Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO) 2022

Synopsis
Task
Award
Data
Evaluation
Related Work
Task Committee
Important dates

Sponsored by

Synopsis

Task: Given a Twitter feed in English, determine whether its author spreads Irony and Stereotypes.
Input: Timelines of authors sharing Irony and Stereotypes towards, for instance, women or the LGTB community [data].
600 users with 200 English tweets each.
Classes: (1) Irony with Stereotypes, (2) Irony without Stereotypes, (3) Stereotypes but no Irony, (4) Neither
Evaluation: Accuracy.
Submission: Even when Deployment on TIRA platform are prefered to guarantee the reproducibility of the results, participants can upload their runs in another modality this year. Participants can bypass the VMs and download the test set from ( Data ) and only upload the predictions in the correct output format specified by the shared task organizers (like in the good, old, non-reproducible days). Participants must upload their results in a single zip archive (.zip).
Baselines: Character/word n-grams+ SVM/Logistic Regression, LDSE, ...
Results (47 Submissions)
Best approach:Soft-voting BERTweet Ensemble [Wentao Yu et al]

Task

With irony, language is employed in a figurative and subtle way to mean the opposite to what is literally stated. In case of sarcasm, a more aggressive type of irony, the intent is to mock or scorn a victim without excluding the possibility to hurt. Stereotypes are often used, especially in discussions about controversial issues such as immigration or sexism and misogyny. At PAN’22, we will focus on profiling ironic authors in Twitter. Special emphasis will be given to those authors that employ irony to spread stereotypes, for instance, towards women or the LGTB community. The goal will be to classify authors as ironic or not depending on their number of tweets with ironic content. Among those authors we will consider a subset that employs irony to convey stereotypes in order to investigate if state-of-the-art models are able to distinguish also these cases. Therefore, given authors of Twitter together with their tweets, the goal will be to profile those authors that can be considered as ironic.

For those who likes challenges, there is also the opportunity to participate in the IROSTEREO subtask that addresses Stereotype Stance Detection. In fact, stereotypes have been employed by ironic authors to hurt the target (e.g. immigrants) or to somehow defend it. The goal of this subtask will be to detect the stance of how stereotypes are used by ironic authors, if in favour or against the target. Therefore, given the subset of ironic authors that employed stereotypes in some of their tweets, the goal will be to detect their overall stance.

Award

We are happy to announce that the best performing team at the 10th International Competition on Author Profiling will be awarded 300,- Euro sponsored by Symanto

This year, the winner of the task is:

Wentao Yu, Benedikt Boenninghoff, and Dorothea Kolossa, Institute of Communication Acoustics, Ruhr University Bochum, Germany

Data

Input

The uncompressed dataset consists in a folder which contains:

A XML file per author (Twitter user) with 200 tweets. The name of the XML file correspond to the unique author id.
A truth.txt file with the list of authors and the ground truth.

The format of the XML files is:

                <author lang="en">
                <documents>
                <document>Tweet 1 textual contents</document>
                <document>Tweet 2 textual contents</document>
                ...
                </documents>
                </author>

The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.

            2d0d4d7064787300c111033e1d2270cc:::I
            b9eccce7b46cc0b951f6983cc06ebb8:::NI
            f41251b3d64d13ae244dc49d8886cf07:::I
            47c980972060055d7f5495a5ba3428dc:::NI
            d8ed8de45b73bbcf426cdc9209e4bfbc:::I
            2746a9bf36400367b63c925886bc0683:::NI
            ...

Regarding the subtask on stance detection, the format will be the same except for the classes, whose labels are: INFAVOR and AGAINST.

Output

Your software must take as input the absolute path to an unpacked dataset, and has to output for each document of the dataset a corresponding XML file that looks like this:

                    <author id="author-id"
                    lang="en"
                    type="NI|I"
                    />

The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.

Regarding the subtask on stance detection, the format will be the same except for the classes, whose labels are: INFAVOR and AGAINST.

Evaluation

The performance of your system will be ranked by accuracy.

For those who participate in the subtask on stance detection of ironic users towards stereotypes, the evaluation metric will be Macro-F. We will also analyse the precision, recall and f-measure per class to look into the performance of the systems regarding each possibility (in favour vs. against).

Results

POS	Team	Accuracy
1	wentaoyu	0.9944
2	harshv	0.9778
3	edapal	0.9722
3	ikae	0.9722
5	JoseAGD	0.9667
5	Enrub	0.9667
7	fsolgui	0.9611
7	claugomez	0.9611
9	AngelAso	0.9556
9	alvaro	0.9556
9	xhuang	0.9556
9	toshevska	0.9556
9	tfnribeiro_g	0.9556
14	josejaviercalvo	0.9500
14	taunk	0.9500
14	your	0.9500
14	PereMarco	0.9500
14	Garcia_Sanches	0.9500
19	pigeon	0.9444
19	xmpeiro	0.9444
19	marcosiino	0.9444
19	dingtli	0.9444
19	moncho	0.9444
19	yifanxu	0.9444
19	yzhang	0.9444
19	longma	0.9444
	LDSE	0.9389
27	missino	0.9389
27	badjack	0.9389
27	sgomw	0.9389
27	wangbin	0.9389
27	caohaojie	0.9389
32	lwblinwenbin	0.9333
32	xuyifan	0.9333
32	dirazuherfa	0.9333
32	Los Pablos	0.9333
32	Metalumnos	0.9333
37	narcis	0.9278
37	stm	0.9278
37	huangxt233	0.9278
40	lzy	0.9222
40	avazbar	0.9222
40	fragilro	0.9222
40	whoami	0.9222
40	Garcia\_Grau	0.9222
45	hjang	0.9167
45	nigarsas	0.9167
45	fernanda	0.9167
45	Hyewon	0.9167
49	zyang	0.9056
50	giglou	0.9000
50	sulee	0.9000
52	ehsan.tavan	0.8889
53	rlad	0.8778
54	balouchzahi	0.8722
	RF + char 2-ngrams	0.8610
55	manexagirrezabalgmail	0.8500
	LR + word 1-ngrams	0.8490
56	tamayo	0.8111
57	yuandong	0.7500
	LSTM+Bert-encoding	0.6940
58	G-Lab	0.6778
58	AmitDasRup	0.6778
60	Alpine_EP	0.6722
61	Kminos	0.6667
62	castro	0.6389
63	castroa	0.5833
64	sokhandan	0.5333
64	leila	0.5333

Results in the Stance Detection Subtask

POS	Team	Run	F1-Macro
-	LDSE		0.7600
1	dirazuherfa	3	0.6248
2	dirazuherfa	4	0.5807
-	RF + char 3-ngram		0.5673
3	toshevska	2	0.5545
4	dirazuherfa	1	0.5433
5	JoseAGD	1	0.5312
6	tamayo	1	0.4886
7	dirazuherfa	2	0.4876
8	tamayo	2	0.4685
-	SVM+word 2-ngram		0.4685
9	AmitDasRup	1	0.4563
10	toshevska	4	0.4444
10	taunk	1	0.4444
12	toshevska	3	0.4393
13	AmitDasRup	2	0.4357
14	toshevska	1	0.4340
15	fernanda	1	0.3119

[1] Valerio Basile, Cristina Bosco, Elisabetta Fersini, Dora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, Manuela Sanguinetti (2019). SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. Proc. SemEval 2019
[2] Sánchez-Junquera J., Chulvi B., Rosso P., Ponzetto S. How Do You Speak about Immigrants? Taxonomy and StereoImmigrants Dataset for Identifying Stereotypes about Immigrants. In: Applied Science, 11(8), 3610, 2021 https://doi.org/10.3390/app11083610
[3] Sánchez-Junquera J., Rosso P., Montes-y-Gómez M., Chulvi B. Masking and BERT-based Models for Stereotype Identification. In: Procesamiento del Lenguaje Natural (SEPLN), num. 67, pp. 83-94, 2021
[4] Zhang S., Zhang X., Chan J., Rosso P. Irony Detection via Sentiment-based Transfer Learning. In: Information Processing & Management, vol. 56, issue 5, pp. 1633-1644, 2019
[5] Sulis E., Hernández I., Rosso P., Patti V., Ruffo G. Figurative Messages and Affect in Twitter: Differences Between #irony, #sarcasm and #not. In: Knowledge-Based Systems, vol. 108, pp. 132–143, 2016
[6] Hernández I., Patti V., Rosso P. Irony Detection in Twitter: The Role of Affective Content. In: ACM Transactions on Internet Technology, 16(3):1-24, 2016
[7] Ghosh A., Li G., Veale T., Rosso P., Shutova E., Barnden J., Reyes A. Semeval-2015 task 11: Sentiment Analysis of Figurative Language in Twitter. In: Proc. 9th Int. Workshop on Semantic Evaluation (SemEval 2015), Co-located with NAACL, Denver, Colorado, 4-5 June. Association for Computational Linguistics, pp. 470–478, 2015
[8] Reyes A., Rosso P. On the Difficulty of Automatically Detecting Irony: Beyond a Simple Case of Negation. In: Knowledge and Information Systems, 40(3):595-614, 2014
[9] Reyes A., Rosso P., Veale T. A Multidimensional Approach for Detecting Irony in Twitter. In: Language Resources and Evaluation, 47(1):239-268, 2013
[10] Reyes A., Rosso P., Buscaldi D. From Humor Recognition to Irony Detection: The Figurative Language of Social Media. In: Data & Knowledge Engineering, vol. 74, pp.1-12, 2012
[11] Francisco Rangel, Gretel Liz De La Peña Sarracén, Berta Chulvi, Elisabetta Fersini, Paolo Rosso. Profiling Hate Speech Spreaders on Twitter Task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers, CEUR-WS.org, vol. 2936, pp. 1772-1789
[12] Francisco Rangel, Anastasia Giachanou, Bilal Ghanem, Paolo Rosso. Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In: L. Cappellato, C. Eickhoff, N. Ferro, and A. Névéol (eds.) CLEF 2020 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings.CEUR-WS.org, vol. 2696
[13] Francisco Rangel and Paolo Rosso. Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling in Twitter. In: L. Cappellato, N. Ferro, D. E. Losada and H. Müller (eds.) CLEF 2019 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings.CEUR-WS.org, vol. 2380
[14] Francisco Rangel, Paolo Rosso, Martin Potthast, Benno Stein. Overview of the 6th author profiling task at pan 2018: multimodal gender identification in Twitter. In: CLEF 2018 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 2125.
[15] Francisco Rangel, Paolo Rosso, Martin Potthast, Benno Stein. Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. In: Cappellato L., Ferro N., Goeuriot L, Mandl T. (Eds.) CLEF 2017 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1866.
[16] Francisco Rangel, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Pottast, Benno Stein. Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations. In: Balog K., Capellato L., Ferro N., Macdonald C. (Eds.) CLEF 2016 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1609, pp. 750-784
[17] Francisco Rangel, Fabio Celli, Paolo Rosso, Martin Pottast, Benno Stein, Walter Daelemans. Overview of the 3rd Author Profiling Task at PAN 2015.In: Linda Cappelato and Nicola Ferro and Gareth Jones and Eric San Juan (Eds.): CLEF 2015 Labs and Workshops, Notebook Papers, 8-11 September, Toulouse, France. CEUR Workshop Proceedings. ISSN 1613-0073, http://ceur-ws.org/Vol-1391/,2015.
[18] Francisco Rangel, Paolo Rosso, Irina Chugur, Martin Potthast, Martin Trenkmann, Benno Stein, Ben Verhoeven, Walter Daelemans. Overview of the 2nd Author Profiling Task at PAN 2014. In: Cappellato L., Ferro N., Halvey M., Kraaij W. (Eds.) CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180, pp. 898-827.
[19] Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstatios Stamatatos, Giacomo Inches. Overview of the Author Profiling Task at PAN 2013. In: Forner P., Navigli R., Tufis D. (Eds.)Notebook Papers of CLEF 2013 LABs and Workshops. CEUR-WS.org, vol. 1179
[20] Francisco Rangel and Paolo Rosso On the Implications of the General Data Protection Regulation on the Organisation of Evaluation Tasks. In: Language and Law / Linguagem e Direito, Vol. 5(2), pp. 80-102
[21] Francisco Rangel, Marc Franco-Salvador, Paolo Rosso A Low Dimensionality Representation for Language Variety Identification. In: Postproc. 17th Int. Conf. on Comput. Linguistics and Intelligent Text Processing, CICLing-2016, Springer-Verlag, Revised Selected Papers, Part II, LNCS(9624), pp. 156-169 (arXiv:1705.10754)