Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO) 2022

Sponsored by
Symanto Research


  • Task: Given a Twitter feed in English, determine whether its author spreads Irony and Stereotypes.
  • Input: Timelines of authors sharing Irony and Stereotypes towards, for instance, women or the LGTB community [data].
    600 users with 200 English tweets each.
    Classes: (1) Irony with Stereotypes, (2) Irony without Stereotypes, (3) Stereotypes but no Irony, (4) Neither
  • Evaluation: Accuracy.
  • Submission: Even when Deployment on TIRA platform are prefered to guarantee the reproducibility of the results, participants can upload their runs in another modality this year. Participants can bypass the VMs and download the test set from ( Data ) and only upload the predictions in the correct output format specified by the shared task organizers (like in the good, old, non-reproducible days). Participants must upload their results in a single zip archive (.zip).
  • Baselines: Character/word n-grams+ SVM/Logistic Regression, LDSE, ...
  • Results (47 Submissions)
    Best approach:Soft-voting BERTweet Ensemble [Wentao Yu et al]


With irony, language is employed in a figurative and subtle way to mean the opposite to what is literally stated. In case of sarcasm, a more aggressive type of irony, the intent is to mock or scorn a victim without excluding the possibility to hurt. Stereotypes are often used, especially in discussions about controversial issues such as immigration or sexism and misogyny. At PAN’22, we will focus on profiling ironic authors in Twitter. Special emphasis will be given to those authors that employ irony to spread stereotypes, for instance, towards women or the LGTB community. The goal will be to classify authors as ironic or not depending on their number of tweets with ironic content. Among those authors we will consider a subset that employs irony to convey stereotypes in order to investigate if state-of-the-art models are able to distinguish also these cases. Therefore, given authors of Twitter together with their tweets, the goal will be to profile those authors that can be considered as ironic.

For those who likes challenges, there is also the opportunity to participate in the IROSTEREO subtask that addresses Stereotype Stance Detection. In fact, stereotypes have been employed by ironic authors to hurt the target (e.g. immigrants) or to somehow defend it. The goal of this subtask will be to detect the stance of how stereotypes are used by ironic authors, if in favour or against the target. Therefore, given the subset of ironic authors that employed stereotypes in some of their tweets, the goal will be to detect their overall stance.


We are happy to announce that the best performing team at the 10th International Competition on Author Profiling will be awarded 300,- Euro sponsored by Symanto

This year, the winner of the task is:
  • Wentao Yu, Benedikt Boenninghoff, and Dorothea Kolossa, Institute of Communication Acoustics, Ruhr University Bochum, Germany



The uncompressed dataset consists in a folder which contains:
  • A XML file per author (Twitter user) with 200 tweets. The name of the XML file correspond to the unique author id.
  • A truth.txt file with the list of authors and the ground truth.
The format of the XML files is:
                <author lang="en">
                <document>Tweet 1 textual contents</document>
                <document>Tweet 2 textual contents</document>
The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.

Regarding the subtask on stance detection, the format will be the same except for the classes, whose labels are: INFAVOR and AGAINST.


Your software must take as input the absolute path to an unpacked dataset, and has to output for each document of the dataset a corresponding XML file that looks like this:

                    <author id="author-id"

The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.

Regarding the subtask on stance detection, the format will be the same except for the classes, whose labels are: INFAVOR and AGAINST.


The performance of your system will be ranked by accuracy.

For those who participate in the subtask on stance detection of ironic users towards stereotypes, the evaluation metric will be Macro-F. We will also analyse the precision, recall and f-measure per class to look into the performance of the systems regarding each possibility (in favour vs. against).


POS Team Accuracy
1 wentaoyu 0.9944
2 harshv 0.9778
3 edapal 0.9722
3 ikae 0.9722
5 JoseAGD 0.9667
5 Enrub 0.9667
7 fsolgui 0.9611
7 claugomez 0.9611
9 AngelAso 0.9556
9 alvaro 0.9556
9 xhuang 0.9556
9 toshevska 0.9556
9 tfnribeiro_g 0.9556
14 josejaviercalvo 0.9500
14 taunk 0.9500
14 your 0.9500
14 PereMarco 0.9500
14 Garcia_Sanches 0.9500
19 pigeon 0.9444
19 xmpeiro 0.9444
19 marcosiino 0.9444
19 dingtli 0.9444
19 moncho 0.9444
19 yifanxu 0.9444
19 yzhang 0.9444
19 longma 0.9444
LDSE 0.9389
27 missino 0.9389
27 badjack 0.9389
27 sgomw 0.9389
27 wangbin 0.9389
27 caohaojie 0.9389
32 lwblinwenbin 0.9333
32 xuyifan 0.9333
32 dirazuherfa 0.9333
32 Los Pablos 0.9333
32 Metalumnos 0.9333
37 narcis 0.9278
37 stm 0.9278
37 huangxt233 0.9278
40 lzy 0.9222
40 avazbar 0.9222
40 fragilro 0.9222
40 whoami 0.9222
40 Garcia\_Grau 0.9222
45 hjang 0.9167
45 nigarsas 0.9167
45 fernanda 0.9167
45 Hyewon 0.9167
49 zyang 0.9056
50 giglou 0.9000
50 sulee 0.9000
52 ehsan.tavan 0.8889
53 rlad 0.8778
54 balouchzahi 0.8722
RF + char 2-ngrams 0.8610
55 manexagirrezabalgmail 0.8500
LR + word 1-ngrams 0.8490
56 tamayo 0.8111
57 yuandong 0.7500
LSTM+Bert-encoding 0.6940
58 G-Lab 0.6778
58 AmitDasRup 0.6778
60 Alpine_EP 0.6722
61 Kminos 0.6667
62 castro 0.6389
63 castroa 0.5833
64 sokhandan 0.5333
64 leila 0.5333

Results in the Stance Detection Subtask

POS Team Run F1-Macro
- LDSE 0.7600
1 dirazuherfa 3 0.6248
2 dirazuherfa 4 0.5807
- RF + char 3-ngram 0.5673
3 toshevska 2 0.5545
4 dirazuherfa 1 0.5433
5 JoseAGD 1 0.5312
6 tamayo 1 0.4886
7 dirazuherfa 2 0.4876
8 tamayo 2 0.4685
- SVM+word 2-ngram 0.4685
9 AmitDasRup 1 0.4563
10 toshevska 4 0.4444
10 taunk 1 0.4444
12 toshevska 3 0.4393
13 AmitDasRup 2 0.4357
14 toshevska 1 0.4340
15 fernanda 1 0.3119
  • [1] Valerio Basile, Cristina Bosco, Elisabetta Fersini, Dora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, Manuela Sanguinetti (2019). SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. Proc. SemEval 2019
  • [2] Sánchez-Junquera J., Chulvi B., Rosso P., Ponzetto S. How Do You Speak about Immigrants? Taxonomy and StereoImmigrants Dataset for Identifying Stereotypes about Immigrants. In: Applied Science, 11(8), 3610, 2021
  • [3] Sánchez-Junquera J., Rosso P., Montes-y-Gómez M., Chulvi B. Masking and BERT-based Models for Stereotype Identification. In: Procesamiento del Lenguaje Natural (SEPLN), num. 67, pp. 83-94, 2021
  • [4] Zhang S., Zhang X., Chan J., Rosso P. Irony Detection via Sentiment-based Transfer Learning. In: Information Processing & Management, vol. 56, issue 5, pp. 1633-1644, 2019
  • [5] Sulis E., Hernández I., Rosso P., Patti V., Ruffo G. Figurative Messages and Affect in Twitter: Differences Between #irony, #sarcasm and #not. In: Knowledge-Based Systems, vol. 108, pp. 132–143, 2016
  • [6] Hernández I., Patti V., Rosso P. Irony Detection in Twitter: The Role of Affective Content. In: ACM Transactions on Internet Technology, 16(3):1-24, 2016
  • [7] Ghosh A., Li G., Veale T., Rosso P., Shutova E., Barnden J., Reyes A. Semeval-2015 task 11: Sentiment Analysis of Figurative Language in Twitter. In: Proc. 9th Int. Workshop on Semantic Evaluation (SemEval 2015), Co-located with NAACL, Denver, Colorado, 4-5 June. Association for Computational Linguistics, pp. 470–478, 2015
  • [8] Reyes A., Rosso P. On the Difficulty of Automatically Detecting Irony: Beyond a Simple Case of Negation. In: Knowledge and Information Systems, 40(3):595-614, 2014
  • [9] Reyes A., Rosso P., Veale T. A Multidimensional Approach for Detecting Irony in Twitter. In: Language Resources and Evaluation, 47(1):239-268, 2013
  • [10] Reyes A., Rosso P., Buscaldi D. From Humor Recognition to Irony Detection: The Figurative Language of Social Media. In: Data & Knowledge Engineering, vol. 74, pp.1-12, 2012
  • [11] Francisco Rangel, Gretel Liz De La Peña Sarracén, Berta Chulvi, Elisabetta Fersini, Paolo Rosso. Profiling Hate Speech Spreaders on Twitter Task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers,, vol. 2936, pp. 1772-1789
  • [12] Francisco Rangel, Anastasia Giachanou, Bilal Ghanem, Paolo Rosso. Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In: L. Cappellato, C. Eickhoff, N. Ferro, and A. Névéol (eds.) CLEF 2020 Labs and Workshops, Notebook Papers. CEUR Workshop, vol. 2696
  • [13] Francisco Rangel and Paolo Rosso. Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling in Twitter. In: L. Cappellato, N. Ferro, D. E. Losada and H. Müller (eds.) CLEF 2019 Labs and Workshops, Notebook Papers. CEUR Workshop, vol. 2380
  • [14] Francisco Rangel, Paolo Rosso, Martin Potthast, Benno Stein. Overview of the 6th author profiling task at pan 2018: multimodal gender identification in Twitter. In: CLEF 2018 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings., vol. 2125.
  • [15] Francisco Rangel, Paolo Rosso, Martin Potthast, Benno Stein. Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. In: Cappellato L., Ferro N., Goeuriot L, Mandl T. (Eds.) CLEF 2017 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings., vol. 1866.
  • [16] Francisco Rangel, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Pottast, Benno Stein. Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations. In: Balog K., Capellato L., Ferro N., Macdonald C. (Eds.) CLEF 2016 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings., vol. 1609, pp. 750-784
  • [17] Francisco Rangel, Fabio Celli, Paolo Rosso, Martin Pottast, Benno Stein, Walter Daelemans. Overview of the 3rd Author Profiling Task at PAN 2015.In: Linda Cappelato and Nicola Ferro and Gareth Jones and Eric San Juan (Eds.): CLEF 2015 Labs and Workshops, Notebook Papers, 8-11 September, Toulouse, France. CEUR Workshop Proceedings. ISSN 1613-0073,,2015.
  • [18] Francisco Rangel, Paolo Rosso, Irina Chugur, Martin Potthast, Martin Trenkmann, Benno Stein, Ben Verhoeven, Walter Daelemans. Overview of the 2nd Author Profiling Task at PAN 2014. In: Cappellato L., Ferro N., Halvey M., Kraaij W. (Eds.) CLEF 2014 Labs and Workshops, Notebook Papers., vol. 1180, pp. 898-827.
  • [19] Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstatios Stamatatos, Giacomo Inches. Overview of the Author Profiling Task at PAN 2013. In: Forner P., Navigli R., Tufis D. (Eds.)Notebook Papers of CLEF 2013 LABs and Workshops., vol. 1179
  • [20] Francisco Rangel and Paolo Rosso On the Implications of the General Data Protection Regulation on the Organisation of Evaluation Tasks. In: Language and Law / Linguagem e Direito, Vol. 5(2), pp. 80-102
  • [21] Francisco Rangel, Marc Franco-Salvador, Paolo Rosso A Low Dimensionality Representation for Language Variety Identification. In: Postproc. 17th Int. Conf. on Comput. Linguistics and Intelligent Text Processing, CICLing-2016, Springer-Verlag, Revised Selected Papers, Part II, LNCS(9624), pp. 156-169 (arXiv:1705.10754)

Task Committee