Profiling Fake News Spreaders on Twitter 2020
Synopsis
- Task: Given a Twitter feed, determine whether its author is keen to be a spreader of fake news.
- Input: Timelines of users sharing fake news as per PolitiFact and Snopes; English and Spanish; 300 training cases each [data]
- Evaluation: Accuracy
- Submission: Deployment on TIRA [submit]
- Baseline: Random, LSTM, Neural Network using char 1-3-grams, SVM usign char 2-6-grams, Symanto LDSE, Emotionally-Infused Neural Network.
Task
Fake news has become one of the main threats of our society. Although fake news is not a new phenomenon, the exponential growth of social media has offered an easy platform for their fast propagation. A great amount of fake news, and rumors are propagated in online social networks with the aim, usually, to deceive users and formulate specific opinions. Users play a critical role in the creation and propagation of fake news online by consuming and sharing articles with inaccurate information either intentionally or unintentionally. To this end, in this task, we aim at identifying possible fake news spreaders on social media as a first step towards preventing fake news from being propagated among online users.
After having addressed several aspects of author profiling in social media from 2013 to 2019 (bot detection, age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating if it is possbile to discriminate authors that have shared some fake news in the past from those that, to the best of our knowledge, have never done it.
As in previous years, we propose the task from a multilingual perspective:
- English
- Spanish
Award
We are happy to announce that the best performing team at the 8th International Competition on Author Profiling will be awarded 300,- Euro sponsored by Symanto.
This year, the winners of the task are (ex aequo):
- Jakab Buda and Flora Bolonyai, Eötvös Loránd University, Hungary
- Juan Pizarro, Chile
Data
Input
The uncompressed dataset consists in a folder per language (en, es). Each folder contains:- A XML file per author (Twitter user) with 100 tweets. The name of the XML file correspond to the unique author id.
- A truth.txt file with the list of authors and the ground truth.
<author lang="en"> <documents> <document>Tweet 1 textual contents</document> <document>Tweet 2 textual contents</document> ... </documents> </author>The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.
b2d5748083d6fdffec6c2d68d4d4442d:::0 2bed15d46872169dc7deaf8d2b43a56:::0 8234ac5cca1aed3f9029277b2cb851b:::1 5ccd228e21485568016b4ee82deb0d28:::0 60d068f9cafb656431e62a6542de2dc0:::1 ...
Output
Your software must take as input the absolute path to an unpacked dataset, and has to output for each document of the dataset a corresponding XML file that looks like this:
<author id="author-id" lang="en|es" type="0|1" />
The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.
IMPORTANT! Languages should not be mixed. A folder should be created for each language and place inside only the files with the prediction for this language.
Evaluation
The performance of your system will be ranked by accuracy. For each language, we will calculate individual accuracies in discriminating between the two classes. Finally, we will average the accuracy values per language to obtain the final ranking.Results
POS | Team | EN | ES | AVG |
---|---|---|---|---|
1 | bolonyai20 | 0.7500 | 0.8050 | 0.7775 |
1 | pizarro20 | 0.7350 | 0.8200 | 0.7775 |
- | SYMANTO (LDSE) [1] | 0.7450 | 0.7900 | 0.7675 |
3 | koloski20 | 0.7150 | 0.7950 | 0.7550 |
3 | deborjavalero20 | 0.7300 | 0.7800 | 0.7550 |
3 | vogel20 | 0.7250 | 0.7850 | 0.7550 |
6 | higueraporras20 | 0.7250 | 0.7750 | 0.7500 |
6 | tarela20 | 0.7250 | 0.7750 | 0.7500 |
8 | babaei20 | 0.7250 | 0.7650 | 0.7450 |
9 | staykovski20 | 0.7050 | 0.7750 | 0.7400 |
9 | hashemi20 | 0.6950 | 0.7850 | 0.7400 |
11 | estevecasademunt20 | 0.7100 | 0.7650 | 0.7375 |
12 | castellanospellecer20 | 0.7100 | 0.7600 | 0.7350 |
- | SVM + c nGrams | 0.6800 | 0.7900 | 0.7350 |
13 | shrestha20 | 0.7100 | 0.7550 | 0.7325 |
13 | tommasel20 | 0.6900 | 0.7750 | 0.7325 |
15 | johansson20 | 0.7200 | 0.7350 | 0.7275 |
15 | murauer20 | 0.6850 | 0.7700 | 0.7275 |
17 | espinosagonzales20 | 0.6900 | 0.7600 | 0.7250 |
17 | ikae20 | 0.7250 | 0.7250 | 0.7250 |
19 | morenosandoval20 | 0.7150 | 0.7300 | 0.7225 |
20 | majumder20 | 0.6400 | 0.8000 | 0.7200 |
20 | sanchezromero20 | 0.6850 | 0.7550 | 0.7200 |
22 | lopezchilet20 | 0.6800 | 0.7550 | 0.7175 |
22 | nadalalmela20 | 0.6800 | 0.7550 | 0.7175 |
22 | carrodve20 | 0.7100 | 0.7250 | 0.7175 |
25 | gil20 | 0.6950 | 0.7350 | 0.7150 |
26 | elexpuruortiz20 | 0.6800 | 0.7450 | 0.7125 |
26 | labadietamayo20 | 0.7050 | 0.7200 | 0.7125 |
28 | grafiaperez20 | 0.6750 | 0.7450 | 0.7100 |
28 | jilka20 | 0.6650 | 0.7550 | 0.7100 |
28 | lopezfernandez20 | 0.6850 | 0.7350 | 0.7100 |
31 | pinnaparaju20 | 0.7150 | 0.7000 | 0.7075 |
31 | aguirrezabal20 | 0.6900 | 0.7250 | 0.7075 |
33 | kengyi20 | 0.6550 | 0.7550 | 0.7050 |
33 | gowda20 | 0.675 | 0.735 | 0.7050 |
33 | jakers20 | 0.6750 | 0.7350 | 0.7050 |
33 | cosin20 | 0.7050 | 0.7050 | 0.7050 |
37 | navarromartinez20 | 0.6600 | 0.7450 | 0.7025 |
38 | heilmann20 | 0.6550 | 0.7450 | 0.7000 |
39 | cardaioli20 | 0.6750 | 0.7150 | 0.6950 |
39 | females20 | 0.6050 | 0.7850 | 0.6950 |
39 | kaushikamardas20 | 0.7000 | 0.6900 | 0.6950 |
- | NN + w nGrams | 0.6900 | 0.7000 | 0.6950 |
42 | monteroceballos20 | 0.6300 | 0.7450 | 0.6875 |
43 | ogaltsov20 | 0.6950 | 0.6650 | 0.6800 |
44 | botticebria20 | 0.6250 | 0.7200 | 0.6725 |
45 | lichouri20 | 0.5850 | 0.7600 | 0.6725 |
46 | manna20 | 0.5950 | 0.7250 | 0.6600 |
47 | fersini20 | 0.6000 | 0.7150 | 0.6575 |
48 | jardon20 | 0.5450 | 0.7500 | 0.6475 |
- | EIN [2] | 0.6400 | 0.6400 | 0.6400 |
49 | shashirekha20 | 0.6200 | 0.6450 | 0.6325 |
50 | datatontos20 | 0.7250 | 0.5300 | 0.6275 |
51 | soleramo20 | 0.6100 | 0.6150 | 0.6125 |
- | LSTM | 0.5600 | 0.6000 | 0.5800 |
52 | russo20 | 0.5800 | 0.5150 | 0.5475 |
53 | igualadamoraga20 | 0.5250 | 0.5050 | 0.5150 |
- | RANDOM | 0.5100 | 0.5000 | 0.5050 |
54 | hoertenhuemer20 | 0.725 | - | |
55 | duan20 | 0.720 | - | |
55 | andmangenix20 | 0.720 | - | |
57 | saeed20 | 0.700 | - | |
58 | baruah20 | 0.690 | - | |
59 | anthonio20 | 0.685 | - | |
60 | zhang20 | 0.670 | - | |
61 | espinosaruiz20 | 0.665 | - | |
62 | shen20 | 0.650 | - | |
63 | suareztrashorras20 | 0.640 | - | |
64 | niven20 | 0.610 | - | |
65 | margoes20 | 0.570 | - | |
66 | wu20 | 0.560 | - |
Related Work
- [1] Rangel F., Franco-Salvador M., Rosso P. A Low Dimensionality Representation for Language Variety Identification. In: Postproc. 17th Int. Conf. on Comput. Linguistics and Intelligent Text Processing, CICLing-2016, Springer-Verlag, Revised Selected Papers, Part II, LNCS(9624), pp. 156-169 (arXiv:1705.10754)
- [2] Ghanem, B., Rosso, P., and Rangel, F. (2020). An Emotional Analysis of False Information in Social Media and News Articles. ACM Transactions on Internet Technology (TOIT), 20(2), pp. 1-18.
- [3] Anastasia Giachanou, Paolo Rosso, Fabio Crestani. Leveraging Emotional Signals for Credibility Detection. Proceedings of the 42nd International ACM Conference on Research and Development in Information Retrieval (SIGIR). pp 877–880. (2019)
- [4] Andre Guess, Jonathan Nagler, and Joshua Tucker. Less than you think: Prevalence and predictors of fake news dissemination on Facebook. Science Advances vol. 5 (2019)
- [5] Andrew Hall, Loren Terveen, Aaron Halfaker. Bot Detection in Wikidata Using Behavioral and Other Informal Cues. Proceedings of the ACM on Human-Computer Interaction. 2018 Nov 1;2(CSCW):64.
- [6] Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum. DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp 22-32. (2018)
- [7] Francisco Rangel and Paolo Rosso. Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling in Twitter. In: L. Cappellato, N. Ferro, D. E. Losada and H. Müller (eds.) CLEF 2019 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings.CEUR-WS.org, vol. 2380
- [8] Francisco Rangel, Paolo Rosso, Martin Potthast, Benno Stein. Overview of the 6th author profiling task at pan 2018: multimodal gender identification in Twitter. In: CLEF 2018 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 2125.
- [9] Francisco Rangel, Paolo Rosso, Martin Potthast, Benno Stein. Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. In: Cappellato L., Ferro N., Goeuriot L, Mandl T. (Eds.) CLEF 2017 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1866.
- [10] Francisco Rangel, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Pottast, Benno Stein. Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations. In: Balog K., Capellato L., Ferro N., Macdonald C. (Eds.) CLEF 2016 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1609, pp. 750-784
- [11] Francisco Rangel, Fabio Celli, Paolo Rosso, Martin Pottast, Benno Stein, Walter Daelemans. Overview of the 3rd Author Profiling Task at PAN 2015.In: Linda Cappelato and Nicola Ferro and Gareth Jones and Eric San Juan (Eds.): CLEF 2015 Labs and Workshops, Notebook Papers, 8-11 September, Toulouse, France. CEUR Workshop Proceedings. ISSN 1613-0073, http://ceur-ws.org/Vol-1391/,2015.
- [12] Francisco Rangel, Paolo Rosso, Irina Chugur, Martin Potthast, Martin Trenkmann, Benno Stein, Ben Verhoeven, Walter Daelemans. Overview of the 2nd Author Profiling Task at PAN 2014. In: Cappellato L., Ferro N., Halvey M., Kraaij W. (Eds.) CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180, pp. 898-827.
- [13] Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstatios Stamatatos, Giacomo Inches. Overview of the Author Profiling Task at PAN 2013. In: Forner P., Navigli R., Tufis D. (Eds.)Notebook Papers of CLEF 2013 LABs and Workshops. CEUR-WS.org, vol. 1179
- [14] Francisco Rangel and Paolo Rosso On the Implications of the General Data Protection Regulation on the Organisation of Evaluation Tasks. In: Language and Law / Linguagem e Direito, Vol. 5(2), pp. 80-102
- [15] Kai Shu, Suhang Wang, and Huan Liu. Understanding user profiles on social media for fake news detection. Proceedings of the IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 430--435 (2018)
- [16] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. Fake News Detection on Social Media: A Data Mining Perspective. ACM SIGKDD Explorations Newsletter. (2017)