Profiling Fake News Spreaders on Twitter

Sponsored by
Symanto Research

Synopsis

  • Task: Given a Twitter feed, determine whether its author is keen to be a spreader of fake news.
  • Input: Timelines of users sharing fake news as per PolitiFact and Snopes; English and Spanish; 300 training cases each [data]
  • Evaluation: Accuracy
  • Submission: Deployment on TIRA [submit]
  • Baseline: Random, LSTM, Neural Network using char 1-3-grams, SVM usign char 2-6-grams, Symanto LDSE, Emotionally-Infused Neural Network.

Task

Fake news has become one of the main threats of our society. Although fake news is not a new phenomenon, the exponential growth of social media has offered an easy platform for their fast propagation. A great amount of fake news, and rumors are propagated in online social networks with the aim, usually, to deceive users and formulate specific opinions. Users play a critical role in the creation and propagation of fake news online by consuming and sharing articles with inaccurate information either intentionally or unintentionally. To this end, in this task, we aim at identifying possible fake news spreaders on social media as a first step towards preventing fake news from being propagated among online users.

After having addressed several aspects of author profiling in social media from 2013 to 2019 (bot detection, age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating if it is possbile to discriminate authors that have shared some fake news in the past from those that, to the best of our knowledge, have never done it.

As in previous years, we propose the task from a multilingual perspective:

  • English
  • Spanish
NOTE: Although we recommend to participate in both languages (English and Spanish), it is possible to address the problem just for one language.

Award

We are happy to announce that the best performing team at the 8th International Competition on Author Profiling will be awarded 300,- Euro sponsored by Symanto.
This year, the winners of the task are (ex aequo):

  • Jakab Buda and Flora Bolonyai, Eötvös Loránd University, Hungary
  • Juan Pizarro, Chile

Data

Input

The uncompressed dataset consists in a folder per language (en, es). Each folder contains:
  • A XML file per author (Twitter user) with 100 tweets. The name of the XML file correspond to the unique author id.
  • A truth.txt file with the list of authors and the ground truth.
The format of the XML files is:
    <author lang="en">
        <documents>
            <document>Tweet 1 textual contents</document>
            <document>Tweet 2 textual contents</document>
            ...
        </documents>
    </author>
      
The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.
    b2d5748083d6fdffec6c2d68d4d4442d:::0
    2bed15d46872169dc7deaf8d2b43a56:::0
    8234ac5cca1aed3f9029277b2cb851b:::1
    5ccd228e21485568016b4ee82deb0d28:::0
    60d068f9cafb656431e62a6542de2dc0:::1
    ...
    

Output

Your software must take as input the absolute path to an unpacked dataset, and has to output for each document of the dataset a corresponding XML file that looks like this:

    <author id="author-id"
        lang="en|es"
        type="0|1"
    />
                              

The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.

IMPORTANT! Languages should not be mixed. A folder should be created for each language and place inside only the files with the prediction for this language.

Evaluation

The performance of your system will be ranked by accuracy. For each language, we will calculate individual accuracies in discriminating between the two classes. Finally, we will average the accuracy values per language to obtain the final ranking.

Results

POS Team EN ES AVG
1 bolonyai20 0.7500 0.8050 0.7775
1 pizarro20 0.7350 0.8200 0.7775
- SYMANTO (LDSE) [1] 0.7450 0.7900 0.7675
3 koloski20 0.7150 0.7950 0.7550
3 deborjavalero20 0.7300 0.7800 0.7550
3 vogel20 0.7250 0.7850 0.7550
6 higueraporras20 0.7250 0.7750 0.7500
6 tarela20 0.7250 0.7750 0.7500
8 babaei20 0.7250 0.7650 0.7450
9 staykovski20 0.7050 0.7750 0.7400
9 hashemi20 0.6950 0.7850 0.7400
11 estevecasademunt20 0.7100 0.7650 0.7375
12 castellanospellecer20 0.7100 0.7600 0.7350
- SVM + c nGrams 0.6800 0.7900 0.7350
13 shrestha20 0.7100 0.7550 0.7325
13 tommasel20 0.6900 0.7750 0.7325
15 johansson20 0.7200 0.7350 0.7275
15 murauer20 0.6850 0.7700 0.7275
17 espinosagonzales20 0.6900 0.7600 0.7250
17 ikae20 0.7250 0.7250 0.7250
19 morenosandoval20 0.7150 0.7300 0.7225
20 majumder20 0.6400 0.8000 0.7200
20 sanchezromero20 0.6850 0.7550 0.7200
22 lopezchilet20 0.6800 0.7550 0.7175
22 nadalalmela20 0.6800 0.7550 0.7175
22 carrodve20 0.7100 0.7250 0.7175
25 gil20 0.6950 0.7350 0.7150
26 elexpuruortiz20 0.6800 0.7450 0.7125
26 labadietamayo20 0.7050 0.7200 0.7125
28 grafiaperez20 0.6750 0.7450 0.7100
28 jilka20 0.6650 0.7550 0.7100
28 lopezfernandez20 0.6850 0.7350 0.7100
31 pinnaparaju20 0.7150 0.7000 0.7075
31 aguirrezabal20 0.6900 0.7250 0.7075
33 kengyi20 0.6550 0.7550 0.7050
33 gowda20 0.675 0.735 0.7050
33 jakers20 0.6750 0.7350 0.7050
33 cosin20 0.7050 0.7050 0.7050
37 navarromartinez20 0.6600 0.7450 0.7025
38 heilmann20 0.6550 0.7450 0.7000
39 cardaioli20 0.6750 0.7150 0.6950
39 females20 0.6050 0.7850 0.6950
39 kaushikamardas20 0.7000 0.6900 0.6950
- NN + w nGrams 0.6900 0.7000 0.6950
42 monteroceballos20 0.6300 0.7450 0.6875
43 ogaltsov20 0.6950 0.6650 0.6800
44 botticebria20 0.6250 0.7200 0.6725
45 lichouri20 0.5850 0.7600 0.6725
46 manna20 0.5950 0.7250 0.6600
47 fersini20 0.6000 0.7150 0.6575
48 jardon20 0.5450 0.7500 0.6475
- EIN [2] 0.6400 0.6400 0.6400
49 shashirekha20 0.6200 0.6450 0.6325
50 datatontos20 0.7250 0.5300 0.6275
51 soleramo20 0.6100 0.6150 0.6125
- LSTM 0.5600 0.6000 0.5800
52 russo20 0.5800 0.5150 0.5475
53 igualadamoraga20 0.5250 0.5050 0.5150
- RANDOM 0.5100 0.5000 0.5050
54 hoertenhuemer20 0.725 -
55 duan20 0.720 -
55 andmangenix20 0.720 -
57 saeed20 0.700 -
58 baruah20 0.690 -
59 anthonio20 0.685 -
60 zhang20 0.670 -
61 espinosaruiz20 0.665 -
62 shen20 0.650 -
63 suareztrashorras20 0.640 -
64 niven20 0.610 -
65 margoes20 0.570 -
66 wu20 0.560 -

Task Committee