Oppositional thinking analysis: Conspiracy theories vs critical thinking narratives

Sponsored by
Symanto Research

Synopsis

Conspiracy theories are complex narratives that attempt to explain the ultimate causes of significant events as cover plots orchestrated by secret, powerful, and malicious groups [1]. A challenging aspect of identifying conspiracy with NLP models [2] stems from the difficulty of distinguishing critical thinking from conspiratorial thinking in automatic content moderation. This distinction is vital because labeling a message as conspiratorial when it is only oppositional could drive those who were simply asking questions into the arms of the conspiracy communities.

At PAN 2024 we aim at analyzing texts that reflect oppositional thinking and contain either conspiracy or critical narratives. The task will address two new challenges for the NLP research community: (1) to distinguish the conspiracy narrative from other oppositional narratives that do not express a conspiracy mentality (i.e., critical thinking); and (2) to identify in online messages the key elements of a narrative that fuels the intergroup conflict in oppositional thinking. To this end we provide two text corpora, one English and one Spanish, and we propose two sub-tasks:

1. Distinguishing between critical and conspiracy texts (subtask 1):

A binary classification task differentiating between (1) critical messages that question major decisions in the public health domain, but do not promote a conspiracist mentality; and (2) messages that view the pandemic or public health decisions as a result of a malevolent conspiracy by secret, influential groups.

  • Input: Set of texts, each associated with one of the two categories: CONSPIRACY, CRITICAL
  • Official evaluation metric: MCC [3]
  • Baselines: BERT classifier [4]

2. Detecting elements of the oppositional narratives (subtask 2):

A token-level classification task aimed at recognizing text spans corresponding to the key elements of oppositional narratives. Since conspiracy narratives are a special kind of causal explanation, we developed a span-level annotation scheme that identifies the goals, effects, agents, and the groups-in conflict in these narratives.

  • Input: Set of texts, each text accompanied with a (possibly empty) list of span annotations. Each annotation corresponds to a narrative element, and is described by its borders (start and end characters), and its category. There are six distinct span categories: AGENT, FACILITATOR, VICTIM, CAMPAIGNER, OBJECTIVE, NEGATIVE_EFFECT
  • Official evaluation metric: span-F1 [5]
  • Baseline: BERT-based multi-task token classifier (separate classification heads, common transformer backbone) [6]
GitHub repository with utilities, baselines, and additional instructions and guidelines can be found here

Task

Conspiracy Theories are complex narratives that attempt to explain the ultimate causes of significant events as cover plots orchestrated by secret, powerful, and malicious groups [1]. Automatic detection of CTs in text has recently gained popularity [2, 7, 8, 9]. The problem is commonly framed as binary classification, with fine-grained approaches corresponding to multi-label or multi-class classification. Two recent MediaEval challenges [8, 9] of coarse- and fine-grained classification of conspiratorial text [8, 9] led to a number of approaches demonstrating that the state-of-art architecture is a multi-task classifier [10, 11, 12] based on domain-specific CT-BERT model [12]. An LLM-based approach was also attempted [13].

However, existing approaches do not distinguish between critical and conspiratorial thinking. This distinction is important because labeling a text as conspiratorial when it is, in fact, oppositional to mainstream views, could potentially lead those who were simply asking questions closer to conspiracy communities. As several authors from the field of social sciences suggest, a fully-fledged conspiratorial worldview is the final step in a progressive "spiritual journey" that sets out by questioning social and political orthodoxies [1, 14, 15]. Additionally, recent research [16] has shown that the level of interaction with conspiracist users is the most important feature for predicting whether or not users join conspiracy communities. These insights have an important implication for automatic content moderation: if models do not differentiate between critical and conspiratorial thinking, there is a high risk of pushing people toward conspiracy communities.

Another important gap in the computational analysis of conspiratorial texts fails to address is the role that intergroup conflict (IGC) [17] plays in these narratives. Intergroup conflict is a way of framing events by emphasizing the hostility between groups, typically by using "us versus them" narrative, and by fueling the perceived injustice and threat to the group. The increasing potentially violent involvement of conspiracist communities in political processes suggests that one of the purposes of CTs is to enforce IGC and coordinate action [18]. Therefore, tools that enable an IGC-based analysis of conspiratorial texts could offer valuable insights for content moderation.

Motivated by the described issues, we propose a novel annotation scheme that distinguishes between conspiracy and critical texts, and defines important categories of oppositional narrative. In addition to the standard elements of conspiracy narratives such as agents (conspirators) and victims, the proposed scheme identifies the following categories: “facilitators” (collaborators of the agents, such as the media) and “campaigners” (those that unmask the conspiracy agenda). These types of actors are “key players” in IGC: the facilitators are tangible targets with whom real conflict is possible (in contrast to abstract agents such as secret groups), and the campaigners are those that show their opposition to the facilitators and try to persuade the victims to join their cause.

We focus on oppositional texts from the Telegram platform related to the COVID-19 pandemic, and construct English and Spanish corpora annotated with the described labeling schemes. This enables the NLP community to tackle two new tasks related to the two previously described phenomena: the binary classification task of distinguishing between conspiratorial and critical texts, and the task of detecting the elements of the oppositional narrative.

Award

We are pleased to announce that the best-performing team at the 11th International Competition on Author Profiling will be awarded 300,- Euro sponsored by Symanto.

Data [download]

The participants will work with a JSON file that contains all the texts in the train dataset, and their annotations. Each text corresponds to a dictionary that contains the ID, tokenized text, the binary category, and the span annotations. Span annotations are a list of dictionaries, each corresponding to an annotated span and containing the span’s category and text, designated by the start and end characters. This is an example of a JSON dictionary corresponding to a single, fully annotated, text:

                 {
                	"id": "91221",
                	"text": "\" Scientism backed ... not \" \" science - backed \" \" . There is nothing scientific about the Covid or childhood vaccine quackery that is slowly but surely killing , maiming and neurologically injuring the next generation — just as the Clintons and other Moloch worshippers want . \" ",
                	"category": "CONSPIRACY",
                	"annotations": [
                  	{
                    	"span_text": "that is slowly but surely killing , maiming and neurologically injuring the next generation",
                    	"category": "NEGATIVE_EFFECT",
                    	"start_char": 128,
                    	"end_char": 219,
                  	},
                  	{
                    	"span_text": "the next generation",
                    	"category": "VICTIM",
                    	"start_char": 200,
                    	"end_char": 219,
                  	},
                  	{
                    	"span_text": "the Clintons",
                    	"category": "AGENT",
                    	"start_char": 230,
                    	"end_char": 242,
                  	},
                  	{
                    	"span_text": "other Moloch worshippers",
                    	"category": "AGENT",
                    	"start_char": 247,
                    	"end_char": 271,
                  	}
                	]
                  }

            

At test time, the participants will receive text data in the above format. For each text, only the “id” and the “text” fields will be provided. The required output is a JSON file in the same format, with each text designated with an ID, and annotated with the text category, and the annotations. More details about the data, and the data utilities can be found in the task’s github repository.

Submission

Details will be announced at a later date.

Evaluation

The official evaluation metric for subtask 1 (critical vs. conspiracy classification) is MCC [3], while the official metric for subtask 2 (span-level detection of narrative elements) is span-F1 [5]. For subtask 1 we will also provide binary F1 scores for each of the classes, and for subtask 2 we will provide per-category span-F1 scores. For each of the task languages, English and Spanish, a separate ranking list will be maintained for each task.

We provide two hard baselines for these tasks. For subtask 1, the baseline is a standard BERT [4] classifier. For subtask 2, the baseline is a BERT-based multi-task token classifier (separate classification heads, common transformer backbone) [6]. The baselines are based on either English or Spanish BERT models, depending on the language. The task’s github repository contains the code of the baselines.

  • [1] K. M. Douglas and R. M. Sutton, “What Are Conspiracy Theories? A Definitional Approach to Their Correlates, Consequences, and Communication,” Annu. Rev. Psychol., vol. 74, no. 1, Jan. 2023.
  • [2] A. Giachanou, B. Ghanem, and P. Rosso, “Detection of conspiracy propagators using psycho-linguistic characteristics,” Journal of Information Science, vol. 49, no. 1, pp. 3–17, Feb. 2023.
  • [3] Chicco, Davide, Tötsch, Niklas, and Jurman, Giuseppe, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Mining, vol. 14, no. 1, pp. 13, Feb. 04, 2021.
  • [4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arXiv, May 24, 2019.
  • [5] Da San Martino, Giovanni, Yu, Seunghak, Barrón-Cedeño, Alberto, Petrov, Rostislav, and Nakov, Preslav, “Fine-Grained Analysis of Propaganda in News Articles,” in Proceedings of the EMNLP-IJCNLP 2019, Hong Kong, China, pp. 5636–5646, Association for Computational Linguistics, Nov. 2019.
  • [6] S. Ruder, “An Overview of Multi-Task Learning in Deep Neural Networks.” arXiv, Jun. 15, 2017.
  • [7] J. D. Moffitt, C. King, and K. M. Carley, “Hunting Conspiracy Theories During the COVID-19 Pandemic,” Social Media + Society, vol. 7, no. 3, p. 20563051211043212, Jul. 2021.
  • [8] K. Pogorelov, D. T. Schroeder, S. Brenner, and J. Langguth, “FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task at MediaEval 2021,” p. 3.
  • [9] K. Pogorelov, D. T. Schroeder, S. Brenner, A. Maulana, and J. Langguth, “Combining Tweets and Connections Graph for FakeNews Detection at MediaEval 2022”.
  • [10] Y. Peskine, G. Alfarano, I. Harrando, P. Papotti, and R. Troncy, “Detecting COVID-19-Related Conspiracy Theories in Tweets,” p. 3.
  • [11] Y. Peskine, P. Papotti, and R. Troncy, “Detection of COVID-19-Related Conpiracy Theories in Tweets using Transformer-Based Models and Node Embedding Techniques”.
  • [12] D. Korenčić, I. Grubišić, A. H. Toselli, B. Chulvi, and P. Rosso, “Tackling Covid-19 Conspiracies on Twitter using BERT Ensembles, GPT-3 Augmentation, and Graph NNs”.
  • [13] Y. Peskine, D. Korenčić, I. Grubisic, P. Papotti, R. Troncy, and P. Rosso, “Definitions Matter: Guiding GPT for Multi-label Classification,” in Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore: Association for Computational Linguistics, 2023, pp. 4054–4063.
  • [14] E. Funkhouser, “A tribal mind: Beliefs that signal group identity or commitment,” Mind & Language, vol. 37, no. 3, pp. 444–464, 2022.
  • [15] B. Franks, A. Bangerter, M. W. Bauer, M. Hall, and M. C. Noort, “Beyond ‘Monologicality’? Exploring Conspiracist Worldviews,” Frontiers in Psychology, vol. 8, 2017.
  • [16] S. Phadke, M. Samory, and T. Mitra, “What Makes People Join Conspiracy Communities? Role of Social Factors in Conspiracy Engagement,” Proc. ACM Hum.-Comput. Interact., vol. 4, no. CSCW3, p. 223:1-223:30, Jan. 2021.
  • [17] R. Böhm, H. Rusch, and J. Baron, “The psychology of intergroup conflict: A review of theories and measures,” Journal of Economic Behavior & Organization, vol. 178, pp. 947–962, Oct. 2020.
  • [18] P. Wagner-Egger, A. Bangerter, S. Delouvée, and S. Dieguez, “Awake together: Sociopsychological processes of engagement in conspiracist communities,” Current Opinion in Psychology, vol. 47, p. 101417, Oct. 2022.

Important Dates

  • February 23, 2024: Train data release
  • May 30, 2024: Software submission deadline
  • June 15, 2024: Participant paper submission Midnight CEST
  • July 1st, 2024: Peer review notification
  • July 7th, 2024: Camera-ready participant papers submission Midnight CEST

Funds

XAI-DisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics (PLEC2021-007681)

Project XAI-DisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics (PLEC2021-007681) funded by MICIU/AEI/ 10.13039/501100011033 and by “European Union NextGenerationEU/PRTR”.

Task Committee