PAN @ CLEF 2016

This is the 15th evaluation lab on uncovering plagiarism, authorship, and social software misuse. PAN will be held as part of the CLEF conference in Évora, Portugal, on September 5-8, 2016. Evaluations will commence from January till June. We invite you to take part in any of the three tasks shown below.

Learn more » Register now » 0 already signed up

Author Identification

Given a document, who wrote it?

This task focuses on author clustering and author diarization (also known as intrinsic plagiarism detection). Both tasks are concerned with measuring author similarity across and within texts.

Learn more »

Author Profiling


Given a document, what're its author's traits?

This task is concerned with predicting an author's demographics from her writing. For example, an author's style may reveal her age and gender.

Learn more »

Author Obfuscation

Given a document, hide its author.

This task works against identification and profiling by automatically paraphrasing a text to obfuscate its author's style. The tasks offered are author masking and obfuscation evaluation.

Learn more »


Tommaso Fornaciari

Detecting translingual plagiarism: A forensic linguistic contribution to computational processing

Rui Sousa-Silva
Universidade do Porto

Plagiarism detection has evolved significantly in recent years, partly in response to the media attention attracted by high-profile plagiarism cases involving journalists and politicians. A culture of control has been establishing itself to guarantee integrity and honesty in all areas related to copyright and authorship, including implementation of policies, codes of conduct, tariffs of penalties, and matching detection software. The latter has dramatically improved alongside the technological developments over the years. Currently instances of linguistic plagiarism can easily be matched to the original, while pointing out differences between the plagiarised and the plagiarising texts. These methods work particularly well with same language texts; however, systematically detecting translingual plagiarism - i.e. where a derivative text copies from a source in another language without attribution - remains a problem area. This is especially so because the possibilities of combining language pairs are immense, thus requiring an enormous data processing power. This session presents illustrative cases of translingual plagiarism and discusses some of the approaches adopted by forensic linguists to investigate and prove that a certain translated text is an instance of plagiarism. The keynote concludes by encouraging a discussion of computational approaches that can be adopted to assist forensic linguists in their own investigation.

Rui Sousa-Silva is assistant professor of the Faculty of Arts and post-doctoral researcher at the Linguistics Centre (CLUP) of the University of Porto, where he is currently conducting his research into Forensic Linguistics and Cybercrime. He has a PhD in Applied Linguistics from Aston University (Birmingham, UK), where he submitted his thesis on Forensic Linguistics: ‘Detecting Plagiarism in the Forensic Linguistics Turn’. He studied cross-cultural attitudes to plagiarism, and proposed an original approach to translingual plagiarism detection. He also authored and co-authored several papers on (computational) authorship analysis, and is co-editor with Professor Malcolm Coulthard of the recently founded international bilingual biennial journal Language and Law / Linguagem e Direito.

Bauhaus-Universtät Weimar logo
Universitat Politecnica de Valencia logo
University of the Aegean logo
Autoritas Consulting logo
WiQ-Ei logo
CLEF'16 logo
CLEF Initiative logo