PAN @ CLEF 2012

This is the 7th evaluation lab on uncovering plagiarism, authorship, and social software misuse. PAN has been part of the CLEF conference in Rome, Italy, on September 17-20, 2012. Evaluations commenced from January till June in the three tasks shown below.

Proceedings »

Plagiarism Detection

Given a document, is it an original?

This task is divided into source retrieval and text alignment. Source retrieval is about searching for likely sources of a suspicious document. Text alignment is about matching passages of reused text between a pair of documents.

Learn more »

Author Identification

Given a document, who wrote it?

This task is divided into authorship attribution and sexual predator identification. For the former, we consider open/closed class situations for attribution as well as author clustering and intrinsic plagiarism. For the latter, the goal is to identify sexual predators in chat logs.

Learn more »

Wikipedia Quality Flaw Prediction

Wikimedia Deutschland

Given a Wikipedia article, what're its quality flaws?

This task is concerned with predicting the ten most frequent quality flaws of English Wikipedia articles, such as "citation needed", orphan, or advert.

Learn more »


Roberto Navigli

Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?

Roberto Navigli
Università La Sapienza, Roma

In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We present experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. In a second set of experiments, we show that, when provided with a vast amount of high-quality semantic relations, knowledge-rich word sense disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. The second part of the talk is devoted to analyzing cases in which BabelNet can be of help in cross-language plagiarism detection. Can a large multilingual semantic network provide hints for detecting plagiarized text? We will see examples of how and when multilingual concepts and disambiguated text can support this task.

Ralf Steinberger

Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources

Ralf Steinberger
European Commission, Joint Research Centre (JRC), Ispra

A system that recognises cross-lingual plagiarism needs to establish – among other things – whether two pieces of text written in different languages are equivalent to each other. Potthast et al. (2010) give a thorough overview of this challenging task. While the Joint Research Centre (JRC) is not specifically concerned with plagiarism, it has been working for many years on developing other cross-lingual functionalities that may well be useful for the plagiarism detection task, i.e. (a) cross-lingual document similarity calculation, (b) subject domain profiling of documents in many different languages according to the same multilingual subject domain categorisation scheme, and (c) the recognition of name spelling variants for the same entity, both within the same language and across different languages and scripts. The speaker will explain the algorithms behind these software tools and he will present a number of freely available language resources that can be used to develop software with cross-lingual functionality.

Learn more »

Bauhaus-Universtät Weimar logo
Universitat Politecnica de Valencia logo
University of the Aegean logo
WiQ-Ei logo
CLEF'12 logo
European Science Foundation
ELIAS Network