PAN 2020 is now live!

Workshop Keynotes

PAN is co-located with the CLEF conference and will be held from September 17 to 20, 2012.

Roberto Navigli
Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?
Università La Sapienza, Roma

In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We present experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. In a second set of experiments, we show that, when provided with a vast amount of high-quality semantic relations, knowledge-rich word sense disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. The second part of the talk is devoted to analyzing cases in which BabelNet can be of help in cross-language plagiarism detection. Can a large multilingual semantic network provide hints for detecting plagiarized text? We will see examples of how and when multilingual concepts and disambiguated text can support this task.

Ralf Steinberger
Ralf Steinberger Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources
European Commission, Joint Research Centre (JRC), Ispra

A system that recognises cross-lingual plagiarism needs to establish – among other things – whether two pieces of text written in different languages are equivalent to each other. Potthast et al. (2010) give a thorough overview of this challenging task. While the Joint Research Centre (JRC) is not specifically concerned with plagiarism, it has been working for many years on developing other cross-lingual functionalities that may well be useful for the plagiarism detection task, i.e. (a) cross-lingual document similarity calculation, (b) subject domain profiling of documents in many different languages according to the same multilingual subject domain categorisation scheme, and (c) the recognition of name spelling variants for the same entity, both within the same language and across different languages and scripts. The speaker will explain the algorithms behind these software tools and he will present a number of freely available language resources that can be used to develop software with cross-lingual functionality.