Shared Tasks

Important Dates

  • March 16, 2012: Training data release
  • May 18, 2012: Test data release
  • June 22, 2012: Result submission deadline
  • August 17, 2012: Paper submission: [template] [guidelines] [submission]
  • September 17-20, 2012: Conference

The timezone of all deadlines is Anywhere on Earth.

Keynotes

Roberto Navigli
Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?
Università La Sapienza, Roma

In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We present experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. In a second set of experiments, we show that, when provided with a vast amount of high-quality semantic relations, knowledge-rich word sense disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. The second part of the talk is devoted to analyzing cases in which BabelNet can be of help in cross-language plagiarism detection. Can a large multilingual semantic network provide hints for detecting plagiarized text? We will see examples of how and when multilingual concepts and disambiguated text can support this task.

Ralf Steinberger
Ralf Steinberger Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources
European Commission, Joint Research Centre (JRC), Ispra

A system that recognises cross-lingual plagiarism needs to establish – among other things – whether two pieces of text written in different languages are equivalent to each other. Potthast et al. (2010) give a thorough overview of this challenging task. While the Joint Research Centre (JRC) is not specifically concerned with plagiarism, it has been working for many years on developing other cross-lingual functionalities that may well be useful for the plagiarism detection task, i.e. (a) cross-lingual document similarity calculation, (b) subject domain profiling of documents in many different languages according to the same multilingual subject domain categorisation scheme, and (c) the recognition of name spelling variants for the same entity, both within the same language and across different languages and scripts. The speaker will explain the algorithms behind these software tools and he will present a number of freely available language resources that can be used to develop software with cross-lingual functionality.

Program

PAN's program is part of the CLEF conference program.

September 17
14:00-16:00 Lab overviews (Room Loyola)
15 min. talk PAN'12 - Uncovering Plagiarism, Authorship, and Social Software Misuse
Martin Potthast
16:00-16:30 Coffee Break
16:30-17:00 Poster Boaster Session
17:00-18:30 Poster Session (Room Galleria A)
Encoplot - Tuned for High Recall (also proposing a new plagiarism detection score) - Notebook for PAN at CLEF 2012
Cristian Grozea, Marius Popescu
A Set-Based Approach to Plagiarism Detection - Notebook for PAN at CLEF 2012
Robin Küppers, Stefan Conrad
Applying Specific Clusterization and Fingerprint Density Distribution with Genetic Algorithm Overall Tuning in External Plagiarism Detection - Notebook for PAN at CLEF 2012
Yurii Palkovskii, Alexei Belov
Detailed Comparison Module In CoReMo 1.9 Plagiarism Detector - Notebook for PAN at CLEF 2012
Diego A. Rodríguez Torrejón, José Manuel Martín Ramos
Optimized Fuzzy Text Alignment for Plagiarism Detection - Notebook for PAN at CLEF 2012
Fernando Sánchez-Vega, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda
Bootstrapped Authorship Attribution in Compression Space - Notebook for PAN at CLEF 2012
Ramon de Graaff, Cor J. Veenman
Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector-Space Model with Extrinsic Features - Notebook for PAN at CLEF 2012
Julian Brooke, Graeme Hirst
Sub-Profiling by Linguistic Dimensions to Solve the Authorship Attribution Task - Notebook for PAN at CLEF 2012
Upendra Sapkota, Thamar Solorio
Information Retrieval and Classification based Approaches for the Sexual Predator Identification - Notebook for PAN at CLEF 2012
Darnes Vilariño, Esteban Castillo, David Pinto, Iván Olmos, Saul León
September 19 (Room Trilussa)
Quality Flaw Prediction in Wikipedia, Chair: Matthias Hagen
10:30-10:45 Overview of the 1st International Competition on Quality Flaw Prediction inWikipedia
Maik Anderka, Benno Stein
10:45-11:15 On the Use of PU Learning for Quality Flaw Prediction in Wikipedia - Notebook for PAN at CLEF 2012
Edgardo Ferretti, Donato Hernández Fusilier, Rafael Guzmán Cabrera, Manuel Montes-y-Gómez, Marcelo Errecalde, Paolo Rosso
11:15-11:30 FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - Notebook for PAN at CLEF 2012
Oliver Ferschke, Iryna Gurevych, Marc Rittberger
Plagiarism Detection, Chair: Matthias Hagen
11:30-12:00 Overview of the 4th International Competition on Plagiarism Detection
Martin Potthast, Tim Gollub, Matthias Hagen, Jan Graßegger, Johannes Kiesel, Maximilian Michel, Arnd Oberländer, Martin Tippmann, Alberto Barrón-Cedeño, Parth Gupta, Paolo Rosso, Benno Stein
12:00-12:15 Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection - Notebook for PAN at CLEF 2012
Kong Leilei, Qi Haoliang, Wang Shuai, Du Cuixia, Wang Suhong, Han Yong
12:15-12:30 Educated guesses and equality judgements: using search engines and pairwise match for external plagiarism detection - Notebook for PAN at CLEF 2012
Lee Gillam, Neil Newbold, Neil Cooke
12:30-14:00 Lunch
Plagiarism Detection, Chair: Paolo Rosso
14:00-14:15 Three way search engine queries with multi-feature document comparison for plagiarism detection - Notebook for PAN at CLEF 2012
Simon Suchomel, Jan Kasprzak, and Michal Brandejs
Cross-Language Plagiarism Detection (Keynotes and Panel Discussion), Chair: Paolo Rosso
14:15-15:00 Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources
Ralf Steinberger
15:00-15:45 Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?
Roberto Navigli
15:45-16:00 Panel discussion
September 20 (Room Leopardi)
Traditional Authorship Attribution, Chair: Efstathios Stamatatos
9:30- 10:00 An Overview of the Traditional Authorship Attribution Subtask + Mixture of Experts Authorship Attribution
Patrick Juola and Michael Ryan, John Noecker Jr
10:00-10:15 Authorship attribution: using rich linguistic features when training data is scarcen - Notebook for PAN at CLEF 2012
Ludovic Tanguy, Franck Sajous, Basilio Calderone, Nabil Hathout
10:15-10:30 Feature Bagging for Author Attribution - Notebook for PAN at CLEF 2012
François-Marie Giraud, Thierry Artières
10:30-11:00 Break
Traditional Authorship Attribution & Sexual Predator Identification, Chair: Patrick Juola
11:00-11:15 Graph-based and Lexical-Syntactic Approaches for the Authorship Attribution Task - Notebook for PAN at CLEF 2011
Esteban Castillo, Darnes Vilariño, David Pinto, Iván Olmos, Jesús A. González, Maya Carrillos
11:15-11:45 Overview of the International Sexual Predator Identification Competition at PAN-2012
Giacomo Inches, Fabio Crestani
11:45-12:00 Vote/Veto Classification, Ensemble Clustering and Sequence Classification for Author Identification - Notebook for PAN at CLEF 2012
Roman Kern, Stefan Klampfl and Mario Zechner
12:00-12:15 Quite Simple Approaches for Authorship Attribution, Intrinsic Plagiarism Detection and Sexual Predator Identification - Notebook for PAN at CLEF 2012
Anna Vartapetiance, Lee Gillam
12:15-12:30 Kernel Methods and String Kernels for Authorship Analysis - Notebook for PAN at CLEF 2012
Marius Popescu, Cristian Grozea
12:30-12:45 Conversation Level Constraints on Pedophile Detection in Chat Rooms - Notebook for PAN at CLEF 2012
Claudia Peersman, Frederik Vaassen, Vincent Van Asch, Walter Daelemans
12:45-13:00 Identifying Predators Using ChatCoder 2.0 - Notebook for PAN at CLEF 2012
April Kontostathis, Will West, Andy Garron, Kelly Reynolds, Lynne Edwards
13:00-14:00 Lunch
Sexual Predator Identification, Chair: Giacomo Inches
14:00-14:15 A Two-step Approach for Effective Detection of Misbehaving Users in Chats - Notebook for PAN at CLEF 2012
Esaú Villatoro-Tello, Antonio Juárez-González, Hugo Jair Escalante, Manuel Montes-y-Gómez, and Luis Villaseñor-Pineda
14:15-14:30 A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs - Notebook for PAN at CLEF 2012
Javier Parapar, David E. Losada, Alvaro Barreiro
14:30-14:45 Features for modelling characteristics of conversations - Notebook for PAN at CLEF 2012
Gunnar Eriksson, Jussi Karlgren
14:45-15:00 Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features - Notebook for PAN at CLEF 2012
Colin Morris, Graeme Hirst

Organizing Committee