PAN at CLEF 2012

Shared Tasks
Important Dates
Keynotes
Program
Organizing Committee

Shared Tasks

Important Dates

March 16, 2012: Training data release
May 18, 2012: Test data release
June 22, 2012: Result submission deadline
August 17, 2012: Paper submission: [template] [guidelines] [submission]
September 17-20, 2012: Conference

The timezone of all deadlines is Anywhere on Earth.

Keynotes

Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?

Roberto Navigli

Università La Sapienza, Roma

In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We present experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. In a second set of experiments, we show that, when provided with a vast amount of high-quality semantic relations, knowledge-rich word sense disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. The second part of the talk is devoted to analyzing cases in which BabelNet can be of help in cross-language plagiarism detection. Can a large multilingual semantic network provide hints for detecting plagiarized text? We will see examples of how and when multilingual concepts and disambiguated text can support this task.

Ralf Steinberger Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources

Ralf Steinberger

European Commission, Joint Research Centre (JRC), Ispra

A system that recognises cross-lingual plagiarism needs to establish – among other things – whether two pieces of text written in different languages are equivalent to each other. Potthast et al. (2010) give a thorough overview of this challenging task. While the Joint Research Centre (JRC) is not specifically concerned with plagiarism, it has been working for many years on developing other cross-lingual functionalities that may well be useful for the plagiarism detection task, i.e. (a) cross-lingual document similarity calculation, (b) subject domain profiling of documents in many different languages according to the same multilingual subject domain categorisation scheme, and (c) the recognition of name spelling variants for the same entity, both within the same language and across different languages and scripts. The speaker will explain the algorithms behind these software tools and he will present a number of freely available language resources that can be used to develop software with cross-lingual functionality.

Program

PAN's program is part of the CLEF conference program.

September 17
14:00-16:00	Lab overviews (Room Loyola)
15 min. talk	PAN'12 - Uncovering Plagiarism, Authorship, and Social Software Misuse Martin Potthast
16:00-16:30	Coffee Break
16:30-17:00	Poster Boaster Session
17:00-18:30	Poster Session (Room Galleria A)
	Encoplot - Tuned for High Recall (also proposing a new plagiarism detection score) - Notebook for PAN at CLEF 2012 Cristian Grozea, Marius Popescu
	A Set-Based Approach to Plagiarism Detection - Notebook for PAN at CLEF 2012 Robin Küppers, Stefan Conrad
	Applying Specific Clusterization and Fingerprint Density Distribution with Genetic Algorithm Overall Tuning in External Plagiarism Detection - Notebook for PAN at CLEF 2012 Yurii Palkovskii, Alexei Belov
	Detailed Comparison Module In CoReMo 1.9 Plagiarism Detector - Notebook for PAN at CLEF 2012 Diego A. Rodríguez Torrejón, José Manuel Martín Ramos
	Optimized Fuzzy Text Alignment for Plagiarism Detection - Notebook for PAN at CLEF 2012 Fernando Sánchez-Vega, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda
	Bootstrapped Authorship Attribution in Compression Space - Notebook for PAN at CLEF 2012 Ramon de Graaff, Cor J. Veenman
	Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector-Space Model with Extrinsic Features - Notebook for PAN at CLEF 2012 Julian Brooke, Graeme Hirst
	Sub-Profiling by Linguistic Dimensions to Solve the Authorship Attribution Task - Notebook for PAN at CLEF 2012 Upendra Sapkota, Thamar Solorio
	Information Retrieval and Classification based Approaches for the Sexual Predator Identification - Notebook for PAN at CLEF 2012 Darnes Vilariño, Esteban Castillo, David Pinto, Iván Olmos, Saul León
September 19 (Room Trilussa)
	Quality Flaw Prediction in Wikipedia, Chair: Matthias Hagen
10:30-10:45	Overview of the 1st International Competition on Quality Flaw Prediction inWikipedia Maik Anderka, Benno Stein
10:45-11:15	On the Use of PU Learning for Quality Flaw Prediction in Wikipedia - Notebook for PAN at CLEF 2012 Edgardo Ferretti, Donato Hernández Fusilier, Rafael Guzmán Cabrera, Manuel Montes-y-Gómez, Marcelo Errecalde, Paolo Rosso
11:15-11:30	FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - Notebook for PAN at CLEF 2012 Oliver Ferschke, Iryna Gurevych, Marc Rittberger
	Plagiarism Detection, Chair: Matthias Hagen
11:30-12:00	Overview of the 4th International Competition on Plagiarism Detection Martin Potthast, Tim Gollub, Matthias Hagen, Jan Graßegger, Johannes Kiesel, Maximilian Michel, Arnd Oberländer, Martin Tippmann, Alberto Barrón-Cedeño, Parth Gupta, Paolo Rosso, Benno Stein
12:00-12:15	Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection - Notebook for PAN at CLEF 2012 Kong Leilei, Qi Haoliang, Wang Shuai, Du Cuixia, Wang Suhong, Han Yong
12:15-12:30	Educated guesses and equality judgements: using search engines and pairwise match for external plagiarism detection - Notebook for PAN at CLEF 2012 Lee Gillam, Neil Newbold, Neil Cooke
12:30-14:00	Lunch
	Plagiarism Detection, Chair: Paolo Rosso
14:00-14:15	Three way search engine queries with multi-feature document comparison for plagiarism detection - Notebook for PAN at CLEF 2012 Simon Suchomel, Jan Kasprzak, and Michal Brandejs
	Cross-Language Plagiarism Detection (Keynotes and Panel Discussion), Chair: Paolo Rosso
14:15-15:00	Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources Ralf Steinberger
15:00-15:45	Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection? Roberto Navigli
15:45-16:00	Panel discussion
September 20 (Room Leopardi)
	Traditional Authorship Attribution, Chair: Efstathios Stamatatos
9:30- 10:00	An Overview of the Traditional Authorship Attribution Subtask + Mixture of Experts Authorship Attribution Patrick Juola and Michael Ryan, John Noecker Jr
10:00-10:15	Authorship attribution: using rich linguistic features when training data is scarcen - Notebook for PAN at CLEF 2012 Ludovic Tanguy, Franck Sajous, Basilio Calderone, Nabil Hathout
10:15-10:30	Feature Bagging for Author Attribution - Notebook for PAN at CLEF 2012 François-Marie Giraud, Thierry Artières
10:30-11:00	Break
	Traditional Authorship Attribution & Sexual Predator Identification, Chair: Patrick Juola
11:00-11:15	Graph-based and Lexical-Syntactic Approaches for the Authorship Attribution Task - Notebook for PAN at CLEF 2011 Esteban Castillo, Darnes Vilariño, David Pinto, Iván Olmos, Jesús A. González, Maya Carrillos
11:15-11:45	Overview of the International Sexual Predator Identification Competition at PAN-2012 Giacomo Inches, Fabio Crestani
11:45-12:00	Vote/Veto Classification, Ensemble Clustering and Sequence Classification for Author Identification - Notebook for PAN at CLEF 2012 Roman Kern, Stefan Klampfl and Mario Zechner
12:00-12:15	Quite Simple Approaches for Authorship Attribution, Intrinsic Plagiarism Detection and Sexual Predator Identification - Notebook for PAN at CLEF 2012 Anna Vartapetiance, Lee Gillam
12:15-12:30	Kernel Methods and String Kernels for Authorship Analysis - Notebook for PAN at CLEF 2012 Marius Popescu, Cristian Grozea
12:30-12:45	Conversation Level Constraints on Pedophile Detection in Chat Rooms - Notebook for PAN at CLEF 2012 Claudia Peersman, Frederik Vaassen, Vincent Van Asch, Walter Daelemans
12:45-13:00	Identifying Predators Using ChatCoder 2.0 - Notebook for PAN at CLEF 2012 April Kontostathis, Will West, Andy Garron, Kelly Reynolds, Lynne Edwards
13:00-14:00	Lunch
	Sexual Predator Identification, Chair: Giacomo Inches
14:00-14:15	A Two-step Approach for Effective Detection of Misbehaving Users in Chats - Notebook for PAN at CLEF 2012 Esaú Villatoro-Tello, Antonio Juárez-González, Hugo Jair Escalante, Manuel Montes-y-Gómez, and Luis Villaseñor-Pineda
14:15-14:30	A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs - Notebook for PAN at CLEF 2012 Javier Parapar, David E. Losada, Alvaro Barreiro
14:30-14:45	Features for modelling characteristics of conversations - Notebook for PAN at CLEF 2012 Gunnar Eriksson, Jussi Karlgren
14:45-15:00	Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features - Notebook for PAN at CLEF 2012 Colin Morris, Graeme Hirst