In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We present experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. In a second set of experiments, we show that, when provided with a vast amount of high-quality semantic relations, knowledge-rich word sense disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. The second part of the talk is devoted to analyzing cases in which BabelNet can be of help in cross-language plagiarism detection. Can a large multilingual semantic network provide hints for detecting plagiarized text? We will see examples of how and when multilingual concepts and disambiguated text can support this task.
PAN at CLEF 2012
Shared Tasks
- Authorship Attribution
- Sexual Predator Identification
- Wikipedia Quality Flaw Prediction
- Source Retrieval
- Text Alignment
Important Dates
- March 16, 2012: Training data release
- May 18, 2012: Test data release
- June 22, 2012: Result submission deadline
- August 17, 2012: Paper submission: [template] [guidelines] [submission]
- September 17-20, 2012: Conference
The timezone of all deadlines is Anywhere on Earth.
Keynotes
A system that recognises cross-lingual plagiarism needs to establish – among other things – whether two pieces of text written in different languages are equivalent to each other. Potthast et al. (2010) give a thorough overview of this challenging task. While the Joint Research Centre (JRC) is not specifically concerned with plagiarism, it has been working for many years on developing other cross-lingual functionalities that may well be useful for the plagiarism detection task, i.e. (a) cross-lingual document similarity calculation, (b) subject domain profiling of documents in many different languages according to the same multilingual subject domain categorisation scheme, and (c) the recognition of name spelling variants for the same entity, both within the same language and across different languages and scripts. The speaker will explain the algorithms behind these software tools and he will present a number of freely available language resources that can be used to develop software with cross-lingual functionality.
Program
PAN's program is part of the CLEF conference program.
September 17 | |
|
Lab overviews (Room Loyola) |
15 min. talk | PAN'12 - Uncovering Plagiarism, Authorship, and Social Software Misuse Martin Potthast |
16:00-16:30 | Coffee Break |
16:30-17:00 | Poster Boaster Session |
17:00-18:30 | Poster Session (Room Galleria A) |
Encoplot - Tuned for High Recall (also proposing a new plagiarism detection score) - Notebook
for PAN at CLEF 2012 Cristian Grozea, Marius Popescu |
|
A Set-Based Approach to Plagiarism Detection - Notebook for PAN at CLEF 2012 Robin Küppers, Stefan Conrad |
|
Applying Specific Clusterization and Fingerprint Density Distribution with Genetic Algorithm
Overall Tuning in External Plagiarism Detection - Notebook for PAN at CLEF 2012 Yurii Palkovskii, Alexei Belov |
|
Detailed Comparison Module In CoReMo 1.9 Plagiarism Detector - Notebook for PAN at CLEF
2012 Diego A. Rodríguez Torrejón, José Manuel Martín Ramos |
|
Optimized Fuzzy Text Alignment for Plagiarism Detection - Notebook for PAN at CLEF 2012 Fernando Sánchez-Vega, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda |
|
Bootstrapped Authorship Attribution in Compression Space - Notebook for PAN at CLEF 2012 Ramon de Graaff, Cor J. Veenman |
|
Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector-Space Model
with Extrinsic Features - Notebook for PAN at CLEF 2012 Julian Brooke, Graeme Hirst |
|
Sub-Profiling by Linguistic Dimensions to Solve the Authorship Attribution Task - Notebook for
PAN at CLEF 2012 Upendra Sapkota, Thamar Solorio |
|
Information Retrieval and Classification based Approaches for the Sexual Predator Identification
- Notebook for PAN at CLEF 2012 Darnes Vilariño, Esteban Castillo, David Pinto, Iván Olmos, Saul León |
|
September 19 (Room Trilussa) | |
Quality Flaw Prediction in Wikipedia, Chair: Matthias Hagen | |
10:30-10:45 | Overview of the 1st International Competition on Quality Flaw Prediction inWikipedia Maik Anderka, Benno Stein |
10:45-11:15 | On the Use of PU Learning for Quality Flaw Prediction in Wikipedia - Notebook for PAN at CLEF
2012 Edgardo Ferretti, Donato Hernández Fusilier, Rafael Guzmán Cabrera, Manuel Montes-y-Gómez, Marcelo Errecalde, Paolo Rosso |
11:15-11:30 | FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - Notebook for PAN at
CLEF 2012 Oliver Ferschke, Iryna Gurevych, Marc Rittberger |
Plagiarism Detection, Chair: Matthias Hagen | |
11:30-12:00 | Overview of the 4th International Competition on Plagiarism Detection Martin Potthast, Tim Gollub, Matthias Hagen, Jan Graßegger, Johannes Kiesel, Maximilian Michel, Arnd Oberländer, Martin Tippmann, Alberto Barrón-Cedeño, Parth Gupta, Paolo Rosso, Benno Stein |
12:00-12:15 | Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection -
Notebook for PAN at CLEF 2012 Kong Leilei, Qi Haoliang, Wang Shuai, Du Cuixia, Wang Suhong, Han Yong |
12:15-12:30 | Educated guesses and equality judgements: using search engines and pairwise match for external
plagiarism detection - Notebook for PAN at CLEF 2012 Lee Gillam, Neil Newbold, Neil Cooke |
12:30-14:00 | Lunch |
Plagiarism Detection, Chair: Paolo Rosso | |
14:00-14:15 | Three way search engine queries with multi-feature document comparison for plagiarism detection
- Notebook for PAN at CLEF 2012 Simon Suchomel, Jan Kasprzak, and Michal Brandejs |
Cross-Language Plagiarism Detection (Keynotes and Panel Discussion), Chair: Paolo Rosso | |
14:15-15:00 | Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and
Resources Ralf Steinberger |
15:00-15:45 | Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection? Roberto Navigli |
15:45-16:00 | Panel discussion |
September 20 (Room Leopardi) | |
Traditional Authorship Attribution, Chair: Efstathios Stamatatos | |
9:30- 10:00 | An Overview of the Traditional Authorship Attribution Subtask + Mixture of Experts Authorship
Attribution Patrick Juola and Michael Ryan, John Noecker Jr |
10:00-10:15 | Authorship attribution: using rich linguistic features when training data is scarcen - Notebook
for PAN at CLEF 2012 Ludovic Tanguy, Franck Sajous, Basilio Calderone, Nabil Hathout |
10:15-10:30 | Feature Bagging for Author Attribution - Notebook for PAN at CLEF 2012 François-Marie Giraud, Thierry Artières |
10:30-11:00 | Break |
Traditional Authorship Attribution & Sexual Predator Identification, Chair: Patrick Juola | |
11:00-11:15 | Graph-based and Lexical-Syntactic Approaches for the Authorship Attribution Task - Notebook for
PAN at CLEF 2011 Esteban Castillo, Darnes Vilariño, David Pinto, Iván Olmos, Jesús A. González, Maya Carrillos |
11:15-11:45 | Overview of the International Sexual Predator Identification Competition at PAN-2012 Giacomo Inches, Fabio Crestani |
11:45-12:00 | Vote/Veto Classification, Ensemble Clustering and Sequence Classification for Author
Identification - Notebook for PAN at CLEF 2012 Roman Kern, Stefan Klampfl and Mario Zechner |
12:00-12:15 | Quite Simple Approaches for Authorship Attribution, Intrinsic Plagiarism Detection and Sexual
Predator Identification - Notebook for PAN at CLEF 2012 Anna Vartapetiance, Lee Gillam |
12:15-12:30 | Kernel Methods and String Kernels for Authorship Analysis - Notebook for PAN at CLEF
2012 Marius Popescu, Cristian Grozea |
12:30-12:45 | Conversation Level Constraints on Pedophile Detection in Chat Rooms - Notebook for PAN at CLEF
2012 Claudia Peersman, Frederik Vaassen, Vincent Van Asch, Walter Daelemans |
12:45-13:00 | Identifying Predators Using ChatCoder 2.0 - Notebook for PAN at CLEF 2012 April Kontostathis, Will West, Andy Garron, Kelly Reynolds, Lynne Edwards |
13:00-14:00 | Lunch |
Sexual Predator Identification, Chair: Giacomo Inches | |
14:00-14:15 | A Two-step Approach for Effective Detection of Misbehaving Users in Chats - Notebook for PAN at
CLEF 2012 Esaú Villatoro-Tello, Antonio Juárez-González, Hugo Jair Escalante, Manuel Montes-y-Gómez, and Luis Villaseñor-Pineda |
14:15-14:30 | A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs - Notebook for
PAN at CLEF 2012 Javier Parapar, David E. Losada, Alvaro Barreiro |
14:30-14:45 | Features for modelling characteristics of conversations - Notebook for PAN at CLEF 2012 Gunnar Eriksson, Jussi Karlgren |
14:45-15:00 | Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features -
Notebook for PAN at CLEF 2012 Colin Morris, Graeme Hirst |