Intrinsic Plagiarism Detection 2009
- Task: Given a set of suspicious documents the task is to identify all plagiarized text passages, e.g., by detecting writing style breaches. The comparison of a suspicious document with other documents is not allowed in this task.
- Input: [data]
- Evaluator: [code]
We are happy to announce the following overall winner of the 1st International Competition on Plagiarism Detection who will be awarded 500,- Euro sponsored by Yahoo! Research:
- Task winner of the intrinsic analysis task is Efstathios Stamatatos from the University of the Aegean.
To develop your approach, we provide you with a training corpus which comprises a set of suspicious documents, each of which may contain plagiarized passages.
For each suspicious document
suspicious-documentXYZ.txt found in the evaluation
corpora, your plagiarism detector shall output an XML file
suspicious-documentXYZ.xml which contains
meta information about all plagiarism cases detected within:
<document reference="suspicious-documentXYZ.txt"> <feature name="detected-plagiarism" this_offset="5" this_length="1000" /> ... </document>
The XML documents must be valid with respect to the XML schema found here.
Performance will be measured using macro-averaged precision and recall, granularity, and the plagdet score, which is a combination of the first three measures. For your convenience, we provide a reference implementation of the measures written in Python.
|Intrinsic Plagiarism Detection Performance|
University of the Aegean, Greece
|0.1955||B. Hagbi and M. Koppel
Bar Ilan University, Israel
|0.1766||M. Zechner, M. Muhr, R. Kern, and M. Granitzer
Know-Center Graz, Austria
|0.1219||L. M. Seaward and S. Matwin
University of Ottawa, Canada
A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.