External Plagiarism Detection 2011
Synopsis
- Task: Given a set of suspicious documents and a set of potential source documents, the task is to find all plagiarized passages in the suspicious documents and their corresponding source passages in the source documents.
- Input: [data]
- Evaluator: [code]
Award
We are happy to announce the following overall winner of the 1st International Competition on Plagiarism Detection who will be awarded 500,- Euro sponsored by Yahoo! Research:
- Task winner of the external plagiarism detection task are J. Grman and R. Ravas from SVOP Ltd., Slovakia.
Congratulations!
Input
To develop your approach, we provide you with a training corpus which comprises a set of suspicious documents and a set of source documents. A suspicious document may contain plagiarized passages, the source passages of which may or may not be present in one or more of the source documents.
Output
For each suspicious document suspicious-documentXYZ.txt
found in the evaluation
corpora, your plagiarism detector shall output an XML file suspicious-documentXYZ.xml
which contains meta
information about all plagiarism cases detected within:
<document reference="suspicious-documentXYZ.txt"> <feature name="detected-plagiarism" this_offset="5" this_length="1000" source_reference="source-documentABC.txt" source_offset="100" source_length="1000" /> ... </document>
The XML documents must be valid with respect to the XML schema found here.
Evaluation
Performance will be measured using macro-averaged precision and recall, granularity, and the plagdet score, which is a combination of the first three measures. For your convenience, we provide a reference implementation of the measures written in Python.
Results
External Plagiarism Detection Performance | |
---|---|
Plagdet | Participant |
0.5563 | J. Grman and R. Ravas SVOP Ltd., Slovakia |
0.4153 | C. Grozea* and M. Popescu° *Fraunhofer Institute FIRST, Germany°University of Bucharest, Romania |
0.3469 | G. Oberreuter, G. L'Huillier, S A. Ríos, and J.D. Velásquez Universidad de Chile, Chile |
0.2467 | N. Cooke, L. Gillam, P. Wrobel, H. Cooke, and F. Al-Obaidli University of Surrey, United Kingdom |
0.2340 | D.A. Rodríguez Torrejón*,° and J.M. Martín Ramos° *IES "José Caballero", Spain °Universidad de Huelva, Spain |
0.1991 | S. Rao, P. Gupta, K. Singhal, and P. Majumder DA-IICT, India |
0.1892 | Y. Palkovskii, A. Belov, I. Muzyka Zhytomyr State University and SkyLine Inc., Ukraine |
0.0804 | R.M.A. Nawab, M. Stevenson, and P. Clough University of Sheffield, United Kingdom |
0.0011 | A. Ghosh, P. Bhaskar, S. Pal, and S. Bandyopadhyay Jadavpur University, India |
A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.