External Plagiarism Detection 2011
- Task: Given a set of suspicious documents and a set of potential source documents, the task is to find all plagiarized passages in the suspicious documents and their corresponding source passages in the source documents.
- Input: [data]
- Evaluator: [code]
We are happy to announce the following overall winner of the 1st International Competition on Plagiarism Detection who will be awarded 500,- Euro sponsored by Yahoo! Research:
- Task winner of the external plagiarism detection task are J. Grman and R. Ravas from SVOP Ltd., Slovakia.
To develop your approach, we provide you with a training corpus which comprises a set of suspicious documents and a set of source documents. A suspicious document may contain plagiarized passages, the source passages of which may or may not be present in one or more of the source documents.
For each suspicious document
suspicious-documentXYZ.txt found in the evaluation
corpora, your plagiarism detector shall output an XML file
suspicious-documentXYZ.xml which contains meta
information about all plagiarism cases detected within:
<document reference="suspicious-documentXYZ.txt"> <feature name="detected-plagiarism" this_offset="5" this_length="1000" source_reference="source-documentABC.txt" source_offset="100" source_length="1000" /> ... </document>
The XML documents must be valid with respect to the XML schema found here.
Performance will be measured using macro-averaged precision and recall, granularity, and the plagdet score, which is a combination of the first three measures. For your convenience, we provide a reference implementation of the measures written in Python.
|External Plagiarism Detection Performance|
|0.5563||J. Grman and R. Ravas
SVOP Ltd., Slovakia
|0.4153||C. Grozea* and M. Popescu°
*Fraunhofer Institute FIRST, Germany°University of Bucharest, Romania
|0.3469||G. Oberreuter, G. L'Huillier, S A. Ríos, and J.D. Velásquez
Universidad de Chile, Chile
|0.2467||N. Cooke, L. Gillam, P. Wrobel, H. Cooke, and F. Al-Obaidli
University of Surrey, United Kingdom
|0.2340||D.A. Rodríguez Torrejón*,° and J.M. Martín Ramos°
*IES "José Caballero", Spain
°Universidad de Huelva, Spain
|0.1991||S. Rao, P. Gupta, K. Singhal, and P. Majumder
|0.1892||Y. Palkovskii, A. Belov, I. Muzyka
Zhytomyr State University and SkyLine Inc., Ukraine
|0.0804||R.M.A. Nawab, M. Stevenson, and P. Clough
University of Sheffield, United Kingdom
|0.0011||A. Ghosh, P. Bhaskar, S. Pal, and S. Bandyopadhyay
Jadavpur University, India
A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.