External Plagiarism Detection 2009
Synopsis
- Task: Given a set of suspicious documents and a set of potential source documents, the task is to find all plagiarized passages in the suspicious documents and their corresponding source passages in the source documents.
- Input: [data]
- Evaluator: [code]
Award
We are happy to announce the following overall winner of the 1st International Competition on Plagiarism Detection who will be awarded 500,- Euro sponsored by Yahoo! Research:
- Task winner of the external plagiarism detection task, and overall winner, are Cristian Grozea, Christian Gehl, and Marius Popescu from Fraunhofer FIRST and the University of Bucharest.
Congratulations!
Input
To develop your approach, we provide you with a training corpus which comprises a set of suspicious documents and a set of source documents. A suspicious document may contain plagiarized passages from one or more source documents.
Output
For each suspicious document suspicious-documentXYZ.txt
found in the evaluation
corpora, your plagiarism detector shall output an XML file
suspicious-documentXYZ.xml
which contains meta information about all plagiarism
cases detected within:
<document reference="suspicious-documentXYZ.txt"> <feature name="detected-plagiarism" this_offset="5" this_length="1000" source_reference="suspicious-documentABC.txt" source_offset="100" source_length="1000" /> ... </document>
The source_*
attributes may be omitted in case no source document can be identified for
a given detected plagiarized passage.
Evaluation
Performance will be measured using macro-averaged precision and recall, granularity, and the plagdet score, which is a combination of the first three measures. For your convenience, we provide a reference implementation of the measures written in Python.
Results
The following table lists the performances achieved by the participating teams:
Plagiarism Detection Performance | |
---|---|
Plagdet | Participant |
0.6957 | C. Grozea*, C. Gehl*, and M. Popescu° *Fraunhofer FIRST, Germany, and °University of Bucharest, Romania |
0.6093 | J. Kasprzak, M. Brandejs, and M. Křipač Masaryk University, Czech Republic |
0.6041 | C. Basile*, D. Benedetto°, E. Caglioti°, G. Cristadoro*, and M. Degli Esposti* *Università di Bologna and °Università La Sapienza, Italy |
0.3045 | Y. A. Palkovskii Zhytomyr State University, Ukraine |
0.1885 | M. Zechner, M. Muhr, R. Kern, and M. Granitzer Know-Center Graz, Austria |
0.1422 | V. Shcherbinin* and S. Butakov° *American University of Nigeria, Nigeria, and °Solbridge International School of Business, South Korea |
0.0649 | R. C. Pereira, V. P. Moreira, and R. Galante Universidade Federal do Rio Grande do Sul, Brazil |
0.0264 | E. Vallés Balaguer Private, Spain |
0.0187 | J. A. Malcolm and P. C. R. Lane Ferret, University of Hertfordshire, UK |
0.0117 | J. Allen Southern Methodist University in Dallas, USA |
A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.