Plagiarism Detection 2010
Synopsis
- Task: Given a set of suspicious documents and a set of potential source documents, the task is to find all plagiarized passages in the suspicious documents and their corresponding source passages in the source documents.
- Input: [data]
- Evaluator: [code]
Task
Given a set of suspicious documents and a set of source documents, the task is to find all plagiarized sections in the suspicious documents and, if available, the corresponding source sections.
Remark. This task combines both external plagiarism detection and intrinsic plagiarism detection, where the former refers to detecting plagiarized sections in a suspicious document and the corresponding source sections in a given set of source documents, and the latter refers to detecting plagiarized sections without comparing the suspicious document to any other documents, e.g., by detecting changes in writing style.
Award
We are happy to announce the following overall winner of the 2st International Competition on Plagiarism Detection who will be awarded 500,- Euro sponsored by Yahoo! Research:
- J. Kasprzak and M. Brandejs from Masaryk University, Czech Republic
Congratulations!
Input
To develop your approach, we provide you with a training corpus which comprises a set of suspicious documents and a set of source documents. A suspicious document may contain plagiarized passages, the source passages of which may or may not be present in one or more of the source documents. Learn more »
Output
For each suspicious document suspicious-documentXYZ.txt
found in the evaluation
corpora, your plagiarism detector shall output an XML file
suspicious-documentXYZ.xml
which contains meta information about all plagiarism
cases detected within:
<document reference="suspicious-documentXYZ.txt"> <feature name="detected-plagiarism" this_offset="5" this_length="1000" source_reference="suspicious-documentABC.txt" source_offset="100" source_length="1000" /> ... </document>
The source_*
attributes may be omitted in case no source document can be identified for
a given detected plagiarized passage.
Evaluation
Performance will be measured using macro-averaged precision and recall, granularity, and the plagdet score, which is a combination of the first three measures. For your convenience, we provide a reference implementation of the measures written in Python.
Results
The following table lists the performances achieved by the participating teams:
Plagiarism Detection Performance | |
---|---|
Plagdet | Participant |
0.7971 | J. Kasprzak and M. Brandejs Masaryk University, Czech Republic |
0.7090 | D. Zou, W. Long, and Z. Ling South China University of Technology, China |
0.6948 | M. Muhr, R. Kern, M. Zechner, and M. Granitzer Know-Center Graz, Austria |
0.6209 | C. Grozea* and M. Popescu° *Fraunhofer FIRST, Germany °University of Bucharest, Romania |
0.6066 | G. Oberreuter, G. L'Huillier, S.A. Ríos, and J.D. Velásquez University of Chile, Chile |
0.5851 | D.A.R. Torrejón*,° and J.M.M. Ramos° *IES "José Caballero", Spain °Universidad de Huelva, Spain |
0.5191 | R.C. Pereira, V.P. Moreira, and R. Galante Universidade Federal do Rio Grande do Sul, Brazil |
0.5093 | Y. Palkovskii, A. Belov, and I. Muzika Zhytomyr State University and SkyLine, Inc. Ukraine |
0.4378 | Sobha L., Pattabhi R.K R., Vijay S.R., A. Akilandeswari MIT Campus of Anna University Chennai, India |
0.2564 | T. Gottron Universität Koblenz-Landau, Germany |
0.2222 | D. Micol, Ó. Ferrández, and R. Muñoz University of Alicante, Spain |
0.2148 | M.R. Costa-jussà, R.E. Banchs, J. Grivolla, and J. Codina Barcelona Media Research Center, Spain |
0.2053 | R.M.A. Nawab, M. Stevenson, and P. Clough University of Sheffield, UK |
0.2034 | P. Gupta and S. Rao DA-IICT, India |
0.1375 | C. Vania and M. Adriani Universitas Indonesia, Indonesia |
0.0558 | P. Suárez*, J.C. González*,°, and J. Villena-Román*,^ *Daedalus - Data, Decisions and Language, Spain °Universidad Politécnica de Madrid, Spain ^Universidad Carlos III de Madrid, Spain |
0.0195 | S. Alzahrani* and N. Salim° *Taif University, Saudi Arabia °Universiti Teknologi Malaysia, Malaysia |
0.0008 | A. Iftene et al. University of Iasi, Romania |
A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.