Intrinsic Plagiarism Detection 2009

Synopsis

  • Task: Given a set of suspicious documents the task is to identify all plagiarized text passages, e.g., by detecting writing style breaches. The comparison of a suspicious document with other documents is not allowed in this task.
  • Input: [data]
  • Evaluator: [code]

Award

We are happy to announce the following overall winner of the 1st International Competition on Plagiarism Detection who will be awarded 500,- Euro sponsored by Yahoo! Research:

  • Task winner of the intrinsic analysis task is Efstathios Stamatatos from the University of the Aegean.

Congratulations!

Input

To develop your approach, we provide you with a training corpus which comprises a set of suspicious documents, each of which may contain plagiarized passages.

Output

For each suspicious document suspicious-documentXYZ.txt found in the evaluation corpora, your plagiarism detector shall output an XML file suspicious-documentXYZ.xml which contains meta information about all plagiarism cases detected within:

<document reference="suspicious-documentXYZ.txt">
  <feature name="detected-plagiarism"
           this_offset="5"
           this_length="1000"
  />
  ...
</document>

The XML documents must be valid with respect to the XML schema found here.

Evaluation

Performance will be measured using macro-averaged precision and recall, granularity, and the plagdet score, which is a combination of the first three measures. For your convenience, we provide a reference implementation of the measures written in Python.

Results

Intrinsic Plagiarism Detection Performance
Plagdet Participant
0.2462 E. Stamatatos
University of the Aegean, Greece
0.1955 B. Hagbi and M. Koppel
Bar Ilan University, Israel
0.1766 M. Zechner, M. Muhr, R. Kern, and M. Granitzer
Know-Center Graz, Austria
0.1219 L. M. Seaward and S. Matwin
University of Ottawa, Canada

A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.

Task Committee