Synopsis

  • Task: Given a document, determine who wrote it.
  • Input: [data]
  • Submission: [submit]

Task

Within the traditional authorship tasks there are different flavors:

  • Traditional (closed-class /open-class, with varying numbers of candidate authors) authorship attribution. Within the closed class you will be given a closed set of candidate authors and are asked to identify which one of them is the author of an anonymous text. Withing the open class you have to consider also that it might be that none of the candidates is the real author of the document.
  • Authorship clustering/intrinsic plagiarism: in this problem you are given a text (which, for simplicity, is segmented into a sequence of "paragraphs") and are asked to cluster the paragraphs into exactly two clusters: one that includes paragraphs written by the "main" author of the text and another that includes all paragraphs written by anybody else. (Thus, this year the intrinsic plagiarism has been moved from the plagiarism task to the author identification track.).

Input

To develop your software, we provide you with a training corpus that comprises several different common attribution and clustering scenarios. Learn more »

Output

As per repeated requests, here is a sample submission format to use for the Traditional Authorship Attribution Competition for PAN/CLEF. Please note that following this format is not mandatory and we will continue to accept anything we can interpret.

For traditional authorship problems (e.g. problem A), use the following (all words in ALL CAPS should be filled out appropriately):

team TEAM NAME : run RUN NUMBER
task TASK IDENTIFIER
file TEST FILE = AUTHOR IDENTIFIER
file TEST FILE = AUTHOR IDENTIFIER
...

For problems E and F, there are no designated sample authors, so we recommend listing paragraph numbers. Author identifier is optional and arbitrary -- if it makes you feel better to talk about authors A and B or authors 1 and 2 you can insert it into the appropriate field. Any paragraphs not listed will be assumed to be part of an unnamed default author.

team TEAM NAME : run RUN NUMBER
task TASK IDENTIFIER
file TEST FILE = AUTHOR IDENTIFIER (PARAGRAPH LIST)
file TEST FILE = AUTHOR IDENTIFIER
...

For example:

team Jacob : run 1
task B
file 12Btest01.txt = A
file 12Btest02.txt = A
file 12Btest03.txt = A
file 12Btest04.txt = None of the Above
file 12Btest05.txt = A
file 12Btest06.txt = A
file 12Btest07.txt = A
file 12Btest08.txt = A
file 12Btest09.txt = A
file 12Btest10.txt = A

task C
file 12Ctest01.txt = A
file 12Ctest02.txt = A
file 12Ctest03.txt = A
file 12Ctest04.txt = A
file 12Ctest05.txt = A
file 12Ctest06.txt = A
file 12Ctest07.txt = A
file 12Ctest08.txt = A
file 12Ctest09.txt = A

task F
file 12Ftest01.txt = (1,2,3,6,7)
file 12Ftest01.txt = (4,5)

In this sample file, we consider anything not listed in task F (paragraphs 8 and beyond) to be a third, unnamed author.

Evaluation

The performance of your authorship attribution will be judged by average precision, recall, and F1 over all authors in the given training set.

Results

Authorship attribution performance
Overall Participant
86.37 Marius Popescu* and Cristian Grozea°
°Fraunhofer FIRST, Germany, and *University of Bucharest, Romania
83.40 Navot Akiva
Bar Ilan University, Israel
82.41 Michael Ryan and John Noecker Jr
Duquesne University, USA
70.81 Ludovic Tanguy, Franck Sajous, Basilio Calderone, and Nabil Hathout
CLLE-ERSS: CNRS and University of Toulouse, France
62.13 Esteban Castillo°, Darnes Vilariño°, David Pinto°, Iván Olmos°, Jesús A. González*, and Maya Carrillo°
°Benemérita Universidad Autónoma de Puebla and *Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Mexico
59.77 François-Marie Giraud and Thierry Artières
LIP6, Université Pierre et Marie Curie (UPMC), France
58.35 Upendra Sapkota and Thamar Solorio
University of Alabama at Birmingham, USA
57.55 Ramon de Graaff° and Cor J. Veenman*
°Leiden University and *Netherlands Forensics Institute, The Netherlands
57.40 Stefan Ruseti and Traian Rebedea
University Politehnica of Bucharest, Romania
54.88 Anna Vartapetiance and Lee Gillam
University of Surrey, UK
43.18 Roman Kern°*, Stefan Klampfl*, Mario Zechner*
°Graz University of Technology and *Know-Center GmbH, Austria
16.63 Julian Brooke and Graeme Hirst
University of Toronto, Canada

Complete performances (Excel)

A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.

Task Committee