Authorship Attribution 2011

Synopsis
Task
Input
Output
Evaluation
Results
Related Work
Task Committee

Synopsis

Task: Given a document, determine who wrote it.
Input: [data]
Submission: [submit]

Task

Given texts of uncertain authorship and texts from a set of candidate authors, the task is to map the uncertain texts onto their true authors among the candidates.
Given a text of uncertain authorship and text from a specific author, the task is to determine whether the given text has been written by that author.

Input

To develop your software, we provide you with a training corpus that comprises several different common attribution and verification scenarios. There are five training collections consisting of real-world texts (for authorship attribution), and three each with a single author (for authorship verification). Learn more »

Output

Classification results are to be formatted in an XML document similar to those found in the evaluation corpora, just without the <body> element.

Evaluation

The performance of your authorship attribution will be judged by average precision, recall, and F1 over all authors in the given training set.

Results

The results of the evaluation have been compiled into this Excel sheet. The table shows micro-averaged and macro-averaged precision, recall, and F values for all runs submitted, dependent on the authorship sub-task. Moreover, the respective rankings obtained from these values are shown. The last column shows an overall ranking based on the sum of all ranks.

The authorship attribution approach of Ludovic Tanguy (University of Toulouse & CNRS, France) was highly ranked across all the attribution tasks. It was beaten only once significantly by the approach of Ioannis Kourtis (University of the Aegean, Greece) on the Large task. The approach of Mario Zechner, (Know-Center, Austria) achieved very high precision on the small attribution tasks, but paid for it in recall.

Regarding the authorship verification tasks, Tim Snider (Porfiau, Canada) achieved the best precision performance overall, though not the highest recall. It must be mentioned that authorship verification is more difficult to be accomplished than authorship attribution. Moreover, the distinction between macro- and micro-averaged performance measuring is meaningless in this case, since there is only one author of interest which is why just one set of results is reported.

A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.

Patrick Juola. Authorship Attribution. In Foundations and Trends in Information Retrieval, Volume 1, Issue 3, December 2006.
Moshe Koppel, Jonathan Schler, and Shlomo Argamon. Computational Methods in Authorship Attribution. Journal of the American Society for Information Science and Technology, Volume 60, Issue 1, pages 9-26, January 2009.
Efstathios Stamatatos. A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, Volume 60, Issue 3, pages 538-556, March 2009.