Author Identification
2011

Within author identification, we address both authorship attribution and authorship verification.

Task
  • Given texts of uncertain authorship and texts from a set of candidate authors, the task is to map the uncertain texts onto their true authors among the candidates.
  • Given a text of uncertain authorship and text from a specific author, the task is to determine whether the given text has been written by that author.
Training Corpus

To develop your software, we provide you with a training corpus that comprises several different common attribution and verification scenarios. There are five training collections consisting of real-world texts (for authorship attribution), and three each with a single author (for authorship verification).

Learn more » Download corpus

Output
Classification results are to be formatted in an XML document similar to those found in the evaluation corpora, just without the <body> element.
Performance Measures
The performance of your authorship attribution will be judged by average precision, recall, and F1 over all authors in the given training set.
Test Corpus

Once you finished tuning your approach to achieve satisfying performance on the training corpus, you should run your software on the test corpus.

During the competition, the test corpus will not be released publicly. Instead, we ask you to submit your software for evaluation at our site as described below.

After the competition, the test corpus is available including ground truth data. This way, you have all the necessities to evaluate your approach on your own, yet being comparable to those who took part in the competition.

Download corpus

Submission

To submit your test run for evaluation, we ask you to send a Zip archive containing the output of your software when run on the test corpus to pan@webis.de.

Should the Zip archive be too large to be sent via mail, please upload it to a file hoster of your choosing and share a download link with us.

Results

The results of the evaluation have been compiled into this Excel sheet. The table shows micro-averaged and macro-averaged precision, recall, and F values for all runs submitted, dependent on the authorship sub-task. Moreover, the respective rankings obtained from these values are shown. The last column shows an overall ranking based on the sum of all ranks.

The authorship attribution approach of Ludovic Tanguy (University of Toulouse & CNRS, France) was highly ranked across all the attribution tasks. It was beaten only once significantly by the approach of Ioannis Kourtis (University of the Aegean, Greece) on the Large task. The approach of Mario Zechner, (Know-Center, Austria) achieved very high precision on the small attribution tasks, but paid for it in recall.

Regarding the authorship verification tasks, Tim Snider (Porfiau, Canada) achieved the best precision performance overall, though not the highest recall. It must be mentioned that authorship verification is more difficult to be accomplished than authorship attribution. Moreover, the distinction between macro- and micro-averaged performance measuring is meaningless in this case, since there is only one author of interest which is why just one set of results is reported.

A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.

Complete performances (Excel) Learn more »

Related Work

For an overview of approaches to automated authorship attribution, we refer you to recent survey papers in the area:

Task Chair

Patrick Juola

Patrick Juola

Duquesne University

Task Committee

Efstathios Stamatatos

Efstathios Stamatatos

University of the Aegean

Shlomo Argamon

Shlomo Argamon

Illinois Institute of Technology

Moshe Koppel

Moshe Koppel

Bar-Ilan University