Within author identification, we address both authorship attribution and authorship verification.
To develop your software, we provide you with a training corpus that comprises several different common attribution and verification scenarios. There are five training collections consisting of real-world texts (for authorship attribution), and three each with a single author (for authorship verification).
Once you finished tuning your approach to achieve satisfying performance on the training corpus, you should run your software on the test corpus.
During the competition, the test corpus will not be released publicly. Instead, we ask you to submit your software for evaluation at our site as described below.
After the competition, the test corpus is available including ground truth data. This way, you have all the necessities to evaluate your approach on your own, yet being comparable to those who took part in the competition.
To submit your test run for evaluation, we ask you to send a Zip archive containing the output of your software when run on the test corpus to email@example.com.
Should the Zip archive be too large to be sent via mail, please upload it to a file hoster of your choosing and share a download link with us.
The results of the evaluation have been compiled into this Excel sheet. The table shows micro-averaged and macro-averaged precision, recall, and F values for all runs submitted, dependent on the authorship sub-task. Moreover, the respective rankings obtained from these values are shown. The last column shows an overall ranking based on the sum of all ranks.
The authorship attribution approach of Ludovic Tanguy (University of Toulouse & CNRS, France) was highly ranked across all the attribution tasks. It was beaten only once significantly by the approach of Ioannis Kourtis (University of the Aegean, Greece) on the Large task. The approach of Mario Zechner, (Know-Center, Austria) achieved very high precision on the small attribution tasks, but paid for it in recall.
Regarding the authorship verification tasks, Tim Snider (Porfiau, Canada) achieved the best precision performance overall, though not the highest recall. It must be mentioned that authorship verification is more difficult to be accomplished than authorship attribution. Moreover, the distinction between macro- and micro-averaged performance measuring is meaningless in this case, since there is only one author of interest which is why just one set of results is reported.
A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.
For an overview of approaches to automated authorship attribution, we refer you to recent survey papers in the area: