This is the 11th evaluation lab on uncovering plagiarism, authorship, and social software misuse. PAN will be held as part of the CLEF conference in Sheffield, UK, on September 15-18, 2014. Evaluations will commence from January till June. We invite you to take part in any of the three tasks shown below.
Given a document, is it an original?
This task is divided into source retrieval and text alignment. Source retrieval is about searching for likely sources of a suspicious document. Text alignment is about matching passages of reused text between a pair of documents.
Given a document, who wrote it?
This task focuses on authorship verification and methods to answer the question whether two given documents have the same author or no. This question accurately emulates the real-world problem that most forensic linguists face every day.
Given a document, what're its author's traits?
This task is concerned with predicting an author's demographics from her writing. For example, an author's style may reveal her age and gender.
The University of Sheffield
Recently I was asked to assist in a text attribution problem: the illegitimate reuse of text (and images) from the web pages of a small business. When the offender was approached about committing possible plagiarism their response was “prove it”. This talk will describe how I approached the problem of proving ownership and the challenges it entailed. I will describe the experiences gained from working on the EPSRC-funded Measuring Text Reuse (METER) project with the UK Press Association, along with the current text attribution literature, that informed my textual analysis. I will also demonstrate how freely available resources can be used to tackle this kind of problem. In the end was I successful: did the offender admit their guilt? You’ll have to attend the talk to find out!
Paul Clough is a Reader in Information Retrieval at the University of Sheffield. Paul joined the university in 1999 working as a Research Assistant in the Department of Computer Science in collaboration with the Journalism Department and the British Press Association on a project entitled “Measuring Text Reuse”. Following various interests in NLP and Information Retrieval, Paul worked as a researcher on a range of projects until 2005 when he became a Lecturer in the Information School. Paul is now head of the Information Retrieval research group and coordinator for a new Masters programme in Data Science. He has continued teaching and researching various aspects of data management and information storage and retrieval. Paul has published over 100 peer-reviewed articles, including a co-authored Springer book on multilingual information retrieval.
University of Trento
Personality recognition from text consists in the automatic classification of authors' personality traits from pieces of text they wrote. Classifier's predictions can be compared against gold standard labels, obtained by means of personality assessments like the Big5 personality test. Until recently, the extraction of personality types was limited to blogs and offline texts, while in recent years there is a strong interest in the scientific community about the extraction of personality from various sources, such as online social networks, speech and video. Current approaches to Personality Recognition are based on supervised learning, but this has several limitations, for example the cost of data annotation, the lack of domain adaptability and multilinguality. We present an unsupervised method for personality recognition from text and some of its applications in Social network analysis as well as in other NLP tasks.
Fabio Celli, 1981, is a computational linguist and data miner. He got a degree in communication studies at the University of Urbino, in linguistics at the University of Bologna, and a PhD in cognitive science at the Center for Mind and Brain Sciences (CIMeC), University of Trento. He is one of the organizers of the Workshop on Computational Personality Recognition. His PhD thesis, Adaptive Personality Recognition from Text, has been published by Lambert Academic Publishing. He also loves contemporary art and electronic music.