This is the 17th evaluation lab on digital text forensics. PAN will be held as part of the CLEF conference in Dublin, Ireland, on September 11-14, 2017. Evaluations will commence from January till June. We invite you to take part in any of the three tasks shown below.
Given a document, who wrote it?
This task focuses on author clustering and style breach detection. Author clustering will be done on short documents of paragraph length. Style breach detection has the goal of identifying breaches of writing style in longer texts.
Given a document, what're its author's traits?
This task focuses on gender and language variety identification on Twitter, providing a corpus of tweets annotated with gender and language variety.
Given a document, hide its author.
This task works against identification and profiling by automatically paraphrasing a text to obfuscate its author's style. The tasks offered are author masking and obfuscation evaluation.
University of Turin
Researchers have used large quantities of online data to study dynamics in novel ways. Consider the specific case of online networked individuals sharing geo-located, multimodal, pieces of information in social media platforms, e.g., users of Twitter, Instagram, Facebook. Can their social dynamics be used to unveil the hidden dimensions that regulate the social life of our cities? To answer this question, our research has focused on understanding how people psychologically experience cities and, as a result, we have created new mapping tools that capture the aesthetic, olfactory and sonic layers of our cities, modeling happiness and use of figurative language, e.g., irony. The work presented in this talk mixes data mining, urban informatics, and computational social science to study how these dimensions relate to demographic, e.g., age or gender, and socio-economic factors, e.g., education, crime, race, or wealth, that characterize the profile of the modern urban fabric.
Rossano Schifanella is an Assistant Professor in Computer Science at the University of Turin, Italy, where he is a member of the Applied Research on Computational Complex Systems group. He is a visiting scientist at Nokia Bell Labs, Cambridge, UK, and a former visiting scientist at Yahoo Labs and at the Center for Complex Networks and Systems Research at Indiana University where he was applying computational methods to model social behavior in online platforms. His research embraces the creative energy of a range of disciplines across technology, computational social science, data visualization, and urban informatics. He is passionate about building new mapping tools that capture the sensorial layers of a city, and designing computational frameworks to model aesthetics, creativity, and figurative language in social media.
Authorship profiling and text-based deception detection have been the focus of attention in recent years, partly because of a rapid growth of Internet communication and necessity to detect fraud in online reviews and dating profiles, reveal suicidal tendencies in authors of texts on social media, as well as to assess who likes/dislikes these or those products and services (male, female, extroverts…) using linguistic analysis of their reviews, etc. Industrial companies are in need of techniques for quick and valid assessment of candidates’ intelligence and personality, and analyzing texts could provide such opportunities. Author profiling and deception detection domain is rapidly developing but not for Slavic languages. They have long been beyond the scope of relevant studies, which is largely due to the fact that there were no corresponding text corpora available and no efficient methods of natural language processing in place. In this keynote we present the results of the research aimed at deception detection and personality and gender recognition in Russian written texts. In this research RusPersonality was used, which is by far the largest corpus of written texts in Slavic languages with rich metadata (information on the authors of the texts – gender, age, occupation, scores on different personality traits, results of neuropsychological assessment, etc. and information on the texts – genre, topic, deceptive/truthful, etc.). The second source of material for research conducted in RusProfiling Lab is social media. In most of the experiments topic-independent features were used. Special attention in the talk will be paid to the estimation of a likelihood of self-destructive (including suicidal as the most severe form) behavior using linguistic analyses of writing. The keynote concludes by encouraging a discussion on the necessity to seek for topic-independent and language-independent features in authorship profiling and for explanation of the estimated correlations between linguistics parameters of written texts and characteristics of their authors.
Tatiana Litvinova is head of Corpus Sociolinguistics and Authorship Profiling Lab (RusProfiling Lab). This is a joint laboratory of Voronezh State University (Voronezh, Russia), Voronezh State Pedagogical University (Voronezh, Russia) and The Kurchatov Institute (Moscow, Russia). The studies conducted in RusProfiling Lab and dedicated to authorship profiling and deception detection have been funded by leading Russian scientific funds. Lab is also in collaboration with industrial companies to get real-world data and to develop authorship profiling techniques that would come in handy in HR. Tatiana has a PhD in Russian Linguistics from Voronezh State University (Voronezh, Russia). She is one of the organizers of RusProfiling Shared Task on Gender Prediction in Cross-Genre perspective which will be held in conjunction with FIRE in December 2017.