This is the 18th evaluation lab on digital text forensics. PAN will be held as part of the CLEF conference in Avignon, France, on September 10-14, 2018. Evaluations will commence from January till June. We invite you to take part in any of the three tasks shown below.
Given a document, who wrote it?
One subtask focuses on cross-domain authorship attribution applied in fanfiction and another subtask focuses on style change detection.
Given a document, what're its author's traits?
This task focuses on gender, whereas text and image may be used as information sources of tweets in English, Spanish and Arabic.
Given a document, hide its author.
This task works against identification and profiling by automatically paraphrasing a text to obfuscate its author's style. The tasks offered are author masking and obfuscation evaluation.
University of Santiago de Compostela (Spain)
In this talk I will review some recent results regarding early detection of signs of depression and anorexia. Since 2017, we have been organizing eRisk, a CLEF lab that promotes the development of effective and efficient solutions for early risk prediction on the Internet. eRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organisations, people with suicidal inclinations, or people susceptible to depression. In this talk, I will discuss the lessons learned over these two years and some future lines of work.
Dr. David E. Losada is an Associate Professor in Computer Science & Artificial Intelligence at the University of Santiago de Compostela (Spain). He is currently the Director of the Master's Programme on Big Data Analytics. David E. Losada received his BS in Computer Science (with honors) in 1997, and his PhD in Computer Science (with honors) in 2001, both from the University of A Coruña (Spain). From 2001 to 2002, he was a lecturer in the San Pablo-CEU University (Spain) and, in 2003, he joined the Univ. of Santiago de Compostela as a senior research fellow ("Ramón y Cajal" R&D programme). His current research interests include a wide range of Information Retrieval (IR) and related areas such as: early risk detection, text mining, IR evaluation, IR probabilistic models, summarization, novelty detection, and sentence retrieval. Losada is an active member of the IR community and he regularly serves in the Programme Committee of prestigious international conferences such as SIGIR or ECIR. He has also led several R&D projects and contracts in the area of search technologies. In 2011, Losada was recognized with an ACM senior member award. David started the organization of eRisk in 2017. eRisk is a CLEF lab that promotes the development of effective and efficient solutions for early risk prediction on the Internet.
The drastic change in the Web was witnessed throughout the past decade, which saw an exponential growth in social networking services. Traditionally, social network users are encouraged to complete their profiles by explicitly providing their personal attributes such as age, gender, interests, etc. Such information is essential for Marketing, Facility Arrangement, or Candidate Assessment, but, unfortunately, often not publicly available. This gives rise to user profiling, which aims at automatic inference of individual user attributes based on their social network interactions. Considering that human beings frequently contribute multi-modal data in multiple online social networks at the same time, it is essential to implement inter-source complimentary multi-view learning techniques to perform automatic user profiling efficiently. In this talk, we will overview recent research attempts on learning across multiple social networks and data modalities for automatic user profiling. We will also give several practical examples of how Multi-View User Profiling helps SoMin.ai in boosting the efficiency of enterprises' marketing efforts.
Aleksandr Farseev is an international researcher, entrepreneur and the founder of SoMin.ai, the Social Media Marketing platform driven by AI. He has obtained his Ph.D. degree from the National University of Singapore and currently holding an Adjunct Professor position at ITMO University, Russia. Apart from academic efforts, Aleksandr leads the AI research department at SoMin.ai - an AI-Driven Social Discovery and Influencer Marketing Platform. Aleksandr's research interests include Social Media Analytics, Multi-View Learning, and Automatic User Profiling. He is known as one of the leading experts in Multi-View User Profile Learning.
Open University of Cyprus & Research Centre on Interactive Media, Smart Systems & Emerging Technologies Nicosia (Cyprus)
There is much concern about algorithms that underlie information services and the view of the social world they present to users. Image search engines are known to perpetuate gender stereotypes, particularly surrounding professions (e.g., returning primarily images of men on a search for "engineer," although few, if any, men on a search for "nurse"). In the first part of the talk, I discuss the problem of detecting social biases in image search results. We developed a novel method for automatically examining the content and strength of gender stereotypes in image results, which is inspired by the trait adjective checklist method. In experiments with Microsoft Bing, we found that photos of women are more often retrieved for searches on warm character traits (e.g., "emotional"), whereas agentic traits (e.g., "rational") typically result in more images of men. In the second part of the talk, I address questions surrounding the origin of social biases in search algorithms. I will argue that the quality of image metadata is a source of bias, as algorithms are typically trained on "gold standard," human-produced metadata. Specifically, in an experiment testing a commonly used crowdsourcing task for metadata generation, I will provide evidence that people's descriptions of men and women depicted in similar contexts differ in systematic ways that are predictable by theory. In conclusion, I shall argue that while the reproduction of social stereotypes in search algorithms is likely inevitable, there are ways to effectively raise users' awareness of biases in results.
Jahna Otterbacher received her doctorate from the University of Michigan (Ann Arbor, USA), where she was a member of the Computational Linguistics and Information Retrieval (CLAIR) research group. She is currently Assistant Professor at the Open University of Cyprus (OUC), Faculty of Pure and Applied Sciences, where she is the academic coordinator of the MSc in Social Information Systems. Jahna also coordinates the Cyprus Center for Algorithmic Transparency (CyCAT) at the OUC, a new initiative funded by the H2020 Widespread Twinning program. The CyCAT seeks to promote transparency and accountability in algorithmic systems that people routinely use, but that are rather opaque to them (e.g., search engines), through three types of interventions - data-, developer- and user-focused. In addition to her post at the OUC, Jahna holds a concurrent appointment as team leader of the Transparency in Algorithms Group at RISE (Research centre on Interactive media, Smart systems and Emerging technologies), a new center of excellence and innovation in Nicosia, Cyprus, in collaboration with two international Advanced Partners, UCL and MPI.