Researchers have used large quantities of online data to study dynamics in novel ways. Consider the specific case of online networked individuals sharing geo-located, multimodal, pieces of information in social media platforms, e.g., users of Twitter, Instagram, Facebook. Can their social dynamics be used to unveil the hidden dimensions that regulate the social life of our cities? To answer this question, our research has focused on understanding how people psychologically experience cities and, as a result, we have created new mapping tools that capture the aesthetic, olfactory and sonic layers of our cities, modeling happiness and use of figurative language, e.g., irony. The work presented in this talk mixes data mining, urban informatics, and computational social science to study how these dimensions relate to demographic, e.g., age or gender, and socio-economic factors, e.g., education, crime, race, or wealth, that characterize the profile of the modern urban fabric.
Authorship profiling and text-based deception detection have been the focus of attention in recent years, partly because of a rapid growth of Internet communication and necessity to detect fraud in online reviews and dating profiles, reveal suicidal tendencies in authors of texts on social media, as well as to assess who likes/dislikes these or those products and services (male, female, extroverts…) using linguistic analysis of their reviews, etc. Industrial companies are in need of techniques for quick and valid assessment of candidates’ intelligence and personality, and analyzing texts could provide such opportunities. Author profiling and deception detection domain is rapidly developing but not for Slavic languages. They have long been beyond the scope of relevant studies, which is largely due to the fact that there were no corresponding text corpora available and no efficient methods of natural language processing in place. In this keynote we present the results of the research aimed at deception detection and personality and gender recognition in Russian written texts. In this research RusPersonality was used, which is by far the largest corpus of written texts in Slavic languages with rich metadata (information on the authors of the texts – gender, age, occupation, scores on different personality traits, results of neuropsychological assessment, etc. and information on the texts – genre, topic, deceptive/truthful, etc.). The second source of material for research conducted in RusProfiling Lab is social media. In most of the experiments topic-independent features were used. Special attention in the talk will be paid to the estimation of a likelihood of self-destructive (including suicidal as the most severe form) behavior using linguistic analyses of writing. The keynote concludes by encouraging a discussion on the necessity to seek for topic-independent and language-independent features in authorship profiling and for explanation of the estimated correlations between linguistics parameters of written texts and characteristics of their authors.