Synopsis

  • Task: Given the Twitter feeds of the followers, determine the occupation, age, and gender of a celebrity.
  • Input: Celebrities as per Twitter's verified accounts and Wikidata notability; age, gender, occupation demographics; 2380 authors and the timelines of 10 random followers [data]
  • Evaluation: Harmonic mean of the per-demographic F1; age prediction assessment using ε-environment [code]
  • Submission: Deployment on TIRA [submit]
  • Baseline: Multinomial regression classifier on TFIDF-weighted word 1-2-grams of follower tweets, and of celebrity tweets, respectively. [code]

Task

Author profiling technology predicts personal or demographic traits of an author based on the expression of these traits in an authors text. A lang standing question in author profiling is, how much the assessed expression used by algorithms depends on characteristics of an individual author and how much on the expression of social groups and communities.

Celebrities are prolific and highly influential users on social media and they act as hubs for the like-minded. The strong homophily within a celebrities community is an ideal condition to study the interplay of author characteristics and community expression in author profiling and it opens the way to predicting traits of users without own texts using only their follower network.

The Celebrity Profiling task this year is to develop a piece of software which predicts three demographics of a celebrity from the text of their followers: occupation, age, and gender.

Data

The datasets contain three files: a follower-feeds.ndjson as input, a labels.ndjson as output, and a celebrity-feeds.ndjson for additional study. Each file lists all celebrities as JSON objects, one per line and identified by the id key. The training dataset contains 1,920 celebrities and is balanced towards gender and occupation. The test dataset contains 400 celebrities and is also balanced towards gender and occupation. The supplement dataset contains the remaining 8,265 celebrities but is not balanced in any way.

PLEASE NOTE: We do not provide the celebrity timelines for the test dataset.

Input Format

The follower-feeds.ndjson contains the English tweets of at least 10 followers for each celebrity, with at least 50 tweets each excluding retweets.
{"id": 1234, "text": [["a tweet of follower 1", "another tweet of follower 1", ...], ["a tweet of follower 2", ...], ...]}
{"id": 5678, "text": [["a tweet of follower 1", "another tweet of follower 1", ...], ["a tweet of follower 2", ...], ...]}
...
feeds.ndjson
The celebrity-feeds.ndjson contains the Twitter timelines of the original celebrities, formatted as:
{"id": 1234, "text": ["a tweet of celebrity 1", "another tweet of celebrity 1", ...]}
{"id": 5678, "text": ["a tweet of celebrity 2", "another tweet", ...]}
...
celebrity-feeds.ndjson

Output Format

The labels.ndjson contains the classes that should be predicted. A valid submission has to produce a labels.ndjson given the follower-feeds.ndjson and contain an entry for each id given in the input.
{"id": 1234, "occupation": "sports", "gender": "female", "birthyear": 2002}
{"id": 5678, "occupation": "professional", "gender": "male", "birthyear": 1990}
...
labels.ndjson
The following values are possible for each of the traits:
occupation  := {sports, performer, creator, politics}
birthyear   := {1940, ..., 1999}
gender      := {male, female}
possible value instances for each label

Evaluation

Submissions are judged by a combined metric cRank, which is the harmonic mean of each label's metric. $$ \text{cRank} = {3 \over {\frac{1}{\text{F}_{1, \text{occupation}}} + \frac{1}{\text{F}_{1, \text{gender}}} + \frac{1}{\text{F}_{1, \text{age}}}}} $$ All traits are judged by their respective F1. Precision and recall of birthyear are calculated leniently. If a prediction is within an m-window of the truth, it is counted as correct: $$ \text{true birthyear} - m \le \text{predicted birthyear} \le \text{true birthyear} + m$$ The window size m is based on the birth year and increases linearly from about 3 years for 1999 to about 9 years for 1940.

Results

Team test-dataset
cRank Age Gender Occupation
baseline-ngram-celebrity-tweets 0.631 0.500 0.753 0.700
hodge20 0.577 0.432 0.681 0.707
koloski20 0.521 0.407 0.616 0.597
tuksa20 0.477 0.315 0.696 0.598
baseline-ngram-follower-tweets 0.469 0.362 0.584 0.521
random 0.333 0.333 0.500 0.250

Task Committee