Voight-Kampff Generative AI Authorship Verification 2024

Synopsis

  • Task: Given two texts, one authored by a human, one by a machine: pick out the human.
  • Input: Pairs of texts, one of which was written by a human. [download tba.]
  • Evaluation: tba.
  • Submission: Deployment on TIRA [submit]
  • Baseline: tba.

Task

With Large Language Models (LLMs) improving at breakneck speed and seeing more widespread adoption every day, it is getting increasingly hard to discern whether a given text was authored by a human being or a machine. Many classification approaches have been devised to help humans distinguish between human- and machine-authored text, though often without questioning the fundamental and inherent feasibility of the task itself.

With years of experience in a related but much broader field—authorship verification—, we set out to answer whether this task can be solved. We start with the simplest arrangement of a suitable task setup: Given two texts, one authored by a human, one by a machine: pick out the human.

The Generative AI Authorship Verification Task @ PAN is organized in collaboration with the Voight-Kampff Task @ ELOQUENT Lab in a builder-breaker style. PAN participants will build systems to tell human and machine apart, while ELOQUENT participants will investigate novel text generation and obfuscation methods for avoiding detection.

Data

Test data for this task will be compiled from the submissions of ELOQUENT participants and will comprise multiple text genres, such as news articles, Wikipedia intro texts, or fanfiction. Additionally, PAN participants will be provided with a bootstrap dataset of real and fake news articles spanning multiple 2021 U.S. news headlines. The bootstrap dataset can be used for training purposes, though we strongly recommend using other sources as well.

Download instructions: The dataset will be made available to participants via Zenodo at a later point.

The bootstrap dataset is provided as a set of newline-delimited JSON files. Each file contains a list of articles, written either by (any number of) human authors or a single machine. That is, the file human.jsonl contains only human texts, whereas a file gemini-pro.jsonl contains articles about the same topics, but written entirely by Google's Gemini Pro. The file format is as follows:

{"id": "news-2021-01-01-2021-12-31-kabulairportattack/art-081", "text": "..."}
{"id": "news-2021-01-01-2021-12-31-capitolriot/art-050", "text": "..."}
...

The article IDs and line orderings are the same across all files, so the same line always corresponds to the same topic, but from different “authors”.

The test dataset will be provided in a different format. Instead of individual files, only a single JSONL file will be given, each line containing a pair of texts:

{"id": "iixcWBmKWQqLAwVXxXGBGg", "text1": "...", "text2": "..."}
{"id": "y12zUebGVHSN9yiL8oRZ8Q", "text1": "...", "text2": "..."}
...

The IDs will be scrambled and the participant's task is to generate an appropriate output file with predictions for which of the two texts is the human one (see Submission below).

Evaluation

Details will be announced at a later date.

Submission

Participants will submit their systems as Docker images through the Tira platform. It is not expected that submitted systems are actually trained on Tira, but they must be standalone and runnable on the platform without requiring contact to the outside world (evaluation runs will be sandboxed).

The submitted software must be executable inside the container via a command line call. The script must take two arguments: INPUT-FILE (absolute path to the input JSONL file) and OUTPUT-DIRECTORY (absolute path to the output directory):

$ mySoftware INPUT-FILE OUTPUT-DIRECTORY

Within OUTPUT-DIRECTORY, a single (!) file with the file extension *.jsonl must be created with the following format:

{"id": "iixcWBmKWQqLAwVXxXGBGg", "is_human": 1.0}
{"id": "y12zUebGVHSN9yiL8oRZ8Q", "is_human": 0.3}
...

For each test case in the input file, an output line must be written with the ID of the input text pair and a confidence score between 0.0 and 1.0. A score < 0.5 means that text1 is believed to be human-authored. A score > 0.5 means that text2 is believed to be human-authored. A score of exactly 0.5 means the case is undecidable. Participants are encouraged to answer with 0.5 rather than making a wrong prediction.

All test cases must be processed in isolation without information leakage between them! Even though systems may be given an input file with multiple JSON lines at once for reasons of efficiency, these inputs must be processed and answered just the same as if only a single line were given. Answers for any one test case must not depend on other cases in the input dataset!

Task Committee