Synopsis

  • Task: tba.
  • Submission: Deployment on TIRA [submit]
  • Input: tba. [download]
  • Evaluation Measures: tba. [code]
  • Baselines: tba. [code]
  • Evaluation: tba. [code]

Task

Develop models to detect and classify human-AI collaborative texts:

Subtask 1: AI Detection Sensitivity

  • Detect AI-generated text with/without obfuscation
  • Binary classification: Human vs Machine

Subtask 2: Human-AI Collaborative Text Classification

Given a document collaboratively authored by humans and models, our goal is to classify it into one of the following six categories:

  • (1) Fully human-written
  • (2) Human-initiated, then machine-continued
  • (3) Human-written, then machine-polished
  • (4) Machine-written, then machine-humanized (obfuscated)
  • (5) Machine-written, then human-edited
  • (6) Deeply-mixed text: where some parts are written by a human and some are generated by a machine.

Data

  • Multi-domain documents (academic, journalism, social media)
  • Human-written and machine-generated samples (GPT-4, Claude, PaLM)
  • Collaborative texts with annotation layers for human/machine contributions
  • Multiple languages supported (English, Spanish, German)

Evaluation

  • Macro-F1
  • Robustness testing against novel obfuscation techniques

Submission

tba

Baselines

  • Zero-shot detectors: DetectGPT, Binoculars
  • Fine-tuned LLMs: RoBERTa-base, DeBERTa-v3
  • Ensemble methods with stylometric features

Results

tba

Task Committee