Synopsis

  • Task: Given a triplet (user query, reasoning trajectory, final answer), detect (1) the source of the reasoning/answer (human vs. AI) and (2) the safety of the reasoning and the final answer.
  • Subtasks:
    • Subtask 1 (Source Detection): Identify whether the reasoning trajectory and final answer are human-written or AI-generated.
    • Subtask 2 (Safety Detection): Decide whether the reasoning trajectory is safe vs. unsafe, and whether the final answer is safe vs. unsafe.
  • Registration: TBD
  • Important dates:
    • May 07, 2026: software submission
    • May 28, 2026: participant notebook submission [template] [submission  – select "Stylometry and Digital Text Forensics (PAN)" ]
  • Data: To be released by February 10
  • Evaluation: File-based submissions on a separate evaluation platform (not executed inside the TIRA sandbox).

Task Overview

The year 2025 saw major advances in the reasoning capabilities of large language models (LLMs), where models produce explicit reasoning trajectories before a final answer. However, intermediate reasoning steps can still be spurious, non-logical, or unsafe, and in some cases models may reach safe conclusions through deceptive or misaligned reasoning paths. To deepen our understanding of AI-generated reasoning and to support improvements in reasoning and safety, this task focuses on detecting the source and the safety of reasoning trajectories.

Subtask 1: Source Detection

Given a triplet (user query, reasoning trajectory, final answer), identify whether the reasoning trajectory and final answer are generated by an AI system or written by a human. Queries mainly involve math, coding, and real-life financial reasoning tasks.

Subtask 2: Safety Detection

Given a triplet (user query, reasoning trajectory, final answer), classify (1) whether the reasoning trajectory is safe vs. unsafe and (2) whether the final answer is safe vs. unsafe. The user queries come from three categories:

  • (a) risky queries requesting harmful content,
  • (b) jailbreak attacks with risks obscured by various strategies,
  • (c) benign queries containing risky tokens.

Data

TBD. Dataset release details (access, licensing, and download links) will be announced here.

Expected structure: each instance is a triplet (query, reasoning trajectory, final answer).

Submission

File-only submission. The evaluation for this task is not conducted inside the TIRA sandbox. Instead, participants submit their system outputs as files in the required format on a separate evaluation platform.

Participants may develop and run their systems using any hardware or software resources of their choice (including local or cloud GPUs, open-weight models, or external APIs), as long as the final submission consists solely of the required output files.

Submission platform and output format: TBD.

Evaluation

TBD. Evaluation measures and aggregation will be described here once finalized.

The task targets both step-level (reasoning trajectory) and outcome-level (final answer) predictions, encouraging systems that can detect unsafe or synthetic reasoning trajectories and explain their decisions.

Baselines

TBD. Baselines and starter code will be linked here.

Leaderboard

TBD

Task Committee