Synopsis

  • Task: Given a triplet (user query, reasoning trajectory, final answer), detect (1) the source of the reasoning/answer (human vs. AI) and (2) the safety of the reasoning and the final answer.
  • Subtasks:
    • Subtask 1 (Source Detection): Identify whether the reasoning trajectory and final answer are human-written or AI-generated.
    • Subtask 2 (Safety Detection): Decide whether the reasoning trajectory is safe vs. unsafe, and whether the final answer is safe vs. unsafe.
  • Registration: TBD
  • Important dates:
    • February 12, 2026: Training + Validation set release
    • March 30, 2026: Testing set release
    • May 07, 2026: Prediction submission
    • May 28, 2026: Participant notebook submission [template] [submission  – select "Stylometry and Digital Text Forensics (PAN)" ]
  • Data: Accessible at our GitHub repository.
  • Evaluation: File-based submissions on a separate evaluation platform (not executed inside the TIRA sandbox).

Task Overview

The year 2025 saw major advances in the reasoning capabilities of large language models, where models produce explicit reasoning trajectories before a final answer. However, intermediate reasoning steps can still be spurious, non-logical, or unsafe, and in some cases models may reach safe conclusions through deceptive or misaligned reasoning paths. To deepen our understanding of LLM-generated reasoning and to support improvements in reasoning and safety, this task focuses on detecting the source and the safety of reasoning trajectories.

Subtask 1: Source Detection

Given a triplet (user query, reasoning trajectory, final answer), identify whether the reasoning trajectory and final answer are generated by an AI system or written by a human. Queries in the testing set may involve math, coding, and real-life financial reasoning tasks.

Subtask 2: Safety Detection

Given a triplet (user query, reasoning trajectory, final answer), classify (1) whether the reasoning trajectory (i.e. each step in the reasoning trace) is safe vs. unsafe and (2) whether the final answer is safe vs. unsafe. The user queries come from three categories:

  • (a) risky queries requesting harmful content,
  • (b) jailbreak attacks with risks obscured by various strategies,
  • (c) benign queries containing risky tokens.

Data

Dataset release details (access, licensing, and download links) is announced at our GitHub repository.

Structure: each instance is a triplet (query, reasoning trajectory, final answer) with their labels.

Submission

File-only submission. The evaluation for this task is not conducted inside the TIRA sandbox. Instead, participants submit their system outputs as files in the required format on a separate evaluation platform.

Participants may develop and run their systems using any hardware or software resources of their choice (including local or cloud GPUs, open-weight models, or external APIs), as long as the final submission consists solely of the required output files.

Submission platform and output format: TBD.

Evaluation

Evaluation measures and aggregation will be described here once finalized. TBD.

The task targets both step-level (reasoning trajectory) and outcome-level (final answer) predictions, encouraging systems that can detect unsafe or synthetic reasoning trajectories and explain their decisions.

Baselines

Baselines and starter code will be linked here. TBD.

Leaderboard

TBD

Task Committee