Obfuscation Evaluation 2018


We call an obfuscation software

  • safe, if a forensic analysis does not reveal the original author of its obfuscated texts,
  • sound, if its obufscated texts are textually entailed with their originals, and
  • sensible, if its obfuscated texts are inconspicuous.

These dimensions are orthogonal; an obfuscation software may meet any of them to various degrees of perfection.

The task is to devise and implement performance measures that quantify any or parts of these aspects of an obfuscation software.


We will provide you with the data generated by submitted obfuscation software as soon as it becomes available.

The input format will be the same as the output of the author masking task.


The output of an evaluation software should be formatted as follows:

measure {
  key  : "myMeasure"
  value: "0.567"
measure {
  key  : "myOtherMeasure"
  value: "1.5789"
measure {
  key  : "myThirdMeasure"
  value: "0.98421"

The output is formatted as ProtoBuf text, not JSON.

  • key can be any string that clearly and concisely names the performance measure.
  • value shall be a numeric quantification of the measure for a given obfuscated text.


Performance will be measured by assessing the validity of your corpus in two ways.

Detection: Your corpus will be fed into the text alignment prototypes that have been submitted in previous years to the text alignment task. The performances of each text alignment prototype in detecting the plagiarism in your corpus will be measured using macro-averaged precision and recall, granularity, and the plagdet score.

Peer-review: Your corpus will be made available to the other participants of this task and be subject to peer-review. Every participant will be given a chance to assess and analyze the corpora of all other participants in order to determine corpus quality.

Task Committee