Wikipedia Vandalism Detection

The definition of vandalism at Wikipedia includes "any addition, removal, or change of content made in a deliberate attempt to compromise the integrity of Wikipedia." Hence, Wikipedia vandalism detection comprises the following classification task:

Given a set of edits on Wikipedia articles, the task is to separate the ill-intentioned edits from the well-intentioned edits.
Training Corpus

To develop your approach, we provide you with a training corpus which comprises a set of edits on Wikipedia articles. All of these edits have been manually annotated whether they constitue vandalism or not.

Learn more » Download corpus (en) Download corpus (de, es)


For all edits found in the evaluation corpora, your vandalism detector shall output a file classification.txt as follows:

26864258 27932250 V 0.92 0.864726878 5.054816462 0.285489458 ... 0.000000584
28689695 87188208 R 0.50 0.642019751 3.499755645 0.123675764 ... 0.050561605
85047080 85047157 V 0.67 0.505090519 9.306061202 0.055005005 ... 0.919051616
80637222 91249168 R 0.43 0.964561645 1.505164514 0.469614241 ... 0.000000031
  • The column OLDREVID is the edit's old revision ID.
  • The column NEWREVID is the edit's new revision ID.
  • The column C denotes whether the edit's class according to your classifier.
    V denotes vandalism edits and R denotes regular edits.
  • The column CONF denotes your classifier's confidence. If your classifier does not return confidence values, simply use 0 and 1 according to the classifiers output.
  • The columns FEATUREVAL1 to FEATUREVALn denote the n feature values your classifier has computed for the given edit. The feature values need not be normalized, and they should consist of unrounded values. If you have non-numeric features include them as well. Please send along a short description of each feature, and make sure that all entries in column FEATUREVALi are based on the same feature implementation.
Performance Measures

The performance of your vandalism detector will be measured based on the area under its precision-recall-curve (PR-AUC). Details about the measures can be found in this paper (Section 1.2).

Test Corpus

Once you finished tuning your approach to achieve satisfying performance on the training corpus, you should run your software on the test corpus.

During the competition, the test corpus does not contain ground truth data that reveals whether or not a suspicious document contains any plagiarized passages. To find out the performance of your software on the test corpus, you must collect the output its and submit it as described below.

After the competition, the test corpus is updated to include the ground truth data. This way, you have all the neccessary data to evaluate your approach on your own, without submitting it's output, yet being comparable to those who took part in the competition.

Download corpus


To submit your test run for evaluation, we ask you to send a Zip archive containing the output of your software when run on the test corpus to

Should the Zip archive be too large to be sent via mail, please upload it to a file hoster of your choosing and share a download link with us.


The following table lists the performances achieved by the participating teams:

English Wikipedia vandalism detection performance
0.82230A.G. West and I. Lee
University of Pennsylvania, USA
0.42464C.-A. Drăguşanu, M. Cufliuc, and A. Iftene
AL.I.Cuza University, Romania
German Wikipedia vandalism detection performance
0.70591A.G. West and I. Lee
University of Pennsylvania, USA
0.18978F.G. Aksit
Maastricht University, Netherlands
Spanish Wikipedia vandalism detection performance
0.48938A.G. West and I. Lee
University of Pennsylvania, USA
0.22077F.G. Aksit
Maastricht University, Netherlands

A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.

Learn more »

Related Work

For an overview of approaches to Wikipedia vandalism detection, we would like to refer you to the proceedings of the past vandalism detection competition: PAN @ CLEF'10 (overview paper).

The current best performing vandalism detector combines the two top vandalism detectors of last year as well as a third approach:

There are a couple of free software tools that tackle vandalism detection:

  • WikiTrust --- A reputation system for Wikipedia authors and content [web, source, API]
  • Spatio Temporal Processing on Wikipedia (Stiki) [web, source, API]
  • ClueBot NG [web]

Task Chair

Martin Potthast

Martin Potthast

Bauhaus-Universität Weimar

Task Committee

Teresa Holfeld

Teresa Holfeld

Bauhaus-Universität Weimar

Benno Stein

Benno Stein

Bauhaus-Universität Weimar