Wikipedia Vandalism Detection 2011

Synopsis

  • Task: Given a set of edits on Wikipedia articles, separate the ill-intentioned edits from the well-intentioned edits.
  • Input: [data (de, es)] [data (en)]

Task

The definition of vandalism at Wikipedia includes "any addition, removal, or change of content made in a deliberate attempt to compromise the integrity of Wikipedia." Hence, Wikipedia vandalism detection comprises the following classification task:

Given a set of edits on Wikipedia articles, the task is to separate the ill-intentioned edits from the well-intentioned edits.

Input

To develop your approach, we provide you with a training corpus which comprises a set of edits on Wikipedia articles. All of these edits have been manually annotated whether they constitute vandalism or not. Learn more »

Output

For all edits found in the evaluation corpora, your vandalism detector shall output a file classification.txt as follows:

OLDREVID NEWREVID C CONF FEATUREVAL1 FEAUTREVAL2 FEATUREVAL3 ... FEAUTREVALn
26864258 27932250 V 0.92 0.864726878 5.054816462 0.285489458 ... 0.000000584
28689695 87188208 R 0.50 0.642019751 3.499755645 0.123675764 ... 0.050561605
85047080 85047157 V 0.67 0.505090519 9.306061202 0.055005005 ... 0.919051616
80637222 91249168 R 0.43 0.964561645 1.505164514 0.469614241 ... 0.000000031
...
  • The column OLDREVID is the edit's old revision ID.
  • The column NEWREVID is the edit's new revision ID.
  • The column C denotes whether the edit's class according to your classifier.
    V denotes vandalism edits and R denotes regular edits.
  • The column CONF denotes your classifier's confidence. If your classifier does not return confidence values, simply use 0 and 1 according to the classifiers output.
  • The columns FEATUREVAL1 to FEATUREVALn denote the n feature values your classifier has computed for the given edit. The feature values need not be normalized, and they should consist of unrounded values. If you have non-numeric features include them as well. Please send along a short description of each feature, and make sure that all entries in column FEATUREVALi are based on the same feature implementation.

Evaluation

The performance of your vandalism detector will be measured based on the area under its precision-recall-curve (PR-AUC). Details about the measures can be found in this paper (Section 1.2).

Results

The following table lists the performances achieved by the participating teams:

English Wikipedia vandalism detection performance
PR-AUC Participant
0.82230 A.G. West and I. Lee
University of Pennsylvania, USA
0.42464 C.-A. Drăguşanu, M. Cufliuc, and A. Iftene
AL.I.Cuza University, Romania
German Wikipedia vandalism detection performance
PR-AUC Participant
0.70591 A.G. West and I. Lee
University of Pennsylvania, USA
0.18978 F.G. Aksit
Maastricht University, Netherlands
Spanish Wikipedia vandalism detection performance
PR-AUC Participant
0.48938 A.G. West and I. Lee
University of Pennsylvania, USA
0.22077 F.G. Aksit
Maastricht University, Netherlands

A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.

Task Committee