Wikipedia Vandalism Detection

Vandalism has always been one of Wikipedia's biggest problems, yet there are only few automatic countermeasures. Instead, volunteers spend their time in reverting vandalism edits---time, which is not spend on improving other parts of the Wikipedia. The goal of this evaluation campaign is to research and develop new, reliable ways to detect vandalism edits, which can be used to aid the Wikipedians.

Vandalism is defined as "any addition, removal, or change of content made in a deliberate attempt to compromise the integrity of Wikipedia". Put another way, a vandalism edit is an edit made with bad intentions.

Solutions to vandalism detection will resemble those of, e.g., spam detection. Hence, the application of machine learning to this problem is straightforward which makes the engineering of features for an edit model that discriminates vandalism edits from regular edits one of the primary topics. You can use all features imaginable for your edit model, with one exception: you may not look into an edit's future. I.e., to classify an edit, you may not analyze succeeding edits on the same article to see what became of it. Such a feature would be unusable in practice.

Given a set of edits on Wikipedia articles, separate the ill-intentioned edits from the well-intentioned edits.
Training Corpus

To develop your approach, we provide you with a training corpus which comprises a set of edits on Wikipedia articles. All of these edits have been manually annotated whether they constitue vandalism or not.

Learn more » Download corpus


For all edits found in the evaluation corpora, your vandalism detector shall output a file classification.txt as follows:

26864258 27932250 V 0.92
28689695 87188208 R 0.50
85047080 85047157 V 0.67
80637222 91249168 R 0.43
  • The first column is the edit's old revision ID.
  • The second column is the edit's new revision ID.
  • The third column denotes whether the edit is vandalism (V) or regular (R), as determined by your classifier.
  • The fourth column denotes your classifier's confidence. Providing these confidence values is optional.
Performance Measures

Performance is measured using the receiver operating characteristics (ROC) [Wikipedia | paper]. More specifically, we measure the area under the ROC curve (AUC), while the algorithm which maximizes the AUC performs best.

Test Corpus

Once you finished tuning your approach to achieve satisfying performance on the training corpus, you should run your software on the test corpus.

During the competition, the test corpus does not contain ground truth data that reveals whether or not a suspicious document contains any plagiarized passages. To find out the performance of your software on the test corpus, you must collect the output its and submit it as described below.

After the competition, the test corpus is updated to include the ground truth data. This way, you have all the neccessary data to evaluate your approach on your own, without submitting it's output, yet being comparable to those who took part in the competition.

Download corpus


To submit your test run for evaluation, we ask you to send a Zip archive containing the output of your software when run on the test corpus to

Should the Zip archive be too large to be sent via mail, please upload it to a file hoster of your choosing and share a download link with us.


The following table lists the performances achieved by the participating teams:

Wikipedia Vandalism Detection Performance
0.92236S.M. Mola Velasco
Private, Spain
0.90351B.T. Adler*, L. de Alfaro° and I. Pye^
*Fujitsu Labs of America, Inc., USA
°Google, Inc., and University of California Santa Cruz, USA
^CloudFlare, Inc., USA
0.89856S. Javanmardi et al.
University of California Irvine, USA
0.89377D. Chichkov
SC Software Inc., USA
0.87990L. Seaward
University of Ottawa, Canada
0.87669I. Hegedũs*, R. Ormándi*, R. Farkas*, and M. Jelasity*,°
*University of Szeged, Hungary
°Hungarian Academy of Sciences, Hungary
0.85875M. Harpalani, T. Phumprao, M. Bassi, M. Hart, and R. Johnson
Stony Brook University, USA
0.84340J. White and R. Maessen
University of California Irvine, USA
0.65404A. Iftene
University of Iasi, Romania

A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.

Learn more »

Related Work

On Wikipedia, there are a number of pages dealing with vandalism. The following pages offer a good introduction as well as many links to other pages about vandalism detection policies and tools:

Research on Wikipedia vandalism is still in its infancy. The following lists all related papers up to now:

  • Si-Chi Chin, W. Nick Street, Padmini Srinivasan, and David Eichmann. Detecting Wikipedia vandalism with active learning and statistical language models. Fourth Workshop on Information Credibility on the Web (WICOW 2010), Raleigh, NC, April 2010.
  • R. Stuart Geiger and David Ribes. The Work of Sustaining Order in Wikipedia: The Banning of a Vandal. In CSCW'10: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pages 117-126, Savannah, Georgia, USA, 2010. ACM.
  • Kelly Y. Itakura and Charles L. A. Clarke. Using Dynamic Markov Compression to Detect Vandalism in the Wikipedia. In SIGIR'09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 822-823, New York, NY, USA, 2009. ACM.
  • Martin Potthast. Crowdsourcing a Wikipedia Vandalism Corpus. In 33rd Annual International ACM SIGIR Conference (to appear), Geneva, July 2010. ACM.
  • Martin Potthast, Benno Stein, and Robert Gerling. Automatic Vandalism Detection in Wikipedia. In ECIR'08: Proceedings of the 30th European Conference on IR Research, Glasgow, volume 4956 LNCS of Lecture Notes in Computer Science, pages 663-668, Berlin Heidelberg New York, 2008. Springer.
  • Reid Priedhorsky, Jilin Chen, Shyong (Tony) K. Lam, Katherine Panciera, Loren Terveen, and John Riedl. Creating, Destroying, and Restoring Value in Wikipedia. In Group'07: Proceedings of the International Conference on Supporting Group Work, Sanibel Island, Florida, USA, 2007.
  • Koen Smets, Bart Goethals, and Brigitte Verdonk. Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach. In WikiAI'08: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pages 43-48. AAAI Press, 2008.
  • Andrew G. West, Sampath Kannan, and Insup Lee. Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. Technical Report MS-CIS-10-05, University of Pennsylvania, 2010.

Task Chair

Martin Potthast

Martin Potthast

Bauhaus-Universität Weimar

Task Committee

Teresa Holfeld

Teresa Holfeld

Bauhaus-Universität Weimar

Benno Stein

Benno Stein

Bauhaus-Universität Weimar