Wikipedia Quality Flaw Prediction 2012
Synopsis
- Task: Given a Wikipedia article, what're its quality flaws?
- Input: [data]
Introduction
In previous years, we have addressed quality issues in Wikipedia in the form of vandalism detection. However, the majority of quality flaws is not caused due to malicious intentions but stem from edits by inexperienced authors; examples include poor writing style, unreferenced statements, or missing neutrality. This year, we generalize the vandalism detection task and focus on the prediction of quality flaws in Wikipedia articles.
We cast quality flaw prediction in Wikipedia as a one-class classification problem (as proposed in this paper). The key feature of this problem is that there is no representative "negative" training data (articles that are tagged to not contain a particular flaw), which makes common discrimination-based classification techniques, like binary or multiclass classification, inapplicable.
The task targets the ten most frequent quality flaws of English Wikipedia articles, which are listed in the following table. You can tackle each flaw individually, but you must predict all ten flaws. The prediction performance is evaluated individually for each flaw, and the results are averaged to a final score.
Flaw name | Description |
---|---|
Unreferenced | The article does not cite any references or sources. |
Orphan | The article has fewer than three incoming links. |
Refimprove | The article needs additional citations for verification. |
Empty section | The article has at least one section that is empty. |
Notability | The article does not meet the general notability guideline. |
No footnotes | The article’s sources remain unclear because of its inline citations. |
Primary sources | The article relies on references to primary sources. |
Wikify | The article needs to be wikified (internal links and layout). |
Advert | The article is written like an advertisement. |
Original research | The article contains original research. |
Background. Wikipedia users who encounter some flaw can tag the article with a respective cleanup tag. The existing cleanup tags correspond to the set of quality flaws that have been identified so far by Wikipedia users and the tagged articles provide a source of human-labeled data (this idea has been proposed in this paper). Hence, each of the ten flaws is defined by the respective cleanup tag.
Remark. Since quality flaw prediction in Wikipedia is a one-class problem, the engineering of features that discriminate articles containing a certain flaw from all other articles is one of the primary challenges. You can use all features imaginable and any source of information (e.g., the articles' revision history, Wikipedia's link graph, and also external sources), with one exception: you must not use any information concerning the cleanup tags that define the flaws. I.e., to predict whether an article suffers from a certain flaw, you must not analyze whether the article is tagged with the respective cleanup tag nor whether it is a member of a respective cleanup category. Such features are unusable in practice.
Task
Given a set of Wikipedia articles that are tagged with a particular quality flaw, decide whether an untagged article suffers from this flaw.
Award
We are happy to announce the following overall winner of the 1st International Competition on Quality Flaw Prediction in Wikipedia who will be awarded 200,- Euro sponsored by Wikimedia Deutschland:
- Edgardo Ferretti, Donato Hernández Fusilier, Rafael Guzmán Cabrera, Manuel Montes-y-Gómez, Marcelo Errecalde, and Paolo Rosso.
Congratulations!
Input
To develop your approach, we provide you with a training corpus which comprises a set of Wikipedia articles for a number of quality flaws. Learn more »
Output
For each set of articles per quality flaw, your quality flaw prediction software shall output a file formatted as follows:
PAGEID C FEATUREVAL_1 FEAUTREVAL_2 FEATUREVAL_3 ... FEAUTREVAL_n 279320 1 0.8647264878 5.0548156462 0.2854089458 ... 0.0000000584 871808 0 0.6442019751 3.4979755645 0.1203675764 ... 0.0505761605 850457 1 0.5054090519 9.3060661202 0.0550005005 ... 0.9190851616 912468 0 0.9644561645 1.5059164514 0.4696140241 ... 0.0000000031 ...
- The column PAGEID denotes the pageid of the test article.
- The column C denotes the decision of your classifier. This value should be either 1 or 0, where 1 means the article contains the flaw.
- The columns FEATUREVAL_1 to FEATUREVAL_n denote the n feature values of your document model for the given article. The feature values need not be normalized, and they should consist of unrounded values. If you have non-numeric features include them as well. Please send along a short description of each feature, and make sure that all entries in a column are based on the same feature implementation.
Evaluation
The prediction performance will be judged by average precision, recall, and F-measure over all quality flaws.
Results
Wikipedia quality flaw prediction performance | |||
---|---|---|---|
Precision | Recall |
|
Participant |
0.735400 | 0.917097 | 0.814529 |
Edgardo Ferretti*, Donato Hernández Fusilier°, Rafael Guzmán Cabrera°, Manuel
Montes-y-Gómez^, Marcelo Errecalde*, and Paolo Rosso† * Universidad Nacional de San Luis, Argentina ° Universidad de Guanajuato, Mexico ^ Óptica y Electrónica (INAOE), Mexico † Universidad Politécnica de Valencia, Spain |
0.753213 | 0.852926 | 0.798336 |
Oliver Ferschke, Iryna Gurevych, and Marc Rittberger Technische Universität Darmstadt, Germany |
0.043209 | 0.579241 | 0.079449 |
Ionut Cristian Pistol and Adrian Iftene "Alexandru Ioan Cuza" University of Iasi, Romania |
A more detailed analysis of the detection performances can be found in the overview paper accompanying this task.
Sponsor
Related Work
- Get additional Wikipedia data: MediaWiki API, Wikimedia database dumps, Wikimedia Toolserver
- A list of MediaWiki parsers and converters.
- Maik Anderka and Benno Stein. A Breakdown of Quality Flaws in Wikipedia. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality '12), pages 11-18, 2012.
- Maik Anderka, Benno Stein, and Nedim Lipka. Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. In Proceedings of the 35th International ACM Conference on Research and Development in Information Retrieval (SIGIR '12), 2012.