TY - JOUR
T1 - Detecting Pages to Protect in Wikipedia Across Multiple Languages
AU - Spezzano, Francesca
AU - Suyehira, Kelsey
AU - Gundala, Laxmi Amulya
N1 - Publisher Copyright:
© 2019, Springer-Verlag GmbH Austria, part of Springer Nature.
PY - 2019/12
Y1 - 2019/12
N2 - Wikipedia is based on the idea that anyone can make edits to the website to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the website that do not align with Wikipedia’s intended uses. For this reason, Wikipedia allows for some pages of the website to become protected, where only certain users can make revisions to the page. This allows administrators to protect pages from vandalism, libel, and edit wars. However, with over five million pages on English Wikipedia, it is impossible for active editors to monitor all pages to suggest articles in need of protection. In this paper, we consider the problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia. We formulate the problem as a binary classification task and propose a novel set of features to decide which pages to protect based on (1) users page revision behavior and (2) page categories. We tested our system, called DePP, on four different Wikipedia language versions: English, German, French, and Italian. Experimental results show that DePP reaches at least 0.93 in both AUROC and average precision across the four languages and significantly outperforms baselines. Moreover, DePP works well in a more realistic, unbalanced setting, that is, when unprotected pages are greatly outnumbered by protected pages, by achieving a good AUROC, a high recall and an average precision significantly higher than the baselines in all the settings and languages considered.
AB - Wikipedia is based on the idea that anyone can make edits to the website to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the website that do not align with Wikipedia’s intended uses. For this reason, Wikipedia allows for some pages of the website to become protected, where only certain users can make revisions to the page. This allows administrators to protect pages from vandalism, libel, and edit wars. However, with over five million pages on English Wikipedia, it is impossible for active editors to monitor all pages to suggest articles in need of protection. In this paper, we consider the problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia. We formulate the problem as a binary classification task and propose a novel set of features to decide which pages to protect based on (1) users page revision behavior and (2) page categories. We tested our system, called DePP, on four different Wikipedia language versions: English, German, French, and Italian. Experimental results show that DePP reaches at least 0.93 in both AUROC and average precision across the four languages and significantly outperforms baselines. Moreover, DePP works well in a more realistic, unbalanced setting, that is, when unprotected pages are greatly outnumbered by protected pages, by achieving a good AUROC, a high recall and an average precision significantly higher than the baselines in all the settings and languages considered.
KW - Misinformation
KW - Page protection
KW - Semi-automated detection
KW - Wikis and open collaboration
UR - http://www.scopus.com/inward/record.url?scp=85063158204&partnerID=8YFLogxK
UR - https://scholarworks.boisestate.edu/cs_facpubs/217
U2 - 10.1007/s13278-019-0555-0
DO - 10.1007/s13278-019-0555-0
M3 - Article
SN - 1869-5450
VL - 9
JO - Social Network Analysis and Mining
JF - Social Network Analysis and Mining
IS - 1
M1 - 10
ER -