Detecting Undisclosed Paid Editing in Wikipedia

Nikesh Joshi, Francesca Spezzano, Mayson Green, Elijah Hill

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Wikipedia, the free and open-collaboration based online encyclopedia, has millions of pages that are maintained by thousands of volunteer editors. As per Wikipedia's fundamental principles, pages on Wikipedia are written with a neutral point of view and maintained by volunteer editors for free with well-defined guidelines in order to avoid or disclose any conflict of interest. However, there have been several known incidents where editors intentionally violate such guidelines in order to get paid (or even extort money) for maintaining promotional spam articles without disclosing such. In this paper, we address for the first time the problem of identifying undisclosed paid articles in Wikipedia. We propose a machine learning-based framework using a set of features based on both the content of the articles as well as the patterns of edit history of users who create them. To test our approach, we collected and curated a new dataset from English Wikipedia with ground truth on undisclosed paid articles. Our experimental evaluation shows that we can identify undisclosed paid articles with an AUROC of 0.98 and an average precision of 0.91. Moreover, our approach outperforms ORES, a scoring system tool currently used by Wikipedia to automatically detect damaging content, in identifying undisclosed paid articles. Finally, we show that our user-based features can also detect undisclosed paid editors with an AUROC of 0.94 and an average precision of 0.92, outperforming existing approaches.

Original languageEnglish
Title of host publicationThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
Pages2899-2905
Number of pages7
ISBN (Electronic)9781450370233
DOIs
StatePublished - 20 Apr 2020
Event29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan, Province of China
Duration: 20 Apr 202024 Apr 2020

Publication series

NameThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
Country/TerritoryTaiwan, Province of China
CityTaipei
Period20/04/2024/04/20

Keywords

  • Detection of abusive content
  • Malicious editors
  • Sockpuppet accounts.
  • Wikipedia

Fingerprint

Dive into the research topics of 'Detecting Undisclosed Paid Editing in Wikipedia'. Together they form a unique fingerprint.

Cite this