Leveraging Machine Learning for Automatically Classifying Fake News in the COVID-19 Outbreak

Brian P. Daley, Francesca Spezzano

Research output: Contribution to conferencePresentation

Abstract

Fake news, spreading its disinformation, is a plague to modern journalism and the media. Poisoning the reliability of sources, accuracy detection is necessary. In this research, we use machine learning to automatically classify COVID-19 related fake news' validity and to find the most important features in the headlines used in determining accuracy. We used a dataset crawled from Politifact.com between March and June 2020 and contained 299 fake news and 100 truthful news as determined by the website's fact-checkers. We extracted different features from the news headlines, including features from the Linguistic Inquiry and Word Count Engine to be used in different machine learning models from the scikit-learn API. The model with the highest average precision was the Decision Tree Classifier, achieving 79% on five-fold cross-validation. The top features used by the classification model included the number of motion words, number of relativity words, number of prepositions in the headline, the authenticity of the tone in the headline, and the word count. Fake news outlets commonly try to have more description in their headlines to convince users that a headline is true, which explains an increase in prepositions, motion, and relativity words, and overall word count in fake news.

Original languageAmerican English
StatePublished - 12 Jul 2020

Fingerprint

Dive into the research topics of 'Leveraging Machine Learning for Automatically Classifying Fake News in the COVID-19 Outbreak'. Together they form a unique fingerprint.

Cite this