Skip to main navigation Skip to search Skip to main content

A New Attack Surface: XAI-guided Adversarial Comment Generation with LLMs to Attack Fake News Detectors

  • Boise State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The swift proliferation of false information on social media presents significant risks to public confidence and the stability of democracy, which motivates the development of machine learning and deep learning-based fake news detectors. While these systems can effectively analyze news content and user interactions, they remain vulnerable to adversarial attacks. Prior research has focused mainly on modifying article text or retrieving generic user comments, leaving comment-based attack strategies underexplored. Existing comment-based attacks often rely on generating synthetic text that can be unrealistic or retrieving existing comments without strategic guidance that ignores feature importance. In this work, we introduce a novel attack surface that combines model interpretability with generative language models. Our approach uses SHAP (SHapley Additive exPlanations) to identify influential tokens driving fake or real classifications and prompts a large language model (LLM) to generate contextually credible human-like comments that utilize influential tokens. The generated comments are then appended to the article, evaluated against multiple state?of?the?art detectors (dEFEND, TextCNN, and RoBERTa), and compared against existing comment-based attacks such as MALCOM, CopyCat, and retrieval?based methods. Our XAI-guided LLM-based approach is competitive compared to existing generative and retrieval-based attack methods, with higher attack success rates while maintaining naturalness and contextual relevance.

Original languageEnglish
Title of host publicationWSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
Place of PublicationNew York, NY
Pages1058-1062
Number of pages5
ISBN (Electronic)9798400722929
DOIs
StatePublished - 21 Feb 2026
Event19th ACM International Conference on Web Search and Data Mining, WSDM 2026 - Boise, United States
Duration: 22 Feb 202626 Feb 2026

Publication series

NameWSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining

Conference

Conference19th ACM International Conference on Web Search and Data Mining, WSDM 2026
Country/TerritoryUnited States
CityBoise
Period22/02/2626/02/26

Keywords

  • adversarial machine learning
  • computing methodologies
  • machine learning robustness
  • robustness in nlp

Fingerprint

Dive into the research topics of 'A New Attack Surface: XAI-guided Adversarial Comment Generation with LLMs to Attack Fake News Detectors'. Together they form a unique fingerprint.

Cite this