TY - GEN
T1 - A New Attack Surface
T2 - 19th ACM International Conference on Web Search and Data Mining, WSDM 2026
AU - Ahmed, Md Shoaib
AU - Spezzano, Francesca
N1 - Publisher Copyright:
© 2026 Owner/Author.
PY - 2026/2/21
Y1 - 2026/2/21
N2 - The swift proliferation of false information on social media presents significant risks to public confidence and the stability of democracy, which motivates the development of machine learning and deep learning-based fake news detectors. While these systems can effectively analyze news content and user interactions, they remain vulnerable to adversarial attacks. Prior research has focused mainly on modifying article text or retrieving generic user comments, leaving comment-based attack strategies underexplored. Existing comment-based attacks often rely on generating synthetic text that can be unrealistic or retrieving existing comments without strategic guidance that ignores feature importance. In this work, we introduce a novel attack surface that combines model interpretability with generative language models. Our approach uses SHAP (SHapley Additive exPlanations) to identify influential tokens driving fake or real classifications and prompts a large language model (LLM) to generate contextually credible human-like comments that utilize influential tokens. The generated comments are then appended to the article, evaluated against multiple state?of?the?art detectors (dEFEND, TextCNN, and RoBERTa), and compared against existing comment-based attacks such as MALCOM, CopyCat, and retrieval?based methods. Our XAI-guided LLM-based approach is competitive compared to existing generative and retrieval-based attack methods, with higher attack success rates while maintaining naturalness and contextual relevance.
AB - The swift proliferation of false information on social media presents significant risks to public confidence and the stability of democracy, which motivates the development of machine learning and deep learning-based fake news detectors. While these systems can effectively analyze news content and user interactions, they remain vulnerable to adversarial attacks. Prior research has focused mainly on modifying article text or retrieving generic user comments, leaving comment-based attack strategies underexplored. Existing comment-based attacks often rely on generating synthetic text that can be unrealistic or retrieving existing comments without strategic guidance that ignores feature importance. In this work, we introduce a novel attack surface that combines model interpretability with generative language models. Our approach uses SHAP (SHapley Additive exPlanations) to identify influential tokens driving fake or real classifications and prompts a large language model (LLM) to generate contextually credible human-like comments that utilize influential tokens. The generated comments are then appended to the article, evaluated against multiple state?of?the?art detectors (dEFEND, TextCNN, and RoBERTa), and compared against existing comment-based attacks such as MALCOM, CopyCat, and retrieval?based methods. Our XAI-guided LLM-based approach is competitive compared to existing generative and retrieval-based attack methods, with higher attack success rates while maintaining naturalness and contextual relevance.
KW - adversarial machine learning
KW - computing methodologies
KW - machine learning robustness
KW - robustness in nlp
UR - https://www.scopus.com/pages/publications/105033151367
U2 - 10.1145/3773966.3779393
DO - 10.1145/3773966.3779393
M3 - Conference contribution
SN - 979-8-4007-2292-9
T3 - WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
SP - 1058
EP - 1062
BT - WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
CY - New York, NY
Y2 - 22 February 2026 through 26 February 2026
ER -