TY - GEN
T1 - Robust Data-centric Graph Structure Learning for Text Classification
AU - Zhuang, Jun
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/5/13
Y1 - 2024/5/13
N2 - Over the past decades, text classification underwent remarkable evolution across diverse domains. Despite these advancements, most existing model-centric methods in text classification cannot generalize well on class-imbalanced datasets that contain high-similarity textual information. Instead of developing new model architectures, data-centric approaches enhance the performance by manipulating the data structure. In this study, we aim to investigate robust data-centric approaches that can help text classification in our collected dataset, the metadata of survey papers about Large Language Models (LLMs). In the experiments, we explore four paradigms and observe that leveraging arXiv’s co-category information on graphs can help robustly classify the text data over the other three paradigms, conventional machine-learning algorithms, pre-trained language models’ fine-tuning, and zero-shot / few-shot classifications using LLMs.
AB - Over the past decades, text classification underwent remarkable evolution across diverse domains. Despite these advancements, most existing model-centric methods in text classification cannot generalize well on class-imbalanced datasets that contain high-similarity textual information. Instead of developing new model architectures, data-centric approaches enhance the performance by manipulating the data structure. In this study, we aim to investigate robust data-centric approaches that can help text classification in our collected dataset, the metadata of survey papers about Large Language Models (LLMs). In the experiments, we explore four paradigms and observe that leveraging arXiv’s co-category information on graphs can help robustly classify the text data over the other three paradigms, conventional machine-learning algorithms, pre-trained language models’ fine-tuning, and zero-shot / few-shot classifications using LLMs.
KW - Data-centric AI
KW - Graph neural networks
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85194484212&partnerID=8YFLogxK
U2 - 10.1145/3589335.3651915
DO - 10.1145/3589335.3651915
M3 - Conference contribution
AN - SCOPUS:85194484212
T3 - WWW 2024 Companion - Companion Proceedings of the ACM Web Conference
SP - 1486
EP - 1495
BT - WWW 2024 Companion - Companion Proceedings of the ACM Web Conference
T2 - 33rd ACM Web Conference, WWW 2024
Y2 - 13 May 2024 through 17 May 2024
ER -