Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks

Clayton Fields, Osama Natouf, Andrew McMains, Catherine Henry, Casey Kennington

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Large transformer language models trained exclusively on massive quantities of text are now the standard in NLP. In addition to the impractical amounts of data used to train them, they require enormous computational resources for training. Furthermore, they lack the rich array of sensory information available to humans, who can learn language with much less exposure to language. In this study, performed for submission in the BabyLM challenge, we show that we can improve a small transformer model’s data efficiency by enriching its embeddings by swapping the learned word embeddings from a tiny transformer model with vectors extracted from a custom multiplex network that encodes visual and sensorimotor information. Further, we use a custom variation of the ELECTRA model that contains less than 7 million parameters and can be trained end-to-end using a single GPU. Our experiments show that models using these embeddings outperform equivalent models when pretrained with only the small BabyLM dataset, containing only 10 million words of text, on a variety of natural language understanding tasks from the GLUE and SuperGLUE benchmarks and a variation of the BLiMP task.

Original languageEnglish
Title of host publicationCoNLL 2023 - BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, Proceedings
EditorsAlex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, Ryan Cotterell
Pages47-57
Number of pages11
ISBN (Electronic)9781952148026
StatePublished - 2023
EventBabyLM Challenge at the 27th Conference on Computational Natural Language Learning, CoNLL 2023 - Singapore, Singapore
Duration: 6 Dec 20237 Dec 2023

Publication series

NameCoNLL 2023 - BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, Proceedings

Conference

ConferenceBabyLM Challenge at the 27th Conference on Computational Natural Language Learning, CoNLL 2023
Country/TerritorySingapore
CitySingapore
Period6/12/237/12/23

Fingerprint

Dive into the research topics of 'Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks'. Together they form a unique fingerprint.

Cite this