Abstract
The goal of multi-label image classification is to predict a set of labels for a single image. Recent work has shown that explicitly modeling the co-occurrence relationship between classes is critical for achieving good performance on this task. State-of-the-art approaches model this using graph convolutional networks, which are complex and computationally expensive. We propose a novel, efficient association module as an alternative. This is coupled with a transformer-based feature-extraction backbone. The proposed model was evaluated using two standard datasets: MS-COCO and PASCAL VOC. The results show that the proposed model outperforms several strong baseline models.
| Original language | English |
|---|---|
| State | Published - 2022 |
| Event | 33rd British Machine Vision Conference Proceedings, BMVC 2022 - London, United Kingdom Duration: 21 Nov 2022 → 24 Nov 2022 |
Conference
| Conference | 33rd British Machine Vision Conference Proceedings, BMVC 2022 |
|---|---|
| Country/Territory | United Kingdom |
| City | London |
| Period | 21/11/22 → 24/11/22 |