基于类别查询的视觉Transformer研究

doi:10.16039/j.cnki.cn22-1249.2024.03.011

Journal of Jilin Institute of Chemical Technology, 2024, 41 (3): 62-67 doi: 10.16039/j.cnki.cn22-1249.2024.03.011

Current Issue | Archive | Adv Search

Research on Visual Transformers based on Class Queries

JIANG Chunyu,Wang Wei

School of Economics and Management,Jilin Institute of Chemical Technology,Jilin City 132022,China

Abstract
Figure/Table
References
Related Citation (3)

Download:

PDF (5187KB)
Export: BibTeX | EndNote (RIS)

Abstract In recent years, Transformer has gradually become the mainstream architecture in computer vision. Its broad expressiveness and high parallelism give it the ability to match the performance of convolutional neural networks (CNNs). However, there are two main problems in applying the attention mechanism to computer vision at the current stage: high computational complexity and the need for a large amount of training data. To address these issues, a category-query based visual Transformer model (OB_ViT) is proposed. The innovation lies in two aspects: the introduction of learnable category queries and the use of a loss function based on the Hungarian algorithm. Specifically, a learnable category query is used as input to the decoder, which allows reasoning about the relationship between target categories and the global image context. In addition, the Hungarian algorithm is used to enforce unique predictions, ensuring that each category query learns only one target category. Experimental results on the Cifar10 and 5-class Flower image classification datasets show that the OB_ViT model achieves significantly improved learning accuracy while reducing the number of parameters compared to ViT and ResNet50. For example, on the Cifar10 dataset, there is a 15% reduction in parameters and a 22% improvement in accuracy.

Key words： Transformer image classification category queries machine learning

Published: 25 March 2024

ZTFLH:
	TP391

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	JIANG Chunyu
	Wang Wei

Cite this article:

JIANG Chunyu,Wang Wei. Research on Visual Transformers based on Class Queries[J]. Journal of Jilin Institute of Chemical Technology, 2024, 41(3): 62-67.

URL:

http://xuebao.jlict.edu.cn/EN/10.16039/j.cnki.cn22-1249.2024.03.011 OR http://xuebao.jlict.edu.cn/EN/Y2024/V41/I3/62