|
|
Research on Visual Transformers based on Class Queries |
JIANG Chunyu,Wang Wei
|
School of Economics and Management,Jilin Institute of Chemical Technology,Jilin City 132022,China |
|
|
Abstract In recent years, Transformer has gradually become the mainstream architecture in computer vision. Its broad expressiveness and high parallelism give it the ability to match the performance of convolutional neural networks (CNNs). However, there are two main problems in applying the attention mechanism to computer vision at the current stage: high computational complexity and the need for a large amount of training data. To address these issues, a category-query based visual Transformer model (OB_ViT) is proposed. The innovation lies in two aspects: the introduction of learnable category queries and the use of a loss function based on the Hungarian algorithm. Specifically, a learnable category query is used as input to the decoder, which allows reasoning about the relationship between target categories and the global image context. In addition, the Hungarian algorithm is used to enforce unique predictions, ensuring that each category query learns only one target category. Experimental results on the Cifar10 and 5-class Flower image classification datasets show that the OB_ViT model achieves significantly improved learning accuracy while reducing the number of parameters compared to ViT and ResNet50. For example, on the Cifar10 dataset, there is a 15% reduction in parameters and a 22% improvement in accuracy.
|
Published: 25 March 2024
|
|
|
|
[1] |
XIN Ruihao, DONG Zheyuan, MIAO Fengbo, WANG Tiantian, LI Yingrui, FENG Xin.
Research on heart disease prediction model based on machine learning
[J]. Journal of Jilin Institute of Chemical Technology, 2022, 39(9): 27-32. |
[2] |
YANG Yeli, AI Xuezhong, YUAN Tianqi, CHEN Siyu.
Design and Simulation Analysis of a Half Bridge Converter
[J]. Journal of Jilin Institute of Chemical Technology, 2022, 39(3): 51-58. |
|
|
|
|