基于类别查询的视觉Transformer研究

doi:10.16039/j.cnki.cn22-1249.2024.03.011

下载:

PDF (5187KB)
输出: BibTeX | EndNote (RIS)

摘要近年来，Transformer已逐渐成为计算机视觉领域的主流架构。其远程表达能力和高并行性赋予了它在性能上与卷积神经网络相媲美的能力。然而，在当前阶段，将注意力机制应用于计算机视觉仍存在两个主要问题：一是计算复杂度过高；二是需要大量的训练数据。为解决这些问题，提出一种基于类别查询的视觉Transformer模型（OB_ViT）。创新之处主要体现在以下两个方面：一是引入可学习的类别查询；二是采用基于匈牙利算法的损失函数。具体而言，一种可学习的类别查询作为解码器的输入，通过此方法，可以对目标类别与全局图像上下文之间的关系进行推理。此外，通过采用匈牙利算法强制实现唯一预测，确保每个类别查询仅学习一种目标类别。在Cifar10和5分类Flower数据集上的图像分类实验表明，与ViT和Resnet50相比，OB_ViT模型在参数量减少的同时，学习准确率显著提高。例如，在Cifar10数据集上，参数量减少15%，准确率提升22%。

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	（）
	作者相关文章
	姜春雨
	王伟

关键词: Transformer')" href="#">

Transformer 图像分类类别查询机器学习

Abstract: In recent years, Transformer has gradually become the mainstream architecture in computer vision. Its broad expressiveness and high parallelism give it the ability to match the performance of convolutional neural networks (CNNs). However, there are two main problems in applying the attention mechanism to computer vision at the current stage: high computational complexity and the need for a large amount of training data. To address these issues, a category-query based visual Transformer model (OB_ViT) is proposed. The innovation lies in two aspects: the introduction of learnable category queries and the use of a loss function based on the Hungarian algorithm. Specifically, a learnable category query is used as input to the decoder, which allows reasoning about the relationship between target categories and the global image context. In addition, the Hungarian algorithm is used to enforce unique predictions, ensuring that each category query learns only one target category. Experimental results on the Cifar10 and 5-class Flower image classification datasets show that the OB_ViT model achieves significantly improved learning accuracy while reducing the number of parameters compared to ViT and ResNet50. For example, on the Cifar10 dataset, there is a 15% reduction in parameters and a 22% improvement in accuracy.

Key words: Transformer image classification category queries machine learning

出版日期: 2024-03-25 发布日期: 2024-03-25 整期出版日期: 2024-03-25

:
	TP391

引用本文:

姜春雨, 王伟. 基于类别查询的视觉Transformer研究[J]. 吉林化工学院学报, 2024, 41(3): 62-67.
JIANG Chunyu, Wang Wei. Research on Visual Transformers based on Class Queries. Journal of Jilin Institute of Chemical Technology, 2024, 41(3): 62-67.

链接本文:

https://xuebao.jlict.edu.cn/CN/10.16039/j.cnki.cn22-1249.2024.03.011 或 https://xuebao.jlict.edu.cn/CN/Y2024/V41/I3/62

[1]	. [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 0 .
[2]	. [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 0 .
[3]	SHAO Bao-li, LU Da, ZHAO Dong-hui. The Application of Dimensional Analysis in the Physical Quantity Conversion between Physical System and Numerical System [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 1 -3 .
[4]	ZHANG Jian, ZHAO Xiang, QU Bo, WU Qi, LIU Yu-tong, LI Yu-shi, LIU Qun. Application of Phosphorus-sulfur-nitrogen Composite Flame Retardant in Cotton Fabric [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 4 -7 .
[5]	WU Ping, REN Hong, LU Fei, WEI Qingling. A Functional Material on Recognition of Zn(II) ions based on the New Azo Compound [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 8 -10 .
[6]	YANG Yan-jun, WANG Ya-hong, Yang Xiu-dong. Process Aptimization of Surfactant Assisted Extraction of Total Polyphenols from Kyllinga Brevifolia Rottb [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 11 -15 .
[7]	LIU Jin-lu, LEI Yong-ping, WANG xiao-lin, ZHONG fang-li. Study on the Purification Method of Total Saponins fromFruit of Rosa Davuvrica Pall. and its Purification Method [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 16 -23 .
[8]	SONG Jian-gang, ZHONG Fang-li, WANG Xiao-lin, LIN Yu. Study on Extraction of Anthocyanin from Aronia melanocarpa Fruit by Ionic liquid Ultrasound Assisted [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 24 -31 .
[9]	TAN Li-hui, TAN Hong-wu. The Crashworthiness Analysis of different Cross-Section Thin-Walled Components [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 32 -35 .
[10]	YU Wen-xin, ZHENG Kai, WANG Li-hui, LIU Hai-bo, Wang Jian-xin. The Influence of Magnetostrictive Transducer Radiation Plate material on Radiation Sound Field Distribution [J]. Journal of Jilin Institute of Chemical Technology, 2018, 35(9): 36 -40 .