基于图像偏移角和多分支卷积神经网络的旋转不变模型设计

张萌; 李响; 张经纬

doi:10.11999/JEIT240417

基于图像偏移角和多分支卷积神经网络的旋转不变模型设计

doi: 10.11999/JEIT240417

张萌¹,
李响^2, ,,
张经纬¹

1.
东南大学集成电路学院南京 211189
2.
兰州大学物理科学与技术学院兰州 730000

基金项目: 广东省重点领域研发计划(2021B1101270006)

详细信息

作者简介:
张萌：男，教授，博士生导师，研究方向为人工智能算法及硬件加速器协同设计、FPGA系统设计及应用等

李响：男，硕士生，研究方向为人工智能及图像处理算法

张经纬：男，博士生，研究方向为FPGA智能计算，高层次综合设计及人工智能编译器

通讯作者:
李响　220220938961@lzu.edu.cn

中图分类号: TN911.73; TP391
计量
- 文章访问数: 137
- HTML全文浏览量: 38
- PDF下载量: 16
- 被引次数: 0
出版历程
- 收稿日期: 2024-05-29
- 修回日期: 2024-11-08
- 网络出版日期: 2024-11-18
- 刊出日期: 2025-12-01

Design of Rotation Invariant Model Based on Image Offset Angle and Multibranch Convolutional Neural Networks

1.
School of Integrated Circuits, Southeast University, Nanjing 211189, China
2.
School of Physical Sciences and Technology, Lanzhou University, Lanzhou 730000, China

Funds: The Key-Area Research and Development Program of Guangdong Province (2021B1101270006)

摘要

摘要: 卷积神经网络(CNN)具有平移不变性，但缺乏旋转不变性。近几年，为卷积神经网络进行旋转编码已成为解决这一技术痛点的主流方法，但这需要大量的参数和计算资源。鉴于图像是计算机视觉的主要焦点，该文提出一种名为图像偏移角和多分支卷积神经网络(OAMC)的模型用于实现旋转不变。首先检测输入图像的偏移角，并根据偏移角反向旋转图像；将旋转后的图像输入无旋转编码的多分支结构卷积神经网络，优化响应模块，以输出最佳分支作为模型的最终预测。OAMC模型在旋转后的手写数字数据集上以最少的8 k参数量实现了96.98%的最佳分类精度。与在遥感数据集上的现有研究相比，模型仅用前人模型的1/3的参数量就可将精度最高提高8%。
- 深度学习 /
- 旋转图像分类 /
- 偏移角 /
- 多分支卷积神经网络
Abstract: Convolutional Neural Networks (CNNs) exhibit translation invariance but lack rotation invariance. In recent years, rotating encoding for CNNs becomes a mainstream approach to address this issue, but it requires a significant number of parameters and computational resources. Given that images are the primary focus of computer vision, a model called Offset Angle and Multibranch CNN (OAMC) is proposed to achieve rotation invariance. Firstly, the model detect the offset angle of the input image and rotate it back accordingly. Secondly, feed the rotated image into a multibranch CNN with no rotation encoding. Finally, Response module is used to output the optimal branch as the final prediction of the model. Notably, with a minimal parameter count of 8 k, the model achieves a best classification accuracy of 96.98% on the rotated handwritten numbers dataset. Furthermore, compared to previous research on remote sensing datasets, the model achieves up to 8% improvement in accuracy using only one-third of the parameters of existing models.
- Deep learning /
- Rotated image classification /
- Offset angle /
- Multibranch Convolutional Neural Networks (CNN)

HTML全文

图 1 偏移角的检测与旋转模块整体流程图

下载: 全尺寸图片幻灯片

图 2 构建直角坐标系示意图

下载: 全尺寸图片幻灯片

图 3 OAMC-B模型的整体结构

下载: 全尺寸图片幻灯片

图 4 36个旋转子集的测试精度曲线

下载: 全尺寸图片幻灯片

表 1 旋转MNIST数据集测试精度

模型	参数量 (k)	精度 (%)
ORN-8 (Align)^[10]	969	83.76
ORN-8 (ORPooling)^[10]	397	83.33
RotEqNet^[5]	100	80.10
Spherical CNN^[15]	68	94.00
E(2)-CNN^[16]	2068	94.37
RIC-CNN^[1]	289	95.52
OAMC-1 (本文)	8	63.18
OAMC-2 (本文)	8	85.06
OAMC-4 (本文)	8	96.98
OAMC-8 (本文)	8	93.70

下载: 导出CSV

表 2 遥感数据集测试精度

模型	参数量 (k)	精度 (%)
模型	参数量 (k)	NWPU-10	MTARSI	AID
VGG16^[20]	3372	82.33	60.15	54.59
RIC-VGG16^[1]	3372	91.65	72.21	66.22
OAMC-4	981	92.91	75.69	74.31

下载: 导出CSV

参考文献(20)

[1]	MO Hanlin and ZHAO Guoying. RIC-CNN: Rotation-invariant coordinate convolutional neural network[J]. Pattern Recognition, 2024, 146: 109994. doi: 10.1016/j.patcog.2023.109994.
[2]	ZHU Tianyu, FERENCZI B, PURKAIT P, et al. Knowledge combination to learn rotated detection without rotated annotation[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 15518–15527. doi: 10.1109/CVPR52729.2023.01489.
[3]	HAN Jiaming, DING Jian, XUE Nan, et al. ReDet: A rotation-equivariant detector for aerial object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2785–2794. doi: 10.1109/CVPR46437.2021.00281.
[4]	LI Feiran, FUJIWARA K, OKURA F, et al. A closer look at rotation-invariant deep point cloud analysis[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 16198–16207. doi: 10.1109/ICCV48922.2021.01591.
[5]	MARCOS D, VOLPI M, KOMODAKIS N, et al. Rotation equivariant vector field networks[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5058–5067. doi: 10.1109/ICCV.2017.540.
[6]	EDIXHOVEN T, LENGYEL A, and VAN GEMERT J C. Using and abusing equivariance[C]. Proceedings of 2023 IEEE/CVF International Conference on Computer Vision Workshops, Paris, France, 2023: 119–128. doi: 10.1109/ICCVW60793.2023.00019.
[7]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. doi: 10.1109/5.726791.
[8]	JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]. The 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 2017–2025.
[9]	LAPTEV D, SAVINOV N, BUHMANN J M, et al. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 289–297. doi: 10.1109/CVPR.2016.38.
[10]	ZHOU Yanzhao, YE Qixiang, QIU Qiang, et al. Oriented response networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4961–4970. doi: 10.1109/CVPR.2017.527.
[11]	WORRALL D E, GARBIN S J, TURMUKHAMBETOV D, et al. Harmonic networks: Deep translation and rotation equivariance[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7168–7177. doi: 10.1109/CVPR.2017.758.
[12]	WEILER M, HAMPRECHT F A, and STORATH M. Learning steerable filters for rotation equivariant CNNs[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 849–858. doi: 10.1109/CVPR.2018.00095.
[13]	FIRAT H. Classification of microscopic peripheral blood cell images using multibranch lightweight CNN-based model[J]. Neural Computing and Applications, 2024, 36(4): 1599–1620. doi: 10.1007/s00521-023-09158-9.
[14]	WEI Xuan, SU Shixiang, WEI Yun, et al. Rotational convolution: Rethinking convolution for downside fisheye images[J]. IEEE Transactions on Image Processing, 2023, 32: 4355–4364. doi: 10.1109/TIP.2023.3298475.
[15]	COHEN T S, GEIGER M, KOEHLER J, et al. Spherical CNNs[C]. The Sixth International Conference on Learning Representations, Vancouver, Canada, 2018.
[16]	WEILER M and CESA G. General e(2)-equivariant steerable cnns[J]. Advances in Neural Information Processing Systems, 2019, 32.
[17]	CHENG Gong, HAN Junwei, ZHOU Peicheng, et al. Multi-class geospatial object detection and geographic image classification based on collection of part detectors[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 98: 119–132. doi: 10.1016/j.isprsjprs.2014.10.002.
[18]	WU Zhize, WAN Shouhong, WANG Xiaofeng, et al. A benchmark data set for aircraft type recognition from remote sensing images[J]. Applied Soft Computing, 2020, 89: 106132. doi: 10.1016/j.asoc.2020.106132.
[19]	XIA Guisong, HU Jingwen, HU Fan, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965–3981. doi: 10.1109/TGRS.2017.2685945.
[20]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.