特征反馈机制优化的超声图像病灶检测算法

丁建睿; 王凌涛; 汤丰赫; 宁春平

doi:10.11999/JEIT230385

特征反馈机制优化的超声图像病灶检测算法

doi: 10.11999/JEIT230385

1.
哈尔滨工业大学(威海)计算机科学与技术学院威海 264209
2.
青岛大学附属医院超声科青岛 266003

基金项目: 国家自然科学基金(U22A2033)，山东省自然科学基金(ZR2020MH290)

详细信息

作者简介:
丁建睿：男，硕士生导师，副教授，研究方向为模式识别、计算机视觉

王凌涛：男，硕士生，研究方向为计算机视觉

汤丰赫：男，硕士生，研究方向为计算机视觉

宁春平：女，硕士生导师，副主任医师，研究方向为超声医学

通讯作者:
丁建睿　jrding@hit.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 407
- HTML全文浏览量: 234
- PDF下载量: 56
- 被引次数: 0
出版历程
- 收稿日期: 2023-05-08
- 修回日期: 2023-08-18
- 录用日期: 2023-08-21
- 网络出版日期: 2023-08-24
- 刊出日期: 2024-03-27

Ultrasound Image Lesion Detection Algorithm Optimized by Feature Feedback Mechanism

1.
School of Computer Science and Technology, Harbin Institute of Technology (Weihai), Weihai 264209, China
2.
Ultrasound Department, The Affiliated Hospital of Qingdao University, Qingdao 266003, China

Funds: The National Natural Science Foundation of China (U22A2033), The Natural Science Foundation of Shandong Province (ZR2020MH290)

摘要

摘要: 该文提出一种基于特征反馈机制的超声图像病灶检测方法，以实现超声病灶的实时精确定位与检测。所提方法由基于特征反馈机制的特征提取网络和基于分治策略的自适应检测头两部分组成。特征反馈网络通过反馈特征选取和加权融合计算，充分学习超声图像的全局上下文信息和局部低级语义细节以提高局部病灶特征的识别能力。自适应检测头对特征反馈网络所提取的多级特征进行分治预处理，通过将生理先验知识与特征卷积相结合的方式对各级特征分别进行病灶形状和尺度特征的自适应建模，增强检测头对不同大小病灶在多级特征下的检测效果。所提方法在甲状腺超声图像数据集上进行了测试，得到了70.3%的AP，99.0%的AP50和88.4%的AP75，实验结果表明，相较于主流检测算法，所提算法能实现更精准的实时超声图像病灶检测和定位。
- 病灶检测 /
- 特征反馈 /
- 自适应检测头
Abstract: A lesion detection method in ultrasound images based on feature feedback mechanism is proposed to realize real-time accurate localization and detection of ultrasound lesions. The proposed method consists of two parts: feature extraction network based on feature feedback mechanism and adaptive detection head based on divide-and-conquer strategy. The feature feedback network fully learns the global context information and local low-level semantic details of ultrasound images through feedback feature selection and weighted fusion calculation to improve the recognition ability of local lesion features. The adaptive detection head performs divide-and-conquer preprocessing on the multi-level features extracted by the feature feedback network. By combining physiological prior knowledge and feature convolution, adaptive modeling of lesion shape and scale features is performed on features at all levels to enhance the detection effect of the detection head on lesions of different sizes under multi-level features. The proposed method is tested on the thyroid ultrasound image dataset, and 70.3% AP, 99.0% AP50 and 88.4% AP75 are obtained. Experimental results show that the proposed algorithm can achieve more accurate real-time detection and positioning of ultrasound image lesions in comparison with mainstream detection algorithm.
- Lesion detection /
- Feature feedback /
- Adaptive detection head

HTML全文

图 1 特征反馈网络结构图

下载: 全尺寸图片幻灯片

图 2 反馈特征选取模块

下载: 全尺寸图片幻灯片

图 3 ConvNeXt阶段流程改进图

下载: 全尺寸图片幻灯片

图 4 预处理块结构图

下载: 全尺寸图片幻灯片

图 5 甲状腺超声图像示例

下载: 全尺寸图片幻灯片

图 6 病灶检测结果示例

下载: 全尺寸图片幻灯片

图 7 梯度热力图

下载: 全尺寸图片幻灯片

图 8 特征图仿真示例

下载: 全尺寸图片幻灯片

表 1 甲状腺超声病灶检测精度对比(%)

方法	骨干网	AP	AP50	AP75	AP良性	AP恶性
Faster RCNN^[7]	ResNet50	64.3	96.6	79.2	61.5	67.1
RetinaNet^[10]	ResNet50	65.2	97.6	80.3	62.4	67.9
Yolov3^[11]	Darknet53	64.7	95.2	81.5	62.5	66.8
FCOS^[12]	ResNet50	65.8	95.5	80.8	63.5	68.2
EfficientDet^[26]	EfficientNet-B1	66.1	98.7	77.1	63.8	68.5
VarifocalNet^[13]	ResNet50	64.5	97.3	78.5	64.4	64.6
Yolof^[14]	ResNet50	65.9	99.2	81.4	64.8	66.9
Yolox^[16]	Darknet53	67.0	98.1	83.4	64.4	69.5
Yolov7^[17]	CBS+ELAN	67.3	98.3	84.0	65.3	69.2
DETR^[20]	ResNet50	63.4	93.6	76.2	61.2	65.7
DAB-DETR^[21]	ResNet50	64.9	96.3	78.9	64.1	65.8
DINO^[22]	ResNet50	66.1	95.8	83.6	62.5	69.7
本文	ResNet50	69.6	99.0	87.7	68.2	71.0
本文	ConvNeXt-tiny	70.3	99.0	88.4	68.9	71.6

下载: 导出CSV

表 2 病灶检测精度消融实验(%)

方法	AP	AP50	AP75
Baseline	65.8	95.5	80.8
+ConvNeXt	67.5	98.7	84.8
+ConvNeXt+自适应检测头	68.5	98.6	86.8
+ConvNeXt+自适应检测头+特征反馈网络	70.3	99.0	88.4

下载: 导出CSV

表 3 不同检测头对比(%)

方法	AP	AP50	AP75
基线检测头(FCOS)	67.5	98.7	84.8
耦合检测头(Yolov3)	65.6	97.1	82.0
解耦合检测头(Yolox)	67.1	98.5	87.0
自适应检测头(本文)	68.5	98.6	86.8

下载: 导出CSV

表 4 不同反馈方式对比

方法	AP(%)	AP50(%)	AP75(%)	FPS(帧/s)
无反馈网络	68.5	98.6	86.8	46
$ {\boldsymbol{P}}_3^1 $至$ {\boldsymbol{P}}_5^1 $反馈网络+ASPP	69.6	98.6	87.4	40
$ {\boldsymbol{P}}_3^1 $至$ {\boldsymbol{P}}_7^1 $反馈网络+ASPP	69.6	98.4	87.8	34
$ {\boldsymbol{P}}_3^1 $至$ {\boldsymbol{P}}_5^1 $反馈网络+反馈特征选取模块(本文)	70.3	99.0	88.4	39
$ {\boldsymbol{P}}_3^1 $至$ {\boldsymbol{P}}_7^1 $反馈网络+反馈特征选取模块	70.1	98.5	88.2	30

下载: 导出CSV

参考文献(27)

[1]	WATAYA T, YANAGAWA M, TSUBAMOTO M, et al. Radiologists with and without deep learning–based computer-aided diagnosis: Comparison of performance and interobserver agreement for characterizing and diagnosing pulmonary nodules/masses[J]. European Radiology, 2023, 33(1): 348–359. doi: 10.1007/s00330-022-08948-4.
[2]	SOLYMOSI T, HEGEDŰS L, BONNEMA S J, et al. Considerable interobserver variation calls for unambiguous definitions of thyroid nodule ultrasound characteristics[J]. European Thyroid Journal, 2023, 12(2): e220134. doi: 10.1530/ETJ-22-0134.
[3]	YAP M H, GOYAL M, OSMAN F, et al. Breast ultrasound region of interest detection and lesion localisation[J]. Artificial Intelligence in Medicine, 2020, 107: 101880. doi: 10.1016/j.artmed.2020.101880.
[4]	LI Yujie, GU Hong, WANG Hongyu, et al. BUSnet: A deep learning model of breast tumor lesion detection for ultrasound images[J]. Frontiers in Oncology, 2022, 12: 848271. doi: 10.3389/fonc.2022.848271.
[5]	MENG Hui, LIU Xuefeng, NIU Jianwei, et al. DGANet: A dual global attention neural network for breast lesion detection in ultrasound images[J]. Ultrasound in Medicine and Biology, 2023, 49(1): 31–44. doi: 10.1016/j.ultrasmedbio.2022.07.006.
[6]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580–587.
[7]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
[8]	LIANG Tingting, CHU Xiaojie, LIU Yudong, et al. CBNet: A composite backbone network architecture for object detection[J]. IEEE Transactions on Image Processing, 2022, 31: 6893–6906. doi: 10.1109/TIP.2022.3216771.
[9]	QIAO Siyuan, CHEN L C, and YUILLE A. DetectoRS: Detecting objects with recursive feature pyramid and switchable atrous convolution[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 10208–10219.
[10]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2999–3007.
[11]	REDMON J and FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018.
[12]	TIAN Zhi, SHEN Chunhua, CHEN Hao, et al. FCOS: Fully convolutional one-stage object detection[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 9626–9635.
[13]	ZHANG Haoyang, WANG Ying, DAYOUB F, et al. VarifocalNet: An IoU-aware dense object detector[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8510–8519.
[14]	CHEN Qiang, WANG Yingming, YANG Tong, et al. You only look one-level feature[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 13034–13043.
[15]	TAN Mingxing, PANG Ruoming, and LE Q V. EfficientDet: Scalable and efficient object detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10778–10787.
[16]	GE Zheng, LIU Songtao, WANG Feng, et al. YOLOX: Exceeding YOLO series in 2021[J]. arXiv preprint arXiv: 2107.08430, 2021.
[17]	WANG C Y, BOCHKOVSKIY A, and LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv preprint arXiv: 2207.02696, 2022.
[18]	WANG Wen, ZHANG Jing, CAO Yang, et al. Towards data-efficient detection transformers[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 88–105.
[19]	CHEN Xiangyu, HU Qinghao, LI Kaidong, et al. Accumulated trivial attention matters in vision transformers on small datasets[C]. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 3973–3981.
[20]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 2020: 213–229.
[21]	LIU Shilong, LI Feng, ZHANG Hao, et al. DAB-DETR: Dynamic anchor boxes are better queries for DETR[C]. The Tenth International Conference on Learning Representations (Virtual), 2022: 1–20. doi: 10.48550/arXiv.2201.12329.
[22]	ZHANG Hao, LI Feng, LIU Shilong, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[C]. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 2023: 1–19. doi: 10.48550/arXiv.2203.03605.
[23]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[24]	LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 11966–11976.
[25]	PENG Zhiliang, GUO Zonghao, HUANG Wei, et al. Conformer: Local features coupling global representations for recognition and detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 9454–9468. doi: 10.1109/TPAMI.2023.3243048.
[26]	WANG W, DAI J, CHEN Z, et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions[C]. IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 14408–14419.
[27]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618–626.