利用跨模态轻量级YOLOv5模型的PET/CT肺部肿瘤检测

周涛; 叶鑫宇; 刘凤珍; 陆惠玲

doi:10.11999/JEIT230052

利用跨模态轻量级YOLOv5模型的PET/CT肺部肿瘤检测

doi: 10.11999/JEIT230052

周涛^{1, 3},
叶鑫宇^{1, 3, ,},
刘凤珍^{1, 3},
陆惠玲²

1.
北方民族大学计算机科学与工程学院银川 750021
2.
宁夏医科大学医学信息工程学院银川 750004
3.
北方民族大学图像图形智能处理国家民委重点实验室银川 750021

基金项目: 国家自然科学基金(62062003)，宁夏自然科学基金(2022AAC03149)，宁夏回族自治区重点研发计划(2020BEB04022)

详细信息

作者简介:
周涛：男，教授，博士生导师，研究方向为医学图像处理、计算机辅助诊断、模式识别

叶鑫宇：男，硕士生，研究方向为医学图像处理、计算机辅助诊断

刘凤珍：女，硕士生，研究方向为医学图像处理、计算机辅助诊断

陆惠玲：女，教授，研究方向为医学图像分析处理、机器学习

通讯作者:
叶鑫宇　3303626778@qq.com

中图分类号: TP391.41
计量
- 文章访问数: 1014
- HTML全文浏览量: 395
- PDF下载量: 153
- 被引次数: 0
出版历程
- 收稿日期: 2023-02-14
- 修回日期: 2023-05-05
- 网络出版日期: 2023-05-16
- 刊出日期: 2024-02-29

CL-YOLOv5: PET/CT Lung Cancer Detection With Cross-modal Lightweight YOLOv5 Model

ZHOU Tao^{1, 3},
YE Xinyu^{1, 3
, ,},
LIU Fengzhen^{1, 3},
LU Huiling²

1.
School of computer science and engineering, North Minzu University, Yinchuan 750021, China
2.
School of Mecial Information & Engineering, Yinchuan 750004, China
3.
Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China

Funds: The National Natural Science Foundation of China (62062003), Ningxia Natural Science Foundation Project (2022AAC03149), Key Research and Development Projects of Ningxia Autonomous Region(2020BEB04022)

摘要

摘要: 多模态医学图像可在同一病灶处提供更多语义信息，针对跨模态语义相关性未充分考虑和模型复杂度过高的问题，该文提出基于跨模态轻量级YOLOv5(CL-YOLOv5)的肺部肿瘤检测模型。首先，提出学习正电子发射型断层显像(PET)、计算机断层扫描(CT)和PET/CT不同模态语义信息的3分支网络；然后，设计跨模态交互式增强块充分学习多模态语义相关性，余弦重加权计算Transformer高效学习全局特征关系，交互式增强网络提取病灶的能力；最后，提出双分支轻量块，激活函数簇(ACON)瓶颈结构降低参数同时增加网络深度和鲁棒性，另一分支为密集连接的递进重参卷积，特征传递达到最大化，递进空间交互高效地学习多模态特征。在肺部肿瘤PET/CT多模态数据集中，该文模型获得94.76% mAP最优性能和3238 s最高效率，以及0.81 M参数量，较YOLOv5s和EfficientDet-d0降低7.7倍和5.3倍，多模态对比实验中总体上优于现有的先进方法，消融实验和热力图可视化进一步验证。
- YOLOv5 /
- 跨模态交互式增强块 /
- 双分支轻量块 /
- PET/CT多模态肺部肿瘤影像
Abstract: Multimodal medical images can provide more semantic information at the same lesion. To address the problems that cross-modal semantic features are not fully considered and model complexity is too high, a Cross-modal Lightweight YOLOv5(CL-YOLOv5) lung cancer detection model is proposed. Firstly, three-branch network is proposed to learn semantic information of Positron Emission Tomography (PET), Computed Tomography (CT) and PET/CT; Secondly, Cross-modal Interactive Enhancement block is designed to fully learn multimodal semantic correlation, cosine reweighted Transformer efficiently learns global feature relationship, interactive enhancement network extracts lesion features; Finally, dual-branch lightweight block is proposed, ACtivate Or Not (ACON) bottleneck structure reduces parameters while increasing network depth and robustness, the other branch is densely connected recursive re-parametric convolution with maximized feature transfer, recursive spatial interaction efficiently learning multimodal features. In lung cancer PET/CT multimodal dataset, the model in this paper achieves 94.76% mAP optimal performance and 3238 s highest efficiency, 0.81 M parameters are obtained, which is 7.7 times and 5.3 times lower than YOLOv5s and EfficientDet-d0, overall outperforms existing state-of-the-art methods in multimodal comparative experiments. In multi-modal comparison experiment, it is generally better than the existing advanced methods, further verification by ablation experiments and heat map visualization ablation experiment.

HTML全文

图 1 CL-YOLOv5整体框架

下载: 全尺寸图片幻灯片

图 2 递进重参卷积结构

下载: 全尺寸图片幻灯片

图 3 双分支轻量块的结构

下载: 全尺寸图片幻灯片

图 4 跨模态交互式增强块的结构

下载: 全尺寸图片幻灯片

图 5 已配准的PET, CT和PET/CT图像

下载: 全尺寸图片幻灯片

图 6 消融实验的可视化结果

下载: 全尺寸图片幻灯片

图 7 不同模型在肺部肿瘤PET/CT多模态数据集上的检测结果

下载: 全尺寸图片幻灯片

图 8 不同模型的PR曲线

下载: 全尺寸图片幻灯片

图 9 不同模型的F1曲线

下载: 全尺寸图片幻灯片

图 10 肺部肿瘤影像和模型热力图

下载: 全尺寸图片幻灯片

表 1 在肺部肿瘤PET/CT多模态数据集上的消融实验对比结果

实验	添加的模块	参数量	计算量	精度	召回率	mAP	F1	FPS	总时间(s)
–	YOLOv5s	7.06M	5.24G	0.9416±1.2	0.8965±1.4	0.9221±1.5	0.9185±1.4	102.63	3661
1	+递进重参卷积	2.77M	2.23G	0.9514±1.2	0.9108±1.3	0.9402±1.4	0.9306±1.3	124.15	3457
2	+双分支轻量块	473.09K	310.69M	0.9566±1.1	0.9160±1.2	0.9448±1.3	0.9359±1.2	149.34	3049
3	+两模态CT	717.04K	600.92M	0.9609±1.1	0.9186±1.2	0.9486±1.2	0.9393±1.2	141.50	3215
4	+两模态PET	717.04K	600.92M	0.9595±1.1	0.9326±1.0	0.9507±1.1	0.9458±1.1	143.13	3169
5	+3模态	717.04K	600.92M	0.9652±0.9	0.9354±0.9	0.9558±1.0	0.9501±0.9	142.35	3182
6	注意力	814.39K	673.06M	0.9729±0.7	0.9476±0.8	0.9651±0.7	0.9603±0.7	138.47	3238

下载: 导出CSV

表 2 不同模型在肺部肿瘤PET/CT多模态数据集上的对比结果

检测模型	参数量	计算量	精度	召回率	mAP	F1	FPS	总时间(s)
R-FCN(Res101-FPN)^[2]	50.80M	60.51G	0.8947±1.2	0.8839±1.4	0.9013±1.5	0.8893±1.4	15.33	8010
SSD512(VGG16)^[2]	23.75M	87.63G	0.8467±1.6	0.8398±2.1	0.8540±2.1	0.8433±2.0	34.62	4133
EfficientDet-d0^[18]	4.31M	2.58G	0.8934±1.3	0.8719±1.5	0.8962±1.7	0.8825±1.5	26.24	4474
YOLOv4l^[4]	63.96M	45.28G	0.9374±1.1	0.8926±1.4	0.9162±1.5	0.9146±1.4	63.99	5516
YOLOv5l^[5]	46.65M	36.56G	0.9495±1.1	0.8968±1.3	0.9307±1.3	0.9244±1.3	71.94	4968
TPH-YOLOv5^[19]	40.83M	36.26G	0.9523±1.2	0.9142±1.3	0.9408±1.4	0.9329±1.3	38.78	6795
PP-PicoDet-l^[3]	1.18M	4.59G	0.9342±1.4	0.8873±1.7	0.9131±1.8	0.9101±1.6	109.55	3601
NanoDet-Plus-m^[20]	1.19M	1.20G	0.9431±1.3	0.8987±1.6	0.9264±1.6	0.9204±1.5	117.18	3435
Poly-YOLO^[4]	6.16M	7.01G	0.9478±1.1	0.9101±1.4	0.9378±1.4	0.9286±1.3	69.12	4491
YOLOv7l^[5]	37.19M	33.64G	0.9558±0.9	0.9237±1.2	0.9476±1.2	0.9395±1.2	73.81	4712
YOLOv8l^[19]	43.63M	52.93G	0.9592±0.9	0.9287±1.1	0.9514±1.2	0.9437±1.1	56.12	5956
CL-YOLOv5	0.81M	0.67G	0.9729±0.7	0.9476±0.8	0.9651±0.7	0.9603±0.7	138.47	3238

下载: 导出CSV

表 3 多模态检测模型的对比结果

检测模型	精度	召回率	mAP	F1
ConvNet^[7]	0.9488	0.9197	0.9392±1.3	0.9340±1.3
BIRANet^[8]	0.9519	0.9211	0.9417±1.3	0.9362±1.2
MVDNet^[9]	0.9587	0.9282	0.9508±1.1	0.9432±1.0
ProbEn^[10]	0.9623	0.9310	0.9543±0.9	0.9464±0.9
CL-YOLOv5	0.9729	0.9476	0.9651±0.7	0.9603±0.7

下载: 导出CSV

参考文献(20)

[1]	MIRANDA D, THENKANIDIYOOR V, and DINESH D A. Review on approaches to concept detection in medical images[J]. Biocybernetics and Biomedical Engineering, 2022, 42(2): 453–462. doi: 10.1016/j.bbe.2022.02.012.
[2]	周涛, 刘赟璨, 陆惠玲, 等. ResNet及其在医学图像处理领域的应用: 研究进展与挑战[J]. 电子与信息学报, 2022, 44(1): 149–167. doi: 10.11999/JEIT210914. ZHOU Tao, LIU Yuncan, LU Huiling, et al. ResNet and its application to medical image processing: Research progress and challenges[J]. Journal of Electronics &Information Technology, 2022, 44(1): 149–167. doi: 10.11999/JEIT210914.
[3]	YU Guanghua, CHANG Qinyao, LV Wengyu, et al. PP-PicoDet: A better real-time object detector on mobile devices[J]. arXiv: 2111.00902, 2021.
[4]	HURTIK P, MOLEK V, HULA J, et al. Poly-YOLO: Higher speed, more precise detection and instance segmentation for YOLOv3[J]. Neural Computing and Applications, 2022, 34(10): 8275–8290. doi: 10.1007/s00521-021-05978-9.
[5]	WANG C Y, BOCHKOVSKIY A, and LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv: 2207.02696, 2022.
[6]	刘政怡, 段群涛, 石松, 等. 基于多模态特征融合监督的RGB-D图像显著性检测[J]. 电子与信息学报, 2020, 42(4): 997–1004. doi: 10.11999/JEIT190297. LIU Zhenyi, DUAN Quntao, SHI Song, et al. RGB-D image saliency detection based on multi-modal feature-fused supervision[J]. Journal of Electronics &Information Technology, 2020, 42(4): 997–1004. doi: 10.11999/JEIT190297.
[7]	ASVADI A, GARROTE L, PREMEBIDA C, et al. Real-time deep convnet-based vehicle detection using 3d-lidar reflection intensity data[C]. ROBOT 2017: Third Iberian Robotics Conference, Sevilla, Spain, 2017: 475–486.
[8]	YADAV R, VIERLING A, and BERNS K. Radar+ RGB fusion for robust object detection in autonomous vehicle[C]. The 2020 IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 1986–1990.
[9]	QIAN Kun, ZHU Shilin, ZHANG Xinyu, et al. Robust multimodal vehicle detection in foggy weather using complementary Lidar and radar signals[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 444–453.
[10]	CHEN Yiting, SHI Jinghao, YE Zelin, et al. Multimodal object detection via probabilistic ensembling[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 139–158.
[11]	HERMESSI H, MOURALI O, and ZAGROUBA E. Multimodal medical image fusion review: Theoretical background and recent advances[J]. Signal Processing, 2021, 183: 108036. doi: 10.1016/j.sigpro.2021.108036.
[12]	MOKNI R, GARGOURI N, DAMAK A, et al. An automatic computer-aided diagnosis system based on the multimodal fusion of breast cancer (MF-CAD)[J]. Biomedical Signal Processing and Control, 2021, 69: 102914. doi: 10.1016/j.bspc.2021.102914.
[13]	RUBINSTEIN E, SALHOV M, NIDAM-LESHEM M, et al. Unsupervised tumor detection in dynamic PET/CT imaging of the prostate[J]. Medical Image Analysis, 2019, 55: 27–40. doi: 10.1016/j.media.2019.04.001.
[14]	MING Yue, DONG Xiying, ZHAO Jihuai, et al. Deep learning-based multimodal image analysis for cervical cancer detection[J]. Methods, 2022, 205: 46–52. doi: 10.1016/j.ymeth.2022.05.004.
[15]	QIN Ruoxi, WANG Zhenzhen, JIANG Lingyun, et al. Fine-grained lung cancer classification from PET and CT images based on multidimensional attention mechanism[J]. Complexity, 2020, 2020: 6153657. doi: 10.1155/2020/6153657.
[16]	DIRKS I, KEYAERTS M, NEYNS B, et al. Computer-aided detection and segmentation of malignant melanoma lesions on whole-body ¹⁸F-FDG PET/CT using an interpretable deep learning approach[J]. Computer Methods and Programs in Biomedicine, 2022, 221: 106902. doi: 10.1016/j.cmpb.2022.106902.
[17]	CAO Siyuan, YU Beinan, LUO Lun, et al. PCNet: A structure similarity enhancement method for multispectral and multimodal image registration[J]. Information Fusion, 2023, 94: 200–214. doi: 10.1016/j.inffus.2023.02.004.
[18]	TAN Mingxing, PANG Ruoming, and LE Q V. EfficientDet: Scalable and efficient object detection[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10778–10787.
[19]	LI Chuyi, LI Lulu, GENG Yifei, et al. YOLOv6 v3.0: A full-scale reloading[J]. arXiv: 2301.05586, 2023.
[20]	LI Dongyang and ZHAI Junyong. A real-time vehicle window positioning system based on nanodet[C]. 2022 Chinese Intelligent Systems Conference, Singapore, 2022: 697–705.