高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于高分辨扩展金字塔的场景文本检测

王满利 窦泽亚 蔡明哲 刘群坡 史艳楠

杨光正, 杨翔宇, 徐丽娟. 二进制标准正交对偶序列的游程定理[J]. 电子与信息学报, 1998, 20(6): 763-770.
引用本文: 王满利, 窦泽亚, 蔡明哲, 刘群坡, 史艳楠. 基于高分辨扩展金字塔的场景文本检测[J]. 电子与信息学报. doi: 10.11999/JEIT241017
Yang Guangzheng, Yang Xiangyu, Xu Lijuan. RUN THEOREMS OF NORMAL ORTHOGONAL DUALITY OF BINARY SEQUENCES[J]. Journal of Electronics & Information Technology, 1998, 20(6): 763-770.
Citation: WANG Manli, DOU Zeya, CAI Mingzhe, LIU Qunpo, SHI Yannan. Scene Text Detection Based on High Resolution Extended Pyramid[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT241017

基于高分辨扩展金字塔的场景文本检测

doi: 10.11999/JEIT241017
基金项目: 国家自然科学基金(52074305),河南省科技攻关(242102221006)
详细信息
    作者简介:

    王满利:男,副教授,博士,研究方向为图像处理和文本检测

    窦泽亚:男,硕士生,研究方向为文本检测和文本识别

    蔡明哲:男,硕士生,研究方向为目标检测和文本识别

    刘群坡:男,副教授,博士,研究方向为只能机器人、智能控制和机器视觉等

    史艳楠:男,副教授,博士,研究方向为信号与信息处理

    通讯作者:

    窦泽亚 dou09042230@163.com

  • 中图分类号: TN915.08; TP391.41

Scene Text Detection Based on High Resolution Extended Pyramid

Funds: The National Natural Science Foundation of China (52074305), Henan Provincial Science and Technology Research Center (242102221006)
  • 摘要: 文本检测作为计算机视觉领域一项重要分支,在文字翻译、自动驾驶和票据信息处理等方面具有重要的应用价值。当前文本检测算法仍无法解决实际拍摄图像的部分文本分辨率低、尺度变化大和有效特征不足的问题。针对上述待解决的问题,该文提出一种基于高分辨扩展金字塔的场景文本检测方法(HREPNet)。首先,构造一种改进型特征金字塔,引入高分辨扩展层和超分辨特征模块,有效增强文本分辨率特征,解决部分文本分辨率低的问题;同时,在主干网络传递特征过程中引入多尺度特征提取模块,通过多分支空洞卷积结构与注意力机制,充分获取文本多尺度特征,解决文本尺度变化大的问题;最后,提出高效特征融合模块,选择性融合高分辨特征和多尺度特征,从而减少模型的空间信息的丢失,解决有效特征不足的问题。实验结果表明,HREPNet在公开数据集ICDAR2015, CTW1500和Total-Text上综合指标F值分别提高了6.0%, 4.4%和2.5%,在准确率召回率上都得到显著提升;此外,HREPNet对不同尺度和分辨率的文本检测效果均有明显提升,对小尺度和低分辨率文本提升尤为显著。
  • 图  1  网络模型整体框架图

    图  2  SFM模块结构

    图  3  MFEM模块结构

    图  4  消融实验可视化结果

    图  5  EFFM模块中阈值大小对模型检测效果的影响

    图  6  本文方法与基准方法在不同尺度下检测效果比较

    图  7  本文方法与基准方法对不同尺度文本图像检测可视化结果

    图  8  可视化对比实验的结果

    表  1  各个创新点的影响实验结果

    HREPMFEMEFFMEFFM*PRF
    83.674.078.5
    85.475.680.2
    87.474.480.4
    85.780.182.8
    88.780.384.3
    88.277.182.3
    88.980.684.5
    88.579.984.0
    注:EFFM*表示不进行交叉重构
    下载: 导出CSV

    表  2  MFEM模块中膨胀系数的选择对模型检测效果的影响

    膨胀系数 ICDAR2015 CTW1500
    P R F P R F
    (1,2) 87.3 79.6 83.3 85.4 79.3 82.2
    (1,4) 88.4 80.3 84.2 86.2 78.6 82.2
    (1,2,4) 88.9 80.6 84.5 87.1 80.7 83.8
    (1,2,6) 88.1 79.8 83.7 86.3 78.9 82.4
    (1,2,4,6) 87.9 78.2 82.8 86.5 78.4 82.3
    下载: 导出CSV

    表  3  公开数据集上本文方法与其它方法的比较结果

    方法(year)ExitCTW1500Total-TextICDAR2015
    PRFPRFPRF
    PSENet (2019) [11]×80.675.678.081.875.178.381.579.780.6
    PAN (2019) [32]86.481.283.788.079.483.582.977.880.3
    TextField (2019) [33]83.079.881.481.279.980.684.380.582.4
    DBNet (2020) [12]84.379.181.687.182.584.786.580.283.2
    FCENet (2021) [15]×85.780.783.187.479.883.485.184.284.6
    PCR (2021) [14]×85.379.882.486.180.283.1---
    CM-Net (2022) [31]×86.082.284.188.581.484.886.781.383.9
    Wang et al. (2023) [30]×84.677.781.088.779.984.188.178.883.2
    baseline×82.676.479.487.377.982.383.674.078.5
    本文方法×87.180.783.888.881.284.888.980.684.5
    下载: 导出CSV
  • [1] WANG Xiaofeng, HE Zhihuang, WANG Kai, et al. A survey of text detection and recognition algorithms based on deep learning technology[J]. Neurocomputing, 2023, 556: 126702. doi: 10.1016/j.neucom.2023.126702.
    [2] NAIEMI F, GHODS V, and KHALESI H. Scene text detection and recognition: A survey[J]. Multimedia Tools and Applications, 2022, 81(14): 20255–20290. doi: 10.1007/s11042-022-12693-7.
    [3] 连哲, 殷雁君, 智敏, 等. 自然场景文本检测中可微分二值化技术综述[J]. 计算机科学与探索, 2024, 18(9): 2239–2260. doi: 10.3778/j.issn.1673-9418.2311105.

    LIAN Zhe, YIN Yanjun, ZHI Min, et al. Review of differentiable binarization techniques for text detection in natural scenes[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2239–2260. doi: 10.3778/j.issn.1673-9418.2311105.
    [4] EPSHTEIN B, OFEK E, and WEXLER Y. Detecting text in natural scenes with stroke width transform[C]. Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 2963–2970. doi: 10.1109/CVPR.2010.5540041.
    [5] LI Qian, PENG Hao, LI Jianxin, et al. A survey on text classification: From traditional to deep learning[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2022, 13(2): 31. doi: 10.1145/3495162.
    [6] KIM K I, JUNG K, and KIM J H. Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(12): 1631–1639. doi: 10.1109/TPAMI.2003.1251157.
    [7] TIAN Zhi, HUANG Weilin, HE Tong, et al. Detecting text in natural image with connectionist text proposal network[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 56–72. doi: 10.1007/978-3-319-46484-8_4.
    [8] BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9365–9374. doi: 10.1109/CVPR.2019.00959.
    [9] HE Minghang, LIAO Minghui, YANG Zhibo, et al. MOST: A multi-oriented scene text detector with localization refinement[C]. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8813–8822. doi: 10.1109/CVPR46437.2021.00870.
    [10] DENG Dan, LIU Haifeng, LI Xuelong, et al. PixelLink: Detecting scene text via instance segmentation[C]. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018. doi: 10.1609/aaai.v32i1.12269.
    [11] WANG Wenhai, XIE Enze, LI Xiang, et al. Shape robust text detection with progressive scale expansion network[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9336–9345. doi: 10.1109/CVPR.2019.00956.
    [12] LIAO Minghui, ZOU Zhisheng, WAN Zhaoyi, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919–931. doi: 10.1109/TPAMI.2022.3155612.
    [13] ZHANG Chengquan, LIANG Borong, HUANG Zuming, et al. Look more than once: An accurate detector for text of arbitrary shapes[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 10552–10561. doi: 10.1109/CVPR.2019.01080.
    [14] DAI Pengwen, ZHANG Sanyi, ZHANG Hua, et al. Progressive contour regression for arbitrary-shape scene text detection[C]. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 7393–7402. doi: 10.1109/CVPR46437.2021.00731.
    [15] ZHU Yiqin, CHEN Jianyong, LIANG Lingyu, et al. Fourier contour embedding for arbitrary-shaped text detection[C]. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 3123–3131. doi: 10.1109/CVPR46437.2021.00314.
    [16] ZHANG Shixue, YANG Chun, ZHU Xiaobin, et al. Arbitrary shape text detection via boundary transformer[J]. IEEE Transactions on Multimedia, 2024, 26: 1747–1760. doi: 10.1109/TMM.2023.3286657.
    [17] YE Maoyuan, ZHANG Jing, ZHAO Shanshan, et al. DPText-DETR: Towards better scene text detection with dynamic points in transformer[C]. Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 3241–3249. doi: 10.1609/aaai.v37i3.25430.
    [18] YU Wenwen, LIU Yuliang, HUA Wei, et al. Turning a CLIP model into a scene text detector[C]. Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 6978–6988. doi: 10.1109/CVPR52729.2023.00674.
    [19] YE Maoyuan, ZHANG Jing, ZHAO Shanshan, et al. DeepSolo: Let transformer decoder with explicit points solo for text spotting[C]. Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 19348–19357. doi: 10.1109/CVPR52729.2023.01854.
    [20] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, SUA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [21] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2117–2125. doi: 10.1109/CVPR.2017.106.
    [22] DENG Chunfang, WANG Mengmeng, LIU Liang, et al. Extended feature pyramid network for small object detection[J]. IEEE Transactions on Multimedia, 2022, 24: 1968–1979. doi: 10.1109/TMM.2021.3074273.
    [23] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. doi: 10.1109/TPAMI.2017.2699184.
    [24] ZHANG Qiulin, JIANG Zhuqing, LU Qishuo, et al. Split to be slim: An overlooked redundancy in vanilla convolution[C]. Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2021: 3195–3201. doi: 10.24963/ijcai.2020/442. (查阅网上资料,未找到对应的出版地信息,请确认) .
    [25] WU Yuxin and HE Kaiming. Group normalization[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 3–19. doi: 10.1007/978-3-030-01261-8_1.
    [26] LI Jiafeng, WEN Ying, and HE Lianghua. SCConv: Spatial and channel reconstruction convolution for feature redundancy[C]. Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 6153–6162. doi: 10.1109/CVPR52729.2023.00596.
    [27] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]. Proceedings of 2015 13th International Conference on Document Analysis and Recognition, Tunis, Tunisia, 2015: 1156–1160. doi: 10.1109/ICDAR.2015.7333942.
    [28] LIU Yuliang and JIN Lianwen. Deep matching prior network: Toward tighter multi-oriented text detection[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1962–1969. doi: 10.1109/CVPR.2017.368.
    [29] CH'NG C K and CHAN C S. Total-text: A comprehensive dataset for scene text detection and recognition[C]. Proceedings of 2017 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, 2017, 1: 935–942. doi: 10.1109/ICDAR.2017.157.
    [30] WANG Fangfang, XU Xiaogang, CHEN Yifeng, et al. Fuzzy semantics for arbitrary-shaped scene text detection[J]. IEEE Transactions on Image Processing, 2023, 32: 1–12. doi: 10.1109/TIP.2022.3201467.
    [31] YANG Chuang, CHEN Mulin, XIONG Zhitong, et al. CM-Net: Concentric mask based arbitrary-shaped text detection[J]. IEEE Transactions on Image Processing, 2022, 31: 2864–2877. doi: 10.1109/TIP.2022.3141844.
    [32] WANG Wenhai, XIE Enze, SONG Xiaoge, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 8440–8449. doi: 10.1109/ICCV.2019.00853.
    [33] XU Yongchao, WANG Yukang, ZHOU Wei, et al. TextField: Learning a deep direction field for irregular scene text detection[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5566–5579. doi: 10.1109/TIP.2019.2900589.
    [34] PENG Jingchao, ZHAO Haitao, ZHAO Kaijie, et al. CourtNet: Dynamically balance the precision and recall rates in infrared small target detection[J]. Expert Systems with Applications, 2023, 233: 120996. doi: 10.1016/j.eswa.2023.120996.
  • 加载中
图(8) / 表(3)
计量
  • 文章访问数:  44
  • HTML全文浏览量:  17
  • PDF下载量:  5
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-11-13
  • 修回日期:  2025-03-19
  • 网络出版日期:  2025-03-28

目录

    /

    返回文章
    返回