Implementation of Digital Holographic Convolutional Reconstruction Algorithm Based on Open Computing Language Acceleration
-
摘要: 针对数字全息重建算法计算速度慢、实时应用能力弱以及现有GPU加速策略跨平台移植性差等问题,该文提出一种利用开放运算语言(OpenCL)架构提高数字全息重建算法执行效率的方案。该方案充分利用OpenCL架构的异构协同计算能力,对数字全息卷积重建算法进行CPU+GPU的异构运行设计,并采用数据并行模式编程实现。针对不同分辨率数字全息图、不同GPU加速平台的测试结果表明,该加速策略的平均执行时间均比CPU低1个数量级,最高总加速比达到54.2,并行运算加速比甚至高达94.7,且具有规模增长性及良好的跨平台特性,加速效率显著,更加适用于数字全息技术的工程化实现及实时性应用场合。Abstract: In view of the problems of slow calculation speed of digital holographic reconstruction algorithm, weak real-time application ability and poor cross-platform portability of existing GPU acceleration strategies, a scheme is proposed based on Open Computing Language (OpenCL) architecture to improve the execution efficiency of digital holographic reconstruction algorithm. In more details, the heterogeneous collaborative computing capabilities of the OpenCL architecture is fully used to design a CPU+GPU heterogeneous operation for the digital holographic convolutional reconstruction algorithm, which is programmed in the data parallel mode. The tests are carried out on the digital holograms in various image resolutions and on the different GPU acceleration platforms. The results indicate that the average execution time of this acceleration strategy is approximately an order of magnitude lower than that of the CPU, the highest total acceleration ratio is 54.2, and the parallel computing acceleration ratio even reaches up to 94.7. Characterized by a scale growth, good cross-platform portability and significant acceleration efficiency, it is more suitable for the engineering realization of digital holographic technology, especially in the real-time applications.
-
表 1 两种GPU加速平台参数
平台 型号 频率 内存 流处理单元 加速平台1 CPU1:AMD Ryzen 5 3600 4.1 GHz 16 GB GPU1:NVIDIA GeForce GTX 1660 SUPER 1530 MHz 6 GB 1408 加速平台2 CPU2:AMD Ryzen 7 Mobile 1.8 GHz 16 GB GPU2:AMD Radeon(TM) Graphics 1750 MHz 512 MB 512 表 2 不同CPU与GPU加速平台的全息重建总执行时间对比
序号 分辨率(像素) 总执行时间(ms) CPU1 加速平台1
(OpenCL)加速平台1
(CUDA)CPU 2 加速平台2 a 512×384 239 12 10 421 71.3 b 1024×768 828 24 16.7 1475 138 c 1536×1152 1822 39 30 3045 273.6 d 2048×1536 3208 66.3 50 5495 445.9 e 2560×1920 4989 98.4 70 8758 739 f 3072×2304 7171 132.3 97 11334 1147.4 表 3 不同GPU加速平台下全息重建的分项执行时间对比(OpenCL版本)(ms)
序号 分辨率(像素) 串行运算用时 数据传输用时 并行运算用时 CPU1 CPU2 CPU1-GPU1 CPU2-GPU2 GPU1 GPU2 a 512×384 1.3 3.3 0.7 2 10 66 b 1024×768 5 15.3 0.7 3.7 18.3 119 c 1536×1152 12.3 33.3 2 11 24.7 229.3 d 2048×1536 20.3 56.3 5 14.3 41 375.3 e 2560×1920 32.7 78 5.7 19.7 60 641.3 f 3072×2304 47.3 103.7 9.3 33 75.7 1010.7 -
[1] 刘俊, 梁霄, 王淦诚, 等. 微纳气泡的三维动态表征[J]. 净水技术, 2021, 40(2): 67–74,126. doi: 10.15890/j.cnki.jsjs.2021.02.007LIU Jun, LIANG Xiao, WANG Gancheng, et al. Three-dimensional dynamic characterization of microbubbles[J]. Water Purification Technology, 2021, 40(2): 67–74,126. doi: 10.15890/j.cnki.jsjs.2021.02.007 [2] WU Peng, ZHANG Dejie, YUAN Jing, et al. Large depth-of-field fluorescence microscopy based on deep learning supported by Fresnel incoherent correlation holography[J]. Optics Express, 2022, 30(4): 5177–5191. doi: 10.1364/OE.451409 [3] 税云秀, 胡琳, 戴姚辉, 等. 基于数字全息的回转类机械零件三维显示[J]. 激光与光电子学进展, 2020, 57(6): 060901.SHUI Yunxiu, HU Lin, DAI Yaohui, et al. Three-dimensional display of rotary mechanical parts based on digital holography[J]. Laser &Optoelectronics Progress, 2020, 57(6): 060901. [4] CHEN Duofang, WANG Lin, LUO Xixin, et al. Resolution and contrast enhancement for Lensless digital holographic microscopy and its application in biomedicine[J]. Photonics, 2022, 9(5): 358. doi: 10.3390/photonics9050358 [5] GAO Pan, WANG Jun, GAO Yangzi, et al. Observation on the droplet ranging from 2 to 16 μm in cloud droplet size distribution based on digital holography[J]. Remote Sensing, 2022, 14(10): 2414. doi: 10.3390/rs14102414 [6] CHANG Xuyang, BIAN liheng, GAO Yunhui, et al. Plug-and-play pixel super-resolution phase retrieval for digital holography[J]. Optics Letters, 2022, 47(11): 2658–2661. doi: 10.1364/OL.458117 [7] 马静, 邸江磊, 肖锋. OpenMP并行程序在数字全息三维重构中的应用[J]. 计算机技术与发展, 2018, 28(3): 150–153,159. doi: 10.3969/j.issn.1673-629X.2018.03.032MA Jing, DI Jianglei, and XIAO Feng. Application of OpenMP parallel program in 3D reconstruction of digital holography[J]. Computer Technology and Development, 2018, 28(3): 150–153,159. doi: 10.3969/j.issn.1673-629X.2018.03.032 [8] CHEN Huanyuan, HWANG W J, CHENG C J, et al. An FPGA-based autofocusing hardware architecture for digital holography[J]. IEEE Transactions on Computational Imaging, 2019, 5(2): 287–300. doi: 10.1109/TCI.2019.2892810 [9] 刘海, 赵志雄, 税云秀, 等. CUDA架构下的数字全息粒子三维速度矢量场快速重建[J]. 激光杂志, 2017, 38(4): 57–60. doi: 10.14016/j.cnki.jgzz.2017.04.057LIU Hai, ZHAO Zhixiong, SHUI Yunxiu, et al. High-speed digital holographic reconstruction of 3D particles velocity vector filed with CUDA[J]. Laser Journal, 2017, 38(4): 57–60. doi: 10.14016/j.cnki.jgzz.2017.04.057 [10] SHIN J G, KIM J W, LEE J H, et al. Accurate reconstruction of digital holography using frequency domain zero padding[J]. SPIE, 2017, 10323, 103235H. [11] DOĞAR M, İLHAN H A, and ÖZCAN M. Real-time reconstruction of digital holograms with GPU[C]. SPIE 8644, Practical Holography XXVII: Materials & Applications, San Francisco, USA, 2013: 86440B. [12] 王广俊, 王大勇, 王华英. 数字全息显微中常见重建算法比较[J]. 激光与光电子学进展, 2010, 47(3): 030901.WANG Guangjun, WANG Dayong, and WANG Huaying. Comparison of commonly used numerical reconstruction algorithms in digital holographic microscopy[J]. 2010, 47(3): 030901. [13] 何希, 吴炎桃, 邸臻炜, 等. 基于图形处理器的形态学重建系统[J]. 计算机应用, 2019, 39(7): 2008–2013. doi: 10.11772/j.issn.1001-9081.2018122549HE Xi, WU Yantao, DI Zhenwei, et al. GPU-based morphological reconstruction system[J]. Journal of Computer Applications, 2019, 39(7): 2008–2013. doi: 10.11772/j.issn.1001-9081.2018122549 [14] 于梦华, 王双亭, 李英成, 等. 畸变差改正算法OpenCL并行加速研究[J]. 遥感信息, 2019, 34(3): 88–92. doi: 10.3969/j.issn.1000-3177.2019.03.014YU Menghua, WANG Shuangting, LI Yingcheng, et al. Distortion algorithm OpenCL parallel acceleration[J]. Remote Sensing Information, 2019, 34(3): 88–92. doi: 10.3969/j.issn.1000-3177.2019.03.014 [15] KARIMI K, DICKSON N G, and HAMZE F. A performance comparison of CUDA and OpenCL[J]. arXiv: 1005.2581, 2010. [16] YU Leiming, NINA-PARAVECINO F, KAELI D R, et al. Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms[J]. Journal of Biomedical Optics, 2018, 23(1): 010504. doi: 10.1117/1.JBO.23.1.010504 [17] FANG Jianbin, VARBANESCU A L, and SIPS H. A comprehensive performance comparison of CUDA and OpenCL[C]. 2011 International Conference on Parallel Processing, Taipei, China, 2011: 216–225. [18] HOLM H H, BRODTKORB A R, and SAETRA M L. Performance and energy efficiency of CUDA and OpenCL for GPU computing using python[M]. FOSTER I, JOUBERT G R, KUCERA L, et al. Parallel Computing: Technology Trends. Amsterdam: IOS Press, 2020, 36: 593–604. [19] DU Peng, WEBER R, LUSZCZEK P, et al. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming[J]. Parallel Computing, 2012, 38(8): 391–407. doi: 10.1016/j.parco.2011.10.002 [20] LOBATO GIMENES T, PISANI F, and BORIN E. Evaluating the performance and cost of accelerating seismic processing with CUDA, OpenCL, OpenACC, and OpenMP[C]. 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, Canada, 2018: 399–408.