Error Correction of Lempel-Ziv-Welch Compressed Data
-
摘要:
无损数据压缩系统在通信传输过程中容易出现错误,会导致码表和重构数据出错并引发误码扩散,影响其在文件系统和无线通信中的应用。针对在通用编码领域广泛使用的无损数据压缩算法LZW,该文分析并利用LZW压缩数据的冗余,通过选取部分编码码字并动态调整其对应的被压缩符号串的长度来携带校验码,提出了具有误码纠正能力的无损数据压缩方法CLZW。该方法不用额外添加数据,也不改变数据规格和编码规则,与标准LZW算法兼容。实验结果表明,用该方法压缩的文件仍然能用标准LZW解码器解压,且该方法可以对LZW压缩数据的误码进行有效纠正。
-
关键词:
- Lempel-Ziv-Welch算法 /
- 数据压缩 /
- 误码纠正
Abstract:Lossless data compression system is prone to bit error and causes error spread during communication transmission, which affects its application to file system and wireless communication. For the lossless data compression algorithm Lempel-Ziv-Welch (LZW), which is widely used in the field of general coding, analyzes and utilizes the redundancy of LZW compressed data, carries the check code by selecting part of the codeword and dynamically adjusting the length of its corresponding compressed string. A lossless data compression method Carrier-LZW(CLZW) with error correction capability is proposed. This method does not need additional data, does not change the data specification and coding rules, and is compatible with the standard LZW algorithm. The experimental results show that the file compressed by this method can still be decompressed by the standard LZW decoder. In the range of error correction capability, the method can effectively correct the error of LZW compressed data.
-
Key words:
- Lempel-Ziv-Welch(LZW) algorithm /
- Data compression /
- Error correction
-
表 1 分别用LZW与CLZW压缩坎特伯雷语料库的对比(K=3, L=1)
文件名 $|T|$ $|T'|$ $|T{'_M}|$ $l$ ${l_M}$ $|T{'_M}|$–$|T'|$ $|M|$ R RM alice29 152089 72322 76194 3.65 3.23 3872 3982 0.053538 0.055059 cp 24603 12228 12856 3.92 3.49 628 716 0.051358 0.058554 fields 11150 5316 5580 4.11 3.66 264 322 0.049661 0.060572 ptt5 513216 70228 73961 5.78 5.30 3733 4295 0.053155 0.061158 sum 38240 31940 32605 2.49 2.17 665 1356 0.020820 0.043827 表 2 分别用LZW与CLZW压缩坎特伯雷语料库的对比
文件名 $|T|$ $|T'|$ $|T{'_M}|$ $l$ ${l_M}$ $|T{'_M}|$–$|T'|$ $|M|$ R RM alice29 152089 72322 76194 3.65 3.23 3872 4113 0.053538 0.056871 cp 24603 12228 12856 3.92 3.49 628 758 0.051358 0.061989 fields 11150 5316 5580 4.11 3.66 264 331 0.049661 0.062265 ptt5 513216 70228 73961 5.78 5.30 3733 4614 0.053155 0.065700 sum 38240 31940 32605 2.49 2.17 665 1370 0.020820 0.044279 表 3 1≤ K ≤5且1≤ L ≤2携带消息量RM的实验结果
文件名 L=1 L=2 K=1 K=2 K=3 K=4 K=5 K=1 K=2 K=3 K=4 K=5 alice29 0.081577 0.077739 0.055059 0.038675 0.024377 0.140538 0.100949 0.062275 0.037212 0.023465 cp 0.077334 0.080686 0.058554 0.040188 0.025369 0.126359 0.096884 0.058758 0.040893 0.025621 fields 0.079725 0.077587 0.060572 0.0398761 0.026660 0.116642 0.0866315 0.064385 0.040385 0.028232 ptt5 0.083042 0.080529 0.061158 0.040919 0.030843 0.130991 0.104748 0.069976 0.043431 0.030271 sum 0.073440 0.072135 0.043827 0.026355 0.018469 0.072916 0.055985 0.038270 0.029750 0.016390 -
BERTINO E, CHOO K K R, GEORGAKOPOLOUS D, et al. Internet of Things (IoT): Smart and secure service delivery[J]. ACM Transactions on Internet Technology, 2016, 16(4): 22. doi: 10.1145/3013520 TALWANA J C and HUANG Jianhua. Smart world of Internet of Things (IoT) and its security concerns[C]. 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Chengdu, China, 2016: 240–245. doi: 10.1109/iThings-GreenCom-CPSCom-SmartData.2016.64. WEN Lulu, ZHOU Kaile, YANG Shanlin, et al. Compression of smart meter big data: A survey[J]. Renewable and Sustainable Energy Reviews, 2018, 91: 59–69. doi: 10.1016/j.rser.2018.03.088 CHENG Ledan, GUO Songtao, WANG Ying, et al. Lifting wavelet compression based data aggregation in big data wireless sensor networks[C]. The 22nd IEEE International Conference on Parallel and Distributed Systems, Wuhan, China, 2016: 561–568. doi: 10.1109/ICPADS.2016.0080. 徐金甫, 刘露, 李伟, 等. 一种基于阵列配置加速比模型的无损压缩算法[J]. 电子与信息学报, 2018, 40(6): 1492–1498. doi: 10.11999/JEIT170900XU Jinfu, LIU Lu, LI Wei, et al. A new lossless compression algorithm based on array configuration speedup model[J]. Journal of Electronics &Information Technology, 2018, 40(6): 1492–1498. doi: 10.11999/JEIT170900 姚军财, 刘贵忠. 一种基于人眼对比度敏感视觉特性的图像自适应量化方法[J]. 电子与信息学报, 2016, 38(5): 1202–1210. doi: 10.11999/JEIT150848YAO Juncai and LIU Guizhong. An adaptive quantization method of image based on the contrast sensitivity characteristics of human visual system[J]. Journal of Electronics &Information Technology, 2016, 38(5): 1202–1210. doi: 10.11999/JEIT150848 YANG J and BHATTACHARYA K. Combining image compression with digital image correlation[J]. Experimental Mechanics, 2019, 59(5): 629–642. doi: 10.1007/s11340-018-00459-y BLASCH E, CHEN Huamei, IRVINE J M, et al. Prediction of compression-induced image interpretability degradation[J]. Optical Engineering, 2018, 57(4): 043108. doi: 10.1117/1.OE.57.4.043108 王刚, 彭华, 唐永旺. 破损压缩文件的修复还原[J]. 电子与信息学报, 2019, 41(8): 1831–1837. doi: 10.11999/JEIT180942WANG Gang, PENG Hua, and TANG Yongwang. Repair and restoration of corrupted compressed files[J]. Journal of Electronics &Information Technology, 2019, 41(8): 1831–1837. doi: 10.11999/JEIT180942 罗瑜, 张珍珍. 一种快速的纹理预测和混合哥伦布的无损压缩算法[J]. 电子与信息学报, 2018, 40(1): 137–142. doi: 10.11999/JEIT170305LUO Yu and ZHANG Zhenzhen. A fast-lossless compression using texture prediction and mixed golomb coding[J]. Journal of Electronics &Information Technology, 2018, 40(1): 137–142. doi: 10.11999/JEIT170305 WELCH T A. A technique for high-performance data compression[J]. Computer, 1984, 17(6): 8–19. doi: 10.1109/MC.1984.1659158 WANG Digang, ZHAO Xiaoqun, and SUN Qingquan. Novel fault-tolerant decompression method of corrupted huffman files[J]. Wireless Personal Communications, 2018, 102(4): 2555–2574. doi: 10.1007/s11277-018-5277-5 DRMOTA M and SZPANKOWSKI W. Redundancy of lossless data compression for known sources by analytic methods[J]. Foundations and Trends® in Communications and Information Theory, 2017, 13(4): 277–417. doi: 10.1561/0100000090 KOGA H and YAMAMOTO H. Asymptotic properties on codeword lengths of an optimal FV code for general sources[J]. IEEE Transactions on Information Theory, 2005, 51(4): 1546–1555. doi: 10.1109/TIT.2005.844098 FRENKEL S, KOPEETSKY M, and MOLOTKOVSKI R. Lempel-Ziv-welch compression algorithm with exponential decay[C]. The 2nd International Symposium on Stochastic Models in Reliability Engineering, Life Science and Operations Management, Beer-Sheva, Israel, 2016: 616–619. doi: 10.1109/SMRLO.2016.108. 李从鹤, 郑辉. 一种用于文本压缩的信源容错译码算法[J]. 无线电通信技术, 2006, 32(2): 36–38, 64. doi: 10.3969/j.issn.1003-3114.2006.02.013LI Conghe and ZHENG Hui. A fault-tolerance decoding algorithm for text compression[J]. Radio Communications Technology, 2006, 32(2): 36–38, 64. doi: 10.3969/j.issn.1003-3114.2006.02.013 KLEIN S T and SHAPIRA D. Practical fixed length Lempel-Ziv coding[J]. Discrete Applied Mathematics, 2014, 163: 326–333. doi: 10.1016/j.dam.2013.08.022 ZHANG Jie, YANG Enhui, and KIEFFER J C. A universal grammar-based code for lossless compression of binary trees[J]. IEEE Transactions on Information Theory, 2014, 60(3): 1373–1386. doi: 10.1109/TIT.2013.2295392 KWON B, GONG M, and LEE S. Novel error detection algorithm for LZSS compressed data[J]. IEEE Access, 2017, 5: 8940–8947. doi: 10.1109/ACCESS.2017.2704900 KITAKAMI M and KAWASAKI T. Burst error recovery method for LZSS coding[J]. IEICE Transactions on Information and Systems, 2009, 92(12): 2439–2444. doi: 10.1587/transinf.e92.d.2439 PEREIRA Z C, PELLENZ M E, SOUZA R D, et al. Unequal error protection for LZSS compressed data using Reed-Solomon codes[J]. IET Communications, 2007, 1(4): 612–617. doi: 10.1049/iet-com:20060530 KEMPA D and KOSOLOBOV D. LZ-end parsing in compressed space[C]. 2017 Data Compression Conference, Snowbird, USA, 2017: 350-359. DO H H, JANSSON J, SADAKANE K, et al. Fast relative Lempel-Ziv self-index for similar sequences[J]. Theoretical Computer Science, 2014, 532: 14–30. doi: 10.1016/j.tcs.2013.07.024 REED I S and SOLOMON G. Polynomial codes over certain finite fields[J]. Journal of the Society for Industrial and Applied Mathematics, 1960, 8(2): 300–304. doi: 10.1137/0108018 LOUCHARD G and SZPANKOWSKI W. On the average redundancy rate of the Lempel-Ziv code[J]. IEEE Transactions on Information Theory, 1997, 43(1): 2–8. doi: 10.1109/18.567640 DAS S, BULL D M, and WHATMOUGH P N. Error-resilient design techniques for reliable and dependable computing[J]. IEEE Transactions on Device and Materials Reliability, 2015, 15(1): 24–34. doi: 10.1109/tdmr.2015.2389038 The Canterbury corpus[EB/OL]. http://corpus.canterbury.ac.nz/descriptions/#cantrbry, 2018.