高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种高效的前向纠错码桶分配DNA存储解码方法

昝乡镇 姚翔宇 许鹏 陈智华 石晓龙 李树栋 刘文斌

昝乡镇, 姚翔宇, 许鹏, 陈智华, 石晓龙, 李树栋, 刘文斌. 一种高效的前向纠错码桶分配DNA存储解码方法[J]. 电子与信息学报, 2022, 44(10): 3650-3656. doi: 10.11999/JEIT210697
引用本文: 昝乡镇, 姚翔宇, 许鹏, 陈智华, 石晓龙, 李树栋, 刘文斌. 一种高效的前向纠错码桶分配DNA存储解码方法[J]. 电子与信息学报, 2022, 44(10): 3650-3656. doi: 10.11999/JEIT210697
ZAN Xiangzhen, YAO Xiangyu, XU Peng, CHEN Zhihua, SHI Xiaolong, LI Shudong, LIU Wenbin. An Efficient Bueket-allocation Decoding Method Based on Forward Error Correction Codes for Deoxyribo Nucleicecid Storage[J]. Journal of Electronics & Information Technology, 2022, 44(10): 3650-3656. doi: 10.11999/JEIT210697
Citation: ZAN Xiangzhen, YAO Xiangyu, XU Peng, CHEN Zhihua, SHI Xiaolong, LI Shudong, LIU Wenbin. An Efficient Bueket-allocation Decoding Method Based on Forward Error Correction Codes for Deoxyribo Nucleicecid Storage[J]. Journal of Electronics & Information Technology, 2022, 44(10): 3650-3656. doi: 10.11999/JEIT210697

一种高效的前向纠错码桶分配DNA存储解码方法

doi: 10.11999/JEIT210697
基金项目: 国家自然科学基金(62072128, 61876047, 62002079)
详细信息
    作者简介:

    昝乡镇:男,博士生,研究方向为DNA存储、生物信息学

    姚翔宇:男,硕士生,研究方向为DNA存储、生物信息学

    许鹏:男,副教授,研究方向为DNA存储、生物信息学

    陈智华:女,副教授,研究方向为DNA存储、生物信息学

    石晓龙:男,教授,研究方向为DNA存储、生物信息学

    李树栋:男,副教授,研究方向为DNA存储、网络安全

    刘文斌:男,教授,研究方向为DNA存储、生物信息学

    通讯作者:

    刘文斌 wbliu6910@gzhu.edu.cn

  • 中图分类号: TN918.3

An Efficient Bueket-allocation Decoding Method Based on Forward Error Correction Codes for Deoxyribo Nucleicecid Storage

Funds: The National Natural Science Foundation of China (62072128, 61876047, 62002079)
  • 摘要: 与传统存储方式相比,脱氧核糖核酸(DNA)存储的难点是测序序列中的插入和删除错误给信息解码过程带来了巨大挑战。针对具有1位纠错能力的前向纠错编码DNA存储,该文提出一种桶式分配策略提高解码的精度和效率。首先,搜索每个分组中所有测序读长的可识别DNA码,根据1位纠错能力确定其对应的合法编码;其次,根据每个可识别DNA码在测序读长的位置确定相应编码的最佳编码位置(即桶);最后,按照众数投票确定每个桶中的最终编码。仿真结果表明在0.10和0.05错误率条件下,平均解码准确率在20X测序深度时可达94%以上;在0.15错误率条件下,平均解码准确率在60X测序深度时可达90%以上。
  • 图  1  DNA存储序列结构示意图

    图  2  桶式分配纠错策略示意图

    图  3  3种分组策略性能比较

    图  4  解码过程的平均正确率与平均运行时间

    表  1  编码表

    DNA编码字母标点符号数字DNA编码字母标点符号数字
    TAACCGa@4ACACACl6
    TAAGGCpACTCTGi"5
    ATCACGe$2TCAGAGj% 
    ATGAGCy , 8ACAGGTx}{ 
    ATGGAGg*ACTGCAf~9
    TACCACk/AGACCTs+0
    ATCCGTb()TCTCGTh- 
    ATGCCAv : TCGAACc ? 1
    TAGCGAr'3ACGACTo !  
    TAGGCTt[]CATTCGz& 
    ACATCGw ; CTACAGq= 
    ACTTGCn . CAACGTd__ 
    TCTACGmEnter CAGACAu#7
    TGCATA大写键GTATGA标点符号键
    CTTGTC数字键CGGTAT空格键
    下载: 导出CSV

    表  2  与其他方法的比较

    文献纠错策略是否插入/删除?测序深度(X)错误率(%)数据恢复准确率存储方法
    文献[20]HEDGES5031.000体外
    文献[21]莱文斯坦码NA10.900体外
    文献[22]德布莱茵图60100.920体外
    文献[23]3层纠错机制2050.905体外
    文献[25]2740≤20.998体外
    文献[26]NA≤21.000体内
    本文方法桶式分配纠错2050.970体外
    20100.940
    60150.906
    下载: 导出CSV
  • [1] REINSEL D, GANTZ J, and RYDNING J. The digital of the world from edge to core[EB/OL]. http://book.itep.ru/depository/dig_economy/idc-seagate-dataage-whitepaper.pdf, 2020.
    [2] WILLIAMS E D, AYRES R U, and HELLER M. The 1.7 kilogram microchip:  Energy and material use in the production of semiconductor devices[J]. Environmental Science & Technology, 2002, 36(24): 5504–5510. doi: 10.1021/es025643o
    [3] GODA K and KITSUREGAWA M. The history of storage systems[J]. Proceedings of the IEEE, 2012, 100: 1433–1440. doi: 10.1109/JPROC.2012.2189787
    [4] 许鹏, 方刚, 石晓龙, 等. DNA存储及其研究进展[J]. 电子与信息学报, 2020, 42(6): 1326–1331. doi: 10.11999/JEIT190863

    XU Peng, FANG Gang, SHI Xiaolong, et al. DNA storage and its research progress[J]. Journal of Electronics &Information Technology, 2020, 42(6): 1326–1331. doi: 10.11999/JEIT190863
    [5] 刘文斌, 朱翔鸥, 王向红, 等. 一种优化DNA计算模板性能的新方法[J]. 电子与信息学报, 2008, 30(5): 1131–1135.

    LIU Wenbin, ZHU Xiangou, WANG Xianghong, et al. A new method to optimize the template set in DNA computing[J]. Journal of Electronics &Information Technology, 2008, 30(5): 1131–1135.
    [6] CEZE L, NIVALA J, and STRAUSS K. Molecular digital data storage using DNA[J]. Nature Reviews Genetics, 2019, 20(8): 456–466. doi: 10.1038/s41576-019-0125-3
    [7] GAO Yanmin, CHEN Xin, QIAO Hongyan, et al. Low-bias manipulation of DNA oligo pool for robust data storage[J]. ACS Synthetic Biology, 2020, 9(12): 3344–3352. doi: 10.1021/acssynbio.0c00419
    [8] DONG Yiming, SUN Fajia, PING Zhi, et al. DNA storage: Research landscape and future prospects[J]. National Science Review, 2020, 7(6): 1092–1107. doi: 10.1093/nsr/nwaa007
    [9] HECKEL R, MIKUTIS G, and GRASS R N. A characterization of the DNA data storage channel[J]. Scientific Reports, 2019, 9(1): 9663. doi: 10.1038/s41598-019-45832-6
    [10] STANCU M C, VAN ROOSMALEN M J, RENKENS I, et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing[J]. Nature Communications, 2017, 8(1): 1326. doi: 10.1038/s41467-017-01343-4
    [11] TAKAHASHI C N, NGUYEN B H, STRAUSS K, et al. Demonstration of end-to-end automation of DNA data storage[J]. Scientific Reports, 2019, 9(1): 4998. doi: 10.1038/s41598-019-41228-8
    [12] KUMAR U K and UMASHANKAR B S. Improved hamming code for error detection and correction[C]. 2007 2nd International Symposium on Wireless Pervasive Computing, San Juan, USA, 2007: 1. doi: 10.1109/ISWPC.2007.342654.
    [13] BLAWAT M, GAEDKE K, HÜTTER I, et al. Forward error correction for DNA data storage[J]. Procedia Computer Science, 2016, 80: 1011–1022. doi: 10.1016/j.procs.2016.05.398
    [14] LU Xiaozhou, JEONG J, KIM J W, et al. Error rate-based log-likelihood ratio processing for low-density parity-check codes in DNA storage[J]. Ieee Access, 2020, 8: 162892–162902. doi: 10.1109/ACCESS.2020.3021700
    [15] ORGANICK L, ANG S D, CHEN Y J, et al. Random access in large-scale DNA data storage[J]. Nature Biotechnology, 2018, 36(3): 242–248. doi: 10.1038/nbt.4079
    [16] ANTKOWIAK P L, LIETARD J, DARESTANI M Z, et al. Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction[J]. Nature Communications, 2020, 11(1): 5345. doi: 10.1038/s41467-020-19148-3
    [17] MEISER L C, ANTKOWIAK P L, KOCH J, et al. Reading and writing digital data in DNA[J]. Nature Protocols, 2020, 15(1): 86–101. doi: 10.1038/s41596-019-0244-5
    [18] ERLICH Y and ZIELINSKI D. DNA Fountain enables a robust and efficient storage architecture[J]. Science, 2017, 355(6328): 950–954. doi: 10.1126/science.aaj2038
    [19] JEONG J, PARK S J, KIM J W, et al. Cooperative sequence clustering and decoding for DNA storage system with fountain codes[J]. Bioinformatics, 2021, 37(19): 3136–3143. doi: 10.1093/bioinformatics/btab246
    [20] PRESS W H, HAWKINS J A, JONES JR S K, et al. HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints[J]. Proceedings of the National Academy of Sciences of the United States of America, 2020, 117(31): 18489–18496. doi: 10.1073/pnas.2004821117
    [21] XUE Tianbo and LAU F C M. Notice of violation of IEEE publication principles: Construction of GC-balanced DNA with deletion/insertion/mutation error correction for DNA storage system[J]. IEEE Access, 2020, 8: 140972–140980. doi: 10.1109/ACCESS.2020.3012688
    [22] SONG Lifu, GENG Feng, GONG Ziyi, et al. . Robust data storage in DNA by de Bruijn graph-based decoding[J]. bioRxiv, 2022, 13(1): 5361. doi: 10.1101/2020.12.20.423642.
    [23] ZAN Xiangzhen, YAO Xiangyu, XU Peng, et al. A hierarchical error correction strategy for text DNA storage[J]. Interdisciplinary Sciences: Computational Life Sciences, 2022, 14(1): 141–150. doi: 10.1007/s12539-021-00476-x.
    [24] BORNHOLT J, LOPEZ R, CARMEAN D M, et al. A DNA-based archival storage system[J]. ACM SIGPLAN Notices, 2016, 51(4): 637–649. doi: 10.1145/2954679.2872397
    [25] ZHONG Yunpeng, QI Shanshan, SHENG Fuxu, et al. A new digital information storing and reading system based on synthetic DNA[J]. Science China Life Sciences, 2018, 61(6): 733–735. doi: 10.1007/s11427-017-9131-7
    [26] LEE U J, HWANG S, KIM K E, et al. DNA data storage in Perl[J]. Biotechnology and Bioprocess Engineering, 2020, 25(4): 607–615. doi: 10.1007/s12257-020-0022-9
  • 加载中
图(4) / 表(2)
计量
  • 文章访问数:  977
  • HTML全文浏览量:  534
  • PDF下载量:  124
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-07-13
  • 修回日期:  2021-09-30
  • 网络出版日期:  2021-10-26
  • 刊出日期:  2022-10-19

目录

    /

    返回文章
    返回