DNA Data Storage
-
摘要: 分子数据存储作为一种稳定性强、存储密度高的数据存储方式,表现出巨大的潜力。它有望解决当今日益增长的巨大信息量与存储能力之间差距不断扩大的问题。作为一种典型的分子数据存储方式,DNA数据存储可以作为一种替代性、变革性的存储介质,用于突破现用存储方式的物理极限,满足不断增加的数据存储需求。该综述将对DNA数据存储的历史、工作流程、及当前的发展状态进行概述,同时讨论现今DNA数据存储存在的问题、挑战及发展趋势。Abstract: Molecular data storage has great potential as durable and high-density data-storage media, which will deal with the growing gap between produced information and the data storage ability. With storing data in molecular form, DNA can provide alternative substrates for storage to overcome the physical limits for existing medias. This review provides an overview of the history, process and the current status of the DNA data storage, and presents the problems of current data storage technology.
-
Key words:
- Molecular data storage /
- DNA data storage /
- Encoding /
- Decoding /
- Read
-
表 1 体外DNA数据存储比较研究
文献 数据容量 合成方法 测序方法 物理冗余
(覆盖率)重新组装 链长
(碱基数)逻辑密度
(bit/碱基)逻辑密度
(有效载荷)是否能
随机访问文献[31] 650 kB 亚磷酰胺(沉积) 合成测序 3000× 索引序列连接 115 0.60 0.83 否 文献[32] 630 kB 亚磷酰胺(沉积) 合成测序 51× 重叠序列连接 117 0.19 0.29 否 文献[17] 80 kB 亚磷酰胺(电化学) 合成测序 372× 索引序列连接 158 0.86 1.16 否 文献[37,45] 3 kB 亚磷酰胺(沉积) 纳米孔测序 200× 索引序列连接 880~1000 1.71 1.74 是 文献[38] 2 MB 亚磷酰胺(沉积) 合成测序 10.5× 种子序列连接 152 1.18 1.55 否 文献[46] 22 MB 亚磷酰胺(沉积) 合成测序 160× 索引序列连接 230 0.89 1.08 否 文献[36] 150 kB 亚磷酰胺(电化学) 合成测序 40× 索引序列连接 117 0.57 0.85 是 文献[12] 200 MB 亚磷酰胺(沉积) 合成测序 5× 索引序列连接 150~200 0.81 1.10 是 文献[43] 8.5 MB 亚磷酰胺(沉积) 合成测序 164× 索引序列连接 194 1.94 2.64 否 文献[44] 854 kB 亚磷酰胺(柱子) 合成测序 250× 索引序列连接 85 1.78 3.37 否 文献[12] 33 kB 亚磷酰胺(沉积) 纳米孔测序 36× 索引序列连接 150 0.81 1.10 是 文献[47] 18 B 酶(柱基) 纳米孔测序 175× 无(单体) 150~200 1.57 1.57 否 -
GANTZ J and REINSEL D. The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far East[R]. IDC iView, 2012: 1–16. EXTANCE A. How DNA could store all the world’s data[J]. Nature, 2016, 537(7618): 22–24. doi: 10.1038/537022a ZHIRNOV V, ZADEGAN R M, SANDHU G S, et al. Nucleic acid memory[J]. Nature Materials, 2016, 15(4): 366–370. doi: 10.1038/nmat4594 COLQUHOUN H and LUTZ J F. Information-containing macromolecules[J]. Nature Chemistry, 2014, 6(6): 455–456. doi: 10.1038/nchem.1958 王君珂, 印珏, 牛人杰, 等. DNA计算与DNA纳米技术[J]. 电子与信息学报, 2020, 42(6): 1313–1325. doi: 10.11999/JEIT190826.WANG Junke, YIN Jue, NIU Renjie, et al. DNA computing and DNA nanotechnology[J]. Journal of Electronics & Information Technology, 2020, 42(6): 1313–1325. doi: 10.11999/JEIT190826. 许进, 强小利, 张凯, 等. 基于探针图的并行型图顶点着色DNA计算模型(英文)[J]. 工程, 2018, 4(1): 61–77. doi: 10.1016/j.eng.2018.02.011XU Jin, QIANG Xiaoli, ZHANG Kai, et al. A DNA computing model for the graph vertex coloring problem based on a probe graph[J]. Engineering, 2018, 4(1): 61–77. doi: 10.1016/j.eng.2018.02.011 蓝雯飞, 邢志宝, 黄俊, 等. DNA自组装计算模型求解二部图完美匹配问题[J]. 计算机研究与发展, 2016, 53(11): 2583–2593. doi: 10.7544/issn1000-1239.2016.20150312LAN Wenfei, XING Zhibao, HUANG Jun, et al. The DNA self-assembly computing model for solving perfect matching problem of bipartite graph[J]. Journal of Computer Research and Development, 2016, 53(11): 2583–2593. doi: 10.7544/issn1000-1239.2016.20150312 朱维军, 周清雷, 张钦宪. 基于DNA计算的线性时序逻辑模型检测方法[J]. 计算机学报, 2016, 39(12): 2578–2597. doi: 10.11897/SP.J.1016.2016.02578ZHU Weijun, ZHOU Qinglei, and ZHANG Qinxian. A LTL model checking approach based on DNA computing[J]. Chinese Journal of Computers, 2016, 39(12): 2578–2597. doi: 10.11897/SP.J.1016.2016.02578 夏宏, 张实君. 基于分子计算的逻辑模型构建[J]. 科技通报, 2016, 32(5): 11–15. doi: 10.3969/j.issn.1001-7119.2016.05.003XIA Hong and ZHANG Shijun. Constructing the logical model based on molecular computing[J]. Bulletin of Science and Technology, 2016, 32(5): 11–15. doi: 10.3969/j.issn.1001-7119.2016.05.003 周旭, 周炎涛, 欧阳艾嘉, 等. 一种最大团问题的tile自组装高效模型[J]. 计算机研究与发展, 2014, 51(6): 1253–1262. doi: 10.7544/issn1000-1239.2014.20120904ZHOU Xu, ZHOU Yantao, OUYANG Aijia, et al. An efficient tile assembly model for maximum clique problem[J]. Journal of Computer Research and Development, 2014, 51(6): 1253–1262. doi: 10.7544/issn1000-1239.2014.20120904 周旭, 周炎涛, 李肯立, 等. 基于tile自组装模型的最大匹配问题算法研究[J]. 电子学报, 2015, 43(2): 262–268. doi: 10.3969/j.issn.0372-2112.2015.02.009ZHOU Xu, ZHOU Yantao, LI Kenli, et al. Efficient maximum matching problem algorithms in the tile assembly model[J]. Acta Electronica Sinica, 2015, 43(2): 262–268. doi: 10.3969/j.issn.0372-2112.2015.02.009 ORGANICK L, ANG S D, CHEN Y J, et al. Random access in large-scale DNA data storage[J]. Nature Biotechnology, 2018, 36(3): 242–248. doi: 10.1038/nbt.4079 RUTTEN M G T A, VAANDRAGER F W, ELEMANS J A A W, et al. Encoding information into polymers[J]. Nature Reviews Chemistry, 2018, 2(11): 365–381. doi: 10.1038/s41570-018-0051-5 DNA to the rescue for data storage[J]. Chemical & Engineering News, 2015, 93(35): 40-41. 陈为刚, 黄刚, 李炳志, 等. 音视频文件的DNA信息存储[J]. 中国科学: 生命科学, 2020, 50(1): 81–85. doi: 10.1360/SSV-2019-0211CHEN Weigang, HUANG Gang, LI Bingzhi, et al. DNA information storage for audio and video files[J]. Scientia Sinica Vitae, 2020, 50(1): 81–85. doi: 10.1360/SSV-2019-0211 GREENGARD S. Cracking the code on DNA storage[J]. Communications of the ACM, 2017, 60(7): 16–18. doi: 10.1145/3088493 GRASS R N, HECKEL R, PUDDU M, et al. Robust chemical preservation of digital information on DNA in silica with error-correcting codes[J]. Angewandte Chemie International Edition, 2015, 54(8): 2552–2555. doi: 10.1002/anie.201411378 LUNT B M. How long is long-term data storage?[C]. Archiving Conference, Society for Imaging Science and Technology, 2011: 29–33. SHRIVASTAVA S and BADLANI R. Data storage in DNA[J]. International Journal of Electrical Energy, 2014, 2(2): 119–124. GREENBERG A, HAMILTON J, MALTZ D A, et al. The cost of a cloud: Research problems in data center networks[J]. ACM SIGCOMM Computer Communication Review, 2008, 39(1): 68–73. doi: 10.1145/1496091.1496103 SHETH R U and WANG H H. DNA-based memory devices for recording cellular events[J]. Nature Reviews Genetics, 2018, 19(11): 718–732. doi: 10.1038/s41576-018-0052-8 WIENER N. Interview: Machines smarter than men[J]. US News World Report, 1964, 56: 84–86. NEIMAN M S. On the molecular memory systems and the directed mutations[J]. Radiotekhnika, 1965, 6: 1–8. DAVIS J. Microvenus[J]. Art Journal, 1996, 55(1): 70–74. doi: 10.1080/00043249.1996.10791743 CLELLAND C T, RISCA V, and BANCROFT C. Hiding messages in DNA microdots[J]. Nature, 1999, 399(6736): 533–534. doi: 10.1038/21092 BANCROFT C, BOWLER T, BLOOM B, et al. Long-term storage of information in DNA[J]. Science, 2001, 293(5536): 1763–1765. AILENBERG M and ROTSTEIN O D. An improved huffman coding method for archiving text, images, and music characters in DNA[J]. BioTechniques, 2009, 47(3): 747–754. doi: 10.2144/000113218 WONG P C, WONG K K, and FOOTE H. Organic data memory using the DNA approach[J]. Communications of the ACM, 2003, 46(1): 95–98. doi: 10.1145/602421.602426 ARITA M and OHASHI Y. Secret signatures inside genomic DNA[J]. Biotechnology Progress, 2004, 20(5): 1605–1607. doi: 10.1021/bp049917i YACHIE N, SEKIYAMA K, SUGAHARA J, et al. Alignment-based approach for durable data storage into living organisms[J]. Biotechnology Progress, 2007, 23(2): 501–505. doi: 10.1021/bp060261y CHURCH G M, GAO Yuan, and KOSURI S. Next-generation digital information storage in DNA[J]. Science, 2012, 337(6102): 1628. doi: 10.1126/science.1226355 GOLDMAN N, BERTONE P, CHEN Siyuan, et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA[J]. Nature, 2013, 494(7435): 77–80. doi: 10.1038/nature11875 GIBSON D G, GLASS J I, LARTIGUE C, et al. Creation of a bacterial cell controlled by a chemically synthesized genome[J]. Science, 2010, 329(5987): 52–56. doi: 10.1126/science.1190719 HECKEL R, SHOMORONY I, RAMCHANDRAN K, et al. Fundamental limits of DNA storage systems[C]. 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 2017: 3130–3134. KOSURI S and CHURCH G M. Large-scale de novo DNA synthesis: Technologies and applications[J]. Nature Methods, 2014, 11(5): 499–507. doi: 10.1038/nmeth.2918 BORNHOLT J, LOPEZ R, CARMEAN D M, et al. A DNA-based archival storage system[J]. ACM SIGPLAN Notices, 2016, 50(4): 637–649. YAZDI S M H T, YUAN Yongbo, MA Jian, et al. A rewritable, random-access DNA-based storage system[J]. Scientific Reports, 2015, 5: 14138. doi: 10.1038/srep14138 ERLICH Y and ZIELINSKI D. DNA fountain enables a robust and efficient storage architecture[J]. Science, 2017, 355(6328): 950–954. doi: 10.1126/science.aaj2038 谭丽, 孙季丰, 郭礼华. 基于memetic算法的DNA序列数据压缩方法[J]. 电子与信息学报, 2014, 36(1): 121–127.TAN Li, SUN Jifeng, and GUO Lihua. DNA sequence data compression method based on memetic algorithm[J]. Journal of Electronics &Information Technology, 2014, 36(1): 121–127. SHANNON C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27(3): 379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x HECKEL R, MIKUTIS G, and GRASS R N. A characterization of the DNA data storage channel[J]. Scientific Reports, 2019, 9(1): 9663. doi: 10.1038/s41598-019-45832-6 REED I S and SOLOMON G. Polynomial codes over certain finite fields[J]. Journal of the Society for Industrial and Applied Mathematics, 1960, 8(2): 300–304. doi: 10.1137/0108018 ANAVY L, VAKNIN I, ATAR O, et al. Improved DNA based storage capacity and fidelity using composite DNA letters[J]. bioRxiv, 2018. doi: 10.1101/433524 CHOI Y, RYU T, LEE A C, et al. Addition of degenerate bases to DNA-based data storage for increased information capacity[J]. bioRxiv, 2018. doi: 10.1101/367052 YAZDI S M H T, GABRYS R, and MILENKOVIC O. Portable and error-free DNA-based data storage[J]. Scientific Reports, 2017, 7: 5011. doi: 10.1038/s41598-017-05188-1 BLAWAT M, GAEDKE K, HÜTTER I, et al. Forward error correction for DNA data storage[J]. Procedia Computer Science, 2016, 80: 1011–1022. doi: 10.1016/j.procs.2016.05.398 LEE H H, KALHOR R, GOELA N, et al. Enzymatic DNA synthesis for digital information storage[J]. bioRxiv, 2018. doi: 10.1101/348987 BAUM E. Building an associative memory vastly larger than the brain[J]. Science, 1995, 268(5210): 583–585. doi: 10.1126/science.7725109 CARUTHERS M H. The chemical synthesis of DNA/RNA: Our gift to science[J]. Journal of Biological Chemistry, 2013, 288(2): 1420–1427. doi: 10.1074/jbc.X112.442855 GOODWIN S, MCPHERSON J D, and MCCOMBIE W R. Coming of age: Ten years of next-generation sequencing technologies[J]. Nature Reviews Genetics, 2016, 17(6): 333–351. doi: 10.1038/nrg.2016.49 SHENDURE J, BALASUBRAMANIAN S, CHURCH G M, et al. DNA sequencing at 40: Past, present and future[J]. Nature, 2017, 550(7676): 345–353. doi: 10.1038/nature24286 DEAMER D, AKESON M, and BRANTON D. Three decades of nanopore sequencing[J]. Nature Biotechnology, 2016, 34(5): 518–524. doi: 10.1038/nbt.3423 FONTANA JR R E and DECAD G M. Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical[J]. AIP Advances, 2018, 8(5): 056506. doi: 10.1063/1.5007621 BONNET J, COLOTTE M, COUDY D, et al. Chain and conformation stability of solid-state DNA: Implications for room temperature storage[J]. Nucleic Acids Research, 2010, 38(5): 1531–1546. doi: 10.1093/nar/gkp1060 PRAKADAN S M, SHALEK A K, and WEITZ D A. Scaling by shrinking: Empowering single-cell 'omics' with microfluidic devices[J]. Nature Reviews Genetics, 2017, 18(6): 345–361. doi: 10.1038/nrg.2017.15 NEWMAN S, STEPHENSON A P, WILLSEY M, et al. High density DNA data storage library via dehydration with digital microfluidic retrieval[J]. Nature Communications, 2019, 10(1): 1706. doi: 10.1038/s41467-019-09517-y