Since binary became a computer language, the development of human society has been greatly accelerated, and 0 and 1 have changed the way of human life in countless circuits. The emergence of computers is like a big explosion in the information age, allowing human beings to quickly process large amounts of data. Both daily life and scientific research have undergone earth-shaking changes.
A large amount of data has brought about changes and challenges. According to statistics, the total amount of global data will reach 84.5ZB in 2021. (1ZB=1021B) Such a huge amount of data not only puts forward high requirements for computing power, but also puts forward extremely high requirements for storage power. In order to store such a large amount of data, the data also lives in “buildings” such as data centers. By 2024, the number of hyperscale data centers in the world may reach 1,000. More and more data centers are being built, but with limited land resources, it is a luxury to build “skyscrapers” for data centers, so increasing data storage density has become another solution.
In order to find a more efficient storage carrier, researchers turned their attention to DNA, the carrier of genetic information in nature. As a term of genetics, the public is no stranger to DNA. In the genetic process, the DNA sequence stores the genetic information, and then the genetic information is copied through the process of transcription and translation to maintain the development and normal operation of organisms. Some researchers once speculated that aliens (or advanced civilizations) stored some information in the genome of living things, waiting for humans to decrypt it. This seemingly science fiction speculation is actually based on an important fact: DNA has conveyed important information in the evolution of human beings for thousands of years, and is one of the densest and most stable information media known.
How is DNA storage technology realized, and what changes can it bring?
01
Is DNA storage reliable?
On a technical level, DNA storage has proven to be feasible.
The idea of using DNA to store information dates back to the dawn of molecular biology. Biochemist Frederick Sanger invented the Sanger sequencing method to make the DNA sequence measurable. From then on, humans can read the nucleotide sequence coded as A, T, C, and G, which are arranged and combined. Now that 0 and 1 can become a computer language, it is also possible to use DNA sequences to transmit specific information. However, at the time, it cost $6,000 to synthesize a 10-base DNA sequence, and although the material performed well, the price was too high.
With the development of new technologies for DNA synthesis and sequencing, DNA is no longer a fantasy as a digital storage medium. In 2001, a research team encoded two quotes from Charles Dickens into the DNA sequence. Use three bases to represent an English letter, such as A=AAA, B=AAC. In 2009, a research team successfully encoded the lyrics, music score and a picture of the nursery rhyme “Mary Had a Little Lamb” into a set of DNA sequences.
There are two main advantages of DNA storage. One is that the storage conditions are simple. For DNA, as long as the temperature is kept low enough, the data can be stored for thousands of years, so the cost of ownership is almost reduced to zero; DNA can accurately load massive amounts of data at a density far exceeding that of electronic devices. DNA storage technology is more suitable for storing important “cold data” that does not require frequent access and recall. “Cold data” can theoretically be stored for more than a thousand years under the condition of close to zero energy consumption. In the future, DNA storage is likely to become the main storage medium for huge cold data storage.
The second is that DNA has a high storage density and a small footprint. If it is stored in the form of DNA, each movie produced can be stored in a space smaller than a sugar cube. Calculations published in Nature Materials in 2016 by George Church of Harvard University and colleagues put the storage density of the simple bacterium Escherichia coli at about 1019 bits per cubic centimeter. At that density, a cube of DNA about a meter on a side could well meet the world’s current storage needs for a year. In terms of weight, the data storage capacity per gram of DNA can reach 215PB, which is about 2,2544,3840 gigabytes (GB), which is equivalent to the data storage capacity of 220,000 1TB hard drives.
02
There have been breakthroughs in DNA storage
Research on DNA storage has made some breakthroughs in recent years. DNA is already being used to manage data differently by researchers who are struggling to make sense of the deluge of data. Recent advances in next-generation sequencing technology allow the easy simultaneous reading of billions of DNA sequences. With this capability, researchers can use DNA sequences as molecular identification “tags” to track experimental results.
The Harvard team used CRISPR DNA editing to record images of human hands into the genome of Escherichia coli bacteria with more than 90 percent read accuracy. Researchers in Switzerland have devised a “DNA-of-things” (DoT) memory architecture to produce materials with immutable memory. In the DoT framework, DNA molecules record data, and these molecules are then encapsulated in nano-silica nanobeads that are fused into various materials for printing or casting objects of any shape.
Researchers at the University of Washington and Microsoft Research have developed a fully automated system for writing, storing and reading DNA-encoded data.
In December 2021, Chinese DNA storage researchers announced the development of a SlipChip—a microfluidic device capable of storing DNA chemicals as well as various reagents. A SlipChip can be an electrode whose charge changes in response to the presence/absence of a DNA sequence.
In 2022, the synthetic biology team of Tianjin University successfully stored 10 selected Dunhuang murals into DNA, and said that the information of these murals can be preserved for thousands of years at room temperature and 20,000 years at 9.4°C.
03
DNA storage technology endorsed by giants
Even though DNA storage technology may have cross-age significance, can it really be applied? In this regard, giants in the storage industry have a positive attitude. Gurtej Sandhu, senior researcher and vice president of Micron Technology, was one of the first project team members involved in the research of DNA storage technology. In 2016 he participated in Harvard University’s George M. Church’s research group. Seagate has brought Catalog’s DNA storage technology into its “lab-on-a-chip.” Seagate’s DNA storage and microfluidics research project has been going on for two and a half years, and there are currently four known patent applications.
The company that cooperates with Seagate is an American start-up company established in 2016. Catalog once produced DNA fragments of 20-30 base pairs, stitched these fragments together with enzymes, and arranged them in different sequences to achieve data storage. Catalog has used DNA technology to store the novel “The Hitchhiker’s Guide to the Galaxy” and the poem “The Road Not Taken”.
Storage giants are optimistic about DNA storage technology, but there are more start-up companies with biotechnology as the core on the DNA storage track. The core reason for this phenomenon is that the underlying key technologies of DNA storage technology are actually DNA sequencing technology, DNA synthesis technology and DNA storage technology.
In addition to Catalog, which cooperates with Seagate, the main companies in DNA data storage technology also have American startup Iridia. Iridia was founded in 2016 to develop the world’s first commercially attractive DNA-based data storage solution. By combining DNA polymer synthesis techniques, electronic nanoswitches, and semiconductor fabrication techniques, the company is developing a highly parallel format to enable arrays of nanomodules with the potential to store data at extremely high densities.
Companies with DNA synthesis technology include French company DNA Script and American company Molecular Assemblies.
Founded in 2014, DNA Script focuses on the manufacture of synthetic DNA using a proprietary template-free technology. The development of new applications such as new therapeutics, sustainable chemical production, improved crops, and data storage can be greatly accelerated through rapid, affordable, and high-quality DNA synthesis. The company’s unique enzymatic technology and nucleotide chemical synthesis platform can synthesize longer DNA sequences with higher purity, which increases the accuracy of the sequence by 500 times, the synthesis speed is faster, and the time-consuming is shortened by 50 times.
Founded in 2013, Molecular Assemblies develops enzymatic DNA synthesis technology that can power new products in the fields of industrial synthetic biology, personalized therapy, precision diagnostics, information storage, and nanotechnology. The company’s proprietary DNA synthesis method is designed to provide cost-effective, sustainable production of high-quality, sequence-specific DNA.
Founded in 2013, Twist Bioscience is committed to providing high-throughput DNA synthesis and sequencing services to customers in the fields of medicine, agriculture, industrial chemicals and data storage. The semiconductor-based synthetic DNA manufacturing process developed by the company reduces the reaction volume by 1 million times while increasing the yield by 1000 times, thereby fully synthesizing 9600 genes on a single silicon wafer. In 2016, Microsoft signed an agreement with Twist Bioscience to order about 10 million DNA products for testing DNA data storage capabilities.
DNA sequencing companies mainly include British company Oxford Nanopore Technolog and so on. Oxford Nanopore Technologies was established in 2005 to develop disruptive electronic single-molecule sensing systems based on nanopore science. Oxford Nanopore Technologies has developed a new generation of sensing technology that uses nanopores – nanoscale pores – embedded in high-tech electronics to perform comprehensive molecular analysis.
In China, in 2019, Huawei announced the establishment of a strategic research institute, saying that it mainly researches and develops cutting-edge technologies, including DNA storage. At the 2021 Huawei Global Analyst Conference, Xu Wenwei, director of Huawei and director of the Strategic Research Institute, said that DNA storage will be used to break through the super-large storage space model and coding technology, and break the capacity wall.
On May 26, 2021, Zhongke Carbon (Shenzhen) Biotechnology Co., Ltd. (C-ATOM) was officially established. In September of this year, Zhongke Carbonyuan relied on the previous accumulation in the field of DNA storage of researcher Dai Junbiao of the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. The synthesizer and sequencer successfully completed the complete process of the DNA storage technology path from encoding, synthesis, preservation, sequencing, to decoding.
DNA storage technology spans the ages
04
Challenges and Potential of DNA Storage
At present, there are still some technical difficulties in the implementation of DNA storage technology. Fan Chunhai, an academician of the Chinese Academy of Sciences, said that in the synthesis process of DNA storage, the efficiency of data input and reading is still not high, which takes a long time and costs high. Yuan Yingjin, academician of the Chinese Academy of Sciences and vice president of Tianjin University, said that DNA information storage is an emerging research direction that is deeply interdisciplinary and integrated. In order to commercialize DNA storage technology, research teams from various fields need to work together to tackle key problems.
If only cost is an issue, then this can eventually be resolved. There is no doubt that DNA storage is one of the most potential data storage methods.