Day 2, in anticipation of World DNA day on the 25th of April (this Sunday), today we provide you with further insights on the DNA molecule’s structure and its massive capability to store information.
We know now that over 70 years ago, the DNA structure was discovered and is the basis of every genetic diagnostic today (see the History of the discovery of DNA). We want to talk more about the structure, how DNA is so essential for the body, and its incredible capability to store information.
DNA is the material that includes all the knowledge on how our bodies function. Each piece of information is carried on a different section of the DNA – the so-called genes. The helical structure of the DNA is “compressed” into chromosomes and lie within every human cell core .
The genes are the sections where the specific information lies. For instance, genes encode for most of our proteins, regulating all kinds of functions inside our bodies, e.g., the transmission of nerve signals, building blocks of our cells, controlling our blood flow, and the construction plan within our DNA. Scientists estimate that this plan probably originated more than four billion years ago. Only for the last 30 years have we been able to decode it and use it to learn more about diseases and the overall goal to prevent them.
Although our complete DNA differs from each individual, we still share 99.5% of the 3 billion bases amongst us humans. The differences lie inside the order of the previously mentioned DNA bases. Compared to informatics, DNA can be looked at as a quaternary code made up of 4 bases (adenine, guanine, cytosine, and thymine), run by a script (body) to keep us alive.
Since the evolvement of genetic analysis through Next-Generation Sequencing (check out the overview by our CSO Luca), we have been transforming biologically stored information into electronic storage. The sequencing of an entire genome – 20 years ago, this undergoing took multiple years, and today you receive the results within a couple of days – generates a single legacy format file (FASTQ gzipped) of 67 GB straight from the sequencing machine.
Genetic analysis is prone to be a key accelerator for personalized medicine, the evolutional step for medicine towards tailored treatments to the individual. The more broad application of this preventive tool will lead to a largely increasing volume of genetic information. As for the management of any electronic data, addressing the storage for such an amount of genetic data is essential. With the high compression benefit of MPEG-G, an ISO-endorsed (ISO23092) genomic data format, a whole-genome file’s data size is reduced by more than 75%. This size reduction allows, for example, to store this sensitive data directly on any ordinary smartphone (64GB) and still to leave space for pictures, videos, and text messages from the family.
[a] Wikimedia Commons