When you talk about data security in genetics, it means protecting the most sensitive information that makes us living beings. Our DNA includes the building plan of our bodies, and variations within these 3 billion base pairs can reveal information, so intimate, even more than any other health information, as it affects not only us but also our families. It is currently possible to identify an individual within the whole world population solely based on genomic data and 75 statistically independent single nucleotide variants (SNVs).[1] So, commonly used anonymization techniques in healthcare cannot be applied to provide complete protection as the raw genomic sequence data is already giving hints to a person, and anonymization of the sequence would alter the actual information.

The urgency to develop plans to protect genetic data has increased with the latest trends in genetic testing. Over the past seven years, Direct-to-Consumer (DtC) genetic testing services have grown, mainly driven by U.S. players pioneering this new market. They are offered to any individual – particularly in countries with less strict data protection regulations – to learn more about their genetic heritage, traits, and potential underlying genetic risk for developing diseases. This DtC genetic testing market has grown since 2014 from $54 million and is expected to reach $340 million this year.[2] With two players, namely AncestryDNA with an estimated 14 million customers and 23andme with 9 million in 2019.[3]

GenomSys - MPEG-G data security - DtC Market size Growth  GenomSys - MPEG-G data security - Growth of Consumer in Genetic Testing

 

 

 

 

 

 

Graphic 1 & 2 – Market size growth for DtC genetic testing (yellow)[2] and growth of consumers for the two major providers of these services (green).[3]

However, the growing interest in genetics amongst the broader public is admirable. DtC genetic tests allow individuals to learn more about how their genes affect their health, increase awareness for scientific or genetic research, and particularly in terms of heritage analysis – although the actual scientific value could be controversial – can have people engage with science more entertainingly. But these tests and the results are based on data about our most intimate information source, our DNA, and thus protection is vital.

Here is where the business models of these large D2C genetic testing service providers cast a shadow on the beforementioned benefits. Despite user consent becoming mandatory for sharing customers’ info and genomic data, there are plenty of recent examples of how current genetic data privacy and security are inefficient today. Recently, a big DtC player, which became public last year, officially and openly declared that its intention and business model is to leverage its users’ genomic data for pharmaceutical research and other studies charged to pharmaceutical companies from which they derive most of their revenues.[4]

For that reason, the MPEG-G standard, being a representation for genomic data, includes built-in security attributes to provide a standard for the genetic field worldwide to handle this sensitive information with the utmost care and protection of DNA data.

What is built-in that makes MPEG-G secure?

Before we go into details about MPEG-G’s built-in security structure, it is essential to mention that a sole data format does not guarantee the full protection of data. The combination with the appropriate and legally mandatory management of patients’ consents, the use of cryptographic techniques, and further protection techniques are needed to guarantee privacy and protection of genomic data.[5]

The MPEG-G (ISO/IEC 23092) format developed by the MPEG Standardization Committee provides security and protection, which were considered from the beginning through a privacy-by-design approach in 2014. MPEG-G can represent – coming from several proposals by different research organizations –the exact same information as legacy formats and necessary requirements from stakeholders along the genomic value chain to provide an authentic and practical solution for handling genomic data in a safe and protective manner.[5]

The format’s security feature comes from MPEG-G’s hierarchical representation of genomic information. This hierarchy contains several levels:

  • Genomic Study (Dataset Group in MPEG-G | e.g., a population study or a family study),
  • Genomic Dataset (Dataset in MPEG-G | e.g., the DNA of a single person in the population or in the family),
  • Genomic Region (set of Access Units in MPEG-G | e.g., coding region(s) of a gene), and
  • Genomic Data (Block in MPEG-G | e.g., a specific information descriptor, such positions of reads with regards to the reference assembly).

Each level can be individually protected by the specification of protection elements providing encryption and authentication strategies. That makes it possible for genomic data, formatted in MPEG-G, to be protected not through a standardized security protocol but rather a syntax that provides flexibility on what to protect and  can be applied to the individual need.[5]

GenomSys - MPEG-G Hierarchy Structure - data security

Graphic 3 – MPEG-G Hierarchical structure that makes it possible to encrypt and provide access to different levels of genomic data.[5]

Based on this, MPEG-G’s security strategies – defined in part 3 of the format – provide confidentiality and integrity to user-specified elements in the file and access control for specific methods of the Application Programming Interface (API).[5]

So, data protection is achieved in MPEG-G due to the hierarchical and modular structure of the file. This allows the realization of fine-grained controlled access, and depending on the user’s role, each of them can receive different access rights. For example, after an individual has agreed to have its exome sequenced and analyzed, the resulting MPEG-G file makes it possible to share either the entire data or portions with third parties. As the legal owner of the file and its information, the individual can then delegate access, using an application that runs on MPEG-G, to either the entire file or portions depending on the necessity through the format’s built-in roles segregation. Potential third-party users can be the individual’s general practitioner needing genetic information to define an appropriate treatment strategy or research groups for their genomic research. In both cases, consent forms are still required, but the MPEG-G format allows future applications to efficiently establish genomic data protection through self-ownership of the owner of the DNA information and not through an intermediary.

GenomSys is putting your DNA always in your hands – real self-ownership for maximum data protection

At GenomSys, we believe that self-ownership of genomic data is essential to minimize privacy concerns and data misuse that could potentially lead to privacy infringement for individuals and that technology can help combine high levels of privacy and genomic data sharing for legitimate use. Our core conviction is also shared by most relevant authorities and institutions, and we are seeing a growing interest and efforts investigating privacy-preserving and sharing technologies (i.e., from NHS).[6]

About the development of DtC genetic testing, our GenomYou app offers the possibility to deliver self-ownership to the genetically interested individual in a secure way using technologies such as the MPEG-G format coupled with the convenience of a smartphone app. The interaction of the built-in security feature, the selective access, and the compression benefit from the MPEG-G format is vital to make this possible. Its basic hierarchical structure, which allows the encryption of even the most minor sections and makes them available to different users (e.g., treating physicians, research groups) by means of individually defined keys, provides a high level of data security. This access is then only possible through the Selective Access feature that makes the processing time for accessing the genomic data faster and more practical without overloading the smartphone’s processor. And last but not least, MPEG-G’s compression benefit is the bases to even fit large genomic datasets such as from Whole-Exome Sequencing or Whole-Genome Sequencing onto the smartphone without maxing out the storage space.

GenomSys - MPEG-G data security - GenomYou

Graphic 4 – MPEG-G’s benefits allow new ways of handling genomic data with a different approach establishing maximum data privacy in the 21st-century convenient way.

And this is how we here at GenomSys want to deliver on our promise to democratize genomics by putting your DNA always in your hands, so you are in charge of protecting your most personal information – your DNA.

 


By Lucas Laner on August 31st, 2022.

[1] Bonomi, L., Huang, Y. & Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat Genet 52, 646–654 (2020). https://doi.org/10.1038/s41588-020-0651-0
[2] Katharina Buchholz; Consumer Genetic Testing Grows in Popularity (2019). https://www.statista.com/chart/19996/size-of-global-direct-to-consumer-gentic-testing-market/
[3] Sarah Feldman; Consumer Genetic Testing Is Gaining Momentum (2019). https://www.statista.com/chart/17023/commercial-genetic-testing/
[4] Kristen V Brown; All Those 23andMe Spit Tests Were Part of a Bigger Plan (2021). https://www.bloomberg.com/news/features/2021-11-04/23andme-to-use-dna-tests-to-make-cancer-drugs
[5] Delgado J, Llorente S, Naro D. Adding Security and Privacy to Genomic Information Representation. Stud Health Technol Inform. 2019;258:75-79. PMID: 30942718.
[6] Xinghua Shi,Xintao Wu; An overview of human genetic privacy Ann N Y Acad Sci. 2017 Jan; 1387(1): 61–72. Published online 2016 Sep 14. https://doi.org/10.1111/nyas.13211

Picture Source: GLady/ pixabay

Schedule a call