In this series, “MPEG-G, can I eat it?” we would like to introduce and illustrate the main features of the new ISO international standard for genomic data representation over the next couple of weeks as it is at the core of all of our solutions. Each week we will present this new and unique standard by focusing on what it is, who developed it, its history, and its benefits in terms of storage, genomic data processing, interoperability, and data protection.
Today we start with the definition of MPEG-G, how it is built, who participated, its purpose, and its history.
What is MPEG-G?
MPEG-G (ISO/IEC 23092) is a series of ISO international standards for the representation of genome sequencing data and associated metadata. The standard MPEG-G aims to provide a framework for developing interoperable applications towards genuinely efficient and economical handling of genomic information.
The MPEG-G Standard consists of six parts:
- Part 1: File and Transport Format – The technology to transport and access data (more details)
- Part 2: Genomic Information Representation – The compressed representation (more details)
- Part 3: APIs and Metadata – The standard interfaces with DNA data applications and legacy formats Metadata for content protection and annotation
- Part 4: Reference Software – The standard support to the implementation of applications
- Part 5: Conformance – The methodology to test compliance with the standard
- Part 6: Annotations – A unified compressed format for high level analysis
Each part of the standard focuses on one distinct area of the specification to enable more straightforward implementation and testing of compliant devices. For example, in MPEG standards, usually, one part is devoted to the reference software implementation and another to the conformity tests. Technical parts are organized so as to mimic the typical architecture of a conformant device. New parts can be created to address new needs manifested by the industry during the standard series lifetime.
What is an international standard?
International Standards are technical specifications developed by experts belonging to international industrial and academic organizations under the coordination of entities such as ISO, the International Organization for Standardization. ISO is the world’s largest developer of voluntary international standards and facilitates world trade by providing common standards between nations
MPEG-G is, therefore, the first and only standard specification published by ISO for the compression and transport of Next Generation Sequencing data.
Who developed it?
The new standard was developed by the Moving Picture Experts Group (MPEG), one of the most prolific ISO working group with more than 1’700 delegates from over 40 countries worldwide. This group has been developing globally used digital media encoding, transmission, and processing standards for more than 30 years. Any individual or organization can join the ISO/MPEG-G working group to keep the spirit for an open standard truly alive.
Why was it developed?
The technological breakthroughs of the last decade in the field of genetics, especially the advent of next-generation sequencing (NGS), have produced vast volumes of raw data. At the same time, ICT costs for storing, transmitting, and processing DNA sequence data and related information are rising due to the increased amount of data.
Genetics is a relatively young field in medicine. The most widely used formats have been developed on the go without a standardized approach in developing a standard, as typical for any other ISO standard in other industries. In the absence of universal standards, there is a risk that timely application of effective treatments might experience a delay in an ever-growing interconnected world.
The solution to that potential threat was leveraging the expertise of the ISO working group MPEG and their previous success in the video industry – making online video streaming possible – and adapting it to genomic data. The vision of MPEG-G was born, a standard for genomic data representation that provides a high compression rate, faster processing time, highly interoperable and secure built-in protection for the precious DNA information.
A brief history of MPEG-G
The development of MPEG-G has been a most recent undergoing, and to shine a light on it, we would like to provide you with a brief overview of the history of MPEG-G.
Graphic 1 – Timeline of the development of the MPEG-G standard
New editions of parts 1 and 2 were published between 2020 and 2021 and will be published to implement fixes and enhancements to the specifications.
A brief outlook on MPEG-G’s journey
Our highly talented Chief of Technology and Co-Founder of GenomSys and key contributor to the MPEG-G standard, Claudio Alberti:
“In the next years, MPEG-G will evolve to integrate new types of genomic data (e.g., genomic annotation, 3D contact matrices, gene expression data) and implement compression techniques that are more efficient in terms of speed, resources consumption, and indexability. Other areas of exploration may include the processing of other “omics” data such as transcriptomics and proteomics data. Still, to be able to do so, the data formats need to stabilize and become less heterogeneous.”
About Claudio Alberti:
He was and still is at the origin and a major contributor of MPEG-G, the new ISO/IEC standard for genome sequencing data representation. After receiving an engineering master’s degree from the Polytechnic School of Milan (Italy) and a Ph.D. from EPFL (Lausanne, Switzerland), Claudio participated in several standardization activities within ISO/IEC and designed and developed solutions for digital media processing and information security. He has collaborated with genomics competence centers such as the Swiss Institute of Bioinformatics, the James Hutton Institute (UK), and the Carl R. Woese Institute for Genomic Biology (U. Illinois at Urbana Champaign) on the development of MPEG-G compliant genome processing applications.
Stay tuned for our next article in the series “MPEG-G, can I eat it?” on the ISO standard’s compression rate next week.
By Lucas Laner and Claudio Alberti on March 02, 2022.
Picture Source: Furiosa-L / pixabay