While genomics analysis is poised to become the major generator of big data by 2025, with already 2 million genomes sequenced so far thanks to Next-Generation Sequencing (NGS) devices, the stakeholders involved in genomic data analysis and management (research and clinic centers, bio-banks, genome service providers) have to face two problems:
- the increasing costs of data storage (on average 850 €/TeraByte per year, which means several million € for large data repositories) and
- the lack of systems interoperability due to poorly specified interfaces, which prevent the efficient data transport and sharing needed to perform analysis on large heterogeneous datasets.
GenomSys is participating to the process of standardization of MPEG-G, the new ISO standard for genomic information representation, and it is implementing the first MPEG-G compliant encoder and decoder. MPEG-G main advantages are:
- Enhanced compression: from 50% to 100x compression according to the selected coding mode.
- Processing time reduction: for a typical genome analysis up to a 50x factor with respect to current practices, thanks to selective and rapid access to specific blocks of data and metadata.
- Open process of technology specification: the ISO process of standards development offers enterprise-grade technology specifications and long-term support and maintenance.
SME Phase 1 project
The GenCoder project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement 827840