The GenCoder project aims at implementing the first prototype of MPEG-G compliant encoder and decoder for efficient compression, storage, transport and analysis of genome sequencing data.
While genomics analysis is poised to become the major generator of big data by 2025, with already 2 million genomes sequenced so far thanks to Next-Generation Sequencing (NGS) devices, the stakeholders involved in genomic data analysis and management (research and clinic centers, bio-banks, genome service providers) have to face two problems:
- the increasing costs of data storage (on average 850 €/TeraByte per year, which means several million € for large data repositories) and
- the lack of systems interoperability due to poorly specified interfaces, which prevent the efficient data transport and sharing needed to perform analysis on large heterogeneous datasets.
GenomSys is participating to the process of standardization of MPEG-G, the new ISO standard for genomic information representation, and it is implementing the first MPEG-G compliant encoder and decoder. MPEG-G main advantages are:
- Enhanced compression: from 50% to 100x compression according to the selected coding mode.
- Processing time reduction: for a typical genome analysis up to a 50x factor with respect to current practices, thanks to selective and rapid access to specific blocks of data and metadata.
- Open process of technology specification: the ISO process of standards development offers enterprise-grade technology specifications and long-term support and maintenance.
SME Phase 1 project
The GenCoder project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement 827840
The technical feasibility study aims at optimizing and validating the performance of the GenCoder software for encoding and streaming genomic data in real application scenarios to deliver a software library compliant with the MPEG-G ISO Standard specifications. The work will involve two partners with significant genomics datasets (more than 2 PB each). The current GenCoder libraries will be tested on data compressed in different formats (gzipped FASTQ, BAM, CRAM), from different species (human, animal, plant, bacteria) as well as cancer cell lines, RNAseq data in order to assess Key Performance
Validation of the business model, the revenue streams, and the pricing strategy aiming at setting adequate price points for all the customers categories. This will be done through the definition of business cases involving real customers in various application scenarios (research centers and hospitals, biotech and pharma companies, data repositories).
Dr. Iordanis Arzimanoglou has visited us from Oct. 31st to Nov. 2nd within the scope of the SME Instrument phase 1 coaching service. During 3 days he has provided us with an extensive overview of global trends in genomics, genetics, Next Generation Sequencing and a wide range of funding opportunities from both public institutions and private sources. Additionally, he helped us in improving our communication materials in order to better express the unique selling points of GenomSys technology and solutions. He was as well very helpful in revising and understanding the impact of current and future legislation in the domain of medical software and devices certification. Finally we got some very interesting contacts for potential partnerships and collaborations both at a national and international level. We are very happy and grateful for having met him and we definitely recommend him as a coach.