Genomic Data Compression

GenomSys is collaborating with VItal-IT, the High Performance Computing center of the Swiss Institute of Bioinformatics, and the EPFL on a project that plans the development of the technology for a new generation of tools and devices providing efficient compression, storage, transport and manipulation of genomic data. The new technology will be the backbone of a class of products conceived to implement genomic applications with new features for data access and handling as well as with a dramatic reduction of storage costs supporting transfer from sequencing facilities to storage and/or analysis sites.


This project aims at developing the tools for efficient genomic information representation, compression and transport along three major axis:

  1. a new generation of genomic information compressors exploiting the company and partners’ expertise in digital information processing and entropy coding;
  2. a genomic information transport layer encapsulating the compressed genomic data and providing standard interfaces for efficient access and systems interoperability;
  3. technologies and tools in conformance with the ISO standard to be developed in the next two years with the collaboration of a large number of experts from international genomic R&D centers and genomics sequencing companies

The project will deliver

  1. A client server system able to compress/transmit/decompress genomic information data as soon as they are produced from a sequencing device. This will imply:
    • receiving raw reads as input,
    • compressing them in the ISO format,
    • encapsulating the compressed data in a ISO standard transport layer
    • sending the data on a IP network
    • receiving the data
    • decompressing for analysis or store them at the receiving end
  2. A Genomic information compression and decompression API able to expose most of the functionality provided today by the Samtools, the current benchmark for genomic data manipulation. Additional features such as
    • enhanced selective access
    • access control
    • signalling for transport
  3. An optimized version of the genomic compressor and decompressor running on an Intel Xeon Phi co-processor
  4. A demonstrator of the integrated system running with existing genomic analysis pipelines used in genomic research centres such as Vital-IT, the High Performance Computing Center of the SIB.
genomic data processing