Enhanced selective access to compressed data

MPEG-G supports several types of selective access to compressed genomic data that are important in genomic analysis. The data structure specified by MPEG-G supports partitioning and efficient querying of data according to the characteristics of the encoded genome sequencing reads. For instance, an analyst can filter out the reads in an MPEG-G file according to the following criteria:

For unmapped reads
- patterns of nucleotides of interest contained in the sequences.
- quality criteria such as those produced by tools like FastQC
For mapped reads
- all reads mapped in a genomic region specified in terms of sequence (e.g. chromosome) and a start and end mapping position
- reads perfectly matching the reference genome
- reads with substitutions only (no indels or clipped bases), with an optional maximum number of substitutions
- reads with substitutions, indels and clipped bases with no mismatches

The selected regions (potentially scattered along several chromosomes) can be labelled with a single identifier for further retrieval with a single query. This way, the analyst can easily save her work and resume it at a later time by quickly accessing only the sub-regions of interest selected in the previous sessions.

Furthermore, MPEG-G enables also to perform the same selective access actions directly on remote content, for instance to efficiently carry out large population studies over genomic databases on the web

Enhanced selective access to compressed data

Contact

GenomSys Newsletter

Quality Management

Company supported by

Follow us on

Enhanced selective access to compressed data

Contact

GenomSys Newsletter

Quality Management

Company supported by

Follow us on

Schedule a call