With MPEG-G it is possible to encapsulate in the same compressed file several related studies that can be accessed separately. Additionally, transversal queries on all the compressed studies are also possible (e.g. “select chr1 of all compressed samples”). This is particularly useful when a study is performed on large populations of individuals of the same species or when the same individual is sequenced/analyzed several times during her life time.
Furthermore, when aggregating several datasets from the same individual/species, the available reference sequences (e.g. human genome) should be stored only once with a significant gain in storage efficiency. These aggregation capabilities enable also storing updated versions of an individual’s genome assembly only as differences with respect to the first encoded genome. This is particularly useful when new genome assemblies are produced for an individual because a new assembly tool was used or new sequencing data are available.