By Luca Trotta on April 07, 2021

Genetic diagnosis of rare diseases

As described in last month’s article, the clinical recognition and treatment of patients affected by rare diseases can be often challenging, and the use of next-generation sequencing (NGS)-based methods proved as effective support for diagnostic, preventive, and therapeutic strategies.

Gene-panels (targeted sequencing), whole-genome sequencing (WGS), and whole-exome sequencing (WES) are the most widely adopted methods for the identification of rare disease-causing variants in clinical settings. ‘Genotype-first’ approaches like WES and WGS are efficient for tackling rare diseases even in the absence of strong clinical suspicions, or in presence of atypical manifestations, or within the occurrence of novel disease-causing variants.

Next-generation sequencing (NGS) defines massively parallel sequencing technology that in the last decades has revolutionised genomic research. NGS provide high-throughput and scalable methods supporting a wide set of applications in research and diagnostic.

Gene-panels are tests targeted to the analysis of a set of genes, or genetic regions or variants that have known or suspected associations with the studied disease or phenotype.

Whole-genome sequencing (WGS) is a sequencing technique for the analysis of the entire genome.
Whole-exome sequencing (WES) is a sequencing technique for the analysis of all the protein-coding regions in genome (i.e., exome).
The Genotype defines the individual’s genetic characteristic.
The Phenotype refers to an individual’s observable physical traits.

All humans share full sequence identity for about 99.9% of their genomes (7). The remaining 0.1%, along with environmental factors could determine the differences among individuals. The differences in sequence can be considered as genetic variants. Variation can occur in different forms, involve different regions of the genome, and have different extent on human phenotypes.

Genetic variation can be defined from nucleotide- to the chromosome-level perspective. The changes affecting nucleotides, the monomers of DNA strands, are single-nucleotide variants (SNVs). Small insertions or deletions (indels) are instead gains or losses of multiple nucleotides (less than 50bp). Copy-number variants (CNVs) are DNA segments present in a number of copies different than the reference (duplications or deletions). An individual human genome could present an estimated amount of ~4-5 million variation sites, mostly (>99,9%) represented by SNVs and short indels (The 1000 Genomes Project Consortium 2015) (8).

The Online Mendelian Inheritance in Man OMIM® database ( lists variants in more than 3’000 genes linked to more than 6’000 clinical phenotypes, including both single-gene disorders and susceptibilities to cancer and complex diseases. The Human Gene Mutation Database (HGMD®) lists ~180’000 disease-associated variants (10).

The consequence of variants occurring in protein-coding genes can be evaluated on the resulting protein structure. Missense (or non-synonymous) variants could cause the substitution of one amino acid with a different one; Loss of Function (LoF) variants could determine reduced or abolished protein function (introducing premature stop codons, removing physiological splice sites or inserting aberrant ones, or interrupting the amino acid sequence reading frame).

In addition to the inherited (germline), variation, other (de novo) variants manifesting without apparent parental inheritance can play a significant role in disease pathogenesis. De novo mutations arising in the germ cell of one of the parents, or in the fertilized egg during early embryogenesis, with an estimated frequency of ∼1,5×10−8 SNVs per site (11). If de novo variants occur after conception in cell lines other than the germline they become somatic. Somatic variants will be only carried by that specific cell population and can contribute to several types of cancer disorders.

DNA is the molecule that carries all the information about the development and functioning of living things. At the molecular level, DNA is composed of two strands coiled around each other shaping a double helix. Each strand consists of multiple basic units defined as nucleotides. At the cellular level, DNA is “compressed” into chromosomes and lie within every human cell core.

Nucleotides are the basic building block of nucleic acids. RNA and DNA are made of long chains of nucleotides. A nucleotide consists of a sugar molecule attached to a phosphate group and a nitrogen base. When forming a double-helix, the bases on adjacent poly-nucleotide chains pair reciprocally (adenosine [A] with thymine [T] and cytosine [C] with guanine [G]. In RNA, the base uracil (U) takes the place of thymine.)

A chromosome is the organized package of DNA within a cell. DNA chains are organised into complexes with proteins (histones), and the resulting chromatin is tightly packaged in increasing levels of complexity, to the highly-coiled chromosomal structures. Each species has a characteristic set of chromosomes with respect to number and organization. For example, humans have 23 pairs of chromosomes–22 pairs of numbered chromosomes called autosomes, 1 through 22, and one pair of sex chromosomes, X and Y. Each parent contributes one chromosome of each pair to an offspring.

Genes are the basic physical and functional units of heredity. The genetic information is organized into DNA segments displaying a specific biochemical function, also defined as coding regions, encoding for the synthesis of a gene product (protein or non-coding RNA). Genes are passed from parents to their children and contain the information needed to specify traits.

Mitochondrial DNA is the DNA located in the mitochondria in most eukaryotic organisms, organized within a single small circular chromosome. The mitochondria are sub-cellular organelles generating most of the cells’ supply of chemical energy. The mitochondria, and thus mitochondrial DNA, are passed from mother to offspring.

RNA or ribonucleic acid, is a nucleic acid that is similar in structure to DNA but different in subtle ways. RNA has only one strand, and contains u instead of T. The cell uses different RNA molecules for a number of different tasks, as the transfer of information from the genome into proteins by transcription and translation, catalyzing biological reactions, controlling gene expression, or sensing and communicating responses to cellular signals.

The Human Genome Project (HGP) was the international, collaborative research program (1990-2003) whose goal was the complete mapping and understanding of all the genes of human beings. All our genes together are known as our “genome.”

A total of 6’000-8’000 mostly monogenic (Mendelian) rare diseases have been estimated. Despite being individually rare, Mendelian diseases collectively exert a large burden on public health due to the large number of patients affected (see our previous article: Insight on Rare Diseases).

Given the mostly monogenic etiology, Mendelian disorders have always represented a forefront for genetic studies. Understanding the genetic bases of rare diseases can improve the clinical practices, and shed light on the biological mechanisms determining clinical conditions (12). Nonetheless, the low prevalence of Mendelian disorders has always represented an obstacle to identifying the causes, and the overlap of clinical symptoms among different conditions impairs the diagnostic procedures, often long, erroneous, and/or inconclusive. Detecting the molecular bases of Mendelian disorders can enable identify the specific conditions, reducing the difficulties that patients experience, and support improved medical practices (12,13).

Etiology is the cause of a disease or abnormal condition.
Etiology is the cause of a disease or abnormal condition.
Pathogenesis is the manner of development of a disease.
Prevalence is the proportion of a population who have a specific characteristic (or the number of cases) in a given time period.
Monogenic diseases are caused by variation/s in a single gene and are typically recognized by their striking familial inheritance patterns.
Multigenic diseases are caused by variations of multiple genes that affect a single phenotypic trait.

NGS-based methods used for the study and the diagnostics of rare diseases

Technological breakthroughs over the past decade drastically improved the genetic testing methods, supporting the automation level of sequencing processes and reducing the cost per analysis. Next-generation sequencing (NGS)-based methods, introduced in 2005, allow for parallel sequencing of multiple genes, as the whole protein-coding region (whole-exome sequencing, WES) or whole genomes (whole-genome sequencing, WGS) (1).

The ever-increasing adoption of NGS-based methods in clinical settings has dramatically advanced the identification of the genetic causes of Mendelian phenotypes, providing support for diagnostic, preventive, and therapeutic strategies, and supporting the development of personalized medicinal approaches is expected in upcoming years (14, 15). Nonetheless, the use of NGS brings concerns and drawbacks associated with the demanding process of storing, processing, analyzing, and interpretation of the massive amount of generated genomics data (1).

For the clinical use of NGS, it is crucial to select the testing applications that are most appropriate to the specific diagnostic requirements. Gene-panels (targeted sequencing), WGS, and WES are among the most widely adopted methods in clinical genetics.

Targeted approaches are used with clinical phenotypes suggestive of a specific genetic etiology (‘phenotype first’ approach). The genes of interest are targeted using panels, generally screening only the coding sequence of the candidates. (1). Targeted approaches are time- and cost-effective due to the low yield of generated data. On the other hand, the largest limitation is the inability to identify variants that are not included in the target design. This means that the application of target approaches totally relies on the (known) genotype-phenotype correlations, and therefore, on accurate initial clinical diagnosis. In addition, there is limited power in deciphering genetically heterogeneous conditions and in identifying new disease genes.

WGS and WES can be used instead with a ‘hypothesis-free approach’, allowing the detection of disease-causing mutations even in cases of clinical presentations not conclusively matching known conditions. By targeting a wider set of low-prevalence conditions, WGS and WES can be of use for a comprehensive and timely unraveling of genetic causes of rare diseases.

WGS allows to identify variants genome-wide without target selection and/or enrichment strategies, providing more uniform coverage and the highest diagnostic yield among NGS methods, including variations outside the reach of WES (intronic, noncoding RNA genes, small CNVs). However, WGS presents drawbacks due to the costs and the demanding requirements for data analysis, interpretation, and storage, still preventing its routine application in diagnostic setups, (Lionel et al. 2017).

WES targets only about 1% of the human genome corresponding to the coding region, drastically reducing the sequencing costs and drawbacks associated with the data analysis and interpretation o non-coding variation. Yet, WES provides a relevant diagnostic yield since coding variants are predominant in the etiology of monogenic diseases (2,15).

Nonetheless, specific technical limitations or specific genomic features (e.g. target design, selection bias, repetitive sequences, homopolymers, high-GC content) could hamper the completeness and uniformity of the coverage of phenotype-related genes (1,16).

What is possibly the best choice for the study and the diagnostics of rare diseases?

Overall, WES can be preferred for the routine diagnostics of rare diseases given that almost 85% of the disease-causing variants lie within the coding region, and due to WGS current limitations (cost, low coverage, analysis, and storage efforts).

Despite decreasing over time, WGS sequencing costs remain prohibitively expensive for most researchers – almost twice than WES – while the diagnostic utility of the two methods did not show significant differences in monogenic diseases (17). The clinical significance of non-coding variants is more complicated to assess than coding ones, and their amount (hundreds of times more than in the coding regions) makes data handling and analysis more demanding. WES instead can provide an optimal balance, becoming widely adopted in research and diagnostic contexts, as the most cost-effective approach for the investigation of the coding regions of a genome (14).

(1) Jamuar SS, Tan EC. Clinical application of next-generation sequencing for Mendelian diseases. Hum Genomics 2015 Jun 16;9:10-015-0031-5.
(2) Shen T, Lee A, Shen C, Lin CJ. The long tail and rare disease research: the impact of next-generation sequencing for rare Mendelian disorders. Genet Res (Camb) 2015 Sep 14;97:e15.
(3) ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 Sep 6;489(7414):57-74.
(4) Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, et al. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet 2014 Nov 15;23(22):5866-5878.
(5) 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature 2015 Oct 1;526(7571):68-74.
(6) Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016 Aug 18;536(7616):285-291.
(7) Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The Sequence of the Human Genome. Science 2001 American Association for the Advancement of Science;291(5507):1304-1351.
(8) Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001 Jan 1;29(1):308-311.
(9) Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 2005 Jan 1;33(Database issue):D514-7.
(10) Krawczak M, Cooper DN. The human gene mutation database. Trends Genet 1997 Mar;13(3):121-122.
(11) Turner TN, Coe BP, Dickel DE, Hoekzema K, Nelson BJ, Zody MC, et al. Genomic Patterns of De Novo Mutation in Simplex Autism. Cell 2017 Oct 19;171(3):710-722.e12.
(12) Dodge JA, Chigladze T, Donadieu J, Grossman Z, Ramos F, Serlicorni A, et al. The importance of rare diseases: from the gene to society. Arch Dis Child 2011 Sep;96(9):791-792.
(13) von der Lippe C, Diesen PS, Feragen KB. Living with a rare disorder: a systematic review of the qualitative literature. Molecular Genetics & Genomic Medicine 2017;5(6):758-773.
(14) Rabbani B, Tekin M, Mahdieh N. The promise of whole-exome sequencing in medical genetics. J Hum Genet 2014 Jan;59(1):5-15.
(15) Chong JX, Buckingham KJ, Jhangiani SN, Boehm C, Sobreira N, Smith JD, et al. The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. Am J Hum Genet 2015 Aug 6;97(2):199-215.
(16) Ruiz-Schultz N, Sant D, Norcross S, Dansithong W, Hart K, Asay B, Little J, Chung K, Oakeson KF, Young EL, Eilbeck K, Rohrwasser A. Methods and feasibility study for exome sequencing as a universal second-tier test in newborn screening. Genet Med. 2021 Jan 13. doi: 10.1038/s41436-020-01058-w. Epub ahead of print. PMID: 33442025.
(17) Clark, M.M., Stark, Z., Farnaes, L. et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. npj Genomic Med 3, 16 (2018).

Picture: gagnonm1993/ pixabay

Schedule a call