Title
KidsGenomics - Next-generation genomics for rare diseases and pediatric cancers
Go Home
Category
Description
Next-generation genomics for rare diseases and pediatric cancers
Address
Phone Number
+1 609-831-2326 (US) | Message me
Site Icon
KidsGenomics - Next-generation genomics for rare diseases and pediatric cancers
Page Views
0
Share
Update Time
2022-07-06 02:57:20

"I love KidsGenomics - Next-generation genomics for rare diseases and pediatric cancers"

www.kidsgenomics.org VS www.gqak.com

2022-07-06 02:57:20

Skip to content Skip to primary sidebarMain navigationAboutResearchContactSubscribeSupportKidsGenomicsNext-generation genomics for rare diseases and pediatric cancersOrigins and Cautions for the gnomAD DatabaseMarch 1, 2022 by dkoboldt Leave a Comment High-throughput sequencing has accelerated human genetics research in countless ways, from rapid newborn sequencing to Mendelian disorders to the many discoveries being made in common, complex disease cohorts. Yet one of the most fundamental benefits of this technology is our ability to catalogue human genetic variation in large cohorts of individuals. For a long while, such catalogues necessarily focused on variation in human protein-coding genes. Coding regions harbor the vast majority of medically-relevant sequence variants — in part because their biological effects are easier to predict — but they only occupy a small 1-2% of the human genome.Whole-genome sequencing offers a far more comprehensive interrogation of genetic variation. The gnomAD database and more recently the Trans-Omics for Precision Medicine (TopMed) program have catalogued the genomes of tens of thousands of individuals. Unlike the 1,000 Genomes Project, which surveyed the genomes of numerous diverse/representative world populations without collecting any phenotypic data, most of the individuals in gnomAD and TopMed are part of disease studies. Even so, the sheer size of these cohorts has made them vital tools for human genetics in both clinical and research settings.Origins and Composition of the gnomAD DatabaseThe genome aggregation database (gnomAD), owing to is size and accessibility, has served as a vital resource for human genetics over the past several years. Like its exome-centric predecessor (ExAC), gnomAD’s catalogue was produced by compiling, harmonizing, and generating summary data across numerous large-scale sequencing projects. More than 60 projects contributed data to the gnomAD database. Most of these are disease cohorts such as TCGA (cancer), ADSP (Alzheimer’s), and T2D-GENES (diabetes). There are two major releases of gnomAD available with some key differences between them:Release:gnomAD v2.1.1gnomAD v3.1.2Assembly:build37/hg19GRCh38Genomes:15,70876,156Exomes:125,7480As you’ll noticed, the first key difference is the genome assembly: gnomAD v2.1.1, which has data from more individuals but is exon-centric, is on build 37. The more recent release, v3.1.2, is on the newer assembly (GRCh38) but contains only genomes. This dichotomy is understandable; it’s a lot more work to harmonize exome sequencing data because it was generated using multiple version of target enrichment kits from several different manufacturers, all with a slightly different definition of exome. However, it’s also unfortunate because neither release contains the maximum available information. In fact, it’s an often-cited reason by labs for not moving to the newer* genome assembly (GRCh38).*Newer is a relative term. GRCh38 was released in December 2013, more than 8 years ago.The Caveats of gnomADThe gnomAD database is a wonderful resource. I use it every day, as do many of my colleagues. Yet anyone who relies on gnomAD data for analysis and interpretation of genetic variants should be fully aware of its caveats.1. Not everyone in gnomAD is healthy.As I mentioned earlier, more than half of the contributing projects are disease studies (especially cardiovascular disease studies). Although individuals with severe pediatric disease and their first-degree relatives are excluded, “some individuals with severe disease may still be included in the data sets, albeit likely at a frequency equivalent to or lower than that seen in the general population.”2. Not everyone in gnomAD is youngIn fact, very few people are. According to a talk by the gnomAD team at ASHG 2017, the average age of a gnomAD individual is 54 years old. Maybe it should not have been given the types of disease cohorts that went into it, but it’s rather advanced, and brings up another important caveat.3. Some variants are somatic clonal mutationsThis came to attention a few years ago when researchers observed that gnomAD contained loss-of-function variants in key developmental genes that also happen to be frequently mutated in myelodysplastic syndrome.A great example of this phenomenon is ASXL1, a gene that encodes a member of the Polycomb group of proteins which are necessary for the maintenance of stable repression of homeotic and other loci. De novo loss-of-function mutations in ASXL1 cause Bohring-Opitz syndrome, a severe congenital malformation disorder, so it was initially puzzling to observe nonsense/frameshift/splice site variants in some individuals in the gnomAD database. Especially severe truncating mutations that would be expected to cause BOS if they were present in an embryo.Closer inspection of the aligned sequencing data for individuals carrying such variants indicates that they’re probably mosaic alterations, i.e. present in a subpopulation of cells. For example, two gnomAD individuals carry a variant at chr20-31021118-C-T (GRCh37) encodes a nonsense change (p.Gln373Ter) that’s Pathogenic in ClinVar. Scroll down on the variant page and you can see the aligned sequence data:Mosaic ASXL1 nonsense variant in gnomADGenome sequencing is generally done on DNA extracted from blood, and the older a person is, the more likely they have mosaic hematopoietic cell populations. The proliferation advantage of ASXL1 mutations allows them to reach appreciable allele frequencies in blood cells. As a result, we observe a fair number of recurrent LOF variants — such as p.Arg417Ter (9 heterozygotes), p.Arg404Ter (7 heterozygotes), and p.Arg965Ter (3 heterozygotes) — all of which appear to be mosaic clonal events. The gnomAD team even added a note to the page for ASXL1:Analysis of allele balance and age data indicates that this gene shows evidence of clonal hematopoiesis of indeterminate potential (CHIP). The potential presence of somatic variants should be taken into account when interpreting the penetrance, pathogenicity, and frequency of assumed germline variants. For more information, see pages 37-40 of supplementary information for The mutational constraint spectrum quantified from variation in 141,456 humans and Pathogenic ASXL1 somatic variants in reference databases complicate germline variant interpretation for Bohring-Opitz Syndrome.This is a reason that the presence of a variant in gnomAD — even in a few individuals — should be considered carefully when evaluating its pathogenicity.4. Many populations are underrepresented.The individuals in gnomAD were not selected as representatives of world populations, but rather as the groups selected for large-scale sequencing projects. Here’s the breakdown by population for the two major versions of the database:Individuals in gnomAD by population for the two major releasesUnsurprisingly, the majority are of Western European ancestry. There are also a large number of Finnish individuals, whose unique population history makes them especially valuable for genetic studies. Yet many other significant world populations are under-represented.Summary: gnomAD is great, use with appropriate cautionI’d intended to also talk about the TopMed database, but since I’m already a thousand words in I’ll have to save that for another time. In summary, the gnomAD database is a spectacular resource for human genetics. Like any resource, it is not without certain flaws. Anyone using it should be aware of these caveats and account for them in their analyses.Filed Under: Clinical Sequencing, Rare Diseases Prenatal testing, rare diseases, and the New York TimesJanuary 7, 2022 by dkoboldt Leave a Comment Some of the (many) recent advances in genetic testing are in the area of non-invasive prenatal testing, or NIPT. This form of genetic screening utilizes the blood of an expecting mother to screen for chromosomal abnormalities and other rare disorders in the fetus. It’s an area of intensive research at the moment, and also the subject of a high-profile article (probably paywalled) in the New York Times by Sarah Kliff and Aatish Bhatia. The story was published on New Year’s Day with the provocative title:When They Warn of Rare Disorders, These Prenatal Tests Are Usually Wrong.Essentially, it’s about the advent of NIPT in the United States, the companies who offer the tests, and the positive predictive value of the results. This excerpt captures the thrust of it:In just over a decade, the tests have gone from laboratory experiments to an industry that serves more than a third of the pregnant women in America, luring major companies like Labcorp and Quest Diagnostics into the business, alongside many start-ups.The tests initially looked for Down syndrome and worked very well. But as manufacturers tried to outsell each other, they began offering additional screenings for increasingly rare conditions.The grave predictions made by those newer tests are usually wrong, an examination by The New York Times has found.The NYT’s “examination” included, among other things:Pooling the results of five published studies on NIPT outcomesConducting interviews with genetic counselors and other healthcare professionalsReviewing the marketing materials of several commercial NIPT offeringsCollecting personal stories from patients who received false positive resultsOn the Accuracy of Rare Positive ResultsNotice the subtle and careful wording used in the headline and in the excerpt above:“When they warn of rare disorders”“The grave predictions made”In scientific terms, both of these phrases are referring to positive screening results, not all screening results. My response to the sensationalist claim that positive NIPT tests are often false positives? Of course they are. This is the nature of screening. I’m reminded of it every time I go through metal detectors at the airport.Image credit: New York TimesThe excerpt I provided is not the start of the article. This is the NYT and they know what they’re doing. The article opens with the story of a young pregnant woman whose NIPT returns a scary-sounding diagnosis for her unborn child (Prader-Willi syndrome) that turns out to be a false positive. It goes on to briefly summarize the rise of NIPT and its expansion to include increasingly rare disorders, especially ones associated with microdeletions.The outcome, as illustrated by numerous graphics like the one at right, is that for every 15 times such tests correctly identify a genomic alteration, they are wrong 85 times. However, what none of these visually striking graphics do tell you is that for every positive result there are thousands of negative results. In other words, the overall accuracy of NIPT is extremely high.Genomic Alteration Detection 101: Size MattersChromosomal abnormalities are a common cause of syndromic birth defects. The most prevalent of these, trisomy 21 (the cause of Down syndrome), has a prevalence of 1 in 664 newborns according to Smith’s Recognizable Patterns of Human Malformation (8th edition). Naturally, whole-chromosome or chromosome-arm abnormalities are also among the easiest things to detect by NIPT. Even with the technical challenges involved, such events leave massive footprints in the genome. That combination — large size and relative prevalence — is why early forms of NIPT that screened for trisomy 21 performed very well.The term microdeletion comes from the field of cytogenetics and it’s a bit deceptively named: it refers to genomic deletions that are too small to be identified by looking at chromosomes with light microscopy. Yes, these events are more challenging to detect by NIPT than losses or gains of entire chromosomes. Even so, such microdeletions can still be huge, i.e. millions of base pairs long, and encompass dozens or hundreds of genes. In the prenatal setting, they tend to cause severe syndromic disorders. Many such disorders are extremely rare, and as noted, they can be more challenging to detect with accuracy. So yes, false positives may be more likely. However, something glossed over rather quickly by the NYT reporters is this: a positive screening result is not a diagnosis. It’s a signal that more testing should be performed.Population Screening Benefits and ConsequencesIn biology in general, false positives for extremely rare things are expected because of the signal-to-noise ratio. A classic example that illustrates this is the identification of inherited versus de novo variants in a child.Every human carries about 4-6 million sequence variants relative to the reference sequence. The vast majority of those are inherited from one’s parents. Most, in fact, are common in human populations because they arose a long time ago. In contrast, mutations that arise de novo in a child (i.e. are absent from the parents) occur at a rate of about 1e-08, making them extremely rare. On average, there are 50-70 such mutations genome-wide. When we sequence a family trio, we identify inherited variants with extremely high accuracy, i.e. >99.9%. Sure, there are a few hundred false positives, but there are many million true positives. The signal-to-noise ratio is very high. This changes when we look only for de novo mutations. Now it’s 50 true positives, 200 false positives. Very different signal to noise ratio.However, and this is important, we don’t blithely call 200 de novo mutations in a child and assume they’re all real. With further scrutiny (e.g. using population databases and removing common artifacts) we can filter out most false positives and get to the correct number.Positive results from NIPT also get follow-up. In the case of the woman introduced at the start of the NYT article, an amniocentesis later revealed that the fetus did not have Prader-Willi syndrome. And this, perhaps, is the biggest thing missed by the reporters: positive NIPT results are the beginning of a process. Many of those frightening positive results are later refuted by direct molecular testing or imaging studies. This does not mean we should stop screening altogether or demand that the screen yield perfect results. If it did, we wouldn’t do PSA screens or mammograms.It’s like my airport analogy: most people who set off a metal detector are not carrying a weapon. That’s why TSA agents don’t open fire at the sound of a buzzer. When I set one off — which seems to be one of my talents in life — I empty my pockets and try again. Yet I don’t want them to stop checking people with metal detectors.Emotional Trauma of False PositivesThis is not to discount or ignore the emotional trauma that a positive screening result brings. These are very real consequences for the families affected. If you’re reading this, you’re probably close to someone who got a scary-sounding result of a medical test (as I am). Even if unconcerned, a possible diagnosis is usually terrifying.Medical professionals take this risk quite seriously. It’s one of the major topics considered whenever population screening is discussed, one of the “costs” in a cost-benefit analysis. Yet we still do a lot of population screening because the benefits of early detection for many conditions outweigh the potential consequences.Summary and Outlook for NIPTI read the New York Times regularly, and I appreciate that they’re one outlet that produces in-depth articles, often about scientific/technical topics like genomics, which are backed by real reporting. I also acknowledge a lesson given to me by my AP English teacher in high school: good writing sometimes needs to take a stance on something. Yet I think this particular article has far too much of a negative slant. It does not, for example, comment on the fact that NIPT does detect true cases of genetic disorders, some of which are extremely rare. It also suggests that the only reason NIPT providers expanded their tests was to out-compete one another and make money. That they’re just another flavor of silicon valley biotech looking to make a profit.The timing of this article is hardly accidental. The verdict of the Theranos trial (Elizabeth Holmes) has a great many people — especially the wealthy investors who lost money — feeling quite uncharitable about technology firms that over-promise on medical testing. This NYT article is written to draw on the parallels — the promises made, the misleading marketing, the billions of dollars to be made. Yet there’s at least one key distinction: NIPT actually works.Another distinction, and a nuance probably missed by the reporters, is that the customers of the tests are not really the pregnant mothers, but the clinicians who care for them. NIPT is not a direct-to-consumer test like Ancestry or 23andMe. One reason that NIPT and other panel tests continue to expand is because medical professionals want them to. Yes, many of the conditions being tested for are individually quite rare, but collectively they are not. Furthermore, many of these conditions are actionable. Let me take one example that the reporters enjoyed highlighting in their colorful circle plot graphics.Image Credit: New York Times1p36 deletion syndrome, also called monosomy 1p36, is the most commonly observed terminal deletion in the human population, with an estimated prevalence of 1 in 5,000. This syndromic condition comprises many common clinical features including growth deficiency, brain malformations, seizures, craniofacial dysmorphism, congenital heart defects, and hearing/vision problems. Almost all patients have congenital hypotonia (muscle weakness), which is associated with feeding difficulties and developmental delays. Intellectual disability is also common, but variable in severity.Most cases of 1p36 deletion are sporadic, meaning there’s no family history. Most patients survive well into adult life, but the severity of the disease and its ultimate effects varies widely. According to the Orphanet page for this condition:Management should be multi-disciplinary and include a regular follow-up. Early diagnosis and access to personalized rehabilitation therapies focusing on motor development, cognition, communication, and social skills are highly recommended.This is a severe disease with lifelong medical issues, but many of them can be managed. Congenital heart defects may require surgery. Seizures can be treated with standard anti-epileptic medications. Infantile spasms are responsive to corticotrophin. Feeding and growth should be monitored, especially early in life.There were 3.6 million babies born last year in the US alone; based on the prevalence estimate, 720 of them have 1p36 deletion. Yes, based on the false positive rate, another 4,500 could receive a false diagnosis by NIPT, but the syndrome would be clinically obvious well before birth. And there are still 720 babies who would be correctly diagnosed. Given all of the potential benefits of early/multidisciplinary intervention, it seems like a non-invasive screening test is still a good idea.Filed Under: Clinical Sequencing, Rare Diseases Genome Reference: Moving to Build 38March 1, 2021 by dkoboldt Leave a Comment This year marks the 20th anniversary of the publication of the human genome reference sequence. As I enjoy recounting to people outside of the genomics field, the investment required to complete that initial assembly is staggering: ten years, dozens of laboratories, hundreds of sequencing instruments, and a billion dollars. Today, using the latest next-generation sequencing, we can sequence a human genome in about two days for a few thousand dollars (yes, a “$1000 genome” is feasible, but only in terms of reagent costs and at centers that can sequence at factory scale).The human genome reference, advances in sequencing technology, and many years of prolific disease gene discovery have facilitated the widespread adoption of genetic testing as a frontline diagnostic tool. Most single-gene, gene panel, exome, and whole-genome tests now use next-generation sequencing. They have another commonality as well: most rely on alignment of those sequencing reads to “build 37” of the human genome sequence, which dates back to 2003.There is a newer, better assembly of the human genome: build 38, which has been available for…. (checks watch)… about eight years. Build 38 offers several key advantages over its predecessor, as highlighted in 2017 by the Genome Reference Consortium:Resolution of assembly errors and gaps associated with complex haplotypes and segmental duplicationsBase-pair–level updates for sequencing errorsAddition of “missing” sequences, with an emphasis on paralogous sequences and population variationBetter sequence representation for certain difficult genomic structures, such as centromeres and telomeres.However, Build 38 also comes with a significant cost: it changes the coordinates of genomic loci. In other words, a SNP’s position on build 37 is different (most of the time) on build 38. The same is true of genes and other annotations of the genome: everything has to be re-mapped on Build 38.LiftOver versus Remapping to Build 38The UCSC Genome Browser has a useful tool, called liftOver, which allows one to convert coordinates between different versions of the reference assembly. You provide a BED file, select the genome assemblies to convert to/from, and it will produce an output BED file with coordinates based on the desired assembly. Liftover works *pretty* well when one is attempting to obtain the coordinates for regions that are accurately represented in both genome assemblies, i.e. the only thing that has changed is the position number. That’s the case for ~95% of things one might need to convert. The problem is addressing the other 5% of loci that don’t have 1-to-1 unique map locations between assemblies.For a possible liftOver use case, consider the gnomAD database — which contains variant allele frequencies from large human study cohorts (~120,000 exome sequences and ~18,000 genome sequences). Obviously, gnomAD and its exon-focused predecessor (ExAC) have proven exceptionally valuable when interpreting genetic variants because they offer a somewhat accurate estimate of worldwide prevalence [in the populations that are represented, at least]. Not all of the individuals in gnomAD are perfectly healthy — after all, many of them were part of large cohort studies of common complex disease — but the curators excluded individuals with severe congenital disorders. Thus, an X-linked variant that’s hemizygous in dozens of males in gnomAD is unlikely to cause a rare X-linked recessive disorder.As noted in the flagship gnomAD paper, the initial WGS dataset yielded 14.9 million high-quality variants from the WES dataset and 229.9 million variants from the WGS dataset, all on build 37. A liftOver to build 38 is available for download. However, if 95% of variants were successfully converted to GRCh38 coordinates using liftOver, we’re still missing 750,000 exome variants and 11.5 million genome variants. These variants, when they’re identified in patients and cohorts with sequence aligned directly to build 38, will [incorrectly] appear novel to gnomAD and thus expected to be quite rare in human populations. The result is a much-diminished signal-to-noise ratio for extremely rare variants in analyses that rely on liftOver data.Key Genomic Resources Moving to Build 38Many of the key resources for genome analysis and variant interpretation have fully embraced GRCh38. The ClinVar database and the UCSC Genome Browser, for example, now default to build 38 coordinate systems while providing backward compatibility with build 37. NCBI’s dbSNP database moved to build GRCh38/hg19 with release 143 in March 2015. Many commercial tools are fully compatible with both genome assemblies, including VarSome, which we use in-house.Other resources are increasingly supporting build38 analysis but still in a transition period. Although the UCSC Genome Browser database defaults to GRCh38, anyone familiar with its annotation tracks will notice that many of them are still missing from the newer assembly. This is an unfortunate reality of having infrastructure entrenched in a certain genome version. The best option would be to ask the groups who contributed key annotation tracks to re-generate their datasets on the new genome assembly. That’s a big ask, especially for groups like the ENCODE Project who have massive datasets on build 37 coordinates.The vital gnomAD database, unfortunately, falls into the partial-transition category. The liftOver versions of gnomAD have been available for some time, and the latest release of gnomAD included a larger WGS dataset all mapped to GRCh38 (it has other advantages too, like better representation of minority populations). However, the critical WES dataset of gnomAD has yet to be re-mapped, and the gnomAD team recommends sticking with the old version for analysis of coding regions. This is a problem, since that’s where most clinical variant interpretation happens, and undoubtedly contributes to why many clinical labs are reluctant to make the switch.The good news is that gnomAD plans to make another release, with WES data fully remapped to GRCh38, sometime in 2021. We can expect that in October, since the gnomAD team likes to make their big announcements around the time of the annual ASHG meeting.Let’s Move to Build 38. Now, and TogetherI firmly believe it’s time for the human genetics community as a whole to make a concerted effort to move to build 38 in 2021. In fact, I challenge everyone to do so. The more stakeholders — research groups, clinical laboratories, databases, and even journals — who embrace build38, and consider it standard, the better this will be. Yes, there are bound to be hiccups. Having gone through this transition before, I’d like to offer a few bits of advice:Before, during, and after the transition, train yourself to indicate the assembly version whenever providing chromosomal positions or coordinates (in e-mails, Excel files, presentations, etc). Make it a habit.Establish shared resources for converting between coordinate systems — such as a local installation of the UCSC liftOver tool — and use a common nomenclature for files/folders that makes the genome version obvious.Plan to conduct analyses on both build 37 and build 38 during the transition period, and systematically compare results to make sure that everything is working as it should be.There are other more subtle differences that we’re likely to encounter as we move analyses to the new genome assembly — such as differences in read mapping performance given the better representation of duplicated sequences and alternate haplotypes. GRCh38 will have its quirks, and we’ll be better off if we muddle through them together.Filed Under: Clinical Sequencing, Rare Diseases New Insights into Human Gene RegulationOctober 5, 2020 by dkoboldt Leave a Comment V. Altounian/Science, in Collaboration With Christian Stolte (Data) GTEx ConsortiumUnderstanding the impact of genetic variants on observable traits is a fundamental goal of human genetics. Yet for the >98% of known sequence variants that reside outside of protein-coding sequences, this remains a significant challenge.There is considerable evidence that noncoding variation can and does impact observable phenotypes. Genome-wide association studies, for example, have pinpointed thousands of loci that are associated with complex disease. Many of them are in noncoding sequences or regions or gene deserts. RNA sequencing studies of undiagnosed patients with Mendelian disorders, too, have uncovered causal variants well beyond the coding regions. Our recent study of an intronic variant that disrupts ATP7B splicing in Wilson disease is just one example.Most geneticists recognize that noncoding variants are important. Even so, interpreting them remains difficult because we have not yet deciphered the “regulatory code.” A bevy of papers from two large-scale international consortia — the ENCODE Project and the GTEx Consortium — has shed new light on regulatory sequences in the genome and their impact on gene expression. In this post, I’ll explore two papers from the latter project that offer fascinating insights into the human genetic regulatory code.The GTEX DatasetThe GTEx project was launched in 2010 with the goal of cataloguing gene expression across a variety of human tissues using the emerging technology of massively parallel RNA sequencing (RNAseq). Many gene hunters like myself benefited from the patterns of gene expression across different tissues when considering potential new disease genes. Last month, the consortium published their latest atlas of gene expression variation in Science, which comprises 15,201 RNAseq datasets representing 49 tissues from 838 postmortem donors. This approximately doubles the catalogue since the intermediate publication in 2017 (42 tissues from 449 donors).Whole-genome sequencing was performed on all 838 donors as well, enabling the authors to search for relationships between sequence variation and gene expression differences between individuals. They identified a total of 43.1 million SNPs after quality control and phasing. That’s an impressive number, especially when one considers that there were only ~30.4 million human variants catalogued just a decade ago (dbSNP build 132, September 2010).cis-eQTL Discovery in GTExThe authors searched for variants associated with the activity of nearby genes (cis-eQTLs), uncovering that 4.23 million variants were associated with at least one gene expression level in at least one tissue. This is nearly half (43%) of common population variants (MAF>0.01) in the cohort. Some interesting findings about cis-eQTLs:Most genes have at least one eQTL. Some 18,262 protein-coding genes (94.7%) and 5,006 long noncoding RNA genes (57.3%) had at least one significantly associated cis-regulatory variant.Most cis-eQTLs had small effect sizes. However, about one in five (22%) had a greater-than-twofold effect on gene expression.Discovery of cis-eQTLs saturates at ~1500 genes in tissues with >200 samples. In other words, this study is extremely well-powered; only 200 individuals are required to discover all of the large-effect cis-eQTLs (which should be around 1,500).cis-sQTL (Splicing) Discovery in GTExGTEx Consortium, Science 2020The authors mapped variants associated with exon-intron splicing patterns of nearby genes (cis-sQTLs) using intron excision ratios from LeafCutter.Splice-QTLs are pervasive. 12,828 protein-coding genes (66.5%) and 1600 lincRNA genes (21.5%) had at least one sQTL in at least one tissue.Cis-sQTLs are enriched almost entirely in transcribed regions. In other words, variants that affect splicing are located within the transcript (i.e. UTR, exon, splice region, or intron). This is somewhat intuitive when you think about it, but reassuring to see.Variants in expected and unexpected places affect splicing. Splice acceptor, splice donor, splice region, and loss-of-function variants were most enriched for sQTLs. This again is somewhat expected. Yet there was also >2x enrichment for sQTLs among missense, synonymous, UTR, and intronic variants. See the figure at right.Rare Variants That Regulate GenesAdapted from Figure 1, N. Ferraro et al, Science 369 (2020)Another article from the GTEx Consortium in the same issue of Science explores in-depth the role of rare variation in driving transcriptomic signatures across tissues. Using the WGS data for 838 individuals, the authors evaluated how rare genetic variants contribute to:Differences in gene expression (eOutliers)Differences in allele expression (aseOutliers)Differences in splicing (sOutliers)I chose to discuss this paper along with the main GTEx article because it’s arguably most relevant to those of us working on the genomic basis of rare and pediatric diseases. Whereas the flagship findings from above pertain largely to common (MAF>0.01) variants that offer the statistical power to detect associations in a cohort of ~800 individuals, this study took a complementary approach. They identified individual outliers with respect to gene expression, allelic expression, or splicing in the RNA-seq datasets, and then interrogated the genomes of those individuals to look for nearby rare variation that might explain the aberration. In other words, this is a study of extreme outliers whose unique patterns of gene activity could be due to rare large-effect variants.The authors prioritized outlier observations that were consistent across multiple tissues from the same individual, eventually identifying, in each individual, a median of:4 genes that were outliers for expression (eOutliers)4 genes that were outliers for allelic expression (aseOutliers)5 genes that were outliers for splicing (sOutliers)Gene Outliers and “Suspect” Rare VariantsWhen the most stringent thresholds were applied to identify these outliers, most (82-94%) individuals harbored at least one rare variant in the gene body or within 2kb. I should point out here the co-occurrence of the outlier status and rare variant does not represent directly causal evidence. The authors use such language as “variants leading to any outlier status” which in my opinion rather make the presumption of a directly causal relationship when it’s not proven. For example, it’s very possible that the nearest RV to an outlier gene in an individual has nothing to do with the gene’s outlier status. Even so, most of human genetics is about probabilities, so I think it could be reasonable to say that there’s a good probability that a rare variant observed in or near a gene that’s an outlier for that particular individual could be contributory.Interestingly, despite the observation of RVs near most outlier genes, the opposite correlation was not true. That is, a large proportion of genes with rare variants did not appear to be outliers, even for the most predictions such as loss-of-function variants. This is perhaps an unexpected and important finding. Even the most predictive category, splice donor and splice acceptor variants, caused a splice outlier only 7.2% and 6.8% of the time. I hope nobody tells ACMG about this.The Impact of Splice Region VariantsThe relevance of variation in and around splice sites remains an area of vigorous debate, even within our lab. Most agree that variants in the canonical splice donor (first two bases of intron after an exon) and splice acceptor (last two bases of intron before the next exon) are the most likely to disrupt splicing, and that holds true in the GTEx dataset. To their credit, the authors explored the “relative risk” of splice disruption based on enrichment of implicated rare variants in the wider splice region:From Figure 2 (Ferraro et al, Science 2020)On average, the relative risk for a rare variant in the canonical splice site was 195, and most of that signal is coming from the minus 2 splice acceptor position (the “A” in “GT-AG”). This matches evolutionary conservation evidence. Interestingly, we do see strong enrichment for rare variants throughout the rest of the splice region, though not nearly at the same level. There’s an interesting spike at +6 bp into the intron that might be worth evaluating further. I’ve seen plenty of splice region variants that don’t impact splicing at all (according to the RNA-seq data), but this suggests that there are some out there which probably do.All told, it’s an interesting study and the highlights I’ve shared here barely scratch the surface of the findings in the massive GTEx v8 dataset. It represents a huge body of work, and an important step forward in our understanding of the human genetic regulatory code.Filed Under: Genome Function The wide phenotypic spectrum of BICD2 variants in dominant SMAMarch 26, 2020 by dkoboldt Leave a Comment The identification of novel disease genes sometimes overshadows another crucial form of genomic discovery: expanding the phenotype associated with known disease genes. This is especially important in the era of pervasive clinical genetic testing.Exome sequencing, which interrogates all ~20,000 protein-coding genes simultaneously, is rapidly becoming a frontline diagnostic test for patients with rare genetic conditions. The obvious advantage of exome sequencing is its comprehensiveness: the coding regions of virtually all genes are sequenced. From a clinician’s point of view, I imagine that has tremendous appeal. Yet the cost of this comprehensive approach is that it uncovers thousands of protein-coding variants in the patient.Automated filtering can reduce the list to ~100-300 variants that are somewhat rare and predicted to affect a gene associated with at least one of the patient’s phenotypes. That’s still far too many to select for full ACMG assessment. Thus, in our laboratory and many others, the decision of whether or not to assess a variant gives considerable weight to the overlap between the patient’s clinical features and the phenotype set of a genetic disorder. The accuracy of that process depends on:Thorough and precise phenotyping of the patient by expert cliniciansThe expertise of clinical directors and variant scientistsCurrent knowledge of genotype-phenotype relationships for disease genesDependency #3 is the one that’s most likely to change over time. This is supported by recently published studies of clinical WES reanalysis, which have consistently found that new knowledge (i.e. a newly identified disease gene) is the most common source of positive findings among previously negative WES cases.A Patient with Severe Muscular AtrophyThe rare disease genomics study at our institution has been running for about four years. One of the first cases we enrolled was a child with arthrogryposis multiplex congenita, i.e. joint contractures affecting multiple parts of the body. This condition is believed to be the result of reduced intrauterine movement. In the case of our patient, the reason for that reduced movement was a striking lack of skeletal muscle.We enrolled the patient and his parents on our research protocol and performed whole-genome sequencing. This uncovered a de novo variant in a gene called BICD cargo adapter 2 (BICD2). Back in 2013, three studies of large family kindreds with dominant muscular atrophy affecting predominantly the lower limbs had linked the disorder to missense variants in BICD2. The Online Mendelian Inheritance in Man (OMIM) database, the definitive resource for gene-disease associations, described the condition as spinal muscular atrophy, lower extremity dominant, type 2 (SMALED2).A dominant form of muscular atrophy could in theory describe our patient, but there were two problems:All reported disease-causing variants to date were missense changes, whereas our patient had an inframe deletion of a single amino acid.Our patient had severe atrophy throughout the body at birth, whereas the patients with SMALED2 generally presented later in life and usually with only lower limbs affected.As a result of the inconsistencies, we chased other leads in this case, none of which panned out. BICD2 remained our top candidate, but not everyone was convinced that it was the answer. Myself included.Serendipitous Meeting at ASHGIn October 2017, I went to the annual meeting of the American Society of Human Genetics (ASHG). On the way to the meeting, I perused the program through the mobile app and searched the submitted abstracts for some of my topics of interest. One of those was the BICD2, and it came back with a hit: a poster from a researcher at Mount Sinai that described a patient with a de novo inframe indel in BICD2.And it was the same indel we’d found in our patient, a deletion of a single amino acid: p.(Asn546del).I met the poster’s author, who was an Ob/Gyn doing a fellowship in genetics. My main concern was that we might have the same patient. We quickly determined that this was not the case — her patient was several years older and female. Then we discussed the clinical features, and determined that the overlap was significant. Same gene, same variant, similar clinical presentation in unrelated patients. That’s the holy grail of rare disease research. It was enough to publish, and (importantly) enough to convince me and my collaborators that BICD2 was the answer after all.BICD2 Structure and FunctionBICD2 itself is a fascinating gene. It’s one of two human homologs of the fly gene Bicaudal D (bicD), so named because mutating it in Drosophila produced flies with two trunk segments, rather than a head and a trunk. As a component of the dynein molecular motor complex, human BICD2 plays a pivotal role in intracellular transport. The protein’s three coiled-coil domains have specific binding partners:Figure 1. BICD2 binding partners and gene structureBICD2’s N-terminal stabilizes the dynein-dynactin complex. Direct visualization in live cells indicates that the complex alone is unable to interact with microtubules, but tethering of BICD2’s C-terminus to different membrane cargoes induces their movement toward microtubule minus ends. As a cell prepares to divide, BICD2 switches its binding preference from RAB6A to nucleoporin RANBP2, recruiting it to the nuclear pore context and regulating dynein and kinesin to keep the centrosomes closely tethered to the nucleus prior entering mitosis.Many of the consequences of a pathogenic BICD2 mutations in muscle cells are striking enough to be seen under a microscope. The most common mutation in SMALED2 (p.S107L) occurs in the first coiled-coil domain and appears to increase BICD2’s binding affinity for dynein. This causes the normally compact Golgi apparatus to disperse throughout the cell, a phenomenon of impaired dynein function called Golgi fragmentation.New Information on BICD2 and DiseaseSome new information emerged as I worked on the case report. First, the study coordinator discovered that our patient had recently passed away at the age of six. This was saddening, even though our finding would not have changed it.Second, I learned that there had been new reports on BICD2 in the intervening years, some of which described a more severe phenotype with onset in utero associated with de novo mutations in BICD2. At around the time I’d gone to ASHG in 2017, a group at the University of Cologne (Storbeck et al) published a study emphasizing the phentoypic extremes of BICD2 mutation carriers. Four of their five reported patients had also passed away at a young age. The last line of their abstract read:Our data define an additional severe disease type caused by BICD2 and emphasize a possibly variable etiology of BICD2-opathies with regard to primary muscle and neuronal involvement.Around the time we submitted our manuscript, a group in France published a case report of an inframe indel in BICD2 segregating in a family with non-progressive SMA. Now there was precedent for the variant type as well.Our report in Molecular Case Studies added to the growing number of reports of patients with BICD2 variants who manifested a disease that was far more severe than the later-onset, lower-extremity-predominant SMA on record for BICD2. This was brought to the attention of the curators of the OMIM database, who revised their entry for BICD2 to recognize two associated disorders:SMALED 2A (MIM #615290), the classic presentation of later-onset affecting mainly lower limbsSMALED 2B (MIM #618291), the severe systemic disease with prenatal onsetBoth conditions are caused by dominant missense or inframe variants in BICD2. The large family kindreds published in 2013 fall under type 2A. Our patient and others with de novo mutations generally fall under type 2B.A Comprehensive Look at BICD2 Variants and DiseaseNew information continues to emerge as more patients are described in the literature, or their variants are submitted to the ClinVar database by clinical laboratories. As best I could tell, there were close to a hundred patients with pathogenic BICD2 variants described across a dozen publications and/or the ClinVar database.I spent much of 2019 collating all of these reports and organizing them into the most comprehensive review to date of the genetic basis and phenotypic spectrum of BICD2 disease in humans. It was just published in Annals of Neurology and I hope you’ll give it a read:Koboldt DC, Waldrop MA, Wilson RK, and Flanigan KM. The Genotypic and Phenotypic Spectrum of BICD2 Variants in Spinal Muscular Atrophy. Ann Neurol. 2020 Apr;87(4):487-496. doi: 10.1002/ana.25704. PubMed: 32057122Filed Under: Rare Diseases Page 1Page 2Page 3Page 4Next Page »Primary SidebarRecent PostsOrigins and Cautions for the gnomAD DatabasePrenatal testing, rare diseases, and the New York TimesGenome Reference: Moving to Build 38New Insights into Human Gene RegulationThe wide phenotypic spectrum of BICD2 variants in dominant SMADisclaimerThe views expressed on this site do not reflect the opinions of Nationwide Children’s Hospital or The Ohio State University.Copyright ©2022 · Lifestyle Pro on Genesis Framework · WordPress · Log inLoading Comments...