LONDON – A vast new body of genomics research has identified thousands of rare genetic variants that are predicted to cause loss of function in protein coding genes, providing novel in vivo models of human gene inactivation.
The proof of concept of using those in vivo models as proxies for human gene knockouts, for use in research and to evaluate drug targets, is described in a collection of seven papers published online in the Nature journals on May 27, 2020, by researchers involved in the Genome Aggregation Consortium (Gnomad).
Working from a collection of DNA sequences of the protein coding exome of 125,748 individuals and the whole genome sequences of 15,708 others, the researchers identified 443,769 variants which they predict will cause loss of function, a huge increase on the number of such variants discovered to date.
They then classified each protein-coding gene according to its tolerance to inactivating variants – that is, how likely a gene is to cause significant disease when disrupted by genetic changes.
The classification was validated using data from model organisms and engineered human cells, to show how that tolerance spectrum can be used to improve the power of gene discovery in both common and rare diseases.
As one example, the researchers found rare variants in genes that do not tolerate loss of function occurred more often in people who have intellectual disability or autism spectrum disorder, than in people who do not.
“What Gnomad is trying to do is to look into diversity; getting as much sequence data as [possible] to see what natural variation exists in the population,” said Nicola Whiffin, of Imperial College London, a member of the Gnomad consortium and lead author of two of the papers. “We are only just beginning to be able to do that on a large scale. Many variants are very rare, so we need a lot of individuals to be able to see them,” she told BioWorld.
Those first major studies of human genetic variation in Gnomad set the scene for a human “knockout” project, which would attempt to discover the phenotypic consequences of disruptive loss of function variants of all human protein coding genes. That would require whole genome sequences, and consistently collected in-depth phenotype data from millions of people, to make it possible to directly link gene-disrupting variation to human biology, the researchers said.
“The Gnomad catalogue gives us our best look so far at the spectrum of genes’ sensitivity to variation, and provides a resource to support gene discovery in common and rare diseases,” said Konrad Karczewski, computational biologist in the Broad Institute’s analytical and translational genetics unit at MIT, and lead author of the flagship paper summarizing the body of research.
The potential value of such a resource to drug discovery is illustrated by one of the current best known examples, the PCSK9 gene, in which loss of function variants have been causally linked to low levels of low-density lipoprotein cholesterol. That observation led to the development of drugs that inhibit the gene, which are used to reduce cardiovascular disease risk.
“A systematic catalogue of potential loss of function variants in humans and the classification of genes along a spectrum of tolerance to activation would provide a valuable resource for medical genetics, identifying disease-causing mutations, potential drug targets, and windows into the normal function of many currently uncharacterized human genes,” the researchers said.
Whiffin said the huge catalogue of normal variation in Gnomad already is proving its worth in rare diseases. “All clinical geneticists automatically refer to Gnomad when they get sequence data from a [rare disease] patient to see if variants are common, in which case [they] won’t be the basis of a diagnosis,” she said.
An important advance described in one of the suite of papers is the production of a catalogue of structural variants, including duplications, deletions and inversions, which at 50 to 100 bases long cannot be identified with typical short sequence read DNA sequencing machines. However, it is known they can have a bigger influence on physiological traits and disease than the single nucleotide polymorphisms on which much current understanding of genetic variations rests.
A group led by Michael Talkowski at the Center for Genomic Medicine at Massachusetts General Hospital, Boston, devised a technique for detecting structural variants from short-read sequences on a population scale. Applying the technique to 14,891 whole genome sequences in Gnomad, Talkowski has drawn up a list of 433,371 structural variants. The variants represent most of the major known classes of structural variation and the largest such collection to date.
“Structural variants are notoriously challenging to identify within whole genome data, and have not previously been surveyed at this scale,” Talkowski said. “But they alter more bases in the genome than any other form of variation and are well-established drivers of human evolution and disease.”
The researchers found that at least 25% of all rare loss of function variants in the average individual genome are actually structural variants, and that many people carry what on the face of it appear to be harmful structural alterations, but without the phenotypes or clinical outcomes that would be expected.
It was also noted that many genes are just as sensitive to duplication as to deletion, and that from an evolutionary perspective gaining one or more copies of a gene can be just as undesirable as losing one.
A great deal has been learned in compiling the catalogue of structural variants, Talkowski said. “But we’ve clearly only scratched the surface of understanding the influence of genome structure on biology and disease.”
Evaluating drug targets
One of the papers in the collection sets out a framework for using loss of function variants to evaluate drug targets, improve the clinical interpretation of genetic variants and to investigate specific loss of function variants in more detail.
Whiffin applied those principles to investigate gain of kinase function variants in the LRRK2 gene that are known to significantly increase the risk of Parkinson’s disease.
That suggests inhibition of LRRK2 activity is a promising target and, indeed, Denali Therapeutics Inc., of South San Francisco, has two Parkinson’s candidates in its pipeline aimed at that gene.
However, some animal models have raised concerns about on-target toxicity in the liver, lungs and kidneys.
A systematic analysis of sequence data identified 1,455 individuals with loss of function variants in LRRK2 that led to reduced production of the kinase.
“Looking at naturally occurring variants that lower the amount of protein in an individual, what is the effect? When we looked at the phenotypes, there was no lung, liver or kidney involvement, even though in model organisms knockouts get lung, liver and kidney problems,” Whiffin said.
“We’ve catalogued large amounts of gene disrupting variation in Gnomad,” said Daniel MacArthur, scientific lead of the Gnomad project, formerly of the Broad Institute and now director of the center for population genomics at the Garvan Institute for Medical Research in Sydney, Australia. “With these two studies, we’ve shown how you can then leverage those variants to illuminate and assess potential drug targets,” he said.