LONDON – While large-scale biobanks that link genomics to longitudinal health records of diagnosis, treatment and outcomes promise to revolutionize understanding of the genetics of complex disease, the detailed statistical analysis of those high-dimensional data is still very much in its infancy.
Now researchers in the U.K. have devised a method for mapping genetic risk across different disorders and applied it to identify risk factors that are common to multiple diseases, using UK Biobank data. They said that will make it easier to track down underlying molecular mechanisms and biological pathways, increasing the value of genome-wide association studies (GWAS) in target identification and validation.
GWAS have proved their worth in identifying genetic variants associated with complex diseases, such as diabetes, lung cancer, multiple sclerosis and asthma. However, they have focused on a relatively small number of diseases, most usually drawing on data from patient cohorts with clear-cut diagnoses and uniform clinical symptoms.
The UK Biobank, with its samples and health data from 500,000 middle-age volunteers who were healthy at the point they were enrolled, makes it possible to take a disease-agnostic approach to the investigation of genetic associations, said Gil McVean, professor of statistical genetics and director of the Big Data Institute at Oxford University, who is lead author of a study on the cross-disease components of genetic risk, published in the Dec. 23, 2019, online issue of Nature Genetics.
“GWAS have been done for a range of diseases over the past 15 years and has given huge insights into their genetic basis. But typically, they look at one disease at a time. What UK Biobank allows us to do is look at all diseases simultaneously,” McVean told BioWorld.
The health care data UK Biobank holds on 320,644 participants of British Isles ancestry includes 19,155 diagnostic terms relating to episodes of hospital treatment, coded according to the World Health Organization’s standardized International Classification of Diseases.
McVean and colleagues have devised a method for mapping genetic risk across the disease classification codes, which allows shared signals across related disease codes to be combined effectively, but also picks up patterns of risk in unrelated diseases.
The approach quantifies the evidence that a genetic variant has any effect on any disease classification code.
Previous studies have used biobank data to look at a few diseases together. For example, in September, researchers at Osaka University in Japan published a cross-trait analysis using data from 46,837 participants in Biobank Japan, in which they identified genetic associations showing there is shared pathophysiology between five distinct gynecological diseases.
Similarly, UK Biobank data previously have been used to investigate the genetic links between asthma and allergy, and between asthma and mental health disorders.
“But UK Biobank covers thousands of diseases and the issue is the lack of tools that can handle that scale,” McVean said. “One aspect of [our] research is technological: showing we have a methodology for doing statistical analysis of all phenotype data.”
It has been known for a long time that there is overlap in genetic risk, for example between hypertension and heart disease, or ankylosing spondylitis and multiple sclerosis, implying some level of shared causes.
The ability to systematically characterize the shared genetic basis of different diseases will lead to improved understanding of the relevant biological pathways, paving the way for improved clinical care and informing target discovery and validation.
“When you put lots of diseases together, you can see how genetic variants that affect disease fall into different clusters,” said McVean. The common genetic structure shared by different diseases helps to identify the underlying pathways, he said.
For example, the single nucleotide polymorphism (SNP) rs4420638 on the apolipoprotein E (APOE) haplotype is the strongest known genetic risk factor for Alzheimer’s disease and is also associated with cardiovascular disease and lipid metabolism disorders.
The researchers found that SNP confers risk for 53 diagnostic terms recorded in hospital records of UK Biobank participants. Those included the to-be-expected categories “other diseases of the nervous system;” “disorders of lipoprotein metabolism;” and “diseases of the circulatory system.” But the analysis also showed evidence of rs4420638 being protective in the category titled “diseases of the liver.”
That demonstrates the approach can reveal previously unrecognized associations, even in a gene such as APOE, where there has been detailed study of how it influences seemingly unrelated phenotypic traits.
Common diseases, distinct pathways
The cross-trait association patterns picked up in the study also uncovered distinctions between genes thought to affect similar biological pathways. For example, the SNP rs2289252 in the F11 blood-clotting factor locus, which is associated with deep vein thrombosis, had a restricted set of associations, whereas rs6025 in the F5 blood-clotting factor gene, also associated with deep vein thrombosis, has a much more diverse range of associations. In addition to vascular traits, those included pneumonia and allergic reactions to drugs.
Despite both those SNPs influencing blood coagulation, the fact their disease association profiles only partially overlap suggests they impact different biological pathways.
Across 3,025 SNPs in the study, 96.9% affect more than one diagnostic term in the hospital records.
Almost a quarter of individuals have at least one record of primary hypertension, which is the most common diagnostic term in the whole dataset.
The analysis found 27 distinct clusters, with a median of six SNPs each, with a strong association to hypertension. Among the clusters, one affects hypertension only, eight are associated with type II diabetes, eight with high cholesterol and 17 with angina, heart attack or other acute or chronic ischemic disease. Four are associated with chronic renal failure, two with disorders of the gall bladder and bile duct and three with obesity.
The researchers note that this heterogeneity in risk profile among clusters is obscured by genome-wide measures of genetic correlation between traits. Their approach of finding clusters of variants can be used to generate hypotheses about what biological processes are modulated and can help in searching for causal relationships between different phenotypes, they said. In addition, by pooling information from different phenotypes, it will be possible to get a much more precise idea of the impact of modulating specific targets.
In clinical care, there are implications for individual patient risk prediction and potentially, diagnosis, prognosis and treatment.
For example, as the research shows, two individuals may have an identical risk for hypertension, but be very different in terms of the risks for potential co-morbidities.
“In the case of hypertension, two people might have the same risk through different pathways, for example kidney disease or an obesity-related profile. It’s not obvious they should be treated the same,” McVean said. “You can use this approach to understand why a person is at risk, or why they have got hypertension.”