BioWorld International Correspondent
LONDON - Research groups around the globe are poised to apply the latest remarkable information on how the human genome is organized, to parts of the genome that researchers know are linked with specific diseases.
Such new avenues of study inevitably will follow the publication of the pilot project of the ENCODE consortium - a detailed analysis of the entire activity of 1 percent of the genome.
Results published in the June 14 Nature described in detail which regions of the genome are copied actively in the cell, where the elements that control gene activity are located, and how DNA-associated proteins are related to gene activity and DNA replication.
The findings are astonishing. Tim Hubbard, from the Wellcome Trust Sanger Institute in Hinxton, UK, said. "The new view transforms our view of the genomic fabric. The majority of the genome is transcribed into RNA. . . This is a remarkable finding, since most prior research suggested only a fraction of the genome was transcribed."
Not only that, but the RNA transcripts overlap known genes, and many are found in what were previously thought to be gene "deserts."
Early studies of gene activity in bacteria indicated that the regions that controlled genes usually were located at or near the sites where gene transcription started. Yet the findings of the ENCODE project include description of many previously unknown control regions, and the discovery that control regions are just as likely to be beyond the end of the gene they control.
Manolis Dermitzakis, investigator at the Wellcome Trust Sanger Institute and a corresponding author on the paper, said alterations in control regions increasingly are thought to be of significance for human disease. "For the first time, we can see DNA sequence variation in the context of the biochemical workings of the cell. We can now begin to unravel the consequences that such variations hold for individuals and their susceptibility to disease," he said.
Dermitzakis told BioWorld International, "Researchers will now be able to put together information such as that published in Nature . . . on the susceptibility loci for seven common diseases, with the information on how we conducted the ENCODE studies, to plan projects that will tell them about the function of these susceptibility regions." (See BioWorld International, June 7, 2007.)
Although the whole picture of what is going on in those regions is now more complex than most previous models of genomic activity had predicted, he added, knowing the function of the various components of such DNA regions only can help to progress our knowledge about the genetic causes of disease.
He gave two examples. In a monogenic disease, it is often the case that some people clearly have the symptoms of the disease, yet sequencing the relevant protein-coding gene fails to identify any mutation that could be responsible. "Knowing what we know now, we can now say that the cause may lie in a variant in a region nearby that controls the activity of the protein-coding gene. By preventing transcription of the protein, this could have the same effect as, say, a mutation that results in a truncated and nonfunctional protein," Dermitzakis said. "We can now look for such variants, outside the protein-coding gene, in an informed way."
Many diseases also have been linked to single nucleotide polymorphisms (SNPs). Yet it is almost certain, Dermitzakis said, that in the current studies, a SNP that is statistically associated with a particular genetic disease or genetic predisposition to a disease is not itself responsible for the condition. "It is much more likely that there is some variant that is inherited alongside the region containing the SNP," he said.
Dermitzakis predicted that it might take three or four years to complete the ENCODE project for the entire genome. The ENCODE consortium is organized by the National Human Genome Research Institute in the U.S., and involves 35 groups from 80 centers around the world.
The title of the Nature paper is: "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project."
Members of the ENCODE consortium also published 28 papers in the June issue of Genome Research. For example, Alexandre Reymond and colleagues from the University of Lausanne, Switzerland, set out to annotate all 399 protein-coding genes in the ENCODE regions.
They found that more than half of the genes produced transcripts that contained sequences mapping outside of the known boundaries of these genes. Those transcribed sequences often overlapped with other genes, and were often located a long way from the main portion of the protein-coding sequence.
"Our results modify our current understanding of the architecture and regulation of protein-coding genes," Reymond explained. "Furthermore, some sequence polymorphisms hitherto considered to be located in noncoding regions may ultimately be related to disease."