BioWorld International Correspondent

LONDON - A newly identified type of variation in the human genome has huge implications for understanding the genetic causes of disease.

It turns out that the number of copies of large chunks of the genome can vary between individuals - not just the single base changes known as single nucleotide polymorphisms (SNPs).

A new study, published Nov. 23, 2006, in Nature, has provided a map of that copy number variation (CNV). Commenting on that work in a News & Views article in the same issue, Kevin Shianna and Huntington Willard of the Institute for Genome Sciences and Policy at Duke University in Durham, NC, wrote: "The data suggest that the greatest source of genetic diversity in our species lies not in millions of SNPs, but rather in larger segments of the genome whose presence or absence calls into question what exactly is a normal' human genome."

The amount of CNV regions identified by the study indicates, Shianna and Willard noted, that there is between five- and 10-fold more variation between any two randomly chosen genomes than suggested previously by studying SNPs alone.

Carried out by an international consortium of researchers, the Copy Number Variation Project aimed to map the gains and losses of large chunks of DNA sequence consisting of between 10,000 and 5 million letters. The group used DNA from 270 individuals from four populations: 30 parent-offspring trios of the Yoruba from Nigeria, 30 parent-offspring trios of European descent from Utah, 45 unrelated Japanese from Tokyo, and 45 unrelated Han Chinese from Beijing. (Those samples are known as the HapMap collection.)

The researchers defined CNV as a DNA segment that is 1 kilobase or larger and present at variable copy number in comparison with a reference genome.

Their paper is titled "Global variation in copy number in the human genome."

Using microarray-based genome-scanning techniques capable of finding changes at least 1,000 bases long, they found an average of 70 CNVs, averaging 250,000 nucleotides in size, in each DNA sample. In all, the group identified 1,447 different CNVs that collectively covered about 12 percent of the human genome, and 6 percent to 19 percent of any given chromosome, making the phenomenon far more widespread than anyone previously had thought.

Matthew Hurles, one of the project's leaders at the Wellcome Trust Sanger Institute, in Hinxton, UK, said: "One of the real surprises of those results was just how much of our DNA varies in copy number. The copy number variation that researchers had seen before was simply the tip of the iceberg, while the bulk lay submerged, undetected. We now appreciate the immense contribution of this phenomenon to genetic differences between individuals."

But what will the discovery mean for medical research and the search for genetic causes of disease?

Charles Lee, one of the project leaders from Brigham and Women's Hospital and Harvard Medical School, both in Boston, said: "Many examples of diseases resulting from changes in copy number are emerging. A recent review lists 17 conditions of the nervous system alone, including Parkinson's disease and Alzheimer's disease, that can result from such copy number changes."

CNVs, researchers believe, could influence the amount of messenger RNA produced by the affected gene, and thus the amount of protein it produces, providing a mechanism by which the phenomenon could influence susceptibility to various diseases.

Studies already have shown that the number of copies of a gene can vary and can be linked to the presence or absence of disease. Aside from the well-known case of rearrangements of the globin genes leading to alpha-thalassaemia, more recent studies have identified many other examples.

Last year, for instance, Sunil Ahuja from the University of Texas Health Center in San Antonio reported that HIV-positive people had fewer copies of a gene called CCL3L1 than HIV-negative subjects from the same geographical area. Each additional copy of the CCL3L1 gene reduced the risk of HIV infection by 5 percent to 10 percent, and slowed disease progress following infection.

The Wellcome Trust Sanger Institute and its partners already have developed a database of CNVs associated with clinical conditions. The database, called DECIPHER, allows researchers around the world to submit clinical information of patients with CNV details, using the Internet. That patient information is then mapped onto the human genome.

A paper accompanying the Nature paper, published online by Nature Genetics, described how members of that same consortium compared the two genome maps produced by Celera Genomics Inc., of Rockville, Md., and the public Human Genome Project. Its title is "Genome assembly comparison identified structural variants in the human genome."

The study found thousands of differences. Commenting on that paper, Stephen Scherer, a geneticist at the Hospital for Sick Children and the University of Toronto, said, "Other people have [compared the two human genome sequences], but they found so many differences that they mostly attributed the results to error."

Personalized genome sequencing, for individualized diagnosis, treatment and prevention and disease, is not far off, he observed. "This paper helps us think about how complex it will be," he concluded.