1,000 Genomes Project Has Passed 1,000 Sequences Mark
By Anette Breindl
The 1,000 Genomes Project has surpassed the 1,000 genomes mark. This week, the international consortium that is behind the sequencing effort reported on genome data from 1,092 individuals from 14 separate populations.
Not that this means their work is complete. The project, in a fine example of fuzzy numbers, actually plans to sequence 2,500 people altogether. The authors refer to their data, which were published in the Nov. 1, 2012, issue of Nature, as Phase I data. They plan to sequence an additional 1,500 individuals from 12 additional populations in the final phase of the project.
Pilot phase data, which consisted of whole-genome data from about 180 individuals plus the exomes, or protein-coding regions, of another 700, were published almost to the day two years ago. (See BioWorld Today, Oct. 28, 2010.)
The current study combines low-coverage whole genome sequencing – which is faster and cheaper, but also less accurate then higher-coverage sequencing – with higher-coverage sequencing of exomes, SNP sequencing and bioinformatics methods. They used those methods to look at base pair variations, insertions and deletions in their subjects.
The combination of approaches, the authors wrote in their paper, "was shown by the pilot phase2 to be powerful and cost-effective in discovering and genotyping all but the rarest SNP and short insertion and deletion (indel) variants." They estimated that with their approach, they have been able to detect 98 percent of variation that occurs in at least 1 percent of the population.
At the press conference that announced the completion of the pilot phase, 1,000 Genomes co-chair Richard Altshuler had stressed that "we want to be very careful as a project not to suggest that this framework project is itself medical research. It is simply a tool; it is not being done in disease samples." (See BioWorld Today, Oct. 28, 2010.)
In the new paper, the authors pointed out two ways in which that tool currently is being used in medical genetics. "Data from the 1,000 Genomes Project," the authors wrote, "are widely used to screen variants discovered in exome data from individuals with genetic disorders and in cancer genome projects."
As the sample size of the 1,000 Genomes Project increases, so does the accuracy of such comparisons.
Another use of the data is for "imputing" genotypes in genomewide association studies. That is, genome sequences in such studies, which look at single-nucleotide polymorphisms but do not sequence whole genomes, can be inferred because genes that are located near each other physically tend to be inherited together. The authors said that for high-frequency variants, such imputation now has an accuracy of between 90 percent and 95 percent, while for low-frequency variants the accuracy is between 60 percent and 90 percent.
The pilot phase published in 2010 suggested that individuals, on the average, manage to get on with life quite well despite having hundreds of deleterious mutations. The new, larger dataset confirmed that finding.
Just looking at transcription factor binding sites alone, the team found that typically "individuals carry 700-900 conserved motif losses . . . of which 18-69 are rare . . . and show strong evidence for being selected against."
The data also make apparent just how much local variation in genome sequences remains alive and kicking in the supposedly global age. The subjects in the current study came from distinct populations in Europe, Asia, Africa and the Americas. Those populations varied "substantially" in the profiles of both their rare and their more common variants, showing that much of human genetic variation arose relatively recently, in an evolutionary time frame.
In a prepared statement, Gil McVean, of Britain's Oxford University, said that "even just within the UK, Orkney islanders will have different variations from mainlanders, and will be different again from those from other nearby islands."
McVean, in fact, envisions sequencing on a scale that those with a less sanguine view of widespread sequencing might consider downright Orwellian. "In the future, we would like to reach the scale of having a grid of individuals giving us a different genome every couple of square kilometres," he said. "But there is a long way to go before we can make this a reality."
Sign up for Perspectives FREE e-mail newsletter.
Outside the U.S.: 1-404-262-5476
Hours: Monday - Thursday, 8:30 am - 6:00 pm EST
Friday, 8:30am - 4:30 pm EST