By David N. Leff
Two archrivals of genome sequencing, from industry and academia, faced off Monday at a press conference in Washington to present near-total cloning of the human genome.
The event, staged jointly by the journals Nature and Science, brought together the principal competing authors of the long-awaited project. Molecular geneticist Eric Lander, director of the Whitehead Institute Center for Genome Research in Cambridge, Mass., is principal author of an introductory 62-page paper in a 123-page edition of Nature, dated Feb. 15, 2001. Titled "Initial sequencing and analysis of the human genome," it presents the readout of the total DNA in a human. The Nature issue comprises 21 related articles in all, involving about 1,000 authors in six countries.
J. Craig Venter, president of Celera Genomics in Rockville, Md., is lead author of a 48-page paper - with 282 co-authors - titled: "The Sequence of the Human Genome." It tops a special 15-paper edition of Science, dated Feb. 16, 2001.
The bottom lines of these contending public-vs.-private strategies are surprisingly similar, Lander told BioWorld Today, in an interview before the press conference. "For one thing," he began, "let's start with the fact that the result turned out to be DNA sequence assemblies of the same length. The public project's assembly is 2.69 billion base pairs; Celera's, 2.65 billion - essentially the same.
"What's so interesting," Lander observed, "were the methods each of us employed. For the last three years there has been a running debate over whether or not our public human genome project was wasting taxpayer dollars by using a clone-by-clone, map-based approach, instead of Celera's whole-genome shotgun.
"Now, of course," he continued, "it emerges in their Science paper that Celera assembled its genome two ways: one, using their closely held whole-genome shotgun approach; the other, the map-based approach used by our public project, which was freely available to Celera. Their shotgun approach produced 120,000 freely floating components - far too many to localize. They weren't able to localize them.
"Celera moved on instead to our map-based approach," Landers went on, "which produced only 3,800 components that had to be localized. So after all of the Sturm und Drang," he pointed out, "it emerges that contrary to the press releases they put out last June, the Celera assembly clearly was based on the public map-based approach. Now I've got to say for the record," he added, "this does not mean that the whole genome shotgun is necessarily going to be a failure. It just means that in this case, it failed."
In Gene Array, Small Is Surprising
Both sides concurred on one point: The biggest surprise of their human genome census was the paucity of protein-coding genes. In recent months, geneticists have bandied about guesstimates running up to 100,000 genes and higher. Now the final published count ranges from 26,383 to 39,114. This sobering statistic puts Homo sapiens at less that thrice the gene tally of Drosophila melanogaster's 13,601 genes, and only six times the bacterium Pseudomonas aeruginosa's 5,570.
"It looks like it's a lot smaller than we thought," Lander observed, "and that has implications. In addition, 30,000 or so proteins have emerged, which are going to be the list that the pharmaceutical industry is going to be using for all time forward. Because this is it," he emphasized. "There aren't any more drug targets. The lists are not complete; they're still imperfect, but this is the beginning of the end of drug-target discovery.
"Within a year or two," Lander went on, "we think these lists will be sufficiently well cleaned up that drug discovery from this time forth will be evaluating a list known to all players to figure out the right ways to tweak them. And that means you could take small-molecule drug candidates, and evaluate them for binding to every protein. The implications," he summed up, "are that you might be able to find targets for drugs of unknown mechanism of action, locate the likely causes of side effects, and engineer around them."
A frequently asked question is what specific human specimen the genome sequencers tapped for their paradigmatic DNA model. "As described in the Nature paper," Lander recounted, "there was unanimous concern that it should be completely anonymous, and therefore steps were taken to advertise. Many volunteers were taken, with fully informed consent. Their donated DNA samples were coded, so there was no way to go back to know who."
Lander added, "About 10 people's DNAs were looked at, and we made a set of libraries, which were distributed to the whole consortium. So while a number of different people were sequenced, the majority contributor from the international human project was an anonymous guy from Buffalo, N.Y."
Venter recalled, "Celera's recruitment of donors of DNA for sequencing was done via self-referral, newspaper ads and outreach activities to ensure ethnic diversity. Five individuals were chosen from our donor pool of 21 people."
To allow for individual genetic variation in the human gene pool, "from eye color and height to aging and cancer," Lander recounted, "our sequence was just shy of 1.5 million SNPs - single nucleotide polymorphisms - reflecting human variation. For that purpose we took 24 people from around the world, mixed their DNAs in equal amounts, did a tremendous amount of sequencing, and lined it up against the public sequence. Lo and behold, it gave rise to 1.42 million variants, nucleotide positions, where we know there is an alternative variation in the population.
"Any two people," he explained, "are 99.9 percent identical at the genetic level. So even though a good proportion of the sequence comes from one guy in Buffalo, if we had used a woman from Papua New Guinea, the difference would be less than one letter in a thousand."
Not By Humankind Alone
Both Nature and Science describe their published genome sequences, covering more than 90 percent of the known genome, as 'initial drafts.' "There are still many holes to fill in, a tremendous amount to do," Lander said. He counted the ways:
"There is the sequencing of other genomes, such as mouse and primate, which will shed tremendous light on the human condition. There is producing cDNAs for genes, and expressing their proteins. Completing the list of human genetic variation. Understanding the expression pattern and cellular localization of every gene. Making a small molecule that binds to every protein.
"Basically," Lander summed up, "the genome project was just a warm-up act for the infrastructure building needed for this century. It told us that by working together in a team, we could lay infrastructure that would be useful for tens of thousands of experiments, and hundreds of thousands of scientists. It's going to be a lot more efficient if we join together in public/private precompetitive consortia, and build infrastructure available to all. I think that companies are not wise to compete on these infrastructure. They should compete on therapies. Where companies will get real value added will be coming up with cures for disease.
"Fifteen years ago," he concluded, "it was just a nutty notion to imagine that we could ever be at this point where we would have the sequence of the human genome." n