If the Olympic Games included an event for gene sequencing, abi-national consortium of 55 cloners would have won the goldmedal for the longest unbroken stretch of DNA on record. "2.2Mb of contiguous nucleotide sequence from chromosome III ofC. elegans," reads the title of their paper in today's issue ofNature.
The team members, all listed under the title, included 33researchers from the British Medical Research Council'smolecular biology lab at Cambridge University (plus onevisiting scientist from France) and 21 from WashingtonUniversity's genome sequencing center in St. Louis.
Working separately but in close consultation for three years,the British and American groups spliced their respectivesequences by computer late last year and counted 2,181,032base pairs in the central gene cluster of the nematode's six-chromosome genome.
The silver medal would have gone to the 315,316 base pairs ofSaccharomyces cerevisiae's chromosome III, sequenced in1992, and the bronze to the Vaccinia virus' complete genome of191, 737, in 1990.
Meanwhile, work on sequencing the human genome hasreached 73 kilobases of the beta-globin region, noted StephenOliver of the Manchester, England, Biotechnology Center, whowrote a commentary in Nature on the record-breaking C.elegans sequence. It uses complementary rather than genomicDNA, in view of the human genome's estimated 3 billion basepairs.
A single base measures 3.4 Angstrom units, so if the 2.2megabases were uncoiled and stretched out, they would extendjust over 7.4 millimeters PP longer than seven Caenorhabditiselegans roundworms laid end to end. Once that nematode'sentire 100-Mb genome were sequenced, it would reach some3.4 centimeters.
Washington University geneticist Richard Wilson, first author ofthe Nature paper, predicted that the Anglo-Americancollaborative project will complete sequencing of the remaining97 percent of the nematode's genome before the end of 1998.Wilson, who is co-principal investigator of the St. Louis arm ofthe project, told BioWorld that since doing the first 2.2 Mb, hisgroup has gone on to come "pretty close to four contiguousmegabases." Cambridge, he surmised, has probably extendedthat another three.
Wilson emphasized that the record sequencing feat is not just atour de force, but has considerable utility; he enumerated two"take-home" points. "First, it demonstrates that it is nowfeasible to do large-scale genome sequencing on the order ofsomewhere between 1 million and 10 million base pairs a year.Second, you can uncover a tremendous number of previouslyundiscovered genes by genomic sequencing," he said. "A lot ofthose genes should be very similar to genes in humans, so thatmay provide a new way to find previously undiscoveredgenes," he added.
Bearing out this point, the Nature paper lists 198 of the 483genes predicted in the 2Mb C. elegans contig, and notes theirsequence homology with other genomes, from yeast and fruit-fly to rodent, bovine and human. The nematode, it notes, has"smaller and fewer introns, thus simplifying the identificationof previously undiscovered genes."
"One of the things you have to consider," Wilson said, "is thatyou need model organisms like C. elegans to go about analyzinghuman sequences, finding where the genes are. Thesignificance of this paper is that you probably can take big,interesting regions of the human genome, for instance, parts ofthe X chromosome, and sequence those entirely."
The nematode's total genome is available in a set of cosmidclones. (A cosmid, Wilson explained, is essentially the unit forlarge-scale sequencing these days.) They are bacterial cloningvectors that allow for cloning foreign DNA of about 40,000bases each.
The project sequencers took these cosmids and chopped theminto random fragments that they then subcloned in abacteriophage vector, serving as the sequencing substrate. "Foreach of these cosmid clones," Wilson continued, "we sequencedsix, seven or eight hundred random 'phage' clones. Then thecomputer reassembled these back into the original cosmid."
The work was funded jointly by the National Institutes ofHealth National Center for Human Genome Research andBritain's Medical Research Council. Wilson estimated the three-year NIH grant at some $6 million. At present, the cost per baseis about a dollar, he said, "and we can already see it goingbelow that."
Commenting on the super-contig, molecular biologist andbiochemist Ellson Chen, director of the Advanced Center forGenetic Technology at Applied Biosystems/Perkin-Elmer Corp.,told BioWorld: "On the one hand, it's encouraging to do a coupleof million bases, so we might be able to see things that we wereunable to see before. You can now achieve a 4-Mb level. That'sthe positive side."
Turning to the "slightly negative" other hand, he added, "It tookthe group some two years to get this far. We must improve ourtechnology still further if we want to even talk aboutsequencing the entire human genome, with its 3 billion bases."
But geneticist Eric Lander, director of the Whitehead/MITCenter for Genome Research in Cambridge, Mass., told BioWorld,"It indicates that total genome sequencing is becoming a veryimportant reality, which I think is a surprise to many people."
The 2.2 Mb C. elegans sequence reported in Nature, he said, "isreally 3 percent of a major metazoan organism. This is on theorder of one-tenth of a percent of the human genome. That'salready an order of magnitude over last year. If we can keepthis up for the next two years, that's 1 percent of the humangenome in a year. It begins to build up."
Lander acclaimed the record sequence as "becoming the majorresearch tool for this organism. Those working in these C.elegans regions have tremendous gold mines of informationbeing handed to them," he said.
-- David N. Leff Science Editor
(c) 1997 American Health Consultants. All rights reserved.