An essential element of the Human Genome Project, which includesefforts to sequence and map the genomes of other species as well ashuman, is the compilation and comparative analysis of DNAsequence information in publicly accessible data bases.
Academic and medical institutions, private companies and non-profitresearch groups generate DNA sequences worldwide in search of newgenes, their functions and proteins.
But the closest thing to a centralized, public repository for theinformation is a global network of three data bases: the DNA DataBase in Japan (DDBJ), the European Bioinformatics Institute (EBI)in Cambridge, England, and GenBank, of Bethesda, Md.
A second U.S. data base, the Genome Sequence Data Base in SanteFe, N.M., is associated with the other three. It shares information,receives submissions from researchers and gets federal funding, buttechnically is not a member of the international network.
GenBank, the older of the two U.S. data bases, is managed by theNational Center for Biotechnology Information, which operateswithin the National Library of Medicine under the umbrella of theNational Institutes of Health in Bethesda, Md.
The Genome Sequence Data Base (GSDB) is managed by theNational Center for Genome Resources (NCGR) in Sante Fe, N.M., anon-profit corporation that receives funding from the U.S.Department of Energy.
Are They Redundant?
Why two data bases in the U.S.? The community of genomicsresearchers is not a huge group and they are reluctant to criticize, butsome admit GSDB, which staged a grand opening in February, hasyet to define itself.
GenBank and GSDB are related, but in many respects they representthe competition between the NIH and the DOE in the push forgenomics research. Both the NIH and the DOE fund U.S. efforts onthe Human Genome Project.
GenBank, formed in 1982, initially was part of the DOE's LosAlamos National Laboratory, but it was funded mostly by the NIH. In1992, GenBank was moved to NCBI in Bethesda, leaving LosAlamos with data resources, but no data.
Last year, the Los Alamos data base was relocated to Sante Fe with$5 million from the U.S. Small Business Administration and $2million from the DOE.
Chris Fields, NCGR's vice president for scientific affairs, said theDOE recently committed $10 million in funding over the next fiveyears for GSDB.
David Lipman, NCBI director, said he gets $9 million from NIH,more than two-thirds of which supports service-oriented activities,including GenBank. The other third goes for NCBI's research group.
Lipman said the advisory board of the international data basenetwork, which includes representatives from GenBank, DDBJ andEBI, has been reluctant to accept GSDB as a fourth member. But hesaid board members are planning to visit GSDB to discuss thesituation.
"It's difficult enough to get agreement with three partners," Lipmansaid. "To bring in a fourth would be more complicated. And it wouldset a precedent for other countries to say they wanted to be part of thenetwork. A few years ago we had a similar situation with theRussians."
While all researchers have access to the DNA information, theinternational network comprised of GenBank, DDBJ and EBI was setup to receive and catalogue all public DNA sequence datasubmissions.
"It's an awkward situation now," Lipman said, referring to GSDB."And it is an unnecessary duplication of services."
Established To Help Industry
Edward Cantrall, CEO of NCGR and a former executive withAmerican Cyanamid Co. in Wayne, N.J., said his non-profitcorporation was established to facilitate the flow of genomicsinformation from the academic community to the biopharmaceuticalindustry.
Cantrall said NCGR's role is to develop "one-stop shopping" forbiotechnology and pharmaceutical companies who want to translatedata from the Human Genome Project into useable information.
Said Fields, "Our job is to be a bridge between the researchcommunity and anyone else who wants to use the information. Thesequence data are the infrastructure. We view our task as providinglinks among data bases . . . between sequence data bases andmapping data bases."
GenBank has been attacking the data base interconnection challenge.It has integrated data bases with gene sequences, protein sequences,and scientific journal literature. A taxonomy data base that classifiesDNA material also is included. A 3-D protein structure data base andcomplete genome data base will be added this summer.
GenBank's Senior Medical Researcher Mark Boguski said NCBI alsohas organized an industrial advisory group, made up ofrepresentatives from major biopharmaceutical companies, for inputon responding to business interests.
David Galas, who helped establish NCGR when he was with theDOE and who is now president and CEO of a genomics company,said, "The data base resources are increasingly important in theidentification of genes and in determination of the functions of genes.As the data bases get larger and larger, they have become moresophisticated."
Data Bases Are `Complementary'
Galas, head of Bothell, Wash.-based Darwin Molecular Corp., saidthere may be some redundancies in GSDB and GenBank, but heviews the two as complementary. In addition, he said, the totalresources invested in public data bases of a biological nature aresmall.
"Interconnection of data bases, such as mapping data bases, sequencedata bases and enzyme data bases is important," Galas said. "Thatwill be really powerful. A lot of companies don't realize howimportant it will be."
Genomics researchers at Sequana Therapeutics Inc., of La Jolla,Calif., said GenBank is an essential resource in their drug discoveryefforts.
"Any genomics company needs to access the public data bases," saidSequana's president and CEO, Kevin Kinsella. "We generate agigabyte of gene sequence data a day and it has to be compared withwhat's already known," he added. Kinsella also noted that Boguski isa member of Sequana's scientific advisory board.
Francis Collins, director of the National Institutes of Health'sNational Center for Human Genome Research, said that without thepublic data bases the Human Genome Project could not exist.
"They provide the tools to search for similarities and that has resultedin some very sophisticated scientific discoveries," said Collins,whose group oversees U.S. research efforts on the global HumanGenome Project. n
-- Charles Craig
(c) 1997 American Health Consultants. All rights reserved.