BALTIMORE -- The biggest challenge in the Human GenomeProject will soon be managing a flood of information, andgetting the job done will require disseminating and analyzingscience via computer networks instead of by journal.
Also, patent law might have to change to accomodate the newmanagement methods.
These were some of the main messages at a conference herethis week on BioInformatics and the Human Genome Project. Itwas organized by BEST North America, Alex. Brown & Sons Inc.and The Genome Database at Johns Hopkins University.
The pace of sequencing is about to take off. To attain the goalsof the Human Genome Project, "by 1998, we should be able tosequence on the order of 100 megabases per year across theU.S.," said Chris Fields, laboratory director of genomeinformatics at The Institute for Genomic Research inGaithersburg, Md.
But U.S. researchers will be sequencing 18 to 36 times thisvolume in just three to five years, Fields said. And thatassumes only incremental improvements in current technology.
With all that data pouring in, "if we stop only for a day, we'llget a backlog," said Fields. New kinds of tools have to bedeveloped so that analysis can take place day in and day out,conducted "often by people with little training."
This first-cut analysis will identify potential sites of genes,regulatory sequences, protein binding sites and the like. Butthe real goal is to figure out what the sequences mean, and this,too, must be done before reams of raw data clog the memorybanks.
Forget about journals. "The literature is too slow and too big,"said Fields. "There is no way to read it and understandeverything that is going on, even about a small gene family."
"Direct electronic data submission is the only way to keep upwith the data explosion," said Kenneth Fasman, director ofBioInformatics at Johns Hopkins University's Genome Database.
"We need data bases that are flexible and fast and networksthat are accessible and decentralized so that every person whohas new information can get it into the data base in time for itto be useful," said Fields.
Data bases must allow "complicated batch queries, multifieldbrowsing, as in a library, without a real query. ... They have tobe interoperable over the network ... (and) we have to haverobust security, transaction logging and recovery," said Fields.
Researchers also must have easy access to all this information."We have got to be able to deliver the data on your Mac andPC," said Fasman, and software must be designed so thatbiologists can manipulate the data without having to learncomputer codes.
Fields pointed to his company's information-handling system asa model for the community. Programs take data frominstruments, manipulate it and send it into the laboratory'sdata base. That data base tells technicians what to sequenceand scientists what to analyze more deeply.
Until the sequencing pace picks up, there are ways to get themost out of today's primitive sequencing tools. One is to targetthe information-rich regions and leave the less dense stuff forthe tools of tomorrow, as Leroy Hood's group is doing, saidFields.
Changes in the patent laws would ease adoption of the newmethods. "Systems for managing data ought to be protectable,and yet the patent system has trouble dealing with them," saidKate Murashige, a patent lawyer with the Washington, D.C., firmMorrison & Foerster.
"There should be no going back to laboratory notebooks," saidC. Thomas Caskey of the department of human genetics atBaylor College of Medicine. "I would like for the discoveryprocess to be documented by entry into data bases. Early entryallows that information to be used by the general researchcommunity." This standard, he said, might lead to less disputeabout discovery.
-- David C. Holzman Washington Editor
(c) 1997 American Health Consultants. All rights reserved.