By Lisa Seachrist

Washington Editor

BETHESDA, Md. — The Human Genome Project (HGP) faces an ambitious set of goals over the next five years, as the architects of the program intend to complete the sequence of the human genome and set in place the technology that will aid researchers in analyzing the massive amount of resulting information.

Francis Collins, director of the National Human Genome Research Institute, highlighted the five-year plan for the HGP at a meeting of the advisory committee to the director of the National Institutes of Health (NIH).

"This is a bold set of predictions," Collins said, "and it's really going to stretch us. The most bold aspect of the plan is to finish the Human Genome Sequence by the year 2003, two years early."

Collins noted that the project intends to have a series of intermediate goals, rather than spring the sequence on an eager scientific community in five years. The plan calls for the project to finish one-third of the human sequence by 2001.

In attempting to achieve that goal, the HGP will employ a technique used by industry: focusing on the gene-rich regions.

"The genome doesn't have genes uniformly spread through out its length," Collins said. "We intend to develop an international peer review process for prioritizing the regions of special biological interest."

Collins said he expects that regions of two megabases to 10 megabases would qualify, if they could be shown likely to reveal a large number of genes.

The HGP won't lower its standards to speed the completion of the sequences, Collins said, but will attempt to create a "working draft" of 90 percent of the genome by 2001, in anticipation of the final sequence. That sequence will be based on mapped clones of the genome.

"We are hoping to create a high quality database in the end," Collins said. "Therefore, the raw data will have to be of high quality."

Project Aims To Advance Comparative Genomics

The project has some new goals as well, aiming to advance sequencing technology to facilitate comparative genomics by speeding the sequencing of other species, such as rat, zebrafish and perhaps chimpanzee.

In addition, Collins said, cataloging sequence variation, in order to understand polygenic diseases that are inherited in a non-Mendelian way, will become part of the project.

As a result, the project will attempt to develop better technology to identify and score single nucleotide polymorphisms (SNPs) — the single alterations in genes that may make a person more likely to respond to a certain type of drug.

"We intend to create an SNP map and stimulate methods of analyzing those data sets," Collins said. "We also intend to create a public resource of DNA samples that are inclusive of diverse groups."

Collins said the project has collected 450 DNA samples broadly representative of all the human groups in the world based on Americans and their ethnic origins. All ethnic identifiers have been removed from the samples, and they have been anonymized.

Also, the project is attempting to foster the technology needed to examine the function of all these genes under different conditions, as well as making an investment in the development of bioinformatics tools and the training of bioinformatics specialists.

"We are painfully short in people trained in biology and mathematics or engineering," Collins said. "It's been tough going. There is this giant sucking sound from the private bioinformatics industry of the talent and experts we have."

Collins said universities are also to blame for the current shortage of bioinformatics specialists, because the schools have been slow to develop interdisciplinary tracts that would not only train them but offer trained scientists an academic career track.