It took nearly 1,000 geneticists and technicians from half a dozen countries to report in Nature dated Thursday, Feb. 15, 2001, an article titled "Initial sequencing and analysis of the human genome." This horde of co-authors made up the public International Sequencing Consortium that mapped 2.69 million DNA base pairs of Homo sapiens.

Then Science for Friday, Feb. 13, 2001, with 262 co-authors, reported the same seminal project fielded by the consortium's privately held rival, Celera Genomics, accounting for a near-identical 2.65 billion bases. The total number of human genes ranged from 26,383 to 39,114.

BioWorld Today covered these long-awaited feats in a news story Feb. 13, 2001, bannered "Genome Publications Reveal Surprises."

Fast forward to Jan. 1, 2003, date of Nature s 99-co-author news item titled "The DNA sequence and analysis of human chromosome 14," as reported by the same International Consortium. Of that 99-strong team, 80 percent or so consisted of scientists from Genoscope, France's National Sequencing Center. The remainder are from the University of Washington, Seattle, and Washington University, St. Louis. The paper's senior author is molecular biologist and geneticist Jean Weissenbach, director of Genoscope.

"There are no big surprises when we sequence a chromosome," he told BioWorld Today. "Number 14 is an average chromosome in terms of gene content, repeats, and other features. There is nothing really special about it. It's one of the 23 chromosome pairs in the entire human karyotype, all of which have yet to be sequenced. There is no specific reason for sequencing 14, but there is no reason for missing it. A number of chromosomes will be completed within this year, 2003, and others will follow.

"Two years ago," Weissenbach recalled, "the public consortium published a draft sequence of the human genome, which contained more than 200,000 gaps. Here in number 14 we have a piece of 87,410,661 base pairs without any gaps. Being a partner of the draft sequence, this was just a first step towards a complete sequence. That was already useful because people could use the data for specific purposes, such as trying to find genes in given areas. But to be completely useful they need to have a sequence that is complete, where there are no gaps - because you never know what you may meet when you have gaps."

Filling Sequence Gaps Points To Pathologies

Weissenbach made the point that "The draft sequences of the human genome have provided an unprecedented wealth of information and have facilitated the identification of genes involved in human diseases. The International Consortium that read the DNA letters of chromosome 14 identified 1,050 genes and gene fragments, plus 393 pseudogenes. On the basis of comparisons with other vertebrate genomes, the researchers estimate they have documented more than 96 percent of the chromosome's genes.

"These, " Weissenbach continued, "contain two loci crucial to the immune system. One is alpha/delta TCR, a complex of genes that encode a receptor expressed on T lymphocytes - white blood cells [WBCs]. Those receptors catch antigens. And once a receptor has caught its antigen, it interacts with other WBCs to trigger a synthesis of antibodies. So that's the first step in selecting the molecule against which an antibody will be synthesized by an organism.

"The second critical immune-system locus," he went on, "is another complex of genes, which encode the immunoglobulin heavy-chain. It is located at the other end of human chromosome 14.

"Some of the 30 or so genes in 14 that, when mutated, are responsible for causing diseases have not yet been identified," he noted. "Definitely, their sequences will greatly facilitate identifying those pathologies. Some of them identified earlier are genes involved in spastic paraplegia, Niemann-Pick disease and early onset Alzheimer's. A severe form of Usher syndrome, which causes both blindness and deafness, is a very invalidating disorder. One gene on chromosome 14 is responsible for Usher syndrome. People know the region where it's located, but they haven't discovered it yet.

"When they find it," Weissenbach observed, "the possible therapy will depend on the gene. No one can tell. But the first medical step one can take is always diagnostic. With all those genes responsible for genetic diseases, the first application is diagnosis. After that, once we have the gene, we will try to find exactly what it is doing, because one cannot just guess from the sequence what is the function of the gene. This usually takes a while, and then we can try to imagine new ways to therapy, to treatment - which is a completely different story. But as long as the information is lacking, we can't do anything.

"When we try to find the genes on the sequence, there are several ways to go about it," Weissenbach explained. "None of those is absolute. Whatever way we use, there is always a problem. One of those is to compare sequences from different species. For instance, comparisons between the human sequence and the mouse sequence reveal some regions that are conserved and others that are not. And among the conserved regions there is a very good chance of finding gene fragments of interest. So this is why we use mouse, Mus musculus. It's to try to find gene-coding segments in the human genome."

A Compact Puffer Fish Shall Lead Them

"A quite different organism, a puffer fish (Tetraodon) is doing the same type of job," Weissenbach recounted. "We call it exofish,' which stands for exon-finding by using sequence homology.' So, in both animals if we have sequences that are very similar - homologous - there is a high chance of those sequences being practically identical between human and fish. The only sequences we detect between puffer fish and man are coding sequences. They are highly conserved during evolution. So its genome is very compact - about one-eighth the size of the human genome. So with a much smaller piece of sequence you have the same information. This is why those puffer fishes are so interesting.

"We are also participating in some other sequencing projects in connection with international consortia. One is an international rice genome initiative. We are also sequencing a number of smaller genomes - bacteria and paramecia," he concluded," - on which we have started recently."