Ask your friendly neighborhood geneticist how many genes there arein the human genome, and the answer will be wide open at both ends:"Anywhere from 30,000 to 120,000."But molecular biologist Chris Fields, who directs the genomeinformatics department at The Institute for Genomic Research (TIGR)in Gaithersburg, Md., has a more precise and reasoned estimate: "Wepredict a number of human genes between 60,000 and 70,000," hewrote in the July issue of Nature Genetics, out last Friday.Fields is the first author of the editorial, titled "How many genes in thehuman genome?" Its principal author is J. Craig Venter, who foundedTIGR as a very large-scale gene sequencing facility.Before attempting to answer his self-imposed question, Fields toldBioWorld Today, "Counting genes requires being clear about whatcounts as a gene." He added, "this is a crucial issue here, because`gene' can mean virtually anything."He explained that "the categories of genes, structural, regulatory and soon, are not even well defined. It's extraordinarily difficult workingfrom the names that have been given to genes, to understand what theydo." As an example, he cited the protein encoded by the recentlydiscovered gene for Huntington's chorea. "It's got a name associatedwith the disease, but no one knows what the protein does."Moreover, Fields noted, genes come in a vast range of sizes, fromDuchenne's muscular dystrophy, which spans 2 to 3 million bases,down to a myriad mini-genes at 1,000 or 2,000 bases each.At TIGR, Fields and his staff "have been pulling out of the Genomedata base at Johns Hopkins University in Baltimore non-redundantprotein-coding sequences, which by latest published count early thisyear totaled 3,483 unique known coding regions." This numberexcludes all immunology-associated genes, and certain othercategories, Fields said.It is now at 4,500, the data base founder, medical geneticist PeterPearson, told BioWorld Today, adding that this represents "less than 10percent of all those in the human genome."TIGR's co-authors observed that "the number of ways people are goingabout approaching the number-of-genes question [which defines thebasic task of the Human Genome Project] is increasing," so we decidedto see "if their answers were plausibly consistent."Here is a sampling of their survey:y 50,000 to 100,000 genes "has the appearance of a rough guess;"y A total human genome of 3 billion bases could carry 300,000 10-kilobase (kb) genes laid end to end;y 100,000 genes, averaging 30 kb would be more reasonable, saidNobelist Walter Gilbert, co-founder of gene sequencing;y Recent data suggest that only 12 percent of the human genome, 360megabases, is transcribed. An average gene size of 18 kb predicts atotal of 20,000;y Some 10,000 distinct genes expressed in a typical mammalian cell,prompt estimates of 20,000 to 40,000 genes in the genomey Noting that the genome contains 45,000 CpG islands (gene-enhancingregions rich in cytosine and guanine), and that 56 percent of sequencedgenes carry these islands, a group at the University of Edinborough,Scotland, estimates the genomic total at 80,858, which TIGR _ by thesame CpG rationale _ scales down to 67,000.Procedure Compares cDNA, ESTsFields' own procedure involves statistical comparisons of cDNA andexpressed sequence tags (ESTs), which are random, partial genefragments."If the set of complete cDNA sequences we extracted from the database is representative of human genes in general," he reasoned," then"the fraction of known cDNAs matched by randomly selected ESTsshould equal the fraction of novel sequences matched by randomlyselected ESTs."TIGR's human EST sequencing project has so far identified ESTsmatching 1,877 _ 54 percent _ of the 3,483 unique coding regions itfound. Thus, by statistical extrapolation, Fields and his team arrived atan over-all figure of 77,700 genes. Applying alternative estimates ofaverage redundancy, they obtained a more refined prediction of 52,000to 64,000. genes. Then, correcting for built-in bias as to gene-rich andgene-poor genomic regions, they reached a final guestimate: "between60,000 and 70,000 human genes."
-- David N. Leff Science Editor
(c) 1997 American Health Consultants. All rights reserved.