What do mice and humans have in common?

Genes, of course. And now a team of researchers based in Russia andthe U.S. has devised a way to identify human genes by usinganalogous genes from mice _ or any other life form, for that matter_ as a handy template.

The method relies on a complex algorithm built into newly developedsoftware, which analyzes previously sequenced genes from anyorganism other than human and uses them as models for assemblingthe sequences of analogous human genes.

Mikhail Gelfand, of the Institute of Protein Research at the RussianAcademy of Sciences, in Moscow, and his colleagues report that theirwork represents a step toward solving "one of the most importantproblems in computational molecular biology." The report appears inthe Aug. 20, 1996, Proceedings of the National Academy of Sciences(PNAS).

Pavel Pevzner, a mathematician from the University of SouthernCalifornia, in Los Angeles, and a member of Gelfand's team, said,"hunting for human genes is a massive, painstaking undertaking thattypically takes years and costs tens or even hundreds of millions ofdollars.

"With this method, we can find a human gene if an analogous genefrom another species has been identified. The species doesn'tmatter," Pevzner said. "Anything alive can be used as a template tofind human genes."

The method should prove particularly useful for researchers who aretrying to pinpoint disease-causing genes in humans. For instance,many cancer-causing genes already have been sequenced in mice andother laboratory animals, and these are thought to have analogues thatcause cancer in man. But the effort to find those analogues in humanshas proven to be maddeningly slow.

To understand how the method works, Pevzner said, it's important torecognize an important difference between genes that are found insimple life forms and those found in man. In very simple life forms,the four-base pair alphabet of DNA _ A,C,G,T _ is organized intocontinuous strings of information. In genes from more complexorganisms including man, genes that consist of as many as 2,000 basepairs are broken up into submessages called exons.

These exons are shuffled, seemingly at random, into sections ofchromosomal DNA that consist of as many as a million letters of theDNA alphabet. Since each gene can be made up of 10 exons or more,the microbiologist must identify each exon, sequence it, and place itin its proper order to decipher its complete message.

It's Like Following An Article That Jumps

Pevzner likens this situation "to a magazine article that begins onpage one, continues on page 13, and then takes up again on pages 43,51, 53, 59, 70, 74, 80 and 91, with pages of advertising and otherarticles appearing in between.

"We don't know why these jumps occur or what purpose they serve,"he says. "Thankfully, the exons stay in order. They don't jumpbackward. You always read in the same direction."

To further complicate matters, however, the jumps are inconsistentfrom species to species, he said. The placement of exons in an insectgene will be different than the placement of the same exons in aworm gene.

Using the magazine analogy, Pevzner explained: "The informationthat appears on a single page in the human edition may be broken upinto two in the wheat version, or vice versa."

There is yet another obstacle complicating such species-to-speciescomparisons. Each gene is spelled out in a manner peculiar to theorganism in which the gene resides. A mouse gene will be written ina mouse language; a human gene will be written in human language.

Pevzner likened this distinction to two different but relatedlanguages, German and English. In these and other romancelanguages, he said, many of the words "are identical or similar, butmany others are not."

When comparing genes from dissimilar species, he said, "we must beable to recognize differently spelled words, written on differentpages, as part of the same message."

If this isn't complex enough, the researchers had to contend withanother problem: the fact that the genetic material separating exons _known as "junk" DNA _ can mimic exons. Long sequences of thismystery DNA may, indeed, be identical to exon segments withoutbeing part of the gene itself.

These sequences are skipped when the message is transcribed intoproteins that can be assembled.

Thus far, efforts to resolve these complexities have relied onstatistical associations. Using the magazine analogy, Gelfandexplained, "it is something like going through back issues and findingthat human gene stories are less likely to contain phrases like "for-sale," telephone numbers, and the dollar sign."

These statistical methods, while useful, are inaccurate at best,Gelfand and Pevzner said.

Their new method, developed with Andrey Mironov, of theLaboratory of Mathematical Methods of the National Center forBiotechnology, in Moscow, makes a list of all of the "pages" thatpotentially are part of the "story" _ all DNA segments that havesequences that seem to be part of the genetic message.

Their software automatically combines and recombines these sectionsinto a set that seems to fit neatly together. The method is mosteffective when the researchers have a target protein to guide thesearch.

This protein, a sequence of amino acids assembled in accordancewith the gene's instructions, can be used to decipher the order ofexons in the gene itself.

When such proteins are available, the researchers said, the newmethod is remarkably accurate.

The researchers tested the method on nearly 100 different genes, 47of them from mammals _ mainly mice _ and 45 from otherorganisms, including bacteria.

They found that 40 of 47 mammalian reconstructions were perfect. Insix other instances, the accuracy ranged from 94 to 97 percent,according to the PNAS report.

In the one instance where the prediction was only 75 percent accurate_ a reconstruction of a human gene using an analogous mouse gene_ the researchers decided to repeat the process using a correspondingchicken gene. This time they obtained a perfect match.

"This is surprising, given that we think of humans as more closelyrelated to mice than to chickens," Pevzner observed.

The method also proved to be surprisingly accurate using genes fromorganisms vastly different from humans, including bacteria and yeast,the article says. In 25 such cases, the method proved to be 100percent accurate.

"Our method will prove extremely useful to researchers, not just inbiotechnology, but also in evolutionary biology," Pevzner said. "Itwill enable biologists to trace, with exceptional precision, exactdegrees of difference between genes from different species. And itwill help to establish evolutionary relationships between species." n

-- Steve Sternberg Special To BioWorld Today

(c) 1997 American Health Consultants. All rights reserved.