By David N. Leff

On Wednesday at 2:00 p.m. EST, the National Science Foundation and Nature magazine hosted a press conference in Washington to celebrate the genome-decoding of a weed.

Simultaneous media events took place in London, Brussels and Tokyo to announce that Arabidopsis thaliana - a small cress of the mustard family - had become the first member of the plant kingdom to join the roster of eukaryotes that have had their genes completely described.

Nature, dated Dec. 14, 2000, reported the multinational feat in a 20-page cover story titled: "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana." Three supplemental articles plus a "News & Views" commentary rounded out its coverage of the plant's DNA blueprint. The lead paper listed 148 co-authors, from seven countries of Europe and North America, who had labored for over five years under the aegis of an international consortium, The Arabidopsis Genome Initiative (AGI). Sequencing of the plant's chromosomes 2 and 4 was reported in 1999, with chromosomes 1, 3 and 5 now announced.

"The Arabidopsis genome is entirely in the public domain," AGI adviser Daphne Preuss told the news media, "so the research results just announced are immediately available to scientists around the world."

American gardeners and botanists know Arabidopsis as the "mouse-ear cress" - of no crop or ornamental value. But to plant biologists, A. thaliana is a matchless laboratory model, on a par with animal-model fruit flies (Drosophila melanogaster), nematode worms (Caenorhabditis elegans), mice (Mus musculus), rats (Rattus rattus), African clawed frogs (Xenopus laevis), zebra fish (Brachydanio rerio) and baby chicks (Gallus gallus). As a Nature summary noted, "Arabidopsis joins yeast, the nematode worm, the fruit fly, over 30 bacteria - and (almost) a person - in the genome hall of fame." Subsequent improvements in sequencing technology mean that this is the most accurate eukaryotic genome sequenced so far.

Dissecting A Weed's Genetic Endowment

The sequenced regions of Arabidopsis' five chromosomes, the Nature paper reported, total 115,409,949 base pairs. That genome contains 25,498 genes - the largest gene set published to date - which encode unique proteins from slightly under 15,000 of them. This count is similar to the predicted functional diversity of Drosophila and C. elegans, with 13,601 and 19,099 genes, respectively.

"The implications of these discoveries," the paper commented, "are not only relevant for plant biologists, but will also affect agricultural science, evolutionary biology, bioinformatics, combinatorial chemistry, functional and comparative genomics and molecular medicine." On this score, it reported, "The Arabidopsis genes include homologues of many DNA repair genes that are defective in different human diseases, for example, hereditary breast cancer, nonpolyposis colon cancer, xeroderma pigmentosum, immunodeficiency, hereditary deafness, cystic fibrosis and Cockayne's syndrome."

"Many of the molecular and cellular processes involved," the press conference heard, "are common to all higher organisms, and some of them are easier to study in Arabidopsis than in human or animal models."

Nature concluded, "The 20th century began with the rediscovery of Mendel's rules of inheritance in pea [plants], and it ends with the elucidation of the complete genetic complement of a model flowering plant, Arabidopsis."

What makes the mouse-ear cress, known in Europe as "thale cress," such an experimental ideal is that it's cheap, conveniently small - it towers only a few inches tall - easy to grow, breeds large numbers of offspring, goes through very short life cycles, and has a relatively small nuclear genome. It's a model for over 250,000 other plant species.

Its rapid reproduction time enabled scientists to implant a specific Arabidopsis gene into the shoots of poplars, and shortened that tree's flowering time from six years to six months. From A. thaliana's genes, researchers have already learned over the years how to protect wheat from disease, ripen tomatoes and double the yield of canola - rapeseed oil.

"The ancestral lineages of Arabidopsis and the Brassica (cabbage and mustard) genera diverged 12.2 million to 19.2 million years ago," the main Nature paper pointed out, adding, "Brassica genes show a high level of nucleotide conservation with their Arabidopsis orthologues, typically more than 85 percent in coding regions. The Arabidopsis and tomato lineages diverged roughly 150 million years ago."

"With the genome in hand," observed corn geneticist Virginia Walbot of Stanford University, "the next challenge will be to unravel experimentally the roles of individual Arabidopsis proteins." She is the author of the "News & Views" article, titled "A green chapter in the book of life," accompanying Nature's quartet of research articles.

At the outset of the genome project, she recalled, one argument for choosing Arabidopsis to sequence was that it had only one copy of each gene, with minimal repetitive DNA. The big surprise turned out to be that at least 70 percent of its 26,000-gene genome is duplicated, leaving fewer than 15,000 functional entities. "Despite the apparent duplication," Walbot pointed out, "geneticists have identified thousands of mutations (in maize, tomato, wheat) that produce visible defects in the plant, but occur in only one of the duplicated genes, so many duplicated genes in these species have unique roles."

Bottomless Treasure Chest Of Metabolites

She made the added point that "most of the Arabidopsis genes have counterparts in the Drosophila and C. elegans animal species, indicating the common ancestry of plants and animals. Flowering plants," Walbot continued, "are also estimated to synthesize at least 100,000 secondary compounds - those that are not found in animals, and are not essential to the life of the plant. Diversification of chemistry, readily measured by our taste buds, supplies dyes, flavors, fragrances and most of the therapeutic drugs that we use. The repertoire of Arabidopsis genes contains the information to synthesize the archetypal precursors of secondary products."

Noting the U.S. National Science Foundation's $150-million investment thus far in the Plant Genome Awards Program, and other large prospective funding programs in Europe and Asia, she concluded that such financing "will speed our understanding of this highly successful branch of the plant kingdom on which we are all so heavily dependent."

Editor's note: For further information about the AGI, visit on the web.