LONDON - More than 200 previously unknown proteins have been identified that are encoded on one of the chromosomes of Plasmodium falciparum, the parasite that causes most deaths from malaria, creating a wealth of potential targets for anti-malarial drugs and vaccines.

A consortium of researchers sequencing the genome of P. falciparum reported their latest results in the Aug. 5, 1999, issue of Nature in a paper titled, "The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum."

Sharen Bowman, project manager at the Sanger Centre at the Wellcome Trust Genome Campus in Hinxton, UK, and first author of the paper, told BioWorld International, "When we produced the sequence of chromosome 3, we found that only six out of the 215 proteins that we think are encoded on that chromosome had been identified before. People can now take the sequence of those genes and start thinking about ways to design new drugs or look for vaccine candidates in the proteins we have predicted to be encoded on this chromosome."

Commenting on the paper, Mats Wahlgren and Maria Teresa Bejarano, of the Microbiology and Tumour Biology Centre at the Karolinska Institute and the Swedish Institute for Infectious Disease Control in Stockholm, Sweden, wrote in a News and Views article in the same issue of Nature that it is "imperative" to use the new sequence to identify new drug targets, to understand the pathogenesis of malaria and to help in constructing vaccines.

In their article, titled "A blueprint of 'bad air,'" they warn the task will not be easy. They wrote, "The P. falciparum genome is more complicated than 14 chromosomes - it is plastic, recombinations occur, and it is modified at fertilization in the mosquito's gut. So although the sequences of the chromosomes are a great step forward in understanding the parasite and the disease, there is still much to be done."

More than 1 million African children under the age of 5 die from malaria each year, with almost all deaths due to P. falciparum. The parasite has been fiendishly difficult to study, mainly because of its complicated life cycle. Transmitted into the bloodstream of its human host by the bite of the Anopheles mosquito, it quickly goes on to infect liver cells, where it grows and divides. After bursting out of these cells, it infects erythrocytes, where it again grows and divides before bursting out again into the blood stream.

Several more similar cycles in erythrocytes can follow, before the parasite differentiates into gametocytes. These are taken in when a mosquito bites the infected human, and the parasite's life cycle continues in its insect host.

P. falciparum expresses a different set of proteins at each stage of its life cycle, but because only the blood stages can be cultured, little is known about the proteins important to the parasite during its other life cycle stages.

For this reason, Bowman said, sequencing P. falciparum's genome was the obvious way to find out about these molecules.

Out of the parasite's 14 chromosomes, chromosomes 2 and 3 have been completely sequenced. Nine are being done at the Pathogen Sequencing Unit at the Sanger Centre, four at The Institute for Genomic Research in Rockville, Md., and one at Stanford University in Stanford, Calif. Bowman predicted it will take another couple of years to complete the project.

True to form, sequencing the genome of P. falciparum has not been easy. More than 80 percent of its DNA comprises only the bases adenine and thymine, with few cytosines or guanines. For that reason, its DNA is not well tolerated by Escherichia coli, the bacterium researchers normally use when sequencing genetic material. Instead, the project has had to rely on sequencing chunks of P. falciparum's DNA in yeast artificial chromosomes, coupled with a technique called a "whole chromosome shotgun."

Researchers, once they had the sequence, noticed the high number of genes encoding previously unknown proteins. Bowman said, "We don't know for sure that these are protein-coding genes, but from our computer analysis it looks like they are. These genes are more complicated than we expected them to be. Most of the genes previously identified in P. falciparum have not been highly spliced - generally they have been shown to have one or two exons [protein-coding regions]. But some of these new genes have a multiple exon structure more typical of higher eukaryotes."

The sequence also provides information about the structure of chromosome 3. The ends of chromosome 3 - the telomeres - feature a series of repetitive sequences, similar to those of chromosome 2. These sequences, and their order, have been conserved, Bowman said, in each of the four telomeric regions that have been sequenced to date.

The telomeres of chromosome 3 also contain several "multigene families." Some of these already had been identified; they are involved in bringing about some of the antigenic variation that allows P. falciparum to evade its human host's immune system so successfully. The proteins encoded by these genes are transported to the surface of the infected erythrocytes, and different members of these gene families can be transcribed in order to stay one step ahead of the immune system.

Bowman said, "As well as the protein families that were already known, we have also identified what might be four new multigene families, which are encoded in the telomere regions. We think it will be interesting for researchers to examine these to find out if they are actually transported to the surface of the red blood cells and involved in antigenic variation."

A candidate for the centromere of chromosome 3 also has been found. The Sanger team has no experimental evidence to support this hypothesis. "But, from the computer analysis, and from comparison with the centromere region of other organisms, such as various yeasts, this region has features that would fit with it being the centromere," Bowman said.

"These [features] are that it is almost totally comprised of adenines and thymines, which is very unusual, it contains arrays of tandem repeats, and it is very gene-poor - the nearest protein coding sequences we can identify are 12 kilobases apart," she said.

The next chromosome to be fully sequenced will probably be chromosome 4, Bowman predicted. "Here at the Sanger Centre, we are committed to finishing about half of the genome of P. falciparum. That's about 14 megabases, of which we have completed one megabase so far - so that will keep up busy for some time."