By Randall Osborne

Editor

Picture long tables full of people hunched over magazines taken from nearby stacks, poring through the scientific articles with a kind of rabbinical intensity, checking cross-references, taking notes. Taking more notes.

That's what you've got sort of in Proteome Inc., said Sharan Pagano, director of business development and marketing for the firm based in Beverly, Mass., although the practice of the company is hardly as simple as the concept.

While "database company" is a phrase that slides easily off the tongue and is not incorrectly used to describe Proteome's products, Pagano describes them more exactly as "knowledge bases," built purely from literature in peer-reviewed journals.

"We compile it, distill it and transform it," she said, using proprietary algorithms to sort the hard-won published findings that individual researchers at other companies would spend many, many hours collating and then would most likely miss some articles, possibly critical ones.

Proteome, which already had 25 biotech and pharmaceutical customers for what it calls its "knowledge-based products," made headlines last week, when the firm launched "the first human databases coupling protein functional information to the human genome," thus taking what it deems the vital next step in proteomics investigations.

The two new products called the Human Proteome Survey Database (HumanPSD) and the G Protein-Coupled Receptor Database (GPCR-PD) supplemented Proteome's BioKnowledge Library, which is a group of searchable Proteome databases derived from journals and updated weekly.

HumanPSD contains fundamental properties for more than 18,000 human, mouse and rat proteins, including biochemical function, cellular role, role in the organism and cellular location. GPCR-PD consolidates a wide variety of information, too. Like HumanPSD, it's mammalian-focused and incorporates data from human, mouse and rat biology.

Shortly after the launch of the new databases, Celera Genomics became Proteome's customer No. 26, buying the BioKnowledge Library to be coupled with the Celera Discovery System for sale to researchers.

Pagano described the process at Proteome as "curation. We systematically go through the literature, extract information and put it in tables that make sense for protein function. Then, we condense the annotation."

It's like having a fleet of librarians at work, except that they're "Ph.D.-level scientists, who are experts in their areas," Pagano said. "This is brainpower interacting with the literature."

Genomics The First Step, Now On To Proteins

Genomics has heated up proteomics "We've got the genome; now we want to focus on the proteins," Pagano said but Proteome has been operating for five years, signing up a pharmaceutical heavyweight, Pfizer Inc., as its first customer. The field grows hotter as the days pass, and Proteome's platform grows stronger.

The company received its first round of venture financing, $8.1 million, last December. "Before that, we were profitable without any venture capital," Pagano said.

She spoke from The Institute for Genomics Research's 12th annual Genome Sequencing and Analysis Conference in Miami, where Proteome gave a presentation and made public the deal with Celera. At the same conference, Compugen Ltd., of Tel Aviv, Israel, said it was launching Gencarta, providing access to the firm's annotated genome, transcriptome and proteome databases. Pagano said Proteome occupies a special niche, however, in that the firm offers scientists a way to find connections between the many bits of data now available.

And the competition to deliver genomics and proteomics information to researchers continues to heat up. Celera last week launched its Single Nucleotide Polymorphism (SNP) database for the human genome. From the company's sequencing of five ethnically diverse donors, the database contains 2.4 million unique, proprietary SNPs. Added to the 400,000 non-overlapping, unique SNPs screened from the public databases, the number brings the total SNPs in Celera's SNP Reference Database to 2.8 million.

"The genome is linear, in a sense, and the fruits of it we see in companies like Celera and Incyte [Genomics Inc.]," Pagano said. "You can use that information to do functional genomics." But the amount of data is not so huge compared to the "bytes of text in scientific literature," which can help makes sense of it, she added. Ordinary text mining doesn't work because of the way articles are organized and kept.

"The way the literature's been written is interesting, but if one thinks about a computer going in and extracting knowledge, it's almost impossible," Pagano said. "That has to do with how it's been built. It really doesn't have a language."

Proteome creates that language, so the journals can speak directly to researchers.

Proteomics Still A Young Field

Although Pagano doesn't regard Proteome as "being strictly in proteomics," she finds the promise of biomedical research mostly in that realm along with the difficulties.

"There are a lot of problems in proteomics," she said. "Proteins do all the work. They course through the body, up-regulate and down-regulate. They're really the messengers of all the biology, and heart of understanding biological mechanisms. So there's a huge amount of promise in translating the genome into the proteome, and understanding how it's all orchestrated.

"It's a much richer area than understanding the gene sequence."

And yet ...

"The technology just isn't as mature, and the challenges are greater," Pagano said. "It's not as small and fast and cheap. And there's still more sequencing to be done, understanding the variation in human populations."

Should investors put their money on genomics, or on pure proteomics? Or should they stick with good old monoclonal antibodies and companies with late-stage drugs or products already on the market?

Despite her years in biotech, Pagano is reluctant to pose as a financial advisor.

"I don't really have a crystal ball on these [other] companies," she said. "One problem is that the biology hasn't really caught up with the number of molecules people can find."

The investor clamor for drugs, for solid evidence of progress, should be well served by the likes of Proteome, as it is, in a different way, by other database firms, Pagano said.

"Most of them are in the area of technology," she said. "We're in the area of knowledge. There isn't a common way to refer to a gene you might have on a chip. If a proteomics company is trying to understand it at the protein or polypeptide level, the information we have can be used."

That makes Proteome's platform valuable for companies that might, at first glance, seem more like competitors bioinformatics and microarray companies not to mention bench scientists.

"A key point is, this saves scientists a tremendous amount of time by showing functions across species," Pagano said. "That [alone] will get drugs to market more quickly."

The company's simple approach stunned her when she first saw it, Pagano said, noting that it made so much sense that it had been overlooked or had been conceived and given up as too daunting.

"What's the context to form a hypothesis? Usually, you begin with the literature," she said. "You always need an anchor. We've done it for all organisms, but there's been no systematic, people-focused attempt to describe what a gene does, where it does it, and is there a drug associated with it?"

That, she said, is efficiently made only through a study of scientific journals.

"We keep the literature updated," Pagano said. "[The knowledge bases] never end because the literature never ends. And if new data are published to suggest there's something that has to be changed, it's changed throughout. Some people say this is not possible, but it is, and we're doing it." *