In the Jan. 18, 2007, issue of the New England Journal of Medicine, researchers from the University of Michigan, Stanford University, and Mountain View, Calif.-based biotechnology company, OncoMed Pharmaceuticals, report on an invasiveness gene expression signature in cancer cells that is characteristic of cancer stem cells, and its utility as a prognostic tool.

The scientists developed a gene expression signature from cancer stem cells. Such cancer stem cells can be identified by the expression patterns of several surface proteins, but the gene expression signature gives "a much broader assessment" of those cells, OncoMed CEO Paul Hastings told BioWorld Today.

The researchers compared the gene expression profile of such cancer stem cells to normal breast epithelial tissue, and found a set of 186 genes that were expressed differentially in the two tissues. They then used breast cancer patient information from two large gene expression and clinical databases to determine the relationship between the gene expression signature and patient outcomes.

Patients whose tumor gene expression profiles were similar to the invasiveness gene signature had increased risks of recurrence, metastasis and death. The researchers tested smaller groups of patients with several other types of cancers as well, and found a similar correlation between invasiveness genes and overall prognosis.

OncoMed, which was founded in 2004 and is in preclinical development with antibodies and small molecules to target cancer stem cells, hopes the signature can be used to stratify patients both for clinical trials and eventually for therapeutic regimens. "It's a step forward in personalized medicine," Hastings said.

Using separate datasets to develop and validate the gene signature might seem like an obvious approach, but it puts the authors ahead of a number of their peers already. The NEJM paper comes on the heels of a review published in the Jan. 17, 2007, Journal of the National Cancer Institute by scientists from the National Institutes of Health showing that half of all papers using microarrays contained "one of three major flaws."

Senior author Richard Simon, chief of the biometric research branch of the National Cancer Institute, enumerated those major flaws as the overuse of cluster analysis, the failure to separate datasets used to generate and to test predictions, and the failure to use statistical criteria that avoid false positives. The authors of the JNCI paper, who are with the National Cancer Institute and the Universite Paris VII Denis Diderot, reviewed 90 cancer microarray studies, and reviewed cancer research papers published through 2004 that contained both microarray and clinical outcome data.

Simon, who said that microarray studies have been met with both "unwarranted hype and excessive skepticism," noted that he does not want to give the impression that microarray studies are an exercise in futility. Because of redundancies in data analysis, not every paper that contained what the authors called a major flaw had invalid conclusions, he added.

Simon told BioWorld Today that the technology itself is extremely powerful, and that microarray studies "have gotten better, and many of them are very good." But he also said that the power of the newly available technologies has at times outstripped the ability of researchers to use them to their best advantage.

One of the specific problems, Simon explained, is that researchers "have not been very careful about separating model building from model testing." In an analysis where there are few variables and many patients, he said, that failure is "not such a big deal." And those used to be typical analyses.

But with genomic data, where many more variables are collected than there are patients in the sample, "then it's a new ball game."

Simon said that some of his colleagues have yet to even recognize the nature of the problem. "It's viewed as a data management problem," he said. But "the crux of it is: How do we design studies properly, and how do we analyze them properly?"

Asked about the statistical analysis in the New England Journal of Medicine paper, Simon said that the paper appears to "have not violated the cardinal principle of separating the data used for development of the classifier from the data used for evaluating the classifier."

He did note, however, that "the study used very heterogeneous sets of cases. Consequently, it's difficult to discern whether there are therapeutic implications of the result."

Simon also noted that the heterogeneity of the data may have led to one unexpected result in the NEJM data. "One item that surprised me was the indication . . . that lymph node status (positive versus negative) was not prognostic. That is strange and may reflect the heterogeneity of the patients with regard to treatment."

John Lewicki, OncoMed senior vice president of research and development, agreed that lymph node status is "clearly a prognostic factor in breast cancer," and the fact that it is not a prognostic factor in the data used to validate the gene signature was unexpected. But he also noted that in their work, the OncoMed scientists and their colleagues were not focusing on whether the predictive value of their gene signature correlated with other known prognostic factors.

The bottom line, to Lewicki, is that the invasiveness gene signature described in the NEJM paper "is predictive of outcomes independently of other variables - whether it syncs up with lymph node status or not."