Researchers at Duke University have developed a set of methods to separate out microbial contamination from microbiome species that were part of tumors and used those methods to gain new insights into tumor microbiomes.

They reported their work in the Dec. 30, 2020, online issue of Cell Host & Microbe.

Human organs have distinct microbiomes, and so, it seems, do tumors. Some individual bacteria appear to play a role in specific tumor types. Fusobacterium nucleatum, for example, is enriched in colorectal tumors and its abundance is linked to tumor features including stage and drug sensitivity.

At the population level, too, tumor microbiomes differ from those of healthy tissue. The tumor-associated microbiome of long-term pancreatic cancer survivors is more heterogenous than that of short-term survivors, and long-term and short-term survivors have distinct tumor-associated microbial signatures.

While the mechanistic link between the tumor-associated microbiome and survival is not clear, one possibility is that microbes that are more immunogenic can help stimulate an antitumor immune response to the tumor they are located in.

Another possibility is that metabolites produced by specific bacteria are useful to the tumor cells.

Earlier this year, researchers at the Weizmann Institute of Science sequenced the microbiome of roughly 1000 individual tumors of seven different cancer types and compared them to roughly 500 samples from normal tissues. They showed that different cancer types and subtypes were associated with distinct microbiomes, and that those bacteria were mostly intracellular, and present in both tumor cells and macrophages.

The team also demonstrated associations of specific microbiomes with both smoking status and response to immunotherapy.

The findings, in the words of an accompanying editorial, raised "multiple important questions for future study."

But such future study has been easier said than done.

The Cancer Genome Atlas (TCGA) includes data on tumor-associates microbes. But in contrast to samples to study the gut or skin microbiome, tumor biopsies are difficult to procure. And within those biopsies, tumor microbiome denizens are still present in relatively minute quantities.

That low abundance makes such samples vulnerable to contamination. While stool and skin samples contain far more bona fide microbial DNA than contaminant DNA, tumor microbiome samples can contain as much contaminant as microbial DNA.

In the work now published in Cell Host & Microbe, the Duke team took advantage of the redundancy built into TCGA samples. For inclusion in TCGA, "matched tissue and blood samples from various cancer types are processed and sequenced in parallel using various sequencing platforms at designated centers," they explained in their paper.

That parallel processing enabled the team to separate out contaminants, which had unique signatures in each sequencing facility and were present in both blood and tissue, from the DNA that was truly from microbes residing in the tumor tissue.

The work included methods to analyze ambiguous cases. Escherichia coli, for example, is present in the microbiome, but also a likely contaminant.

For now, the team has used their approach to identify contaminants in oropharyngeal, esophageal, gastrointestinal, and colorectal tissues, and has published the decontaminated samples in a public database.

In addition to tumor and blood, TCGA also includes matched normal tissue. The investigators were able to show that by comparing the microbial signature of tumor and normal tissue, they were able to identify nearly 40 bacterial species whose presence differed in colorectal tumor tissue and matched normal tissue. The strongest association was between Fusobacterium and tumor tissue, replicating previous findings and validating the method. But the researchers also found other species that were either more or less prevalent in tumor than normal tissue.

The authors concluded that "the ability to retroactively remove contaminant species from NGS sequencing datasets will greatly expand the breadth and accessibility of metagenomic profiles for downstream analyses," and is applicable to other datasets besides TCGA. The Genotype-Tissue Expression (GTEx) project, for example, is a multi-institutional project that studies how genomic variants in expression quantitative trait loci (eQTL) and splicing quantitative trait loci (sQTL) affect protein levels, and could be analyzed using the same methods.