Tumors and other tissues are usually thought of as a mass of identical cells. Nothing could be further from the truth.

A tumor has a multitude of cells that can have subtle genetic differences. Those differences can be critical. In tumors, the genetic differences may help some cells resist treatment.

Single-cell RNA sequencing enables detailed study of such subtle differences. The technique works by looking at the cell’s messenger RNA as it carries genetic information from a cell’s DNA blueprint to the machinery that produces the enzymes and other cellular proteins. Single-cell RNA sequencing provides a snapshot of a cell’s distinctive pattern of gene activation, or “expression,” and thus the cell’s function.

The technique involves isolating individual cells and extracting the messenger RNA. When the messenger RNA is copied back to DNA, researchers can sequence that DNA to reveal the active genetic machinery of the cell.

But there are limits.

Large volumes of noisy data

This type of sequencing produces masses of genetic data that may hide key differences in expression among the different cells. It’s also hard to get the data for genes that may be important but are expressed at low levels. Another challenge is that the information may constitute a low “signal” in a sea of “noise” produced by all the other genetic and non-genetic sources. These problems become worse as the numbers of cells analyzed increases to the hundreds of thousands.

Another data-analysis problem arises from “batch” effects. These reflect subtle day-to-day differences in how samples from the same tumor are prepared and analyzed. We developed a new computational technique for analyzing sequencing data to detect biologically meaningful subpopulations of cells in tumors and other tissues. The technique, called Latent Cellular Analysis, will enable researchers to derive insights into the differences among cells. It is described in the journal Nucleic Acids Research.

Our lab develops techniques to mine the data from single-cell RNA sequencing and reveal the key gene expression differences among cell subpopulations. My colleagues and I have developed analytical tools that can overcome the many barriers to understanding those differences.

Previous analytical approaches have not been particularly user-friendly. Many techniques require significant efforts for scientists to interpret the results and generate hypotheses. Most methods also have high computational costs for sets of data that represent large numbers of cells.

Moreover, several of the methods require users to estimate the number of so-called “clusters” in the data. Such clusters involve organizing data about individual cells into groups with common features that may be biologically important.

Our method accomplishes the clustering task using the kind of machine-learning approach originally developed to automatically analyze text by recognizing patterns of word co-occurrence. This approach makes it easier to recognize the subtle differences among cells that may reflect, for example, the differences among cells in a tumor.

To demonstrate latent cellular analysis, we used the technique to overcome strong batch effects in sequencing rhabdomyosarcoma data. Rhabdomyosarcoma is the most common soft tissue tumor in children. This seemly homogeneous tumor has two major subpopulations of cells and our LCA technique effectively removed the batch effects and clearly identified the two subpopulations.

We are making our technique freely available to researchers to download and install. We are also developing the interface to make it easier to use.

Detecting and understanding the differences among subpopulations of cancer cells will enable better treatments and more sensitive diagnostic tests. Understanding the differences among seemingly identical cells in other tissues can also lead to important biological insights.