Abstract

Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

Keywords

Computational biologyBiologyProfiling (computer programming)PhenotypeTranscriptomeComputer scienceGeneticsGene

MeSH Terms

AnimalsComputersMolecularData AnalysisGene Expression ProfilingHigh-Throughput Nucleotide SequencingHumansLeukocytesMononuclearMiceSequence AnalysisRNASingle-Cell AnalysisSoftwareTranscriptome

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
36
Issue
5
Pages
411-420
Citations
13637
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

13637
OpenAlex
1244
Influential

Cite This

Andrew Butler, Paul Hoffman, Peter Smibert et al. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology , 36 (5) , 411-420. https://doi.org/10.1038/nbt.4096

Identifiers

DOI
10.1038/nbt.4096
PMID
29608179
PMCID
PMC6700744

Data Quality

Data completeness: 90%