Abstract

Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.

Keywords

Matching (statistics)PopulationComputer scienceSingle cell sequencingRNAData miningRNA-SeqComputational biologyk-nearest neighbors algorithmArtificial intelligenceBiologyGeneMathematicsGene expressionGeneticsPhenotypeStatisticsTranscriptome

MeSH Terms

AlgorithmsCluster AnalysisData AnalysisHigh-Throughput Nucleotide SequencingSequence AnalysisRNASingle-Cell Analysis

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
36
Issue
5
Pages
421-427
Citations
2526
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2526
OpenAlex
155
Influential

Cite This

Laleh Haghverdi, Aaron T. L. Lun, Michael D. Morgan et al. (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology , 36 (5) , 421-427. https://doi.org/10.1038/nbt.4091

Identifiers

DOI
10.1038/nbt.4091
PMID
29608177
PMCID
PMC6152897

Data Quality

Data completeness: 86%