Abstract

Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as "doublets," which limit cell throughput and lead to spurious biological conclusions. Here, we present a computational doublet detection tool-DoubletFinder-that identifies doublets using only gene expression data. DoubletFinder predicts doublets according to each real cell's proximity in gene expression space to artificial doublets created by averaging the transcriptional profile of randomly chosen cell pairs. We first use scRNA-seq datasets where the identity of doublets is known to show that DoubletFinder identifies doublets formed from transcriptionally distinct cells. When these doublets are removed, the identification of differentially expressed genes is enhanced. Second, we provide a method for estimating DoubletFinder input parameters, allowing its application across scRNA-seq datasets with diverse distributions of cell types. Lastly, we present "best practices" for DoubletFinder applications and illustrate that DoubletFinder is insensitive to an experimentally validated kidney cell type with "hybrid" expression features.

Keywords

Spurious relationshipComputational biologyGeneGene expressionIdentification (biology)BiologyRNAExpression (computer science)Limit (mathematics)CellRNA-SeqBiological systemComputer sciencePattern recognition (psychology)GeneticsArtificial intelligenceTranscriptomeMathematicsMachine learning

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
article
Volume
8
Issue
4
Pages
329-337.e4
Citations
3891
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3891
OpenAlex

Cite This

Christopher S. McGinnis, Lyndsay M. Murrow, Zev J. Gartner (2019). DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems , 8 (4) , 329-337.e4. https://doi.org/10.1016/j.cels.2019.03.003

Identifiers

DOI
10.1016/j.cels.2019.03.003