Abstract

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization. Results We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. Availability and Implementation The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater. Supplementary information Supplementary data are available at Bioinformatics online.

Keywords

Normalization (sociology)VisualizationComputer scienceRNA-SeqSoftwareComputational biologyData miningBiologyTranscriptomeGeneticsGene expressionProgramming languageGene

MeSH Terms

Cell LineHumansPrincipal Component AnalysisProgramming LanguagesQuality ControlRNASequence AnalysisRNASingle-Cell AnalysisSoftwareStatistics as Topic

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
article
Volume
33
Issue
8
Pages
1179-1186
Citations
1915
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1915
OpenAlex
111
Influential
1679
CrossRef

Cite This

Davis J. McCarthy, Kieran R. Campbell, Aaron T. L. Lun et al. (2016). Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics , 33 (8) , 1179-1186. https://doi.org/10.1093/bioinformatics/btw777

Identifiers

DOI
10.1093/bioinformatics/btw777
PMID
28088763
PMCID
PMC5408845

Data Quality

Data completeness: 90%