Abstract

Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.

Keywords

Computer sciencePreprocessorTrimmingPython (programming language)Adapter (computing)JavaData miningData pre-processingProgramming languageSource codeOperating system

MeSH Terms

HumansProgramming LanguagesQuality Control

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
34
Issue
17
Pages
i884-i890
Citations
25069
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

25069
OpenAlex
3902
Influential
21931
CrossRef

Cite This

Shifu Chen, Yanqing Zhou, Yaru Chen et al. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34 (17) , i884-i890. https://doi.org/10.1093/bioinformatics/bty560

Identifiers

DOI
10.1093/bioinformatics/bty560
PMID
30423086
PMCID
PMC6129281

Data Quality

Data completeness: 90%