Abstract
Abstract A large amount of sequencing data is generated and processed every day with the continuous evolution of sequencing technology and the expansion of sequencing applications. One consequence of such sequencing data explosion is the increasing cost and complexity of data processing. The preprocessing of FASTQ data, which means removing adapter contamination, filtering low‐quality reads, and correcting wrongly represented bases, is an indispensable but resource intensive part of sequencing data analysis. Therefore, although a lot of software applications have been developed to solve this problem, bioinformatics scientists and engineers are still pursuing faster, simpler, and more energy‐efficient software. Several years ago, the author developed fastp, which is an ultrafast all‐in‐one FASTQ data preprocessor with many modern features. This software has been approved by many bioinformatics users and has been continuously maintained and updated. Since the first publication on fastp, it has been greatly improved, making it even faster and more powerful. For instance, the duplication evaluation module has been improved, and a new deduplication module has been added. This study aimed to introduce the new features of fastp and demonstrate how it was designed and implemented.
Keywords
Affiliated Institutions
Related Publications
fastp: an ultra-fast all-in-one FASTQ preprocessor
Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for e...
<i>lumi</i>: a pipeline for processing Illumina microarray
Abstract Summary: Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different fro...
Benchmarking atlas-level data integration in single-cell genomics
Abstract Single-cell atlases often include samples that span locations, laboratories and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of at...
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal defin...
FastQ Screen: A tool for multi-genome mapping and quality control
<ns3:p>DNA sequencing analysis typically involves mapping reads to just one reference genome. Mapping against multiple genomes is necessary, however, when the genome of origin r...
Publication Info
- Year
- 2023
- Type
- article
- Volume
- 2
- Issue
- 2
- Pages
- e107-e107
- Citations
- 1338
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1002/imt2.107