Abstract

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.

Keywords

BiologySanger sequencingDocumentationFile formatComputational biologyInformation retrievalComputer scienceDNA sequencingDatabaseGeneticsGeneOperating system

Affiliated Institutions

Related Publications

The Phusion Assembler

The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage,...

2002 Genome Research 220 citations

Publication Info

Year
2009
Type
review
Volume
38
Issue
6
Pages
1767-1771
Citations
1863
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1863
OpenAlex

Cite This

Peter Cock, Christopher J. Fields, N. Goto et al. (2009). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research , 38 (6) , 1767-1771. https://doi.org/10.1093/nar/gkp1137

Identifiers

DOI
10.1093/nar/gkp1137