Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

2019 Nature Biotechnology 1,766 citations

Abstract

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the ‘genome in a bottle’ (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads. High-fidelity reads improve variant detection and genome assembly on the PacBio platform.

Keywords

Human genomeComputational biologyGenomeSequence assemblyGeneticsDNA sequencingBiologyGene

MeSH Terms

Base SequenceDNACircularGenetic VariationGenomeHumanHaplotypesHigh-Throughput Nucleotide SequencingHumansSequence AnalysisDNA

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
article
Volume
37
Issue
10
Pages
1155-1162
Citations
1766
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1766
OpenAlex
117
Influential

Cite This

Aaron M. Wenger, Paul Peluso, William J. Rowell et al. (2019). Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology , 37 (10) , 1155-1162. https://doi.org/10.1038/s41587-019-0217-9

Identifiers

DOI
10.1038/s41587-019-0217-9
PMID
31406327
PMCID
PMC6776680

Data Quality

Data completeness: 90%