Abstract

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings. NGMLR and Sniffles perform highly accurate alignment and structural variation detection from long-read sequencing data.

Keywords

Computer scienceStructural variationIdentification (biology)Computational biologyField (mathematics)Data miningFilter (signal processing)GenomeBiologyGeneticsGene

MeSH Terms

DNA Mutational AnalysisGenomeHumanGenomicsHigh-Throughput Nucleotide SequencingHumansSequence AnalysisDNA

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
15
Issue
6
Pages
461-468
Citations
1775
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1775
OpenAlex
187
Influential

Cite This

Fritz J. Sedlazeck, Philipp Rescheneder, Moritz Smolka et al. (2018). Accurate detection of complex structural variations using single-molecule sequencing. Nature Methods , 15 (6) , 461-468. https://doi.org/10.1038/s41592-018-0001-7

Identifiers

DOI
10.1038/s41592-018-0001-7
PMID
29713083
PMCID
PMC5990442

Data Quality

Data completeness: 86%