Minimap2: pairwise alignment for nucleotide sequences

Heng Li Heng Li
2018 Bioinformatics 14,625 citations

Abstract

Abstract Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3–4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2 Supplementary information Supplementary data are available at Bioinformatics online.

Keywords

Pairwise comparisonNucleotideMultiple sequence alignmentSequence alignmentComputational biologyComputer scienceGeneticsBiologyArtificial intelligenceGenePeptide sequence

MeSH Terms

AlgorithmsBase SequenceGenomicsHigh-Throughput Nucleotide SequencingSequence AnalysisDNASoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
34
Issue
18
Pages
3094-3100
Citations
14625
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

14625
OpenAlex
1420
Influential
12824
CrossRef

Cite This

Heng Li (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics , 34 (18) , 3094-3100. https://doi.org/10.1093/bioinformatics/bty191

Identifiers

DOI
10.1093/bioinformatics/bty191
PMID
29750242
PMCID
PMC6137996
arXiv
1708.01492

Data Quality

Data completeness: 93%