Abstract

The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.

Keywords

GenomeGenotypingComputational biologyReference genomeHaplotypeSearch engine indexingPopulationBiologyComputer scienceGenomicsGeneticsHuman genomeGenotypeGeneArtificial intelligence

MeSH Terms

Base SequenceBenchmarkingGenetic VariationGenomeHumanGenomicsGenotypeHumansReproducibility of ResultsSequence AnalysisDNASequence AnalysisRNASoftware

Affiliated Institutions

Related Publications

Although DR3-DQB1*0201 may be associated with multiple component diseases of the autoimmune polyglandular syndromes, the human leukocyte antigen DR4-DQB1*0302 haplotype is implicated only in beta-cell autoimmunity.

Human leukocyte antigen (HLA)-DRB1 and -DQB1 alleles were analyzed using a PCR-based sequence-specific priming technique in 16 patients with autoimmune polyglandular syndrome ty...

1996 The Journal of Clinical Endocrinology... 108 citations

Publication Info

Year
2019
Type
article
Volume
37
Issue
8
Pages
907-915
Citations
13675
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

13675
OpenAlex
2011
Influential

Cite This

Daehwan Kim, Joseph M. Paggi, Chanhee Park et al. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology , 37 (8) , 907-915. https://doi.org/10.1038/s41587-019-0201-4

Identifiers

DOI
10.1038/s41587-019-0201-4
PMID
31375807
PMCID
PMC7605509

Data Quality

Data completeness: 90%