Abstract
Abstract Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. Availability: http://samtools.sourceforge.net Contact: hengli@broadinstitute.org
Keywords
Affiliated Institutions
Related Publications
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel
Abstract Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (...
MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes
Abstract Genome‐wide association studies (GWAS) can identify common alleles that contribute to complex disease susceptibility. Despite the large number of SNPs assessed in each ...
High-throughput genotyping by whole-genome resequencing
The next-generation sequencing technology coupled with the growing number of genome sequences opens the opportunity to redesign genotyping strategies for more effective genetic ...
DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome
Acute myeloid leukaemia is a highly malignant haematopoietic tumour that affects about 13,000 adults in the United States each year. The treatment of this disease has changed li...
Integrative analysis of the melanoma transcriptome
Global studies of transcript structure and abundance in cancer cells enable the systematic discovery of aberrations that contribute to carcinogenesis, including gene fusions, al...
Publication Info
- Year
- 2011
- Type
- article
- Volume
- 27
- Issue
- 21
- Pages
- 2987-2993
- Citations
- 6923
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btr509