Abstract

Numerous large-scale genomic studies of matched tumor-normal samples have established the somatic landscapes of most cancer types. However, the downstream analysis of data from somatic mutations entails a number of computational and statistical approaches, requiring usage of independent software and numerous tools. Here, we describe an R Bioconductor package, Maftools, which offers a multitude of analysis and visualization modules that are commonly used in cancer genomic studies, including driver gene identification, pathway, signature, enrichment, and association analyses. Maftools only requires somatic variants in Mutation Annotation Format (MAF) and is independent of larger alignment files. With the implementation of well-established statistical and computational methods, Maftools facilitates data-driven research and comparative analysis to discover novel results from publicly available data sets. In the present study, using three of the well-annotated cohorts from The Cancer Genome Atlas (TCGA), we describe the application of Maftools to reproduce known results. More importantly, we show that Maftools can also be used to uncover novel findings through integrative analysis.

Keywords

BioconductorBiologyComputational biologySomatic cellAnnotationGenomeVisualizationIdentification (biology)GeneticsBioinformaticsData miningComputer scienceGene

MeSH Terms

Clonal EvolutionHumansMutation RateNeoplasmsSequence AnalysisDNASoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
28
Issue
11
Pages
1747-1756
Citations
5045
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

5045
OpenAlex
262
Influential
3740
CrossRef

Cite This

Anand Mayakonda, De‐Chen Lin, Yassen Assenov et al. (2018). Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Research , 28 (11) , 1747-1756. https://doi.org/10.1101/gr.239244.118

Identifiers

DOI
10.1101/gr.239244.118
PMID
30341162
PMCID
PMC6211645

Data Quality

Data completeness: 90%