HTSeq—a Python framework to work with high-throughput sequencing data

Abstract

Abstract Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq . Contact: sanders@fs.tum.de

Keywords

Python (programming language)Computer scienceScripting languageWorkflowProgramming languageSoftwareParsingOperating systemData miningDatabase

Affiliated Institutions

European Molecular Biology Laboratory DE

Related Publications

Twelve years of SAMtools and BCFtools

Petr Danecek , James Bonfield , Jennifer Liddle +8 more

Abstract Background SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion an...

2021 GigaScience 13080 citations

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

Pierre-Alain Chaumeil , Aaron J. Mussig , Philip Hugenholtz +1 more

Abstract Summary The Genome Taxonomy Database Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the GTDB. GTDB-Tk is computa...

2019 Bioinformatics 4811 citations

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li , Richard Durbin

Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A...

2009 Bioinformatics 59569 citations

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

Anton Bankevich , Sergey Nurk , Dmitry Antipov +13 more

The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell gen...

2012 Journal of Computational Biology 25356 citations

BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

Daniel L. Ayres , Aaron E. Darling , Derrick J. Zwickl +9 more

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest i...

2011 Systematic Biology 739 citations

Publication Info

Year: 2014
Type: article
Volume: 31
Issue: 2
Pages: 166-169
Citations: 21465
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

HTSeq—a Python framework to work with high-throughput sequencing data

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

21465

OpenAlex

Cite This

APA Style

                            
                                    Simon Anders, 
                                
                                    Paul Theodor Pyl, 
                                
                                    Wolfgang Huber
                                
                            (2014). 
                            HTSeq—a Python framework to work with high-throughput sequencing data. 
                            Bioinformatics
                            , 31
                            (2)
                            , 166-169.
                            https://doi.org/10.1093/bioinformatics/btu638

Identifiers

DOI: 10.1093/bioinformatics/btu638