Data Structures for Statistical Computing in Python

Abstract

In this paper we are concerned with the practical issues of working with data sets common to finance, statistics, and other related fields. pandas is a new library which aims to facilitate working with these data sets and to provide a set of fundamental building blocks for implementing statistical models. We will discuss specific design issues encountered in the course of developing pandas with relevant examples and some comparisons with the R language. We conclude by discussing possible future directions for statistical computing and data analysis using Python.

Keywords

Python (programming language)Computer scienceData scienceData explorationStatistical analysisData structureComputational statisticsTheoretical computer scienceData miningSoftware engineeringProgramming languageMachine learningStatisticsVisualizationMathematics

Affiliated Institutions

Capital University US

Related Publications

SCANPY: large-scale single-cell gene expression data analysis

F. Alexander Wolf , Philipp Angerer , Fabian J. Theis

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inferenc...

2018 Genome biology 8088 citations

Figuring Out Factors: The Use and Misuse of Factor Analysis

David L. Streiner

Factor analysis is a technique which is designed to reveal whether or not the pattern of responses on a number of tests can be explained by a smaller number of underlying traits...

1994 The Canadian Journal of Psychiatry 749 citations

New tools for automated high-resolution cryo-EM structure determination in RELION-3

Jasenko Zivanov , Takanori Nakane , Björn Forsberg +4 more

Here, we describe the third major release of RELION. CPU-based vector acceleration has been added in addition to GPU support, which provides flexibility in use of resources and ...

2018 eLife 5243 citations

fastp: an ultra-fast all-in-one FASTQ preprocessor

Shifu Chen , Yanqing Zhou , Yaru Chen +1 more

Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for e...

2018 Bioinformatics 25069 citations

HTSeq—a Python framework to work with high-throughput sequencing data

Simon Anders , Paul Theodor Pyl , Wolfgang Huber

Abstract Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from stand...

2014 Bioinformatics 21465 citations

Publication Info

Year: 2010
Type: article
Pages: 56-61
Citations: 10212
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Data Structures for Statistical Computing in Python

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

10212

OpenAlex

Cite This

APA Style

                            
                                    Wes McKinney
                                
                            (2010). 
                            Data Structures for Statistical Computing in Python. 
                            Proceedings of the Python in Science Conferences
                            
                            , 56-61.
                            https://doi.org/10.25080/majora-92bf1922-00a

Identifiers

DOI: 10.25080/majora-92bf1922-00a