Abstract
Abstract A new projection pursuit algorithm for exploring multivariate data is presented that has both statistical and computational advantages over previous methods. A number of practical issues concerning its application are addressed. A connection to multivariate density estimation is established, and its properties are investigated through simulation studies and application to real data. The goal of exploratory projection pursuit is to use the data to find low- (one-, two-, or three-) dimensional projections that provide the most revealing views of the full-dimensional data. With these views the human gift for pattern recognition can be applied to help discover effects that may not have been anticipated in advance. Since linear effects are directly captured by the covariance structure of the variable pairs (which are straightforward to estimate) the emphasis here is on the discovery of nonlinear effects such as clustering or other general nonlinear associations among the variables. Although arbitrary nonlinear effects are impossible to parameterize in full generality, they are easily recognized when presented in a low-dimensional visual representation of the data density. Projection pursuit assigns a numerical index to every projection that is a functional of the projected data density. The intent of this index is to capture the degree of nonlinear structuring present in the projected distribution. The pursuit consists of maximizing this index with respect to the parameters defining the projection. Since it is unlikely that there is only one interesting view of a multivariate data set, this procedure is iterated to find further revealing projections. After each maximizing projection has been found, a transformation is applied to the data that removes the structure present in the solution projection while preserving the multivariate structure that is not captured by it. The projection pursuit algorithm is then applied to these transformed data to find additional views that may yield further insight. This projection pursuit algorithm has potential advantages over other dimensionality reduction methods that are commonly used for data exploration. It focuses directly on the "interestingness" of a projection rather than indirectly through the interpoint distances. This allows it to be unaffected by the scale and (linear) correlational structure of the data, helping it to overcome the "curse of dimensionality" that tends to plague methods based on multidimensional scaling, parametric mapping, cluster analysis, and principal components.
Keywords
Affiliated Institutions
Related Publications
Exploratory Projection Pursuit
Abstract A new projection pursuit algorithm for exploring multivariate data is presented that has both statistical and computational advantages over previous methods. A number o...
Nonlinear Dimensionality Reduction by Locally Linear Embedding
Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensional...
ROBPCA: A New Approach to Robust Principal Component Analysis
AbstractWe introduce a new method for robust principal component analysis (PCA). Classical PCA is based on the empirical covariance matrix of the data and hence is highly sensit...
Projection-Based Approximation and a Duality with Kernel Methods
Projection pursuit regression and kernel regression are methods for estimating a smooth function of several variables from noisy data obtained at scattered sites. Methods based ...
A Projection Pursuit Algorithm for Exploratory Data Analysis
An algorithm for the analysis of multivariate data is presented and is discussed in terms of specific examples. The algorithm seeks to find one-and two-dimensional linear projec...
Publication Info
- Year
- 1987
- Type
- article
- Volume
- 82
- Issue
- 397
- Pages
- 249-266
- Citations
- 787
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/01621459.1987.10478427