Abstract
Classification and regression trees are ideally suited for the analysis of complex ecological data. For such data, we require flexible and robust analytical methods, which can deal with nonlinear relationships, high-order interactions, and missing values. Despite such difficulties, the methods should be simple to understand and give easily interpretable results. Trees explain variation of a single response variable by repeatedly splitting the data into more homogeneous groups, using combinations of explanatory variables that may be categorical and/or numeric. Each group is characterized by a typical value of the response variable, the number of observations in the group, and the values of the explanatory variables that define it. The tree is represented graphically, and this aids exploration and understanding. Trees can be used for interactive exploration and for description and prediction of patterns and processes. Advantages of trees include: (1) the flexibility to handle a broad range of response types, including numeric, categorical, ratings, and survival data; (2) invariance to monotonic transformations of the explanatory variables; (3) ease and robustness of construction; (4) ease of interpretation; and (5) the ability to handle missing values in both response and explanatory variables. Thus, trees complement or represent an alternative to many traditional statistical techniques, including multiple regression, analysis of variance, logistic regression, log-linear models, linear discriminant analysis, and survival models. We use classification and regression trees to analyze survey data from the Australian central Great Barrier Reef, comprising abundances of soft coral taxa (Cnidaria: Octocorallia) and physical and spatial environmental information. Regression tree analyses showed that dense aggregations, typically formed by three taxa, were restricted to distinct habitat types, each of which was defined by combinations of 3–4 environmental variables. The habitat definitions were consistent with known experimental findings on the nutrition of these taxa. When used separately, physical and spatial variables were similarly strong predictors of abundances and lost little in comparison with their joint use. The spatial variables are thus effective surrogates for the physical variables in this extensive reef complex, where information on the physical environment is often not available. Finally, we compare the use of regression trees and linear models for the analysis of these data and show how linear models fail to find patterns uncovered by the trees.
Keywords
Affiliated Institutions
Related Publications
Multiple Regression in Behavioral Research: Explanation and Prediction
Part I: Foundations of Multiple Regression Analysis. Overview. Simple Linear Regression and Correlation. Regression Diagnostics. Computers and Computer Programs. Elements of Mul...
BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION
Accurate prediction and explanation are fundamental objectives of statistical analysis, yet they seldom coincide. Boosted trees are a statistical learning method that attains bo...
Classification and Regression Trees
Classification and regression tree models (CARTs) are computationally intensive methods that are used in situations where there are many explanatory variables and user would lik...
Unbiased Recursive Partitioning: A Conditional Inference Framework
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been kn...
Regression Analysis: Statistical Modeling of a Response Variable
1. The Analysis of Means: A Review of Basics and an Introduction to Linear Models 2. Simple Linear Regression:Linear Regression with One Independent Variable 3. Multiple Regress...
Publication Info
- Year
- 2000
- Type
- article
- Volume
- 81
- Issue
- 11
- Pages
- 3178-3192
- Citations
- 3065
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1890/0012-9658(2000)081[3178:cartap]2.0.co;2