Multicollinearity and misleading statistical results

2019 Korean journal of anesthesiology 2,024 citations

Abstract

Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (Rh2) of a multiple regression model with one explanatory variable (Xh) as the model's response variable and the others (Xi [i ≠ h]) as its explanatory variables. The variance (σh2) of the regression coefficients constituting the final regression model are proportional to the VIF. Hence, an increase in Rh2 (strong multicollinearity) increases σh2. The larger σh2 produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIF is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of σh2 according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models.

Keywords

MulticollinearityVariance inflation factorStatisticsMathematicsLinear regressionRegression analysisEconometricsRegressionExplained variationVariance (accounting)Linear predictor functionRegression diagnosticVariablesProper linear modelBayesian multivariate linear regressionEconomics

MeSH Terms

BiasBiomedical ResearchData InterpretationStatisticalHumansLiver CirculationLiver RegenerationLiver TransplantationModelsStatisticalRegression AnalysisResearch Design

Affiliated Institutions

Related Publications

Multicollinearity

Abstract Multicollinearity refers to the linear relation among two or more variables. It is a data problem which may cause serious difficulty with the reliability of the estimat...

2010 Wiley Interdisciplinary Reviews Compu... 840 citations

Publication Info

Year
2019
Type
article
Volume
72
Issue
6
Pages
558-569
Citations
2024
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2024
OpenAlex
172
Influential

Cite This

Jonghae Kim (2019). Multicollinearity and misleading statistical results. Korean journal of anesthesiology , 72 (6) , 558-569. https://doi.org/10.4097/kja.19087

Identifiers

DOI
10.4097/kja.19087
PMID
31304696
PMCID
PMC6900425

Data Quality

Data completeness: 86%