Abstract
Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estimating statistics (<i>e.g.</i>, the average and standard deviation of a sample), resulting in overestimated or underestimated values. Therefore, the results of data analysis are considerably dependent on the ways in which the missing values and outliers are processed. In this regard, this review discusses the types of missing values, ways of identifying outliers, and dealing with the two.
Keywords
Affiliated Institutions
Related Publications
The logistic analysis of epidemiologic prospective studies: Investigation by simulation
Abstract We performed a Monte Carlo computer simulation of the Walker‐Duncan logistic regression technique in a typical epidemiologic prospective setting and analysed the result...
Estimating Mean and Standard Deviation from the Sample Size, Three Quartiles, Minimum, and Maximum
Background: We sometimes want to include in a meta-analysis data from studies where results are presented as medians and ranges or interquartile ranges rather than as means and ...
Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling
Linear subspace methods that provide sufficient reconstruction of the data, such as PCA, offer an efficient way of dealing with missing pixels, outliers, and occlusions that oft...
Time Series Model Specification in the Presence of Outliers
Abstract Outliers are commonplace in data analysis. Time series analysis is no exception. Noting that the effect of outliers on model identification statistics could be serious,...
A simulation study of the number of events per variable in logistic regression analysis
We performed a Monte Carlo study to evaluate the effect of the number of events per variable (EPV) analyzed in logistic regression analysis. The simulations were based on data f...
Publication Info
- Year
- 2017
- Type
- review
- Volume
- 70
- Issue
- 4
- Pages
- 407-407
- Citations
- 630
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.4097/kjae.2017.70.4.407
- PMID
- 28794835
- PMCID
- PMC5548942