Abstract

Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estimating statistics (<i>e.g.</i>, the average and standard deviation of a sample), resulting in overestimated or underestimated values. Therefore, the results of data analysis are considerably dependent on the ways in which the missing values and outliers are processed. In this regard, this review discusses the types of missing values, ways of identifying outliers, and dealing with the two.

Keywords

OutlierMissing dataStatisticsStandard deviationReliability (semiconductor)MedicineSample (material)Statistical powerData miningEconometricsPower (physics)Computer scienceMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2017
Type
review
Volume
70
Issue
4
Pages
407-407
Citations
630
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

630
OpenAlex
22
Influential

Cite This

Sang Gyu Kwak, Jonghae Kim (2017). Statistical data preparation: management of missing values and outliers. Korean journal of anesthesiology , 70 (4) , 407-407. https://doi.org/10.4097/kjae.2017.70.4.407

Identifiers

DOI
10.4097/kjae.2017.70.4.407
PMID
28794835
PMCID
PMC5548942

Data Quality

Data completeness: 81%