Abstract

We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias.Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate.We present a nonparametric method which directly produces resampling weights without distribution estimation.Our method works by matching distributions between training and testing sets in feature space.Experimental results demonstrate that our method works well in practice.

Keywords

Selection biasSelection (genetic algorithm)Sample (material)Sampling biasComputer scienceArtificial intelligenceStatisticsPattern recognition (psychology)Sample size determinationMathematicsChromatographyChemistry

Affiliated Institutions

Related Publications

Publication Info

Year
2007
Type
book-chapter
Pages
601-608
Citations
1529
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1529
OpenAlex
213
Influential
415
CrossRef

Cite This

Jiayuan Huang, Alexander J. Smola, Arthur Gretton et al. (2007). Correcting Sample Selection Bias by Unlabeled Data. The MIT Press eBooks , 601-608. https://doi.org/10.7551/mitpress/7503.003.0080

Identifiers

DOI
10.7551/mitpress/7503.003.0080

Data Quality

Data completeness: 81%