Abstract

The paper introduces some generalizations of Vapnik's (1982) method of structural risk minimization (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved generalization performance. It then considers the more general case when the hierarchy of classes is chosen in response to the data. A result is presented on the generalization performance of classifiers with a "large margin". This theoretically explains the impressive generalization performance of the maximal margin hyperplane algorithm of Vapnik and co-workers (which is the basis for their support vector machines). The paper concludes with a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Four examples are given of such functions, including the Vapnik-Chervonenkis (1971) dimension measured on the sample.

Keywords

GeneralizationHyperplaneStructural risk minimizationMargin (machine learning)VC dimensionComputer scienceMinificationDimension (graph theory)HierarchySample (material)Basis (linear algebra)Support vector machineMachine learningArtificial intelligenceAlgorithmMathematicsCombinatorics

Affiliated Institutions

Related Publications

Publication Info

Year
1998
Type
article
Volume
44
Issue
5
Pages
1926-1940
Citations
527
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

527
OpenAlex

Cite This

John Shawe‐Taylor, Peter L. Bartlett, Robert C. Williamson et al. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory , 44 (5) , 1926-1940. https://doi.org/10.1109/18.705570

Identifiers

DOI
10.1109/18.705570