Abstract
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O*(√T)regret. The setting is a natural generalization of the nonstochastic multiarmed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by previous approaches are overcome by the use of a self-concordant potential function. Our approach presents a novel connection between online learning and interior point methods.
Keywords
Affiliated Institutions
Related Publications
ROBUST MODELING WITH ERRATIC DATA
An attractive alternative to least‐squares data modeling techniques is the use of absolute value error criteria. Unlike the least‐squares techniques the inclusion of some infini...
Handbook of Genetic Algorithms
This book sets out to explain what genetic algorithms are and how they can be used to solve real-world problems. The first objective is tackled by the editor, Lawrence Davis. Th...
Atomic Decomposition by Basis Pursuit
The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packe...
Performance evaluation of genetic algorithms for flowshop scheduling problems
The aim of this paper is to evaluate the performance of genetic algorithms for the flowshop scheduling problem with an objective of minimizing the makespan. First we examine var...
Active Learning with Statistical Models
For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techn...
Publication Info
- Year
- 2008
- Type
- article
- Pages
- 263-274
- Citations
- 192
- Access
- Closed