Randomized Algorithms for Matrices and Data

Michael W. Mahoney

doi:10.1201/b11822-37

Abstract

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years.Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and this work was performed by individuals from many different research communities.While the most obvious benefit of randomization is that it can lead to faster algorithms, either in worst-case asymptotic theory and/or numerical implementation, there are numerous other benefits that are at least as important.For example, the use of randomization can lead to simpler algorithms that are easier to analyze or reason about when applied in counterintuitive settings; it can lead to algorithms with more interpretable output, which is of interest in applications where analyst time rather than just computational time is of interest; it can lead implicitly to regularization and more robust output; and randomized algorithms can often be organized to exploit modern computational architectures better than classical numerical methods.This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.Throughout this review, an emphasis will be placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these tools in large-scale data applications.Crucial in this context is the connection with the concept of statistical leverage.This concept has long been used in statistical regression diagnostics to identify outliers; and it has recently proved crucial in the development of improved worst-case matrix algorithms that are also amenable to high-quality numerical implementation and that are useful to domain scientists.This connection arises naturally when one explicitly decouples the effect of randomization in these matrix algorithms from the underlying linear algebraic structure.This decoupling also permits much finer control in the application of randomization, as well as the easier exploitation of domain knowledge.Most of the review will focus on random sampling algorithms and random projection algorithms for versions of the linear least-squares problem and the low-rank matrix approximation problem.These two problems are fundamental in theory and ubiquitous in practice.Randomized methods solve these problems by constructing and operating on a randomized sketch of the input matrix Afor random sampling methods, the sketch consists of a small number of carefully-sampled and rescaled columns/rows of A, while for random projection methods, the sketch consists of a small number of linear combinations of the columns/rows of A. Depending on the specifics of the situation, when compared with the best previously-existing deterministic algorithms, the resulting randomized algorithms have worst-case running time that is asymptotically faster; their numerical implementations are faster in terms of clock-time; or they can be implemented in parallel computing environments where existing numerical algorithms fail to run at all.Numerous examples illustrating these observations will be described in detail.

Keywords

Computer scienceExploitLeverage (statistics)CounterintuitiveContext (archaeology)Randomized algorithmAlgorithmRegularization (linguistics)Theoretical computer scienceData scienceArtificial intelligence

Affiliated Institutions

Stanford University US

Related Publications

Robust Solutions to Least-Squares Problems with Uncertain Data

Laurent El Ghaoui , Hervé Lebret

We consider least-squares problems where the coefficient matrices A, b are unknown but bounded. We minimize the worst-case residual error using (convex) second-order cone progra...

1997 SIAM Journal on Matrix Analysis and A... 1049 citations

Determinant Maximization with Linear Matrix Inequality Constraints

Lieven Vandenberghe , Stephen Boyd , Shao-Po Wu

The problem of maximizing the determinant of a matrix subject to linear matrix inequalities (LMIs) arises in many fields, including computational geometry, statistics, system id...

1998 SIAM Journal on Matrix Analysis and A... 651 citations

Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers

Nathan L. Kleinman , James C. Spall , Daniel Q. Naiman

The method of Common Random Numbers is a technique used to reduce the variance of difference estimates in simulation optimization problems. These differences are commonly used t...

1999 Management Science 95 citations

Stable signal recovery from incomplete and inaccurate measurements

Emmanuel J. Candès , Justin Romberg , Terence Tao

Abstract Suppose we wish to recover a vector x 0 ∈ ℝ 𝓂 (e.g., a digital signal or image) from incomplete and contaminated observations y = A x 0 + e ; A is an 𝓃 × 𝓂 matrix wi...

2006 Communications on Pure and Applied Ma... 7037 citations

Random Matrix Theory and Wireless Communications

Antonia M. Tulino , S. Valero

Random matrix theory has found many applications in physics, statistics and engineering since its inception. Although early developments were motivated by practical experimental...

2004 now publishers, Inc. eBooks 812 citations

Publication Info

Year: 2012
Type: book-chapter
Citations: 447
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Randomized Algorithms for Matrices and Data

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

447

OpenAlex

Cite This

APA Style

                            
                                    Michael W. Mahoney
                                
                            (2012). 
                            Randomized Algorithms for Matrices and Data. 
                            Chapman & Hall/CRC data mining and knowledge discovery series
                            
                            .
                            https://doi.org/10.1201/b11822-37

Identifiers

DOI: 10.1201/b11822-37