Abstract
We investigate the computational complexity of two closely related classes of combinatorial optimization problems for linear systems which arise in various fields such as machine learning, operations research and pattern recognition. In the first class (Min ULR) one wishes, given a possibly infeasible system of linear relations, to find a solution that violates as few relations as possible while satisfying all the others. In the second class (Min RVLS) the linear system is supposed to be feasible and one looks for a solution with as few nonzero variables as possible. For both Min ULR and Min RVLS the four basic types of relational operators =, ⩾, > and ≠ are considered. While Min RVLS with equations was mentioned to be NP-hard in (Garey and Johnson, 1979), we established in (Amaldi; 1992; Amaldi and Kann, 1995) that min ULR with equalities and inequalities are NP-hard even when restricted to homogeneous systems with bipolar coefficients. The latter problems have been shown hard to approximate in (Arora et al., 1993). In this paper we determine strong bounds on the approximability of various variants of Min RVLS and min ULR, including constrained ones where the variables are restricted to take binary values or where some relations are mandatory while others are optional. The various NP-hard versions turn out to have different approximability properties depending on the type of relations and the additional constraints, but none of them can be approximated within any constant factor, unless P = NP. Particular attention is devoted to two interesting special cases that occur in discriminant analysis and machine learning. In particular, we disprove a conjecture of van Horn and Martinez (1992) regarding the existence of a polynomial-time algorithm to design linear classifiers (or perceptrons) that involve a close-to-minimum number of features.
Keywords
Affiliated Institutions
Related Publications
On the Complexity of Multiple Sequence Alignment
We study the computational complexity of two popular problems in multiple sequence alignment: multiple alignment with SP-score and multiple tree alignment. It is shown that the ...
On the NP-hardness of solving bilinear matrix inequalities and simultaneous stabilization with static output feedback
In this paper, it is shown that the problem of checking the solvability of a bilinear matrix inequality (BMI), is NP-hard. A matrix valued function, F(X,Y), is called bilinear i...
A Direct Formulation for Sparse PCA Using Semidefinite Programming
Given a covariance matrix, we consider the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number o...
Machine learning for combinatorial optimization: A methodological tour d'horizon
This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization pr...
Categorical Data Analysis
Preface. 1. Introduction: Distributions and Inference for Categorical Data. 1.1 Categorical Response Data. 1.2 Distributions for Categorical Data. 1.3 Statistical Inference for ...
Publication Info
- Year
- 1998
- Type
- article
- Volume
- 209
- Issue
- 1-2
- Pages
- 237-260
- Citations
- 757
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1016/s0304-3975(97)00115-1