10

This builds on my earlier questions here and here.


Let $B$ be a symmetric positive definite matrix in $\mathbb{R}^{k\times k}$ and consider the problem

$$\begin{array}{ll} \text{maximize} & x^\top B x\\ \text{subject to} & \|x\|=1 \\ & b^\top x = a\end{array}$$

where $b$ is an arbitrary unit vector and $a > 0$ is a small positive number. Let $$\lambda_1 > \lambda_2 \geq \cdots \geq \lambda_k > 0$$ be the eigenvalues of $B$ with corresponding eigenvectors $z_1,...,z_k$. I conjecture that the optimal value of the problem is bounded below by $a^2 \lambda_1 + \left(1-a^2\right)\lambda_2$, at least if $a$ is small enough.


To motivate this conjecture, let us consider two special cases. First, suppose that $a= 0$. Then, as was explained to me in one of my previous posts, the optimal value is between $\lambda_1$ and $\lambda_2$ by the Courant-Fischer theorem. Thus, $\lambda_2$ is a lower bound, and it also coincides with my conjectured lower bound in this special case.

Second, let $a > 0$ but suppose that $b = z_i$ for some $i = 1,...,k$. Any feasible $x$ can be written as

$$x = ab + \sqrt{1-a^2} \cdot \hat{b}$$

where $\hat{b}\perp b$. If $b = z_1$, I can take $\hat{b} = z_2$, and if $b = z_i$ for $i \neq 1$, I can take $\hat{b} = z_1$. Either way, the objective value of $x$ is bounded below by $a^2 \lambda_1 + \left(1-a^2\right)\lambda_2$ as long as $a$ is small enough (note that this requires $\lambda_1 > \lambda_2$).

The difficulty is showing that it holds in the case where $b$ is not one of the eigenvectors of $B$ (perhaps with additional restrictions on how large $a$ can be). My intuition is that, if $b$ is not required to be orthogonal to $x$, but only "almost" orthogonal (meaning that $a$ may be required to be sufficiently small), you should be able to go a bit further in the direction of the principal eigenvector than in the case where $a = 0$.


Here is the most up-to-date work on this problem. In the answer below, it was found that the optimal value $v$ of the problem is a generalized eigenvalue of the system

$$PBx = vPx,$$

which in turn was derived from the system

$$PBPy + aPBb = v Py.$$

Any pair $\left(y,v\right)$ that solves these equations then leads to a feasible $x = ab+Py$, with $v$ being the objective value.

We can write

$$\left(vI - PB\right)Py = aPBb.$$

Note that, for any $v$ that is not an eigenvalue of $PB$, the matrix $vI-PB$ is invertible, whence

$$Py = a\left(vI-PB\right)^{-1}PBb.$$

The normalization $x^\top x = 1$ then becomes $y^\top P y = 1-a^2$, leading to the equation

$$\frac{1-a^2}{a^2} = b^\top BP\left(vI-PB\right)^{-2} PBb.$$

The largest root of this equation is the optimal value of the problem. Perhaps, as suggested, it can be found numerically.

sven svenson
  • 1,440
  • 1
    I guess in your example with $b=z_i$, we have $x^\top B x=\sum \lambda_i x_i^2$. – Alex Ravsky Sep 09 '20 at 21:37
  • if $b=0$ and the $||\cdot||_\infty$-norm is used, this problem is known to be NP-complete – LinAlg Sep 09 '20 at 23:56
  • Sure, but I am not trying to solve the problem, only to bound the solution. For $a=0$ such a bound exists, so the question is whether a more general form of it applies. – sven svenson Sep 10 '20 at 12:13
  • 1
    What do you mean by "for more general $B$"? Without loss of generality, it suffices to consider symmetric $B$ since $x^TBx = x^TSx$, where $S = \frac 12 (B + B^T)$. Do you simply mean that we allow $B$ to be a symmetric matrix with negative eigenvalues? – Ben Grossmann Sep 12 '20 at 20:33
  • I meant a symmetric, positive definite $B$ for which $b$ was not necessarily an eigenvector. I could only show my bound in the special case where $b = z_i$ for some $i$, but am trying to show it for any $b$. I will edit to clarify, thanks. – sven svenson Sep 12 '20 at 20:35
  • @svensvenson I had missed somehow that the $z_i$ refer to the eigenvectors. Thanks for that – Ben Grossmann Sep 12 '20 at 20:39
  • One quick inequality: $$ x^TBx = a^2 (b^TBb) + (1 - a^2)(\hat b ^T B \hat b) - 2 a\sqrt{1 - a^2}\hat b^T B b =\ a^2 |b|_B + (1 - a^2)|\hat b|_B - 2 a\sqrt{1 - a^2}\hat b^T B b \geq \ a^2 |b|_B ^2+ (1 - a^2)|\hat b|_B^2 - 2 a\sqrt{1 - a^2}|b|_B\cdot |\hat b |_B = \ (a|b|_B - \sqrt{1 - a^2}|\hat b|_B)^2 $$ where $|x|_B = \sqrt{x^TBx}$. – Ben Grossmann Sep 12 '20 at 20:50
  • 1
    I don't follow your argument that the optimum objective value is bounded below by $\ a^2\lambda_1+\left(1-a^2\right)\lambda_2\ $ when $\ b=z_i\ $ with $\ i\not\in{1,2}\ $. In that case, if $\ a>0\ $, $\ \lambda_1=\lambda_2>\lambda_i\ $, and $\ \displaystyle x=\sum_j x_j z_j\ $, then $\ x_i=a\ $ and $\ \displaystyle \sum_{j\ne i}x_j^2= 1-a^2\ $, and $$x^\top Bx =\sum_j\lambda_j x_j^2\ \le\lambda_1\sum_{j\ne i}x_j^2+\lambda_ix_i^2\ =\lambda_1\left(1-a^2\right) +a^2\lambda_i\ <a^2\lambda_1 + \left(1-a^2\right)\lambda_2\ . $$ So it looks to me like this case provides a counterexample. – lonza leggiera Sep 13 '20 at 07:31
  • If $\lambda_1 > \lambda_2$, it is true for small enough $a$. I guess the top two can't be equal though unless $a = 0$. Edited accordingly. – sven svenson Sep 13 '20 at 13:18
  • 1
    This Lagrangian approach looks more promising than my pseudo-eigenvalue approach. – greg Sep 13 '20 at 17:11
  • Looks like the main issue is that we have $\left(PB+\mu I\right)^{-2}$ in the condition but only $\left(PB +\mu I\right)^{-1}$ in the objective function. I take it that $PB +\mu I$ is not a projection, so these do not seem to simplify much. I guess we know $\left(PB +\mu I\right)^{-1} PB b = \left(PBP + \mu I\right)^{-1} PBb$. – sven svenson Sep 13 '20 at 17:13
  • 1
    You just need to solve a scalar nonlinear equation for $\mu.,$ It's analytically difficult but numerically easy. – greg Sep 13 '20 at 17:31
  • Please, avoid making several edits. – Aloizio Macedo Sep 16 '20 at 13:22

2 Answers2

2

The following analysis explores various approaches to the problem, but ultimately fails to produce a satisfactory solution.

One of the constraints can be rewritten using the nullspace projector of $b$ $$\eqalign{ P &= \Big(I-(b^T)^+b^T\Big) = \left(I-\frac{bb^T}{b^Tb}\right) \;=\; I-\beta bb^T \\ Pb &= 0,\qquad P^2=P=P^T \\ }$$ and the introduction of an unconstrained vector $y$ $$\eqalign{ b^Tx &= a \\ x &= Py + (b^T)^+a \\ &= Py + a\beta b \\ &= Py + \alpha_0 b \\ }$$ The remaining constraint can be absorbed into the definition of the objective function itself $$\eqalign{ \lambda &= \frac{x^TBx}{x^Tx} \;=\; \frac{y^TPBPy +2\alpha_0y^TPBb +\alpha_0^2\,b^TBb}{y^TPy +\alpha_0^2\,b^Tb} \;=\; \frac{\theta_1}{\theta_2} \tag{0} \\ }$$ The gradient can be calculated by a straightforward (if tedious) application of the quotient rule as $$\eqalign{ \frac{\partial\lambda}{\partial y} &= \frac{2\theta_2(PBPy +\alpha_0PBb)-2\theta_1Py} {\theta_2^2} \\ }$$ Setting the gradient to zero yields $${ PBPy +\alpha_0PBb = \lambda Py \tag{1} \\ }$$ which can be rearranged into a generalized eigenvalue equation. $$\eqalign{ PB\left(Py+\alpha_0b\right) &= \lambda Py \\ PBx &= \lambda Px \tag{2} \\ }$$ Note that multiplying the standard eigenvalue equation $$\eqalign{ Bx &= \lambda x \tag{3} \\ }$$ by $P$ reproduces equation $({2})$. So both standard and generalized eigenvalues are potential solutions.

Unlike the discrete $\lambda$ values yielded by the eigenvalue methods, equation $({1})$ is solvable for a continuous range of $\lambda$
$$\eqalign{ y &= \alpha_0(\lambda P-PBP)^+PBb \\ }$$ and produces a $y$ vector which satisfies the zero gradient condition $({1})$.

Unfortunately, none of these approaches yields a solution which satisfies all of the constraints.

But solving equation $(0)$ for an optimal $y$ vector is still the appropriate goal, and requires a numerical rather than an analytical approach.

greg
  • 40,033
  • Thanks. This looks difficult to solve analytically. I suppose one could assume the bound does not hold, so $\lambda < a^2\lambda_1 + \left(1-a^2\right)\lambda_2$, and then try to convert these into contradictory inequalities. The difficulty is that it is not clear whether $\lambda$ is being multiplied by positive or negative numbers in each equation. But I also think that using the nullspace projector is key somehow. I have been trying to come up with some vector whose projection would have an objective value satisfying the bound (then the bound holds for the opt. value). – sven svenson Sep 12 '20 at 23:34
  • Interesting! What does the $\left(\cdot\right)^+$ notation mean here? Normalization by $b^\top b$? We can assume that $b^\top b = 1$ for simplicity. – sven svenson Sep 13 '20 at 15:17
  • 1
    The $A^+$ denotes the Moore-Penrose pseudoinverse of $A$. – greg Sep 13 '20 at 15:27
  • Does it then follow that $AA^+ = I$? Wouldn't that be necessary to satisfy the gradient equation? Also, another question -- when you write the objective, you only substitute the constraint into the last term $\alpha^2_0 b^\top b$. First, shouldn't that term be $b^\top B b$? Second, why is the substitution only made there, and not, e.g., for $\alpha_0$ in the second term? – sven svenson Sep 13 '20 at 15:35
  • 1
    Damn! You're right, that term should be $b^TBb$. – greg Sep 13 '20 at 15:56
  • There might be a way to fix that. What if we keep $y^\top P y = 1 - a^2$ as a constraint and add a Lagrange multiplier? I will edit the post with some additional work, but am hoping you can glance over it. (Edit: work added.) – sven svenson Sep 13 '20 at 15:58
  • I noticed that the eigenvalues $\lambda_i$ of $B$ also satisfy $PBz_i = \lambda_i Pz_i$. I then considered $x = ab + aP\left(\lambda_i P - PBP\right)^+ PBb$ following your eqs. (3)-(4), and tried computing some numerical instances with $b \neq z_i$ for any $i$. In all cases, I got $\frac{x^\top B x}{x^\top x} = \lambda_i$ again, so the max is $\lambda_1$!! The only exception was when $b = z_i$, in which case I already have a bound. Can this be proved in general?! – sven svenson Sep 15 '20 at 17:17
  • 1
    The solution that you noted is a well known solution to the Rayleigh quotient problem, but it doesn't satisfy the $b^Tx$ constraint. I updated the post with my latest thoughts. – greg Sep 15 '20 at 18:03
  • Ah, I see what you mean. In fact, one can first pick any $v$ and obtain an appropriate $Py$. Then the normalization condition produces a nonlinear equation whose largest root is the optimal value. I wonder if something can be done to bound the root. The RHS seems to have local maxima at the eigenvalues of $PB$. – sven svenson Sep 16 '20 at 00:14
1

I don't think the conjecture is correct. For a counter example, take $B=\begin{pmatrix} 1&0&0 \\0&1&0 \\ 0&0&\varepsilon \end{pmatrix}$ and $b=\begin{pmatrix}0\\ 0\\ 1\end{pmatrix}$. Then the desired maximum is $(1-a^2)+a^2\varepsilon < 1$.

sss89
  • 649
  • While very insightful, note that this example does not satisfy $\lambda_1 > \lambda_2$. – Surb Sep 14 '20 at 08:19
  • Ow, missed this condition. Back to the blackboard then – sss89 Sep 14 '20 at 09:23
  • Yeah, as was pointed out in the comments above, there is some relationship between how small $a$ has to be and the gaps between the eigenvalues. If $\lambda_1=\lambda_2$ then $a$ must be zero. In the case where $b$ is an eigenvector the range is $\frac{1-a^2}{a^2} < \frac{\lambda_1}{\lambda_1-\lambda_2}$. – sven svenson Sep 14 '20 at 12:50