2

I'd like to perform constrained gradient descent in a high-dimensional space. I'm planning to compute the gradient in the high-dimensional space and then project it to a lower-dimensional space that satisfies multiple constraints. I don't know how to mathematically express that lower-dimensional subspace, or how to project a higher dimensional vector onto it.

The objective is to have a matrix of $\gamma$ values that are as close to one as possible.

$$ \begin{bmatrix} \gamma_a & \gamma_b \\ \gamma_c & \gamma_d \end{bmatrix} $$

The $\gamma$ values are non-negative weights that are multiplied by non-negative $n$ values. The constraints act upon the products of the $\gamma$ and $n$ values.

$$ \begin{bmatrix} \gamma_a & \gamma_b \\ \gamma_c & \gamma_d \end{bmatrix} \odot \begin{bmatrix} n_a & n_b \\ n_c & n_d \end{bmatrix} = \begin{bmatrix} \gamma_a n_a & \gamma_b n_b \\ \gamma_c n_c & \gamma_d n_d \end{bmatrix} = \begin{bmatrix} a & b \\ c & d \end{bmatrix} $$

The constraints are that the rows and columns of the $\begin{bmatrix} a & b \\ c & d \end{bmatrix} $ matrix sum to the same values:

$$ a + b = c + d $$ $$ a + c = b + d $$

Since $\gamma$ is a weight I'd like the objective function to score $2$ and $\frac{1}{2}$ equivalently (and $3$ and $\frac{1}{3}$ equivalently) as if they're 'equally far from 1'. I've come up with the following objective function to optimize: $$ f = \sum_{i}^{a, b, c, d} \log(\gamma_i)^2 $$

Let's say - for example's sake - that the values of $n$ are as follows: $$ \begin{bmatrix} n_a & n_b \\ n_c & n_d \end{bmatrix} = \begin{bmatrix} 1 & 60 \\ 99 & 40 \end{bmatrix} $$

How do I figure out the optimal values for $ \begin{bmatrix} \gamma_a & \gamma_b \\ \gamma_c & \gamma_d \end{bmatrix} $?


I've tried using Lagrange multipliers for this, but it requires me to solve a system of equations with logarithms and reciprocals in it, which breaks my automatic equation solver sympy. I'm looking to extend this idea to many dimensions, so I have to have an automatic solver to solve the system of equations, solving by hand is not an option. sympy throws a NotImplementedError when solving the system of equations for me.

The derivative of the objective function is: $$ \frac{\partial f}{\partial \gamma_i} = 2 \log(\gamma_i)\frac{1}{\gamma_i} $$

I'm also fine with redifining the objective function (so long as $x$ and $\frac{1}{x}$ receive equal score):

$$ f = \sum_{i}^{a, b, c, d} (1 -\gamma_i)^2 + (1 - \frac{1}{\gamma_i})^2 $$

But this didn't help me with the Lagrange Multiplier approach.


Back to constrained gradient descent. I'm guessing that this is a convex problem, which would mean that gradient descent would converge to the global optimum.

I got the idea of constrained gradient descent from this post: Gradient descent with constraints

The idea is to project the high-dimensional gradient to the subspace that meets the constraints. In their case the projection is onto a sphere, which can be performed as normalizing the vector norm.

Their solution doesn't fit my needs, because I don't know how to formulate and perform the projection to a lower dimensional space that satisfies the constraints.

Any help is appreciated.

1 Answers1

1

To answer my own question; I found a useful reference on p367 of this book: https://grapr.files.wordpress.com/2011/09/luenberger-linear-and-nonlinear-programming-3e-springer-2008.pdf.

The constraints should be written in $Ax = b$ format: $$ n_a \gamma_a + n_b \gamma_b - n_c \gamma_c - n_d \gamma_d = 0 $$ $$ n_a \gamma_a - n_b \gamma_b + n_c \gamma_c - n_d \gamma_d = 0 $$ $$ \begin{bmatrix} n_a & n_b & -n_c & -n_d \\ n_a & -n_b & n_c & -n_d \\ \end{bmatrix} \begin{bmatrix} \gamma_a \\ \gamma_b \\ \gamma_c \\ \gamma_d \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} $$ The projection matrix that projects a vector onto the subspace is then: $$ P = I - A^T (A A^T)^{-1}A $$

Constrained gradient descent can then be performed by computing the gradient of the objective function for a set of gamma values, projecting the gradient to the subspace by multiplying it with the matrix $P$ and applying the gradient. It is important to start with an initial $\gamma_0$ that is already on the subspace. I went with: $$ \vec{\gamma_0} = \frac{1}{4} \begin{bmatrix} \frac{1}{n_a} \\ \frac{1}{n_b} \\ \frac{1}{n_c} \\ \frac{1}{n_d} \end{bmatrix} = \begin{bmatrix} 50 \\ 0.833 \\ 0.505 \\ 1.25 \end{bmatrix} $$

I also found it helpful to add the constraint

$$ a + b + c + d = 1 $$

And to use the objective function

$$ f = \frac{1}{N} \sum_i^N (1 - \gamma_i)^2 + (1 - \frac{1}{\gamma_i})^2 $$

instead of the $\log$ based one (because the quadratic one punishes outliers more).

Hope this helps someone out there!