2

I am working with three dimensional measurement data and want to model them using a multivariate linear regression. I have already implemented a simple gradient descent algorithm to solve the classic regression problem

$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3.$

However my ultimate goal is to fit three target variables with a hard condition:

$y_1 = \beta_{10} + \beta_{11}x_1 + \beta_{12}x_2 + \beta_{13}x_3,\\ y_2 = \beta_{20} + \beta_{12}x_1 + \beta_{22}x_2 + \beta_{23}x_3,\\ y_3 = \beta_{30} + \beta_{31}x_1 + \beta_{32}x_2 + \beta_{33}x_3,\\ \text{subject to }\beta_{11}+\beta_{22}+\beta_{33}=0.$

Or in other words

$\underline{y} = \underline{\underline{B}}\cdot\underline{x} + \underline{\beta_0},\\ \text{subject to } \mathrm{tr}(\underline{\underline{B}})=0.$

I've searched high and low for multi-target or multi-output regression methods but I could not find out how I would possibly go on to implement a hard condition into them. Happy for any kind of recommendations, even just keywords for further research.

schafran
  • 131
  • 4

1 Answers1

1

After some further research and inspiration I have figured out the following method:

I am using the vector notation of my problem with $\underline{y}=\begin{pmatrix}y_1\\y_2\\y_3\end{pmatrix},~\underline{x}=\begin{pmatrix}1\\x_1\\x_2\\x_3\end{pmatrix},~\underline{\underline{B}}=\begin{pmatrix}\beta_{10}&\beta_{11}&\beta_{12}&\beta_{13} \\ \beta_{20}&\beta_{21}&\beta_{22}&\beta_{23} \\ \beta_{30}&\beta_{31}&\beta_{32}&\beta_{33} \end{pmatrix}$ This results in the optimisation problem with $N$ examples

$\underline{\underline{\hat{B}}}=\underset{\beta \in C}{\operatorname{argmin}}L(\underline{\underline{B}})=\underset{\beta \in C}{\operatorname{argmin}}\sum_{i=1}^{N}\lVert \underline{y}_i - \underline{\underline{B}}\cdot\underline{x}_i \rVert^2,~~~~\text{with}~~~~C= \lbrace \underline{\underline{B}}\in \mathbb{R}^{3\times4} \mid \beta_{11}+\beta_{22}+\beta_{33}=0\rbrace$

For solving this, I am using projected gradient descent where $\underline{\underline{B}}$ for every iteration is projecet onto $C$. Normally the Projection operator would require another optimisation process $\operatorname{proj}_C(x)=\operatorname{argmin}_u \big\lbrace \chi_C(u) + \frac{1}{2}\lVert u-x\rVert^2\big\rbrace$ with $\chi_C$ the characteristic function.

Now comes the neat part: Since the constraint $C$ describes a flat subspace in a 12-dimensional vector space, the projection can be described as a simple linear map.

Using a reduced version of $\underline{\beta}=\begin{pmatrix}\beta_{11}\\\beta_{22}\\\beta_{33}\end{pmatrix}$ (in the projection step $\underline{\underline{B}}$ will be flattened to a vector to avoid working with 3D tensors), the projection $\underline{\beta}'=\operatorname{proj}_C(\underline{\beta})$ can be represented as

$\underline{\beta}' = \underline{\underline{P}}\cdot\underline{\beta}~~~~\text{with}~~~~ \underline{\underline{P}}=\frac{1}{3}\begin{pmatrix}2&-1&-1\\-1&2&-1\\-1&-1&2\end{pmatrix}$

The vector $\underline{\beta}$ and projection map $\underline{\underline{P}}$ can then be arbitrarily augmented with the remaining entries of $\underline{\underline{B}}$, as they will not be affected by the projection.

With the above method, my constrained gradient descent is only slightly more expensive than a usual gradient descent method by computing one extra linear operation for each iteration.

The results I have obtained so far with the method seem promising

schafran
  • 131
  • 4