1

First of all, apologize if my question is ill-posed, duplicated, or sounds pretty naïve to the audience.

For a problem I have been working on, I have formulated an optimization problem of the form: $$\psi^* = argmin_{\psi}\{d(x,g_{\psi}(x))\}$$ $$s.t. f_{\theta}(x) \neq f_{\theta}(g_{\psi}(x)),$$ where $d: X\times X\mapsto \mathbb{R}_{\geq 0}$ is a distance function and $f_{\theta}:X \mapsto Y$ is a fixed parameterized model. The goal is to find the optimal parameters $\psi^*$ for the function $g_{\psi}: X\mapsto X$.

In general, the function $d$ will be non-linear, and so will the inequality constraint. The range of $f_{\theta}$ can be either a subset of the real numbers (i.e., $Y \subseteq \mathbb{R}$) or a discrete set (i.e., $Y\in \{0,1,\ldots,k-1\}$). In the latter case, we could also consider the output of $f_{\theta}$ as a $k$-dimensional stochastic vector representing a probability distribution.

My questions are as follows.

(1) Is there a way to handle the non-linear inequality constraint? I have found an answer to a very similar problem when the two members of the inequality are scalars (i.e., $x_1 \neq x_2$ and $x_1, x_2 \in \mathbb{R}$). In such a case, it is pretty common to "linearize" the non-linear inequality using some threshold $\varepsilon$, i.e., $|x_1 - x_2| \geq \varepsilon$. This might work if $f_{\theta}(x) = x_1 \in \mathbb{R}$ and $f_{\theta}(g_{\psi}(x)) = x_2 \in \mathbb{R}$. However, what happens if $f_{\theta}$ is a vector-valued function instead? Can I extend the approach above from scalars to vectors? Specifically, what if the output of $f_{\theta}$ is a discrete probability distribution?

(2) Can I still use gradient-based methods to solve this objective?

(3) If the answer to (2) is no, what are the best alternatives?

Any help will be much appreciated.

Thanks!

0 Answers0