Gradient descent can be used to minimize an objective function $\Phi:\mathbb{R}^d \to \mathbb{R}$, if we know how to evaluate $\Phi$ on any input of our choice.
However, my situation is a little different. I have an objective function $\Phi$ of the form
$$\Phi(x) = \Phi_1(x) + \Phi_2(x),$$
where I can evaluate $\Phi_1$ on any input of my choice, but I don't have the ability to do that for $\Phi_2$. Instead, for $\Phi_2$, I have only a thresholded (quantized) version of $\Phi_2$: I can evaluate $f_2:\mathbb{R}^d \to \{0,1\}$ on any input of my choice, where $f_2$ is defined by
$$f_2(x) = \begin{cases} 0 &\text{if } \Phi_2(x)\le t\\ 1 &\text{if } \Phi_2(x) > t\\ \end{cases}$$
and $t$ is fixed. You can assume that $\Phi_2$ is smooth and has all the nice properties you might like, but I can only evaluate $f_2$, not $\Phi_2$. How can I search for an $x$ that's likely to make $\Phi(x)$ as small as possible, in this situation? Is there any way to adapt gradient descent or other mathematical optimization method to this setting?
Why I think there might be some hope: if we find $x',\delta \in \mathbb{R}^d$ such that $f(x')=0$ and $f(x'+\delta)=1$, where $x' \approx x$ and $\delta \approx 0$, then we've learned some information about $\Phi_2$, e.g., that the partial derivative of $\Phi_2$ is likely to be large in the $\delta$ direction. It seems like it might be possible to build an algorithm to exploit this kind of information. Are there any techniques to handle this kind of situation?