Here is a minimization procedure I've "dreamed up." I'm hoping to gain a better understanding of its mathematical properties and practical efficiency.
Given a (locally) convex function $f(x):{\mathbb{R}}^n \to \mathbb{R}$, initial $x_1$, initial step size $a_1$, and tolerance $\delta$:
- If $\lVert\nabla f(x_k )\rVert<\delta$, return $x_k$; otherwise:
- Pick step direction $d_k \equiv -\nabla f(x_k )/\lVert\nabla f(x_k )\rVert$.
- Pick step size $a_k$.
- Let $x_{k+1} \equiv x_k +a_k d_k$.
- Let $a_{k+1} \equiv a_1 /k$.
- Let $k\equiv k+1$ and return to step 1.
Most optimization procedures require you to do some kind of line search after picking the step direction, but this algorithm avoids that computation by simply choosing an arbitrary $a_1$ and letting it decrease as the function iterates. Since
$$a_k =\frac{1}{k}$$
the step size approaches $0$ in the limit $k\to \infty$ and the sequence of iterates $\left\{ x_k \right\}$ is convergent. On the other hand, since the sum
$$\sum_{k=1}^{\infty } a_k =a_1 \sum_{k=1}^{\infty } \frac{1}{k}$$
is divergent, the cumulative sum of the step sizes is infinite, so assuming convexity, we will never get "stuck" at an $x$ far from $x^*$. (I am unsure of how to prove this formally.)
The above properties also apply for a more general algorithm where, in step 5, we let $a_{k+1} \equiv a_1 /k^t$ with $t\in (0,1]$.
Is there a name for this optimization procedure? What are its convergence properties? How should one select the initial values $x_1$ and $a_1$ in the general case?
Here is a proof-of-concept implementation in Matlab. Since we have to compute the gradient numerically, I have it evaluate the gradient over a "neighborhood" of size nsize around $x_k$. nsize is initialized to 0.01 and decreases by a factor of $k$ at each iteration, which prevents cycling.
[x, y] = minimize2d(@obj, -1.34, 1.79, 1, 0.01, 10e-15);
x_star = x(end)
y_star = y(end)
f_star = obj(x_star, y_star)
[x_plot, y_plot] = meshgrid(linspace(-1.6, 0.3, 51),linspace(.9, 1.9, 51));
z_plot = obj(x_plot, y_plot);
contour(x_plot, y_plot, z_plot, 10)
hold on
plot(x, y, "-k")
scatter(x_star, y_star)
hold off
function f = obj(x, y)
f = 4x.^2 + exp(1.5y) + exp(-y) - 10*y;
end
function [x, y] = minimize2d(fun, x0, y0, a0, Nsize, tol)
x = x0; y = y0; a = a0;
grad_magnitude = tol + 1;
i = 1;
while grad_magnitude > tol
a = a0 / i;
Nsize = Nsize / i;
[xN, yN] = meshgrid(linspace(x(i)-Nsize, x(i)+Nsize, 3), ...
linspace(y(i)-Nsize, y(i)+Nsize, 3));
f = fun(xN, yN);
[px, py] = gradient(f);
grad_magnitude = norm([px(2) py(2)]);
step = -a * [px(2), py(2)] / norm([px(2) py(2)]);
x(i+1) = x(i) + step(1);
y(i+1) = y(i) + step(2);
i = i + 1;
end
nit = i
end
Output:
nit = 16
x_star = -7.5968e-06
y_star = 1.2651
f_star = -5.6986
