1

I need to solve an optimization problem involving an Extreme Learning Machine $z=W_2\sigma(W_1 x)$, where the weight matrix for the hidden layer $W_1$ is a fixed random matrix, $\sigma()$ is the activation function, and the output weight matrix $W_2$ is chosen by minimizing the Mean Squared Error of the produced output $f(z) = ||W_2\sigma(W_1 x) - y||^2_2$ plus a $L_1$ (lasso) regularization term. So the objective function is as follows: $$f(W_2) = \| W_2 \sigma(W_1 x) - y \|^2 + \lambda \| W_2 \|_1$$ I need to solve the problem using Nesterov Accelerated Gradient, but ELMs typically use a non-iterative method for learning the output weights. we can get around the problem by adapting the network to include a gradient-based iterative process. To ensure that NAG is applicable to the ELM with an MSE loss regularized by L1, we need to account for the fact that the L1 regularization introduces non-smoothness. NAG, as traditionally applied, assumes smoothness (L-Lipschitz continuity), which the L1-regularized loss function lacks.

The problem can be solved adapting the ELM or modifying the NAG approach? Can you provide also some references to study the appropriate solution?

all.m
  • 11

0 Answers0