very slow convergence of Picard method for solving nonlinear system of equations

Question

I have a nonlinear system of equations as $$ \left(\mathbf{K}_{\mathbf{L}}+\mathbf{K}_{\mathbf{N L}}(\mathbf{X})\right) \mathbf{X}=\mathbf{F} $$ in which $\mathbf{K}_{\mathbf{N L}}(\mathbf{X})$ represents the nonlinear stiffness matrix which is dependent to $\mathbf{X}$. I'm solving with Picard iteration like this:

first ignore the nonlinear stiffness matrix and solve the linear matrix for $\mathbf{X}$.
put the resulting $\mathbf{X}$ in the nonlinear stiffness matrix and solve the full equation for $\mathbf{X}$.
check convergence and repeat 2 if the convergence is not satisfied.

the problem i have here is when the force vector($\mathbf{F}$) is small the nonlinear equation solves very fast but when i increase the force beyond some threshold it gets ages to converge. i have tried to solve it using Matlab fsolve function with algorithms like 'trust-region' and 'levenberg-marquardt' but the same thing happens with large force vectors.

is there any way i can improve the convergence speed ?

p.s. heres a gif of the result vector $\mathbf{X}$ inside the convergence loop with a force vector slighly over the threshold.

edit(more details): so my problem is bending of a nonlinear timoshenko beam that has three governing equations as below: $$ -\frac{d}{d x}\left\{A_{x x}\left[\frac{d u}{d x}+\frac{1}{2}\left(\frac{d w}{d x}\right)^{2}\right]+B_{x x} \frac{d \phi_{x}}{d x}\right\}=0 $$ $$ -\frac{d}{d x}\left\{A_{x x} \frac{d w}{d x}\left[\frac{d u}{d x}+\frac{1}{2}\left(\frac{d w}{d x}\right)^{2}\right]+B_{x x} \frac{d w}{d x} \frac{d \phi_{x}}{d x}\right\}-\frac{d}{d x}\left[S_{x x}\left(\frac{d w}{d x}+\phi_{x}\right)\right]=q $$ $$ -\frac{d}{d x}\left\{D_{x x} \frac{d \phi_{x}}{d x}+B_{x x}\left[\frac{d u}{d x}+\frac{1}{2}\left(\frac{d w}{d x}\right)^{2}\right]\right\}+S_{x x}\left(\frac{d w}{d x}+\phi_{x}\right)=0 $$ along with the proper boundary conditions and using finite difference, when assembled they form: $$ \left(\mathbf{K}_{\mathbf{L}}+\mathbf{K}_{\mathbf{N L}}(\mathbf{X})\right) \mathbf{X}=\mathbf{F} $$

@user10354138 whats the difference between fsolve and ode23s ? — omgtheykilledkenny, Jul 26 '20 at 10:23
The ordinary Picard algorithm converges linearly (notice that the scheme revolves around a fixed point iteration). So, it is not "superfast" even if you are close to the solution, this can be a nightmare (or a night with eyes open until the end of computation) for some large systems. Still, you may want to check if conditions for using the scheme hold, i.e. the function and its derivative are continuous at the point where you start your iterations (typically the origin). — Basco, Jul 28 '20 at 21:23
the function is continuous but i didn't really consider the derivative. do you mean the derivative with respect to time or X itself? because X is has three separate variables{u;w;s} — omgtheykilledkenny, Jul 29 '20 at 12:19
Your numerical question sounds like a conditioning issue. The solver of ode23s might resolve the issue since they're designed for solving "stiff" problems. I've got a few questoins: 1) What do you mean by a nonlinear matrix? and 2) Also, what is the error tolerance your solver is using? Lowering your error tolerance may yield faster "convergence" — Zim, Jul 29 '20 at 19:37
This approach is reasonable only if $\mathbf K_{NL} \ll \mathbf K_L$. Otherwise you should use linearization of $\mathbf K_{NL}$ either full (Newton's method) or partial. The less is the nonlinear term, the faster the convergence will be. — uranix, Jul 29 '20 at 20:25
thanks for your suggestion @Zim i edited my question to answer your first question. about your second question i don't think lowering the error tolerance is gonna fix it because it is obvious from the gif i provided that the solution is not converging and it is oscillating with very big difference. i think @uranix is right because as i increase the force, $\mathbf{K}_{\mathbf{N L}}\left(\mathbf{X}\right)$ gets bigger and the approach gets inefficient. but for the Newtons method i have used matlabs fsolve and from the documentations it uses Newtons methos,but the same happens. — omgtheykilledkenny, Jul 30 '20 at 09:54
It may help if you put more details in. Best, just let us know what $K_L$ and $K_{NL}(X)$ are. Right now the question is too general to offer anything better than wild shots (and, from what you are saying, it looks like they don't help much) — fedja, Jul 30 '20 at 16:19
Are $A,B,D,S$ constants and $u,w,\phi_x$- unknown functions? If so, one thing that strikes me is that your first equation allows one to eliminate $u$ and all non-linearity from the second two if we introduce the constant parameter $C={Axx[]+\dots}$ and then to quickly solve the resulting linear system and solve for $u$ in the end (another linear equation), so it looks like you are down to a one-variable problem $H(C)=something$ where $H$ is obtained by solving a purely linear system and something, apparently, is determined by the boundary conditions. Am I missing anything? — fedja, Jul 30 '20 at 21:08
@fedja yes you are right about the constants and unknown functions but i am not understanding your method for eliminating u and the nonlinear parts. could you elaborate on that some more? thank you — omgtheykilledkenny, Jul 31 '20 at 05:33
According to the first equation, the full expression in braces is constant. Denote that constant by $C$. Then, $\frac{du}{dx}+\frac 12(\frac{dw}{dx})^2=(C-B_{xx}\frac {d\phi_x}{dx})/A_{xx}$. Now plug this into the second and the third equations and treat $C$ as a known parameter. You'll get two linear ODE's for $w$ and $\psi_x$ that you can solve as quickly as you can solve your linear systems in general. Now, when you have your $w$ and $\phi_x$, you can solve the first equation for $u$ in no time. The result, will satisfy all 3 equations, so only the boundary conditions may be hurt. (contd) — fedja, Jul 31 '20 at 14:55
What exactly are the boundary conditions, by the way? Can you post them too? — fedja, Jul 31 '20 at 14:55
If none of these suggestions help you on thing that you can of course always do is throw more compute at it. In particular if you have access to a fast graphics card, GPU computing can provide pretty substantial gains for linear algebra problems. Also automatic differentiation libraries can help in providing accurate derivatives. You could try for example with pytorch's torch.optim.LBFGS optimizer. — Hyperplane, Aug 04 '20 at 10:58

Hyperplane · Answer 1 · 2020-07-28T22:59:53.153

Here's a simple idea you could try with very little extra effort: Your GIF shows that it's oscillating back and forth, a phenomenon that can also occur in classical gradient descent algorithms if the problem is bad conditioned. A very popular and powerful method to alleviate this kind of problem is called momentum, which basically consists of averaging over previous iterations.

So instead of throwing away all the previous iterates, you can do something like

$$ x_{k+1} = (1-\beta)g(x_{k}) + \beta x_k$$

Note that when $\beta=0$, we recover a standard fixed point iteration. Consider a simple fixed point problem like $x=\cos(x)$, which exhibits the oscillatory phenomenon. Then, starting from the same seed here are the residuals $|x_*-x_k|$ for different values of $\beta$:

$$ \small\begin{array}{lllllll} k & \beta=0 & \beta=0.1 &\beta=0.2 &\beta=0.3 &\beta=0.4 &\beta=0.5 \\\hline 0 & 5.45787 &5.45787 &5.45787 &5.45787 &5.45787 &5.45787 \\1 & 0.2572 & 0.777267 & 1.29733 & 1.8174 & 2.33747 & 2.85754 \\2 & 0.19566 & 0.538475 & 0.690985 & 0.555697 & 0.107195 & 0.610102 \\3 & 0.116858 & 0.162927 & 0.0696096 & 0.00419339 & 0.00218156 & 0.0454083 \\4 & 0.0835784 & 0.0908543 & 0.0249916 & 0.000723828 & 8.0351e-06 & 0.0070347 \\5 & 0.053654 & 0.0431759 & 0.00828335 & 0.000124022 & 3.34983e-08 & 0.0011389 \\6 & 0.0371882 & 0.0224696 & 0.00282738 & 2.12772e-05 & 1.39595e-10 & 0.000185622 \\7 & 0.0245336 & 0.0112062 & 0.000955803 & 3.64953e-06 & 5.81757e-13 & 3.02859e-05 \\8 & 0.0167469 & 0.00571477 & 0.000324182 & 6.26001e-07 & 2.44249e-15 & 4.94232e-06 \\9 & 0.0111768 & 0.00288222 & 0.000109831 & 1.07377e-07 & 1.11022e-16 & 8.06552e-07 \end{array} $$

A well chosen momentum can speed up convergence tremendously! A variant of this idea specific to fixed point iterations appears to be known as Anderson Acceleration.

I like the proposal, but although I cannot confirm it from the information in the question as posed by @omgtheykilledkenny, it seems that those oscillations are the result of plotting the solution for the linear stiffness and the update considering the nonlinear part. If using damping factors, it may be necessary to consider dissipation in some problems (physical systems may not conserve energy under some of these schemes). — Basco, Jul 29 '20 at 00:50
Thank you for your proposal @Hyperplane. i am going to try this method and will let you know how it works. if i got it correctly the function g(x) gives the result of my system of nonlinear equations, right? — omgtheykilledkenny, Jul 29 '20 at 12:14
Isn't $g(x_k)$ supposed to be $x_{k-1}$? Otherwise you are just using the previous iterate. — Mick, Aug 02 '20 at 13:24
@Mick $g$ returns the solution to the linear system with stiffness matrix evaluated at $x_k$, i.e. $g(x_k) = \text{solve}(K_L+K_{NL}(x_k),, F)$ — Hyperplane, Aug 04 '20 at 10:37

Hans Engler · Accepted Answer · 2020-08-04T22:15:30.170

1

I assume that $K_{NL}(0) = 0$.

Currently you are using the iteration $$ (K_L + K_{NL}(X_n))X_{n+1} = F $$ with $X_0 = 0$. Instead, first solve $$ (K_L + K_{NL}(X_\sigma))X_\sigma = \sigma F $$ for small $\sigma$, using this method. This converges quickly as you noticed. Then solve $$ (K_L + K_{NL}(X_{\sigma'}))X_{\sigma'} = \sigma' F $$ with the same iteration for some $\sigma' > \sigma$, using the same iteration but now starting with the previously found $X_\sigma$ and not with the zero vector. And so on until the right hand side is $F$.

For example, choose $\sigma = N^{-1}, \, \sigma' = 2 N^{-1}$ and so on, for sufficiently large $N$.

Of course Anderson acceleration is also a good idea here :)

edited Aug 04 '20 at 22:15

answered Aug 03 '20 at 01:29

Hans Engler

16,092

oh so you mean that i start solving the nonlinear equations for small forces and with their solution start increasing the force and converge for force? hummm seems like a reasonable idea. let me try that. i have read the Anderson acceleration papers but it looks so complicated however i trie the suggestion of @hyperplane but it gives different answers for different beta ratios. – omgtheykilledkenny Aug 03 '20 at 07:59

score 1 · Answer 3 · answered Mar 23 '21 at 20:32

You can try some of the convergence acceleration algorithms which can work very well for fixed-point (Picard-type) iterations. If you are using R, there is a package called SQUAREM which implements a reliable, convergence acceleration scheme. It is based on the paper (Varadhan and Roland, Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm, Scandinavian Journal of Statistics, 2008). EM algorithms are essentially Picard-like algorithms - they are contraction mappings which converge slowly.

very slow convergence of Picard method for solving nonlinear system of equations

3 Answers3