12

I have to solve a system of up to 10000 equations with 10000 unknowns as fast as possible (preferably within a few seconds). I know that Gaussian elimination is too slow for that, so what algorithm is suitable for this task?

All coefficients and constants are non-negative integers modulo p (where p is a prime). There is guaranteed to be only 1 solution. I need the solution modulo p.

tmwilliamlin168
  • 325
  • 1
  • 2
  • 7

4 Answers4

13

A LU decomposition of a $n \times n$ matrix can be computed in $O(M(n))$ time, where $M(n)$ is the time to multiply two $n \times n$ matrices. Therefore, you can find a solution to a system of $n$ linear equations in $n$ unknowns in $O(M(n))$ time. For instance, Strassen's algorithm achieves $M(n) = O(n^{2.8})$, which is faster than Gaussian elimination. See https://en.wikipedia.org/wiki/Invertible_matrix#Blockwise_inversion.

Rather than trying to implement this yourself, I would suggest using a library: e.g., a BLAS library.

D.W.
  • 167,959
  • 22
  • 232
  • 500
5

There is what you want to achieve, and there is reality, and sometimes they are in conflict. First you check if your problem is a special case that can be solved quicker, for example a sparse matrix. Then you look for faster algorithms; LU decomposition will end up a bit faster. Then you investigate what Strassen can do for you (which is not very much; it can save 1/2 the operations if you multiply the problem size by 32).

And then you use brute force. Use a multi-processor system with multiple threads. Use available vector units. Arrange your data and operations to be cache friendly. Investigate what is the fastest way to do calculations modulo p for some fixed p. And you can often save operations by not doing operations modulo p (result in the range 0 ≤ result < p) but a bit more relaxed (for example result in the range -p < result < p).

gnasher729
  • 32,238
  • 36
  • 56
3

The best way to solve big linear equations is to use parallelisation or somehow to distribute computations among CPUs or so.

See CUDA, OpenCL, OpenMP.

A lot of people suggests Strassen's algorithm but it has a very big hidden constant which makes it inefficient.

By the way: your linear equations might be very sparse(a lot of zeros), there are few very neat optimisations to solve them in parallel.

Oleg Kovalov
  • 131
  • 4
2

In real problems when the size is large ($n>100000$) the best approach is the numerical solution of the system by an iterative method like GMRES, CG, BiCG, etc. All those algorithms depends strongly on matrix-vector products, so the complexity is as much as $O(n^2)$ which performs better than a direct solution by a LU decomposition with complexity of $O(n^3)$.

Sometimes the scale of the problem is not large enough to the proposed methods to be an improvement compared to LU decomposition, so there is a work you have to do in balance the applicability of the methods.

xskxzr
  • 7,613
  • 5
  • 24
  • 47
Carlos
  • 21
  • 1