9

Why we use the 2-norm all over the place instead of the 1-norm although both are equivalent and the 1-norm is more effieciently computable? I am currently implementing an algorithm that normalizes a vector in each step (using 2-norm). So i could also just use the 1-norm instead? More details: power iteration for the calculation of the dominat eigenvalue of a matrix:

x:=random vector of length numrows(A) x=x/||x|| 
while l changes (e.g. abs(l-l_last)<tolerance)
    x=Av
    l=||x||
    x=x/||x||
Paul
  • 241
  • 3
    What do you mean they are equivalent? In what sense? – mathreadler Jul 14 '18 at 21:08
  • All norms are equivalent on finite vectorspaces, thats a theorem i can remeber – Paul Jul 14 '18 at 21:14
  • 2
    what will it help you if you don't know what sense this equivalence? Here is from wikipedia : "However, all these norms are equivalent in the sense that they all define the same topology.". So unless we are interested in the topology of the space then maybe we should not care about that equivalence. – mathreadler Jul 14 '18 at 21:17
  • 1
    That the 1 and 2 norms are not equivalent in other senses is very important for lots of the work on various types of regularized optimization last 10-15 years. "Total variation" (TV) was for example extremely popular in all kinds of applications 5-10 years ago because new efficient numerical methods to solve them were found. – mathreadler Jul 14 '18 at 21:25
  • 1
    The 2-norm is not always better, but it does have certain nice properties. For example, $| x |_2$ can be interpreted geometrically as the length of the vector $x$. Also, $|x|_2^2 = \langle x, x \rangle$, which is a simple formula. – littleO Jul 14 '18 at 21:27
  • 1
    https://math.stackexchange.com/questions/63238/why-do-we-use-a-least-squares-fit this may be useful in terms of optimization – qwr Jul 14 '18 at 22:30
  • In New York City, the 1-norm is definitely better than the 2-norm. – Eric Duminil Jul 15 '18 at 08:58
  • 1
    @ericduminil on Manhattan sure, but the whole city? – mathreadler Jul 17 '18 at 17:05
  • @mathreadler: You're right, I was thinking about Manhattan. – Eric Duminil Jul 18 '18 at 18:01

3 Answers3

9

In your use case, the actual norm of the vector does not matter since you are only concerned about the dominant eigenvalue. The only reason to normalize during the iteration is to keep the numbers from growing exponentially. You scale the vector however you want to prevent numeric overflow.

A key concept about eigenvectors and eigenvalues is that the set of vectors corresponding to an eigenvalue form a linear subspace. This is a consequence of multiplication by a matrix being a linear map. In particular, any scalar multiple of an eigenvector is also an eigenvector for the same eigenvalue.

The Wikipedia article Power method mentions the use of the Rayleigh quotient to compute an approximation to the dominant eigenvalue. For real vectors and matrices it is given by the value $\, (v\cdot Av)/(v \cdot v). \,$ There are probably good reasons for the use of this formula. Of course, if $\,v\,$ is normalized so that $\, v \cdot v = 1, \,$ then you can simplify that to $\, v\cdot Av. \,$

Somos
  • 37,457
  • 3
  • 35
  • 85
  • So is this not true in general i guess. But i think if you can use the 1-norm then it is better to compute – Paul Jul 14 '18 at 21:27
  • Depends on what you mean by "in general". – Somos Jul 14 '18 at 21:29
  • I mean: there exist an algorithm that uses vector normalizations, for that the 2 norm produces different results than the 1 norm. ? – Paul Jul 14 '18 at 21:41
  • For this particular problem, the dominant eigenvalue is the same no matter how you iterate. Try variations and find out for yourself. – Somos Jul 14 '18 at 22:33
  • I tried using the 1norm and i get different results, the power method now does not converges against the dominat eigenvalue anymore – Paul Jul 15 '18 at 11:59
  • You have to decide what works for you. Not going to argue here. – Somos Jul 15 '18 at 13:18
6

Tell me which of these looks more like a ball, and I'll tell you which one is the better norm:

$$ \ $$

                                                        enter image description here

More seriously, the two-norm is given by an inner product, and this has far reaching consequences: like orthogonality, projections, complemented subspaces, orthonormal bases, etc., etc., etc., features that we see in Hilbert spaces.

Martin Argerami
  • 217,281
5

In the eigenvalue algorithms, you're attempting to generate the eigenvalues and the orthogonal eigenvectors. The step right there is the normalization part. If you look at this. enter image description here

the next step doesn't work with the $\ell_{1}$ norm. Or the actual purpose of it doesn't. The entire purpose here was to make that normalized. If you realize how these algorithms work if the eigenvalues are closer together it makes them take a lot longer. So numeric error will destroy the algorithm.