Why is the 2-norm better than the 1-norm?

Question

Why we use the 2-norm all over the place instead of the 1-norm although both are equivalent and the 1-norm is more effieciently computable? I am currently implementing an algorithm that normalizes a vector in each step (using 2-norm). So i could also just use the 1-norm instead? More details: power iteration for the calculation of the dominat eigenvalue of a matrix:

x:=random vector of length numrows(A) x=x/||x|| 
while l changes (e.g. abs(l-l_last)<tolerance)
    x=Av
    l=||x||
    x=x/||x||

All norms are equivalent on finite vectorspaces, thats a theorem i can remeber — Paul, Jul 14 '18 at 21:14
what will it help you if you don't know what sense this equivalence? Here is from wikipedia : "However, all these norms are equivalent in the sense that they all define the same topology.". So unless we are interested in the topology of the space then maybe we should not care about that equivalence. — mathreadler, Jul 14 '18 at 21:17
That the 1 and 2 norms are not equivalent in other senses is very important for lots of the work on various types of regularized optimization last 10-15 years. "Total variation" (TV) was for example extremely popular in all kinds of applications 5-10 years ago because new efficient numerical methods to solve them were found. — mathreadler, Jul 14 '18 at 21:25
The 2-norm is not always better, but it does have certain nice properties. For example, $| x |_2$ can be interpreted geometrically as the length of the vector $x$. Also, $|x|_2^2 = \langle x, x \rangle$, which is a simple formula. — littleO, Jul 14 '18 at 21:27
https://math.stackexchange.com/questions/63238/why-do-we-use-a-least-squares-fit this may be useful in terms of optimization — qwr, Jul 14 '18 at 22:30
In New York City, the 1-norm is definitely better than the 2-norm. — Eric Duminil, Jul 15 '18 at 08:58

Somos · Answer 1 · 2018-07-15T21:21:57.377

9

In your use case, the actual norm of the vector does not matter since you are only concerned about the dominant eigenvalue. The only reason to normalize during the iteration is to keep the numbers from growing exponentially. You scale the vector however you want to prevent numeric overflow.

A key concept about eigenvectors and eigenvalues is that the set of vectors corresponding to an eigenvalue form a linear subspace. This is a consequence of multiplication by a matrix being a linear map. In particular, any scalar multiple of an eigenvector is also an eigenvector for the same eigenvalue.

The Wikipedia article Power method mentions the use of the Rayleigh quotient to compute an approximation to the dominant eigenvalue. For real vectors and matrices it is given by the value $\, (v\cdot Av)/(v \cdot v). \,$ There are probably good reasons for the use of this formula. Of course, if $\,v\,$ is normalized so that $\, v \cdot v = 1, \,$ then you can simplify that to $\, v\cdot Av. \,$

edited Jul 15 '18 at 21:21

answered Jul 14 '18 at 21:19

Somos

37,457
3
35
85

So is this not true in general i guess. But i think if you can use the 1-norm then it is better to compute – Paul Jul 14 '18 at 21:27
Depends on what you mean by "in general". – Somos Jul 14 '18 at 21:29
I mean: there exist an algorithm that uses vector normalizations, for that the 2 norm produces different results than the 1 norm. ? – Paul Jul 14 '18 at 21:41
For this particular problem, the dominant eigenvalue is the same no matter how you iterate. Try variations and find out for yourself. – Somos Jul 14 '18 at 22:33
I tried using the 1norm and i get different results, the power method now does not converges against the dominat eigenvalue anymore – Paul Jul 15 '18 at 11:59
You have to decide what works for you. Not going to argue here. – Somos Jul 15 '18 at 13:18

Martin Argerami · Answer 2 · 2018-07-15T14:12:07.370

6

Tell me which of these looks more like a ball, and I'll tell you which one is the better norm:

$$ \ $$

More seriously, the two-norm is given by an inner product, and this has far reaching consequences: like orthogonality, projections, complemented subspaces, orthonormal bases, etc., etc., etc., features that we see in Hilbert spaces.

edited Jul 15 '18 at 14:12

answered Jul 15 '18 at 02:19

Martin Argerami

217,281

2

So you are sayung „looks like a ball“ = good? – Paul Jul 15 '18 at 11:53
1

Hello Martin, could you add an explanation why "looks like a ball" qualifies as the main criterion for choosing the better norm? – Orphevs Jul 15 '18 at 12:38
I have added a small explanation. – Martin Argerami Jul 19 '18 at 02:27

score 5 · Accepted Answer · answered Jul 15 '18 at 06:45

5

In the eigenvalue algorithms, you're attempting to generate the eigenvalues and the orthogonal eigenvectors. The step right there is the normalization part. If you look at this.

the next step doesn't work with the $\ell_{1}$ norm. Or the actual purpose of it doesn't. The entire purpose here was to make that normalized. If you realize how these algorithms work if the eigenvalues are closer together it makes them take a lot longer. So numeric error will destroy the algorithm.

answered Jul 15 '18 at 06:45

This contradicts with the answer of somos !? – Paul Jul 15 '18 at 11:54
You are right, the power method now converges to a value that is definitively not the dominat eigenvalue. – Paul Jul 15 '18 at 12:05

Why is the 2-norm better than the 1-norm?

3 Answers3