19

I'm trying to grasp the difference between the affine and projective transformations.

I got the point of the "line at infinity", but their matrix representation is not yet clear enough.

Here's the affine transformation $A$

$$ A = \begin{bmatrix} a_1 & a_2 & t_x \\ a_3 & a_4 & t_y \\ 0 & 0 & 1 \end{bmatrix} $$

The matrix $A$ has 2 rotational, 2 scale and 2 translational parameters.

For projective one (homography) $H$:

$$ H = \begin{bmatrix} h_1 & h_2 & h_3 \\ h_4 & h_5 & h_6 \\ h_7 & h_8 & h_9 \end{bmatrix} $$

We still have the same parameters as the affine, but we added the last row, which is the projection vector.

Why do we need to add this vector to the matrix? What does it mean? What is the impact of that vector on the projection itself?

Maystro
  • 293
  • 1
  • 2
  • 6

3 Answers3

28

First, I think it may be a mistake to think about "translation", "rotation" and "scale", which is one particular decomposition of the affine group -- perhaps it's better to think about what transformations can be effected by affine maps.

For affine maps: We can move any collection of three noncollinear points to any other collection of three points (which must be noncollinear if we want the map to be invertible).

For projective maps: we can move any collection of four points (no three collinear) to any collection of four points. (Although to make complete sense of this, points and lines at infinity must be included.)

Similar characterizations for smaller groups:

Translation: we can move any point to any other.

Rotation: we can move any line through the origin to any other line through the origin

T + R: we can move any point-line pair to any other point-line pair, where a "point line pair" means a line L and a point P that lies on L.

I'll let you work out descriptions of the transformative power of things like "all scales and rotations", etc.

ADDITIONAL REMARKS

  1. Although a homography has 9 entries, there are really only 8 free parameters, in the sense that two matrices that differ by a multiplicative (nonzero) constant represent the same homography. So we might as well simplify a bit by dividing through by h9 to get a matrix whose lower right entry is a 1. (That'll miss out on describing matrices whose lower-right entry is 0, but this is a small set, and once you understand the others, this last set won't give you any problems.

  2. Such a matrix can now be factored into \begin{align} \begin{bmatrix} h_1 & h_2 & h_3 \\ h_4 & h_5 & h_6 \\ h_7 & h_8 & 1 \end{bmatrix} & = \begin{bmatrix} h_1- h_3 h_7& h_2 - h_3 h_8& h_3 \\ h_4 - h_6 h_7 & h_5 - h_6 h_8 & h_6 \\ 0 & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ h_7 & h_8 & 1 \end{bmatrix} \end{align} i.e., your transformation becomes a combination of an affine transform (on the left), albeit one slightly different from the one you "see" in the top 6 matrix entries of your original matrix, and an transform whose only interesting entries are in the bottom row. So since you understand affine xforms already, let's look at the rightmost matrix, which I'll rewrite $$ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ u & v & 1 \end{bmatrix} $$ to avoid having to type subscripts. Note that if $(u, v) = (0, 0)$, then this is an affine transformation and you know about this, so from here on, we'll assume that $u$ and $v$ are not both zero.

What does this to a point $(x, y)$ of the plane? Well, we write $(x,y)$ as a column vector by appending a "1", so we get $$ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ u & v & 1 \end{bmatrix} \begin{bmatrix} x\\ y \\ 1 \end{bmatrix} = \begin{bmatrix} x\\ y \\ ux + vy + 1 \end{bmatrix} $$ which, when "rehomogenized" (i.e., when divided by its last coordinate to make the last coordinate by "1"), becomes $$ \begin{bmatrix} x/ (ux + vy + 1)\\ y/ (ux + vy + 1) \\ 1 \end{bmatrix}. $$ In short, we get the transformation $$ (x,y) \mapsto (\frac{x}{ux + vy + 1}, \frac{y}{ux + vy + 1}). $$ What does that "look like"? Well, it sends the line where $ux + vy = -1$ to infinity. It takes the line where $ux + vy + 1 = 1$ to itself (i.e., it fixes every point on that line). But as for the details...let's simplify a little.

  1. By rotating the coordinate system, we can assume that the point $(u, v)$ lies on the positive $y$-axis; by uniformly scaling the coordinate system, we can make $(u, v)$ be $(0, 1)$. So now all we have to understand is the transformation defined by the matrix $$ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} x\\ y \\ 1 \end{bmatrix} = \begin{bmatrix} x\\ y \\ y + 1 \end{bmatrix} $$ i.e., $$ (x, y) \mapsto (\frac{x}{y+1}, \frac{y}{y+1}). $$ This transformation fixes the origin, and sends the line $y = -1$ to infinity. It holds the line $y = 0$ fixed, pointwise. And it takes the point $(0, -1, 1)$ [now I'm including the 3rd homogenous coordinate] to $(0, -1, 0)$, the point at infinity representing all lines parallel to the $y$-axis.

To be more explicit: you can think of this as transforming the plane by fixing the $x$-axis, and transforming each line through $(0, -1)$ into a vertical line. If the line $L$ passes through $(0, -1)$ and $(a, 0)$, then the transformed line will pass through $(a, 0)$ and be vertical. People in computer graphics sometimes call this the "unhinging" transformation, thinking of two diagonal lines through $(0, -1)$ as forming a "hinge", while after transformation, they become parallel vertical lines.

John Hughes
  • 100,827
  • 4
  • 86
  • 159
  • Thanks. but I almost know this information but what I need is to understand the difference in their matrix representation.. what is the aim of having the last row in projective transformation matrix? and what data it represents ? – Maystro Jun 10 '15 at 09:54
  • 1
    And I'm suggesting that asking "what does this number represent" is a little like asking "What does the fourth bit of this hash-value mean?" It may have an answer, but probably not one that leads to insights. But see my additional remarks above. – John Hughes Jun 11 '15 at 10:29
4

The whole point of the representation you're using for affine transformations is that you're viewing it as a subset of projective space. A line has been chosen at infinity, and the affine transformations are those projective transformations fixing this line.

Therefore, abstractly, the use of the extra parameters is to describe where the line at infinity moves during the projective transformation.

rschwieb
  • 160,592
  • 1
    You can puzzle out what the extra entries do explicitly, but I'm not sure it is worth the effort. It depends on your needs if the effort will gain you anything. – rschwieb Jun 10 '15 at 10:16
  • Thanks. Actually, I have to dig into the details because I'm currently working on object detection application where I think the homography is the best choice. One last question, not sure if you have an idea, we're talking here about transformation between 2D planes so if I have a non planar object, knife for example, that I need to project into another image (2D plane), it won't work properly since the knife is not a plane, is it right? – Maystro Jun 10 '15 at 12:01
  • @Maystro This description you just gave isn't making a lot of sense to me. If you carefully explain the problem again, maybe I will see what you mean. – rschwieb Jun 10 '15 at 17:26
4

I made a few animations that illustrate what the transforms in John Hughes's examples "look like" for the "pure projective" case

$$ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ u & v & 1 \end{bmatrix} $$

In the animation I display what the values of $u$ and $v$ are on the left with the original image. On the right is the warped image. Keep in mind this is an image coordinate system so, the top left is $(0, 0)$ and a positive y direction is downwards.

The first animation shows v ranging from 0 to 1 and clearly illustrates the "hinge" behavior mentioned in previous posts.

enter image description here

It looks a bit different if I range $v$ from 0 to -1, but it's the same idea.

enter image description here

Then I do the same for u, varying it between 0 and 1.

enter image description here

And here is u and v when they vary between 0 and 1 together.

enter image description here

Lastly, I wanted to get a sense of what was happening in the other quadrants not shown in the image, so I augmented the transform to include a translation before the projective matrix and then its inverse after. i.e.

$$ \begin{bmatrix} 1 & 0 & 128 \\ 0 & 1 & 128 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ u & v & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & -128 \\ 0 & 1 & -128 \\ 0 & 0 & 1 \end{bmatrix} $$

Which produced this effect where the "hinge" now appears around the line $y=128$.

enter image description here

I'm not sure if the image "re-entering" the canvas from the bottom is representing something happening mathematically or if it is an artifact of the opencv implementation I used for the projective transform. If someone could shed some light on that I'd be interested and I'll update this post accordingly.

Intuition I gained from doing this is that the protective component needs to be a low magnitude number in order to not completely distort the image, otherwise points on the image itself start getting sent to infinity, and that doesn't make for an appealing or useful result.

EDIT: It does appear that the wrap around effect is a real part of the transform and not an artifact. You can verify this by manually transforming a point and looking at where it appears. I chose a few of these "control points", drew them on the image and drew lines between their original position and where they warped to:

enter image description here

Erotemic
  • 141