A matrix represents a linear transformation that rotates, scales and shears whatever you put into it; so feeding the coordinates of a square could potentially give you, say, a parallelogram. An important fact is that there is a one-to-one correspondence between all real matrices and linear transformations: if you can think of a linear transformation, then there is a way to write it as a matrix.
SVD is based on a theorem that says any matrix $\mathbf A$ can be written in the form $\mathbf{U\Sigma V}^T$ where $\mathbf U$ and $\mathbf V$ are strictly rotations and $\mathbf \Sigma$ is a matrix that scales. So, any linear transformation can be broken down into 3 steps, i.e. rotate first, stretch/scale (not necessarily by the same amount in all directions; you could stretch the x-axis twice as much as the y-axis), and rotate again.
For instance, to transform a square into a paralellogram, you could rotate clockwise by $\theta$ (the value of this is not too important as long as you pick a sensible number as the rotation matrices are not unique), scale the axes by different factors, then rotate counter-clockwise again by $\theta$.
Points 1 and 2 are related in the following way: a projection (point 2) is a 'simplified' transformation. Suppose you had a transformation that changes a 1x1 square into a 10x0.1 rectangle. A projection would be to simply say that this transformation changes the square into a 10x0 'rectangle' (which is a line). This is dimensionality reduction: your 2-dimension square is projected onto a 1-dimensional line. If you did an SVD with this, $\mathbf U$ and $\mathbf V$ would be the identity matrices, and $\mathbf \Sigma$ would be a diagonal matrix (as it always is) with entries 10 and 0.1.
The key point to understanding the dimensionality reduction part is to completely forget about rotations: by the SVD decomposition theorem, rotations are irrelevant and can be 'added' in later or earlier; you only want to know the way in things scale (along different axes), so the SVD helps you strip away the rotation: a matrix that turns a square into a parallelogram can be seen as something that scales a square into a rectangle (between two rotations). Having something scale to a small value (relative to everything else) means that you can pretend it scales to zero, which in the context of transformations, is a projection and approximates the original transformation.
To summarise the answer to your question: when your transformation is just scaling, and one of the scales is relatively small, you can replace the smallest scale factor with zero, and this gives you a projection. SVD tells you that all transformations can be expressed as a scaling between two rotations, and the idea of dimensionality reduction is to replace the scaling with a projection.
'Selecting the right axes' refers to the rotation: you want to 'project away' only once you are sure that you lose as little as possible by first rotating your shape (or data).