3

Suppose that we have an $m\times n$ matrix $A$ of rank $n$, whose entries are 8-bit unsigned integers obtained from a grayscale image. Now we want to apply SVD to $A$ and to use the first $k$ singular values to construct the the best rank-$k$ approximation of $A$ to accomplish compression for data storage.

I understand that for a floating-point image, the compression rate for SVD is

$$\frac{mn}{k(m+n+1)},\tag{1}$$

as mentioned in a lot of places including some textbooks. For example, a quick internet search gives us this article and this convenient demo link.

Coming back to our case, the entries of $A$ are 8-bit unsigned integers, while we still need to use floating points (32 bits or 64 bits) for the storage of the singular values and vectors (because 8-bits do not seem enough for the precision), it would seem that the compression rate for this case should become

$$\frac{mn}{4k(m+n+1)}, \quad\text {or} \quad \frac{mn}{8k(m+n+1)}.\tag{2}$$

This would render this compression scheme not practical in most cases. For example, using the demo link above, for that default image there, we need at least the first 100 singular values for a less-than-satisfactory compressed image. Using Formula $(1)$, that is assuming the original image is using the floating point data type, the compression rate is $2.40$; but assuming the original image is using the 8-bit unsigned integer data type, using Formula $(2)$, the compressed image becomes at least $0.6$ times bigger than the original one.

Does SVD compression only work for floating point images? What am I missing here?

Peradventure
  • 131
  • 2

1 Answers1

1

Does SVD compression only work for floating point images?

It seems quite obvious that it works for arbitrary images. Whether you are satisfied with the amount of compression or not doesn't really change the fact that it works.

I think looking at the compression ratio is going to lead to misleading results. Suppose you take two images: $I_1$ uses 8-bit RGB, and $I_2$ is the same image but captured using 24-bit RGB. Note that it is often fairly challenging to distinguish between two such images, and the differences are often perceptually minor. Now suppose you compress both with lossy compression. Suppose that both compressed images happen the same size. Then the compression ratio of $I_1$ is 3x worse than the compression ratio of $I_2$. Does that mean that compression doesn't work for 8-bit RGB images? No, it means that $I_2$ has a bunch of extra information that isn't semantically relevant. Thus, a compression algorithm that throws away this irrelevant information will achieve a higher compression ratio. That's not because the compression algorithm is better; it's just a reflection of the fact that $I_2$ has more irrelevant information (or information with low relevance).

And this is exactly the situation you're in. There is no way that the low 8 bits of those 32-bit floating point numbers are perceptually meaningful; no human is going to be able to perceive those. So, if you're capturing an image with 32-bit floats, then some of the bits of those floats are pure noise that are perceptually completely irrelevant. So, of course a compression algorithm that throws away that irrelevant information is going to achieve a higher compression ratio. Does that mean that compression on 8-bit images doesn't work? No, it just means there was less irrelevant information to throw away. This phenomenom is not specific to SVD-based compression; it will likely be true for any form of good lossy compression.

I can always make the compression ratio arbitrarily good by appending a bunch of irrelevant useless information to the input, and then modifying my compression algorithm to throw away the irrelevant information. That doesn't mean I have a better compression algorithm, or that the compression algorithm doesn't work or is unacceptable on the original images.

For all of these reasons, I think a better way to compare the two schemes is by the size of the compressed image, not by the compression ratio. And both variants you describe achieve the same size compressed image, so the apparent difference evaporates.

Second, when you're doing your comparison, there is no reason why you need to hold $k$ fixed. Usually we think of lossy compression as a tradeoff between the size of the compressed image vs the perceptual quality. You have two tunable parameters: $k$, and the number of bits of precision that use for the entries of the outputs of the compression. You are free to choose those arbitrarily to maximize the perceptual quality of the image, for a given image. For instance, instead of using one value of $k$ and 32-bit floats, you might do better to double $k$ and use 16-bit floats. Probably only empirical experiments can help you set those parameters in the optimal way... but you shouldn't assume what the optimal setting will be, or that it will necessarily be the same for both settings (both 8-bit inputs vs 32-bit inputs).

D.W.
  • 167,959
  • 22
  • 232
  • 500