What's the difference/connection between PCA and inverse Fourier transform?

Question

Principle Component Analysis (PCA) finds the component with the highest contribution, which is very similar to the idea of inverse Fourier transform, which finds the frequency with the highest weight. Could someone help clarify their difference/connection. It seems that they are connected in some mathematical forms.

The discrete Fourier transform is $x \mapsto Wx$ where $x$ is your data vector and $W$ is the DFT matrix which is unitary ie. $W^* Wx = x$. PCA is based on the SVD $M = U S V^$ where $M$ is your data matrix, $S$ is a diagonal matrix and $U,V$ are unitary. PCA is the idea of approximating $M$ by $U \tilde{S} V^$ where in $ \tilde{S}$ we kept only the few largest (singular) values in $S$. — reuns, Aug 05 '17 at 22:44
Isn't there a connection here since W, U and V are all unitary? The two methods both represent the signal on a space expanded by orthogonal vectors (basis). The difference is that they choose different orthogonal vectors. For example, Fourier chooses sin/cos functions only.@reuns — uPhone, Aug 06 '17 at 01:36
No. The difference is that $x$ is a vector, $M$ is a matrix, and $U,V$ depend on $M$. In the case $M_{ij} = x_{i-j}$ then $M$ acts on vectors as a convolution and $M = W^* S W$ where (the diagonal of) $S$ is the DFT of $x$. https://en.wikipedia.org/wiki/Toeplitz_matrix — reuns, Aug 06 '17 at 01:46
@reuns Okay. I think I start to get the point. But what if x is a matrix? Say, 2-D DFT? — uPhone, Aug 06 '17 at 02:37
What if $x$ is a vector of length $N$ ? What is its DFT ? How do you show it is $x$ multiplied by some unitary matrix ? — reuns, Aug 06 '17 at 03:01
Both of them involve the "decomposition of data". Principal Component Analysis extracts the principal components which represent directions in the data that capture the most variance. Fourier Transform extracts the frequencies from the signal. In the former case, the principal components are eigenvectors, and in the latter case, the frequencies are sinusoidal functions. Both, however, represent the salient features of an aggregate (sometimes noisy) whole. — Neel Sandell, Aug 02 '23 at 14:40

shaman · Answer 1 · 2020-09-10T21:10:41.333

There is indeed a connection but with the Fourier transform, and not it's inverse.

PCA is also known by Karhunen-Loève Expansion, this finds the optimal basis for your data (optimal in the sense that it minimizes the euclidean distance between the data points and their projections onto the new basis). If your data is translationally invariant (i.e. permutation of one data point generates another data point), then the optimal basis consists of Fourier vectors.

In other words, PCA on translationally invariant data will yield the DFT.

See https://en.wikipedia.org/wiki/Circulant_matrix but I will give a brief, handwaving explanation:

If you have translationally invariant data matrix X:

$X = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 3 & 1 \\ 3 & 1 & 2 \\ \end{pmatrix} $

A circulant matrix $C_{ij} = c_{i-j}$ will circularly permute matrix X. Associated with $C$ is a permutation matrix $P$ which will rearrange the rows of $X$ into their original positions:

$PXC = X$

We can get the right singular vectors from:

$X^T X = (PXC)^ T PXC = C^T X^T P^{-1} PXC = C^T X^T XC$

and

$CX^T X = X^T XC$

(here I am assuming $C$ is also a permutation matrix, because I am both too lazy to prove the general case).

Since $C$ commutes with $X^T X$ they can be simultaneously diagonalized. The eigenvectors of a circulant matrix are the Fourier modes and thus the optimal basis for translationally invariant data are Fourier vectors with the singular values being the discrete Fourier coefficients.

Please also see https://en.wikipedia.org/wiki/Karhunen%E2%80%93Lo%C3%A8ve_theorem

This helps a lot for me to understand this problem! When performing the PCA on a Non-translationally invariant data, is there any connection with DFT? — uPhone, Sep 10 '20 at 21:21
Only from an abstract setting. The common theme is to find a basis for the object (vector or function), PCA finds an optimal basis in the statistical sense of maximizing variance along each basis direction. That optimal basis is the Fourier vectors in the case of translationally invariant data.
I.e. the DFT is a specific example of the more general procedure of KL transform. PCA is usually referring to a more statistical and data science oriented application of this concept. — shaman, Sep 11 '20 at 21:28

score -1 · Answer 2 · answered Mar 13 '19 at 03:25

http://shawnleezx.github.io/blog/2015/06/05/connection-between-pca/

"We could unify PCA and Fourier Transform from filter bank or linear operator point of view. Actually those two point of views are just different way to say the same thing in signal processing and mathematics.

From filter bank point of view, the process of getting the PCA coefficients and Fourier coefficients could be regarded as passing the original signal to a Linear Time Invariant system. Each Fourier basis or eigenvector is a filter. All of them make up the filter bank."

The link is dead. This is a link to web archive: https://web.archive.org/web/20190916131212/http://shawnleezx.github.io/blog/2015/06/05/connection-between-pca/. — Royi, Jun 23 '24 at 18:18

What's the difference/connection between PCA and inverse Fourier transform?

2 Answers2