The Schönhage–Strassen multiplication algorithm works by turning multiplications of size $N$ into many multiplications of size $lg(N)$ with a number-theoretic transform, and recursing. At least I think that's what it does because there's some other cleverness and I really don't understand it well enough to summarize accurately. It finishes in $O(N \cdot lg(N) \cdot lg(lg(N)))$ time.
A number-theoretic transform is exactly like a Discrete Fourier Transform, except it's done in the finite field $F_{2^N+1}$ of integers modulo $2^N+1$. This makes the operations a lot cheaper, since the Fourier transform has a lot of multiplying by roots of unity, and $F_{2^N+1}$'s roots of unity are all powers of 2 so we can just shift! Also, integers are a lot easier to work with than floating point complex numbers.
Anyways, the thing that confuses me is that $F_{2^N+1}$ is very large. If I give you a random element from $F_{2^N+1}$, it takes $O(N)$ bits to specify it. So adding two elements should take $O(N)$ time. And the DFT does a lot of adding.
Schönhage–Strassen splits the input into $\frac{N}{lg(N)}$ groups with $lg(N)$ bits. These groups are the values of $F_{2^N+1}$ that it will transform. Each pass of the DFT will have $O(\frac{N}{lg(N)})$ additions/subtractions, and there are $O(lg(\frac{N}{lg(N)}))$ passes. So based on addition taking $O(N)$ time it seems like the cost of all those additions should be $O(N \frac{N}{lg(N)} lg(\frac{N}{lg(N)}))$, which is asymptotically the same as $O(N^2)$.
We can do a bit better than that... because the values start out so small, the additions are quite sparse. The first pass' additions really only cost $O(lg(N))$ each, and the second pass' cost $2^1 O(lg(N))$ each, and the i'th pass' cost $O(min(N, 2^i \cdot lg(N)))$ each, but that still all totals out to a terrible $\frac{N^2}{lg(N)}$.
How is Schönhage–Strassen making the additions cheap? How much do they cost overall?
Is it because the algorithm actually uses $F_{N+1}$ (with $N$ guaranteed to be a power of 2)? There's enough stacked $2^{2^{n}}$ and german in the paper that I'm really not sure. On the other hand, I don't think that guarantees enough roots of unity for things to work out.