6

Quanta Magazine's April 13, 2023 A New Approach to Computation Reimagines Artificial Intelligence starts with:

By imbuing enormous vectors with semantic meaning, we can get machines to reason more abstractly — and efficiently — than before.

Later on, during the explanation are the paragraphs:

The vectors must be distinct. This distinctness can be quantified by a property called orthogonality, which means to be at right angles. In 3D space, there are three vectors that are orthogonal to each other: One in the x direction, another in the y and a third in the z. In 10,000-dimensional space, there are 10,000 such mutually orthogonal vectors.

But if we allow vectors to be nearly orthogonal, the number of such distinct vectors in a high-dimensional space explodes. In a 10,000-dimensional space there are millions of nearly orthogonal vectors.

I remember reading previous questions here with high dimensions and dot products are discussed and seeing comments about how easy it is to get very small or even zero dot products in high dimensions, but I've never worked outside of one, two and three dimensional problems.

Question: What definition of "nearly orthogonal" would result in "In a 10,000-dimensional space there are millions of nearly orthogonal vectors"? Would it be for example dot product1 smaller than some number like 0.1?


1of the presumably normalized vectors

uhoh
  • 1,967
  • Presumably that reference defines such a critical concept, no? Anyway, it can't be the dot product alone, since we can make the dot product small just by rescaling. Maybe if you fix the norm and then use the dot product, but even then there are uncountably many. – lulu Apr 16 '23 at 21:57
  • It's a figure of speech. – Ethan Bolker Apr 16 '23 at 21:59
  • @EthanBolker but I've now asked a mathematical question here, that will have a mathematical answer. – uhoh Apr 16 '23 at 22:00
  • 6
    https://mathoverflow.net/questions/24864/almost-orthogonal-vectors – L. F. Apr 16 '23 at 22:05
  • I doubt that reference is to a precisely defined concept in the literature. Asking for what one might be is not a precise enough question for an answer. – Ethan Bolker Apr 16 '23 at 22:06
  • 1
    Since two vectors are orthogonal exactly when their dot product is zero, perhaps two vectors can be said to be “nearly orthogonal” if their dot product is less than $\epsilon$ for some chosen positive $\epsilon$, and you choose a smaller $\epsilon$ if you want your idea of “nearly” to be more restrictive and exacting. – MJD Apr 16 '23 at 22:08
  • 1
    I call bullshit, or imprecise speech. Of course already in $\mathbb R^3$ there are infinitely many vectors, and even infinitely many "directions" (they mean mod out scalar multiples), that are not just "nearly" but exactly orthogonal to $(1,0,0)$, e.g. $(0,1,1), (0,1,2), (0,1,-17\pi) ...$. – Torsten Schoeneberg Apr 16 '23 at 22:08
  • 4
    @TorstenSchoeneberg I think you have missed the point. Of course there is an infinite family of vectors orthogonal to $(1, 0, 0)$. But if you have a set $S$ of vectors from $\Bbb R^3$ which are pairwise orthogonal then $|S|$ is at most 3. And we might reasonably ask, given $\epsilon > 0$, how large can a set $S\subset \Bbb R^3$ be if $a, b\in S$ implies $\langle a, b\rangle <\epsilon$. – MJD Apr 16 '23 at 22:10
  • 3
    @MJD Thanks for clarifying. If that is what is meant (and I agree it should be), it is extraordinarily poorly phrased in the quote. – Torsten Schoeneberg Apr 16 '23 at 22:12
  • 2
    I found it initially puzzling but after a little thought I realized what was probably meant. – MJD Apr 16 '23 at 22:16
  • 3
    @TorstenSchoeneberg - Notice that nowhere does the quote refer to a "vector space". Based on the subject matter of the paper, I am 99% sure that the "space" in question is instead a lattice of vectors, not a vector space, and almost certainly a finite lattice at that. By almost orthogonal, they probably mean lattice points that differ from the orthogonal direction by in each coordinate by amounts $\ll$ their length. But the criteria of interest to them is surely discussed in the paper, so you (uhoh) should really be looking there, not here, for an explanation. – Paul Sinclair Apr 18 '23 at 02:39
  • 1
    @PaulSinclair but now future Stack Exchange readers can look here and find a very nice answer, which is one of the two main reasons that we post SE questions :-) uhoh's lemmas, especially #3 – uhoh Apr 20 '23 at 22:50

2 Answers2

15

Based on discussion and references in the comments (principally from user L.F.) it appears that, as I guessed, two vectors are considered “nearly orthogonal” if their dot product is “small”: that is, $a$ and $b$ are nearly orthogonal if, in the context of a particular value $\epsilon$, we have $$-\epsilon \le \frac{a\cdot b}{\lvert a\rvert \lvert b \rvert} \le \epsilon.\tag{$\star$}$$ The smaller the value of $\epsilon$, the stricter the requirement imposed by “nearly orthogonal”. As $\epsilon$ goes to zero, the meaning of “nearly orthogonal” approaches the actually orthogonal.

This Math Overflow post asks how, for given $n$ and $\epsilon$, one can find a large family of vectors from $\Bbb R^n$ that are nearly orthogonal in this sense. The top answer there, by Bill Johnson, cites the so-called Johnson-Lindenstrauss lemma and claims that you can find a family of $k$ nearly-orthogonal vectors if

$$n\ge C \epsilon^{-2} \log k$$

where $C$ is a fixed constant no larger than $8$. Bill Johnson is (or at least purports to be) one of the namesakes of the Johnson-Lindenstrauss lemma, so the answer is likely to be reliable.

Turning this around, we have that, given $n$ and $\epsilon$, one can find at least $$e^{n\epsilon^2/8}$$ nearly-orthogonal vectors. Note that the appearance of $e$ here is rather arbitrary, as its value can be absorbed into the $C$. For the specific case of $k \approx 10^6, n=10000$ that you asked about, we find that $\epsilon = 0.12$ is sufficient to find many millions of nearly-orthogonal vectors, but $\epsilon = 0.1$ may not be.

(Beware; some of the answers seem to consider the less strict constraint that the dot product lie in $[-1, \epsilon]$ rather than in $[-\epsilon, \epsilon]$, and I have not thought carefully about how this will affect the results. For large $n$, not too much, I think.)

A reply by Timothy Gowers explains why this result is plausible: The vectors in $\Bbb R^n$ lie on the the unit $n-1$-sphere, and each vector can be thought of as excluding a portion of this sphere that is proportional to $(1-\epsilon)^{n-1}$.

Separate answers by Ryan O'Donnell and by ‘kodlu’ provide a method for locating $2^{O(\epsilon^2n)}$ nearly-orthogonal unit vectors: simply select random vectors whose components are $\pm \frac1{\sqrt n}$; by a probabilistic argument these will usually be nearly-orthogonal.

Disclaimer: I tried to summarize the MO discussion, but I did not think about any of it carefully, so I may have gotten the details wrong. Jelani Nelson suggests consulting Problems and Results in Extremal Combinatorics Part I, by Noga Alon, for details. This is currently available online from Professor Alon's web site at Tel Aviv University. The pertinent part seems to be section 9.

MJD
  • 67,568
  • 43
  • 308
  • 617
  • 1
    Considering two vectors in two-dimensional space is there not an infinite number of pairs of orthogonal vectors? – user Apr 21 '23 at 20:54
  • 2
    @user You misunderstood. The quest is for large families of vectors such that any pair of vectors within the family are "nearly orthogonal". In your example every family has size two only. – Jyrki Lahtonen Apr 22 '23 at 04:46
  • 1
  • @JyrkiLahtonenj What exactly did I misunderstand? Do you claim that there are only finite number of pairs of (nearly) orthogonal vectors in two-dimensional space? – user Apr 22 '23 at 16:51
  • 2
    You seem to have misunderstood that the quest is for large families of vectors such that any pair of vectors within the family are "nearly orthogonal". – MJD Apr 22 '23 at 20:02
  • @user In two dimensions you simply cannot have a nearly orthogonal set of three or more vectors. You are bound to get an angle either less than $60$ or more than $120$ degrees between some pair of the three vectors. But for example in a space of dimension roughly one million, my answer describes a set of $10^{18}$ vectors such that the angles between any two are between $89.7$ and $90.3$ degrees. We do need to go to ridiculously high dimensions for that to work. – Jyrki Lahtonen Apr 23 '23 at 04:07
  • @JyrkiLahtonenj Possibly you are correct in some sense, which is stated neither in the question nor in the article. But in the usual sense there are uncountable many triples of vectors which are pairwise "nearly" orthogonal in three dimensions. – user Apr 23 '23 at 09:17
  • 1
    @user The way I see it, the point is exactly that we want a single set of nearly orthogonal vectors such that the size of the set well exceeds the dimension of the space. In 3 dimensions the best you can do with four vectors is the configuration of a regular tetrahedon, with all the angles between the vectors equal to 109.5 degrees. I don't think we can expect an article intended for a wideer readership to get all the quantifiers in the correct places :-) – Jyrki Lahtonen Apr 23 '23 at 20:07
  • @JyrkiLahtonen Thank you for the explanation. This is probably indeed what was meant. – user Apr 23 '23 at 21:21
5

Coding theoretical methods (or number theoretical, if you prefer), most notably constructions relying on exponential sums over finite fields, are a rich source of such families of vectors. I will describe one very flexible construction, and list its relevant parameters afterwards. The gist is that in $n$ dimensional space we end up with $n^k$ unit vectors with all the pairwise inner products bounded from above by $M/\sqrt n$ with the constant $M$ increasing linearly with $k$.

I will consider binary vectors (all entries $\pm1$) in what follows. We can use larger sets of complex roots of unity instead, if so desired.


Let $q=2^m$ and $F=\Bbb{F}_q$. Let $e:F\to\{\pm1\}$ be a non-trivial additive character (= a homomorphism from the additive group to the multiplicative group $\{\pm1\}$). A somewhat canonical such beast is $$e(x)=(-1)^{tr(x)},$$ where $tr:F\to\Bbb{F}_2$ is the trace map, $tr(x)=x+x^2+x^4+x^8+\cdots+x^{2^{m-1}}$.

Given any polynomial $f(x)\in F[x]$ we get a $q$-dimensional vector $w_f$ with components indexed by $x\in F$ with the recipe $$ w_f(x)=e(f(x)) $$ for all $x\in F$. So we see that $||w_f(x)||^2=q$ because all the entries are $\pm1$.

A convenient tool for finding good families is to restrict the choice of the polynomial $f(x)$ by giving an upper bound to its degree. Another constraint we need is to disallow even degree terms. This comes from the fact that for all $x\in F$ we have $tr(x)=tr(x^2)$. So let's fix a positive integer $N$, and consider the collection of polynomials $$ C(N)=\{a_0x+a_1x^3+a_2x^5+\cdots a_{N-1}x^{2N-1}\in F[x]\mid \text{$a_i$ arbitrary elements of $F$ for all $i$}\}. $$ We see that there are $q^N$ polynomials in this set.

If $f(x)=a_0x+\cdots+a_{N-1}x^{2N-1}$ and $g(x)=b_0x+\cdots+b_{N-1}x^{2N-1}$ are distinct polynomials from $C(N)$, then the inner product $$\langle w_f,w_g\rangle=\sum_{x\in F}e(f(x)+g(x)).\qquad(*)$$ Here the sum polynomial $f(x)+g(x)$ is a non-zero polynomial of degree at most $2N-1$ and without even degree terms. The Weil-Carlitz-Uchiyama bound then says that the character sum in $(*)$ has absolute value at most $(2N-2)\sqrt q$.

Recall that all the vectors $w_f$ have Euclidean norm $=q$, so this bound on character sums implies that $$ |\cos(\theta(f,g))|\le \frac{2N-2}{\sqrt q}.\qquad(**) $$ The inequality $(**)$ then means that, with $q$ large enough in comparison $2N-2$, the angles $\theta(f,g)$ between the vectors $w_f$ and $w_g$, $f\neq g$, are very close to $\pi/2$. This is why I think the construction fits.


Some example numbers.

  • With $m=14$ we have $q=16384$, so the vectors have dimension $16384$. Choosing $N=2$ we get a family of $q^N=2^{28}\approx260\,\text{million}$ vectors such that the cosines between any two of them have absolute value $\le 2/\sqrt{q}=1/64$.
  • With $m=20$ we have $q=1048576$, $\sqrt q=1024$. With $N=3$ we get a family of roughly $10^{18}$ vectors in $\{\pm1\}^{1048576}$ with cosines of the angles in between bounded from above by $4/1000$.

With small values of $N$ there are number theoretical quirks (the theory of quadratic and bilinear forms over $F$ come to the fore) allowing a saving of a factor of $\sqrt2$ in the W-C-U -bound and hence also in the upper bound to the cosines. However, those apply only when $m$ satisfies some congruence.

Jyrki Lahtonen
  • 140,891
  • 1
    Can you refer me to Wikipedia or some other source that discusses the “trace map”? I had not heard of it before and nothing I've found seems to be related. – MJD Apr 22 '23 at 15:54
  • 2
    @MJD It is an instance of the field trace. I have used the properties of the trace of finite fields for example here and here. Using the trace we can easily describe all the characters of the additive group of a finite field, which is how it emerges in counting problems and such. – Jyrki Lahtonen Apr 22 '23 at 16:09