9

Consider the Wikipedia proof for Caratheodory's Theorem, the statement of which I have reproduced below. In short, I am looking for some geometric intuition about the modified coefficients in the proof, something that I may have been able to "see" for myself if I were asked to prove the theorem without looking it up.

Theorem (Caratheodory). Let $X \subset \mathbb{R}^d$. Then each point of $\mathrm{conv}(X)$ can be written as a convex combination of at most $d+1$ points in $X$.

From the proof, each $y \in \mathrm{conv}(X)$ can be written as the following convex combination, where we assume $k \geq d+2$:

$$ y = \sum_{j=1}^k \lambda_j x_j \text{ with } \sum_{j=1}^k \lambda_j = 1 \text{ and } \lambda_j > 0 \quad \forall\, j=1,\dots,k $$

The resulting $k \geq d+2$ points $x_j \in \mathbb{R}^d$ are affinely dependent, so

$$ \sum_{j=1}^k \mu_j x_j = 0 \text{ with } \sum_{j=1}^k \mu_j = 0 $$

The remainder of the proof uses some funky manipulations of the coefficients for $y$ to show that one of the points in the convex combination for $y$ is really unnecessary. The new coefficients are:

$$ y = \sum_{j=1}^k \left(\lambda_j - \frac{\lambda_i}{\mu_i} \mu_j \right) x_j $$

where $i = \arg\min_{j \;:\; \mu_j > 0} \frac{\lambda_j}{\mu_k}$. The $i$th coefficient turns out to be zero, completing the proof. I understand why this choice of coefficients is desirable, but I do not understand why it's the "right" or "obvious" choice. My own drawings do not make the situation any clearer to me.

What do the new coefficients mean geometrically, and in particular, how can I interpret the ratio $\lambda_i/\mu_i$ geometrically? What does the $\max$ correspond to?

  • Any point in $\mathbb{R}^d$ can be considered as a vector; the core idea here is to orthogonalise those vectors (look at the Gram-Schmidt process, you should see strong similarities). The adjustments to the co-efficients are just that orthogonalisation process in action. – postmortes Jun 21 '17 at 06:13
  • 2
    I like the following formulation of Caratheodory: Each point of the convex hull can be written as a convex combination of affinely independent points of $X$ (and thus, of at most $d+1$ points). – gerw Jun 21 '17 at 09:01
  • 1
    @postmortes That is a very interesting comment, but I don't recognize Gram-Schmidt in the proof outlined by gerw below. Maybe I am missing something. If you'd like to post an answer explaining your comment in more detail, I'd be interested in reading it. – littleO Jun 21 '17 at 19:54
  • @postmortes I only see a loose conceptual connection to Gram-Schmidt--we're rewriting a combination of dependent points in terms of an independent subset--but I don't see how to take the analogy further since this result shouldn't depend any inner product structure of $\mathbb{R}^n$. Could you elaborate? – Benjamin Bray Jun 21 '17 at 19:55

3 Answers3

4

You basically add $$y = \sum_{j = 1}^k \lambda_j \, x_j$$ and $$0 = \sum_{j = 1}^k \alpha \, \mu_j \, x_j, $$ for some $\alpha \in \mathbb{R}$. This yields $$y = \sum_{j = 1}^k \underbrace{(\lambda_j + \alpha \, \mu_j)}_{=:\Lambda_j} \, x_j. $$ This directly yields $$\sum_{j=1}^k \Lambda_j = 1.$$ However, you additionally need $$\Lambda_j \ge 0 \;\forall j \qquad\text{and}\qquad \Lambda_i = 0 \text{ for some } i,$$ such that you obtain a convex combination, in which one coefficient is zero.

Now, try to figure out how to choose $\alpha$ and $i$.

gerw
  • 33,373
  • This is an awesome explanation. So clear. – littleO Jun 21 '17 at 10:21
  • This explanation definitely demystifies the ratio/max in the choice of $\Lambda_j$, but I am still hoping to find some more concrete geometric meaning. Perhaps there isn't any... – Benjamin Bray Jun 21 '17 at 19:50
  • Hi @gerw, this is unrelated to your answer, hence I would be deleting this comment soonest. Please could you help out with this question https://math.stackexchange.com/q/4441875/585488 – linker May 03 '22 at 11:01
0

Even more than geometrically you can see that .. physically.

Take three points $A,B,C$, in 2D, so three vectors. Assign them non-negative weights, like if they were point masses.
Their weighted average, the barycenter, will be internal to the triangle, i.e. their convex hull.
For the weighted average to be external, the weights shall have different signs. You can "see" that if two are subject to gravity while the third is being pulled up.

Now take a fourth point $Q$ inside the triangle: it will be the weighted combination of $A,B,C$ with certain non-negative weights.
A further point $P$ which is a non-negative weighted average of $A,B,C,Q$ will reduce to a non-negative weighted average of just $A,B,C$, but not in general of e.g. $A,B,Q$, because with respect to this triangle $C$ will have some negative coefficients.

--- in reply to your comment ---

If $Q$ is instead taken outside of $\triangle ABC$, then its expression in terms of $A,B,C$ will contain some negative weights.
Now a non-negative combination of $A,B,C,Q$, i.e. a point $P$ inside that quadrilateral, will not always reduce to a non-negative combination of $A,B,C$.

We get the conclusion that, given $n$ points, their weighted average with non-negative weights, i.e. their linear combination with coefficients in $[0,1]$ summing to $1$ (convex combination), will always reduce to the convex combination of the $m \le n$ points which define the convex hull of the $n$ points.

G Cab
  • 35,964
  • but in this analogy, aren't you assuming that $Q$ is in the convex hull of $A,B,C$? This isn't the case, if the convex hull has more than 3 vertices – glS Sep 02 '20 at 09:28
  • @gls: I amde an addendum to my answer in reply – G Cab Sep 02 '20 at 15:08
  • I don't get how you reach the conclusion. You remark that not every vertex can be removed, which I understand, but then the point of the theorem is that you can always (provided affine dependence) find one vertex which can be removed still having the point inside expressed as a convex combination of the remaining vertices. How do you see this being the case in your analogy? – glS Sep 02 '20 at 15:22
  • @glS : in fact, if one of the points in the combination is inside the convex hull of the others it can be removed, because its contribution gets absorbed in the convex combination of the others. Should be evident from my answer. Do you know about the barycenter composition for a complex shape ? – G Cab Sep 02 '20 at 23:24
  • but that's still only the trivial case. The nontrivial aspect is when you have $n$ (affinely dependent) points $x_1,...,x_n$, each one not in the convex hull of the others. You then need to show that for any $x\in\operatorname{conv}(x_1,...,x_n)$, at least one of the $x_i$ can be removed. – glS Sep 03 '20 at 00:04
0

[Not geometric intuition but just adding the proof]

Thm [Caratheodory]: Let ${ T \subseteq \mathbb{R} ^d },$ and ${ z \in \text{conv}(T) }.$ Consider the smallest ${ m }$ such that ${ z }$ can be written as a convex combination of some ${ m }$ points in ${ T }.$ Now ${ m \leq d + 1 }.$

Pf: (Link, but there is a typo with ${ \max }$ taken instead of ${ \min }$)
Since ${ z \in \text{conv}(T) }$ it can be written as ${ z = \sum _{i=1} ^{n} \lambda _i z _i }$ with each ${ \lambda _i \geq 0 ,}$ ${ \sum _{i=1} ^{n} \lambda _i = 1 }.$ If ${ n \leq d +1 }$ we are done, so suppose ${ n > d + 1 }.$ So ${ z _2 - z _1, \ldots, z _n - z _1 \in \mathbb{R} ^d }$ are linearly dependent. So ${ \sum _{i=2} ^{n} \beta _i (z _i - z _1) = 0 }$ for some ${ \beta _i }$s not all ${ 0 }.$ Equivalently ${ \sum _{i=1} ^{n} \gamma _i z _i = 0 }$ with ${ \sum _{i=1} ^{n} \gamma _i = 0 }$ and not all ${ \gamma _i }$ zero. The equations we now have are $${ z = \sum _{i=1} ^{n} \lambda _i z _i , \lambda _i \geq 0, \sum _{1} ^{n} \lambda _i = 1, }$$ $${ \sum _{i =1} ^{n} \gamma _i z _i = 0, \sum _{i=1} ^{n} \gamma _i = 0, \text{ and not all } \gamma _i \text{ are } 0 }.$$
Focus on the indices ${ \mathscr{I} = \lbrace i \in \lbrace 1 , \ldots, n \rbrace : \gamma _i > 0 \rbrace },$ a nonempty set. The quantity ${ \lambda = \min _{i \in \mathscr{I}} \frac{\lambda _i}{\gamma _i} }$ lets us write ${ z = \sum _{i =1} ^{n} (\lambda _i - \lambda \gamma _i) z _i ,}$ and notice this is a convex combination as well. This new convex combination has atleast one coefficient ${ 0 }.$ So roughly, we showed whenever ${ z }$ is a convex combination of ${ n > d + 1 }$ points we can write it as a convex combination with one lesser point, as needed.