How was the determinant of matrices generalized for matrices bigger than $2 \times 2$?

Question

I read a book a very long time ago where it said something like this:

Given a system of two equations with two unknowns:

$$ ax_1+bx_2=y_1 \\ cx_1+dx_2=y_2 $$

Multiplying the first equation by $d$, the second by b and substracting the first from the second we get:

$$ s:(ad-cb)x_1=dy_1-by_2 $$

Then

1) $ad-cb=0 \wedge dy_1-by_2=0 \iff s \text{ has infinite solutions.}$

2) $ad-cb=0 \wedge dy_1-by_2\neq0 \iff s \text{ has no solutions.}$

3) $ad-cb\neq0 \iff s \text{ has a unique solution.}$

And $ad-cd$ is called the determinant of this system of equations (or matrix).

How was this generalized for bigger systems/matrices?

I don't know the history, but I recall reading at some point that the notion of determinants is older than that of matrices -- so it may have been first determined for solving systems of equations like those you provide. However, nowadays, we could just define it as a generalized volume of a parallelotope. — , May 25 '15 at 01:18
@David I may have not made myself clear, I understand how it is defined and how to calculate it. What I want to know is why does the determinant of a bigger matrix have the same properties as a 2 by 2 matrix, when it doesn't show a relation like, for example, the 2 by 2 determinant has with the number of solutions to a system of equations. — YoTengoUnLCD, May 25 '15 at 01:20
@YoTengoUnLCD But determinants of larger matrices do tell you how many solutions there are to a system of linear equations. Just as with the $2\times 2$ case, $\det(A)=0 \implies$ infinite or no solutions and $\det(A) \ne 0 \implies$ a unique solution. — , May 25 '15 at 01:22
@Bye_World Yes, I understand that. I want to know why that happens, how was the determinant generalized for bigger matrices so it would have all the properties the 2 by 2 version has. — YoTengoUnLCD, May 25 '15 at 01:24
@Bye_World That very one, $|A|=0 \implies$ unique solution, for example. — YoTengoUnLCD, May 25 '15 at 01:29
Would a geometric answer suffice or are you looking for an algebraic answer? — , May 25 '15 at 01:30
Any of those would work, I just want to understand the "inside of the hood" kind of reasoning. Thanks! — YoTengoUnLCD, May 25 '15 at 01:33
There's this: http://www-history.mcs.st-and.ac.uk/HistTopics/Matrices_and_determinants.html — pjs36, May 25 '15 at 01:45
If you solve a generic 3 by 3 or 4 by 4 linear system by hand, the formula for the determinant when $ n $ is $3$ or $4$ will pop out. At that point, you can see the pattern and guess a formula that holds for any $ n $. — littleO, May 25 '15 at 02:07

score 6 · Answer 1 · 2015-05-25T06:44:53.627

Let's define the determinant a little differently than you may have seen before. Let's define it for linear functions $f$ as: $$f(e_1) \wedge \cdots \wedge f(e_n) = \det(f)(e_1 \wedge \cdots \wedge e_n)$$

where $e_i$ is the $i^{th}$ standard basis vector of $\Bbb R^n$.

So first I'll have to go through what the exterior product, denoted $\wedge$, is. First off, it is a product of two vectors with the following properties:

$\vec u \wedge \vec v = -\vec v \wedge \vec u$
$(\vec u\wedge \vec v)\wedge \vec w = \vec u \wedge (\vec v \wedge \vec w)$
$(k\vec u) \wedge \vec v = \vec u \wedge (k\vec v) = k(\vec u \wedge \vec v)$
$\vec u \wedge (\vec v + \vec w) = \vec u \wedge \vec v + \vec u \wedge \vec w$

Given two vectors $\vec u$ and $\vec v$, the bivector $\vec u \wedge \vec v$ is the oriented area element whose magnitude is given by the area of the parallelogram with sides as the vectors $\vec u$ and $\vec v$ and with orientation given by the order of the factors in $\vec u\wedge \vec v$.

Likewise, the trivector $\vec u \wedge \vec v \wedge \vec w$ is the oriented volume element whose magnitude is given by the volume of the parallelepiped with sides as $\vec u$, $\vec v$, and $\vec w$. Higher dimensional $n$-vectors are defined analogously.

Here's a picture which might help you visualize it:

enter image description here

Thus $e_1 \wedge \cdots \wedge e_n$ is a unit $n$-vector -- that is an $n$-dimensional box with an $n$-volume of $1$ and an orientation defined by the order of the elements $e_i$.

As you can see from the first property that I listed above $\vec u \wedge \vec u = 0$ (prove this for yourself if you don't immediately see how it follows).

Using this definition of the determinant, we can see that if the vectors $f(e_1), \cdots, f(e_n)$ are not linearly independent, then from the above statement we'd have $$f(e_1) \wedge \cdots \wedge f(e_n) = 0(e_1 \wedge \cdots \wedge e_n) \\ \implies \det(f) = 0$$

And from the geometric definition I gave above (and it's generalization to higher dimensional $k$-vectors), we can see that if $f(e_1), \cdots, f(e_n)$ are linearly independent, then $$f(e_1) \wedge \cdots \wedge f(e_n) \ne 0(e_1 \wedge \cdots \wedge e_n) \\ \implies \det(f) \ne 0$$

Therefore the $\det(f) \ne 0 \iff f(e_1), \cdots, f(e_n)$ are linearly independent. And from our knowledge of linear algebra we know that $f(e_1), \cdots, f(e_n)$ are linearly independent $\iff$ the matrix which represents $f$ in some basis is invertible.

This -- as opposed to thinking about the intersections of flats in $\Bbb R^n$ OR to thinking about it as some weird recursively defined algorithm for square matrices -- is the way I prefer to think about the determinant. It's just the $n$-volume of a box -- which is $0$ if the box isn't $n$-dimensional.

Using this definition the formula for the determinant is just a result of the properties of the wedge product. Let's see it in the case of $\Bbb R^2$.

Let $f(x) = Ax$ where $A=\pmatrix{a & b \\ c & d}$. Then $f(e_1) = ae_1 + ce_2$ and $f(e_2) = be_1 + de_2$. Therefore $$\begin{align}f(e_1) \wedge f(e_2) &= (ae_1 + ce_2)\wedge (be_1 +de_2) \\ &= ae_1\wedge be_1 + ae_1\wedge de_2 + ce_2 \wedge b_1e_1 +ce_2 \wedge de_2 \\ &= (ab)e_1\wedge e_1 + (ad)e_1\wedge e_2 +(cb)e_2\wedge e_1 +(cd)e_2\wedge e_2 \\ &= 0 + (ad)e_1\wedge e_2 +(cb)(-e_1\wedge e_2) +0 \\ &= (ad-bc)e_1\wedge e_2\end{align} \\ \implies \det(f) = ad-bc$$

Another +1 from me, and a note to @YoTengo: This is covered in Sergei Winitzki's book, Linear Algebra via Exterior Products (introduced within a chapter or two), and is available online, I believe legitimately, for free. — pjs36, May 25 '15 at 02:38

score 2 · Answer 2 · 2015-05-29T06:28:58.463

An equation of the form $a_1x_1 + \cdots a_nx_n = d$ defines an $(n-1)$-dimensional flat (affine subspace) in $\Bbb R^n$. Remember that it requires at least $n$ of these $(n-1)$-D flats to uniquely define a point. Let's try to build some intuition for that.

Consider $\Bbb R^2$. An $(n-1)$-D flat in $\Bbb R^2$ is a line. One line isn't enough to specify a point. But if you have two lines, there are three possibilities:

$(1)$ the lines coincide. Well two lines right on top of each other isn't really any better than $1$ line. We can't specify a unique point with this. Two lines that coincide will be scalar multiples of each other. That is if $l_1: ax+by=c$ is one of your lines, then your other line $l_2$ must be of the form $l_2: k(ax+by) = k(c)$.

$(2)$ the lines are parallel, but don't coincide. If two lines in $\Bbb R^2$ are parallel then they never intersect, so there is no point that they share. In this case, the equations which specify these lines will have to be of the form $l_1: ax+by=c$ and $l_2: ax+by=d$, where $c\ne d$.

$(3)$ the lines intersect, but not everywhere. Then they define a unique point in $\Bbb R^2$ because two lines that don't coincide can only intersect once.

Now consider $\Bbb R^3$. An $(n-1)$-D flat in $\Bbb R^3$ is a plane. One plane can't specify a unique point. Two planes, can't even specify a unique point. The intersection of two planes is always either a line or the entire plane if they coincide. We need at least three planes to specify a unique point. If you have three planes in $\Bbb R^3$, there are $5$ possibilies:

$(1)$ At least two of the planes coincide. Again, this is basically like having only having $2$ (or less) planes. If $\pi_1$ and $\pi_2$ are two planes that coincide then their equations will be of the form $\pi_1: ax + by +cz = d$ and $\pi_2: k(ax+by+cz) = kd$.

$(2)$ At least two of the planes are parallel. Well if two of the planes are parallel, then those two planes can't intersect each other. So each may intersect with the third plane, but those intersections will either be lines or the entire plane -- thus a unique point cannot be specified by these three planes. If $\pi_1$ and $\pi_2$ are parallel then $\pi_1: ax+by+cz = d \implies \pi_2: ax+by+cz = f$ for $f\ne d$.

$(3)$ Each of the three planes intersect the other two, but all three lines of intersection are parallel and don't coincide. Then there can't be a point of intersection. This occurs when $\pi_1: a_1x+b_1y+c_1z = d_1$, $\pi_2: a_2x+b_2y+c_2z=d_2$, and $\pi_3: (ja_1 +ka_2)x + (jb_1 +kb_2)y + (jc_1 +kc_2)z = (jd_1 +kd_2)$

$(4)$ The lines of intersection are parallel and all $3$ coincide. Then they form a line which doesn't specify one point. Thus $\pi_1 \cap \pi_2 = \pi_2 \cap \pi_3$ for instance.

$(5)$ None of the above. Then each of the three planes intersect with the other two and the lines of intersection aren't all parallel. Then because these planes are just $2$-D subspaces, we can use what we know from our consideration of $\Bbb R^2$ earlier to see that at least two of these lines intersect. Thus all three planes intersection at a single point.

Solving a system of linear equations is just finding this one point of intersection of $(n-1)$-D flats. Hopefully you get the point from the above that you need at least $n$ $(n-1)$-D flats to specify a point -- but just because you have $n$ of them doesn't mean you will.

One other thing to notice is that every time the collection of flats didn't specify a unique point, the equations of at least one of those flats was a linear combination of the others. I won't prove it, but this is a general result: an equation of a flat is a linear combination of the other $n-1$ flats $\iff$ that collection of $n$ flats does not specify a unique point.

Now let's see how the determinant arises from looking analyzing whether a system of $n$ linear equations in $n$ variables has a unique solution.

You've already done the $2\times 2$ case, so let's look at the $3\times 3$ case then we'll see if we can figure out how to generalize it.

Consider the following system:

$$\begin{cases}ax_1 + bx_2 + cx_3 = y_1 \\ dx_1 + ex_2 + fx_3 = y_2 \\ gx_1 + hx_2 + ix_3 = y_3\end{cases}$$

where $a\ne 0$. If your system has $a=0$ in its first equation, then rearrange your equations so that the coefficient on $x_1$ in the first equation is nonzero. If you can't, then you already know that this system can't uniquely specify a point, so you're already done.

The only operations we will do on these are the Gaussian elimination operations. Then we can see that

$$\begin{cases}ax_1 + bx_2 + cx_3 = y_1 \\ dx_1 + ex_2 + fx_3 = y_2 \\ gx_1 + hx_2 + ix_3 = y_3\end{cases} \\ \implies \begin{cases}ax_1 + bx_2 + cx_3 = y_1 \\ (ae-bd)x_2 + (af-cd)x_3 = ay_2-dy_1 \\ (ah-bg)x_2 + (ai-cg)x_3 = ay_3-gy_1\end{cases} \\ \implies \begin{cases}ax_1 + bx_2 + cx_3 = y_1 \\ (ae-bd)x_2 + (af-cd)x_3 = ay_2-dy_1 \\ [(ai-cg)(ae-bd)-(ah-bg)(af-cd)]x_3 = (ay_3-gy_1)(ae-bd)-(ay_2-dy_1)(ah-bg)\end{cases} \\ \implies \begin{cases}ax_1 + bx_2 + cx_3 = y_1 \\ (ae-bd)x_2 + (af-cd)x_3 = ay_2-dy_1 \\ a[\color{red}{a(ei-fh)-b(di-fg)+c(dh-eg)}]x_3 = a[\color{blue}{a(ey_3-hy_2)-b(dy_3-gy_2)+y_1(dh-ge)}]\end{cases} $$

Because we already assumed that $a\ne 0$, we can cancel it off both sides. Then what are we left with? Cramer's rule. That last equation just says

$$\det(A)x_3 = \det(A_3)$$

where $A_3$ is the coefficient matrix of this system where $\begin{bmatrix} y_1 \\ y_2 \\ y_3\end{bmatrix}$ takes the place of the third column.

What is this telling us? Well the value of $\det(A_3)$ doesn't effect whether or not $x_3$ has a unique solution -- even if it is zero, it doesn't rule out a unique $x_3$. On the other hand, $\det(A)$ does effect $x_3$'s value. We can see that we can always solve for $x_3$ here UNLESS $\det(A)=0$. Thus if $\det(A)$, there is no number $x_3$ which is a unique solution to this system.

Now you should be able to see that just by performing the same Gaussian reduction methods that I used above to separate out the a different variable, you'd find a very similar formula pops up (for instance if you tried solving for $x_2$, you'd eventually get $\det(A)x_2 = \det(A_2)$).

So in both the $2\times 2$ (look at your own question for verification) and in the $3\times 3$ case, Gaussian elimination eventually yields Cramer's rule. This continues to hold for any system of $n$ equations in $n$ unknowns. To verify this, simply find a good proof of Cramer's rule OR come up with one on your own.

Mark L. Stone · Answer 3 · 2015-05-25T02:35:55.563

Determinant = product of the eigenvalues. That says it all. All the properties of the determinant flow from this if you understand eigenvalues.

The determinant = 0 if and only if the matrix is singular, which is true if and only if at least one eigenvalue = 0. That's very powerful right there.

Eigenvalues are your friend. If you don't know what they are, pick up a linear algebra book and find out. You can try wading through http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors , but if you don't have a strong enough background, it might make for tough sledding other than for historical information. A really nifty and very useful result is that the sum of the eigenvalues of a matrix is equal to the sum of the diagonal elements of the matrix, with this sum being known as the trace of the matrix.

How was the determinant of matrices generalized for matrices bigger than $2 \times 2$?

3 Answers3

Linked