Exterior Algebra Notes #1: Matrices and Determinants

October 8, 2018

(This is not really an intro to the subject. I don’t have an audience in mind for this. I’ve written my notes out in an expository style because it helps me retain what I study.)

(Vector spaces are assumed to be finite-dimensional and over \(\bb{R}\) with the standard inner product unless otherwise noted.)

Exterior algebra (also known as ‘multilinear algebra’, which is arguably the better name) is an obscure and technical subject. It’s used in certain fields of mathematics, primarily abstract algebra and differential geometry, and it comes up a lot in physics, often in disguise. I think it ought to be far more widely studied, because it turns out to take a lot of the mysteriousness out of the otherwise technical and tedious subject of linear algebra. But most of the places it turns up it is very obfuscated. So my aim is to study exterior algebra and do some ‘refactoring’: to make it more explicit, so it seems like a subject worth studying in its own right.

In general I’m drawn to whatever makes computation and intuition simple, and this is it. In college I learned about determinants and matrix inverses and never really understood how they work; they were impressive constructions that I memorized and then mostly forgot. Exterior Algebra turns out to make them into simple intuitive procedures that you could rederive whenever you wanted.

1. An example problem

Suppose: you’re writing a computer graphics library. You want to draw a bunch of objects lit up by the sun, which is in a fixed direction in the sky. Each surface will be drawn with a brightness based on the angle it makes with the light. After all, a surface pointed directly at the sun should be bright, almost white, while a surface perpendicular to it won’t be illuminated at all. To represent the angle each surface faces, you store the normal vector \(\b{n}\), which points out of the surface. The brightness scaling factor \(c\) of that surface is given by the dot product of the normal with the direction of the sun:

\[c = \b{n} \cdot \b{d}_{sun}\]

And this works fine for a while. But eventually you want your illuminated object to be transformed, say, so you end up transforming its coordinates several times before doing this calculation. You store the transformation as a matrix \(A\), so a vertex transforms is now \(A \b{v}\). When it comes time to compute how the normals transform in the lighting calculation, you go with the obvious choice: \(\b{n} \mapsto A \b{n}\). But nothing looks right—every surface is the wrong brightness. You’re confused. You give up and search online and find something like this:

There is one problem which I don’t know how to show directly so I’m going to show it in a diagram. We’re multiplying the normal by the u_world matrix to re-orient the normals. What happens if we scale the world matrix? It turns out we get the wrong normals.
…
I’ve never bothered to understand the solution but it turns out you can get the inverse of the world matrix, transpose it, which means swap the columns for rows, and use that instead and you’ll get the right answer.

And at that point you either shrug and say “alright” and compute \(\b{n} \mapsto (A^{-1})^T \b{n}\)… or you quit learning computer graphics entirely and go back to trying to understand exterior algebra. That’s what I did. It’s a deep rabbit hole.

There are other sources where you can learn all about bivectors and multivectors and everything else. I mention this to say that this stuff isn’t just abstract formal nonsense; it’s got actual implications if you use vector mathematics in real life.

The answer, by the way, is that a normal vector that we store as \(\b{n} = n_x \b{x} + n_y \b{y} + n_z \b{z}\) is, geometrically, a bivector, since it represents a unit of area. We’re just storing it as a vector. Its true value is

\[\b{n} = n_x \b{y \^ z} + n_y \b{z \^ x} + n_z \b{x \^ y}\]

Bivectors created from elements of \(\bb{R}^3\) are elements of a vector space called \(\^^2 \bb{R}^3\), which is spanned by \(\{ \b{ x \^ y, y \^ z, z \^ x }\}\). If vectors are transformed by a matrix \(A\), then bivectors transform according to a matrix called \(A^{\^ 2}\), which maps \(\^^2 \bb{R}^3 \ra \^^2 \bb{R}^3\) and is defined by the property:

\[(A^{\^ 2}) (\b{x \^ y}) \equiv A(\b{x}) \^ A(\b{y})\]

and likewise for \(\b{y} \^ \b{z}\) and \(\b{z \^ x}\). Applying this to \(\b{n}\):

\[\begin{aligned} A^{\^ 2} [n_x \b{y \^ z} + n_y \b{z \^ x} + n_z \b{x \^ y}] = \; &n_x (A\b{y}) \^ (A\b{z}) \\ + \; &n_y (A \b{z}) \^ (A \b{x}) \\ + \; &n_z (A \b{x}) \^ (A \b{y}) \end{aligned}\]

And it turns out that in three dimensions, because \(n-1 = 2\), \((A^{-1})^T\) and \(A^{\^2}\) have the same entries (once you map \(\^^2 \bb{R}^3\) back to \(\bb{R}^3\)), which is how the computer graphics people get away with it.

For a concrete example, suppose our matrix is \(A = \text{diag}(1, 1, 2)\).

\[\begin{aligned} A &= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{pmatrix} \\ A(\b{n}) &= \begin{pmatrix} n_x \\ n_y \\ 2 n_z \end{pmatrix} \end{aligned} \tag{wrong}\]

So it doubles the \(z\) coordinate and leaves the other directions unchanged. The surface normal of an \(xy\) plane would naively be \(\b{z}\); after transformation, this is \(2 \b{z}\). But the bivector \(\b{x} \^ \b{y}\) is unchanged: \(A^{\^2} (\b{x \^ y}) = A\b{x} \^ A \b{y} = \b{x \^ y}\) in the new coordinate system, which corresponds to the vector \(\frac{1}{2} \b{z} = A^{-1} \b{z}\). That’s the correct behavior: a dilation of the \(\b{z}\) coordinate should not affect the size of an element of \(\b{xy}\) area.

This is a fairly trivial example (you’d probably be normalizing \(\b{n}\) anyway to be a unit vector!), but it shows how you naively using vectors to represent areas can go wrong, and it’s not surprising that in more complicated procedures that will fail more disastrously.

2. Multivector Matrices

Here are some notes about how linear transformations work on multivectors, because it took me a while to understand this and I had to write it out.

Some useful notation for multivectors:

A multivector’s components are indexed by the basis elements of the space it lives in, so, just as a vector \(\b{v}\) has a \(v_x\) component, a bivector \(\b{b}\) has a \(b_{x \^ y}\) component. You could ask for the \(b_{y \^ x} = b_{- x \^ y}\) coordinate, but it’s unnecessary. It’s like writing \(v_{-x}\) for \(- v_x\).

\(\^^k V\) is the space of \(k\)-vectors in \(V\), that is, vectors over the vector space whose basis are the products of \(k\) unit vectors together. For instance in the usual \(\bb{R}^3\) basis, the basis for \(\^^2 \bb{R}^3\) is \((\b{x} \^ \b{y}, \b{y} \^ \b{z}, \b{z} \^ \b{x})\).

Note that the orderings of basis multivectors like \(\b{x \^ y}\) vs \(\b{y \^ x}\) are arbitrary. It’s convenient to use to use \((\b{y \^ z, z \^ x, x \^ y})\) as a basis for \(\^^2 \bb{R}^3\) because they’re in the same order as their duals \((\b{x}, \b{y}, \b{z})\), but the math doesn’t care—different orderings are a basis change away. Same as how you’re free to use \((\b{z}, \b{x}, \b{y})\) as the basis of \(\bb{R}^3\), if you want.

I like to use multi-indexes to refer to multivector components, which are labeled with capital letters like \(I\) and refer to tuples: \(I = (i_1, i_2, \ldots i_k)\) (usually it’s obvious what \(k\) is). We imagine that the basis elements are something like \((\b{x}_1, \b{x}_2, \b{x}_3)\), which just means the same thing as \((\b{x}, \b{y}, \b{z})\), and then the multi-index \((1,2)\) refers to \(\b{x} \^ \b{y}\).

Since a multi-index ranges over the \(k\)-vectors in \(\^^k \bb{R}^n\), there are necessarily \({n \choose k}\) possible values. They can come in any order, but that’s how many of them there will be in total.

2.1 Linear Transformations on \(\^^k V\)

Suppose we have a linear transformation \(A : U \ra V\).

For any \(k\), \(A\) also defines (canonically, for free) another linear transformation \(A^{\^ k} : \^^k U \ra \^^k V\), which we write as \(A^{\^ k}\) (some people write it as \(\^^k A\)):

\[A^{\^ k} \in \^^k U \ra \^^k V\]

\(A^{\^ k}\) is defined by its action on any vector \(\b{u}_{\^ I} = \b{u}_{i_1} \^ \b{u}_{i_2} \ldots \^ \b{u}_{i_k} \in \^^k U\):

\[A^{\^ k} (\b{u}_{\^ I}) \equiv A(\b{u}_{i_1}) \^ A(\b{u}_{i_2}) \^ \ldots A(\b{u}_{i_k})\]

Since \(A^{\^ k}\) is just a linear transformation, we can write it out as a matrix. If \(\dim U = m\) and \(\dim V = n\), then \(\dim \^^k V = {n \choose k}\) and \(\dim \^^k U = {m \choose k}\), so \(A^{\^ k}\) is an \({m \choose k} \times {n \choose k}\) matrix.

The key insight for understanding matrices such as \(A^{\^ k}\) is that they are just matrices, albeit in a different vector space (\(\^^k U \ra \^^k V\)), and all of the usual matrix knowledge applies.

Each column of \(\^^k V\) is the action of \(A\) on a particular \(k\)-vector, which results in a linear combination of \(k\)-vectors. It does not matter what order we write the columns in.

Conceptually: \(A^{\^ k}\) is a linear transformation between \(k\)-volumes of \(U\) to \(k\)-volumes of \(V\). This is true regardless of whether \(\dim U = k\), or whether \(\dim U = \dim V\).

Important special cases:

\(A^{\^ 1}\) is just \(A\).
If \(A\) is square, then, setting \(\dim U = \dim V = n\), we have \(A^{\^ n} = \det (A) \b{u}_{\^ I}^T \o \b{v}_{\^ I}\). This is the determinant of \(A\), considered as a linear transformation from \(\^^n V \ra \^^n V\), meaning that it has basis vectors attached rather than being a dimensionless scalar.
- The scalar value \(\det (A)\) is the coefficient of \(A^{\^ n}\)
- Sometimes it’s useful to write the coefficient as a trace: \(\det (A) = \tr A^{\^ n}\)
- Most of the properties of \(\det\) follow directly from this interpretation. (eg: non-linearly-independent matrix \(\ra\) degenerate volume \(\ra\) 0 determinant; composition of transformations \(\ra\) multiplicativity of determinants.)
If \(U=V\) then \(A^{\^ n - 1}\) is the adjugate matrix of \(A\), except that it is written in the \(\^^{n-1} V\) basis, and \(A^{\^ n - 1} / \det(A)\) is \(A^{-1}\), except that it is written in the \(\^^{n-1} V\) basis. In a later post we will elaborate on the \(\star\) operator, which is the right way to map these back to the correct bases.
It’s useful to define \(A^{\^ 0}\) as \(1 \in \bb{R} \simeq \^^0 V\).

Example:

Consider the sloped rectangle in \(\bb{R}^3\) formed by the points \(\{ a \b{x}, a\b{x} + \b{z}, b \b{y} + \b{z}, b \b{y} \}\). Two of its sides are given by \(\b{c}_1 = \b{z}\) and \(\b{c}_2 = b \b{y} -a \b{x}\), and its area is a bivector:

\[\b{c} = \b{c}_1 \^ \b{c}_2 = \b{z} \^ (b \b{y} - a \b{x}) = a (\b{x} \^ \b{z}) - b (\b{y} \^ \b{z}) \in \^^2 \bb{R}^3\]

It’s a linear combination of some area on the \(\b{xy}\) and \(\b{yz}\) planes. Its total scalar area can be computed by taking the magnitude of this bivector (\(\| \b{c} \| =\sqrt{a^2 + b^2}\)), which shows how areas and \(k\)-volumes in general obey a version of the Pythagorean theorem (more on that in the next post).

Suppose

\[A = \begin{pmatrix} 0 & 0 & -1 \\ 2 & 0 & 0 \\ 0 & 3 & 0 \end{pmatrix}\]

Its action on \(\b{c}\) is:

\[\begin{aligned} A \b{c} &= a (A \b{x}) \^ (A \b{z}) - b (A \b{y} \^ A \b{z}) \\ &= a (2 \b{y} \^ - \b{x}) + b(3 \b{z} \^ -\b{x}) \\ &= 2a (\b{x \^ y}) - 3 b (\b{z \^ x}) \end{aligned}\]

We see that \(A\) does not multiply \(\| \b{c} \| = \sqrt{a^2 + b^2}\) by a scalar, but rather takes it to \(\| \b{c} \| \ra \| A^{\^2} \b{c} \| = \sqrt{4a^2 + 9b^2}\). This shows how, if we want to consider the area of this rectangle as a geometric object, we should write it as its bivector representation rather than mapping it back to its scalar value.

2.2 Multivector Matrix Notations

In this section let \(U = V = \bb{R}^3\). Then:

\[\begin{aligned} A^{\^0} &= 1 \\ A^{\^1} &= A = (A \b{x}, A \b{y}, A \b{z}) \\ A^{\^2} &= (A^{\^2}(\b{x} \^ \b{y}), A^{\^2}(\b{y} \^ \b{z}) , A^{\^2}(\b{z} \^ \b{x}) ) \\ A^{\^3} &= A^{\^3}(\b{x \^ y \^ z}) = A \b{x} \^ A \b{y} \^ A \b{z} = \det (A) \; \b{x \^ y \^ z} \end{aligned}\]

Let’s write out \(A^{\^2}\) componentwise. It’s a \(3 \times 3\) matrix: it has three columns and each of those is a bivector with three components.

To keep track of rows and columns I am going to use superscripts for column indexes and subscripts for row indexes. So, as you will, as you go down a column, the upper index changes but the lower index stays the same.

Here’s the \(\b{x \^ y}\) column of \(A^{\^ 2}\):

\[A^{\^2} (\b{x \^ y}) = A(\b{x}) \^ A(\b{y}) = \begin{pmatrix} A^{\^2}(\b{x \^ y})_{\b{x \^ y}} \\[3pt] A^{\^2}(\b{x \^ y})_{\b{y \^ z}} \\[3pt] A^{\^2}(\b{x \^ y})_{\b{z \^ x}} \end{pmatrix} = \begin{pmatrix} A_x^x A_y^y - A_x^y A_y^x \\[3pt] A_x^y A_y^z - A_x^z A_y^y \\[3pt] A_x^z A_y^x - A_x^x A_y^z \end{pmatrix}\]

Here’s the \(\b{x}^T \^ \b{y}^T\) row of \(A^{\^ 2}\):

\[\begin{aligned} (\b{x}^T \^ \b{y}^T) (A^{\^2}) &= (\b{x}^T A) \^ (\b{y}^T A) \\ &= \big( (A_x^x A_y^y - A_x^y A_y^x) , (A_y^x A_z^y - A_y^y A_z^x) , (A_z^x A_y^y - A_z^y A_x^x) \big) \\ \end{aligned}\]

With a regular matrix we can extract components by contracting with a basis vector on each side; for instance \(A_y^x = \b{y} A \b{x}\). We can do the same here:

\[(\b{y}^T \^ \b{z}^T) A^{\^2}(\b{x \^ y}) = (A^{\^2})^{x \^ y}_{y \^ z} = \begin{vmatrix} A_y^x & A_y^y \\ A_z^x & A_z^y \end{vmatrix} = A_y^x A_z^y - A_y^y A_z^x\]

Each component \(A_{\^ I}^{\^ J}\) of \(A^{\^ k}\) is the determinant of a \(k \times k\) minor of \(A\): the minor created by the columns \(I\) and the rows \(J\) (possible with a minus sign, depending on the permutations of your bases). I prefer this notation because it emphasizes exactly the meaning of that value: that it is a component of the matrix \(A^{\^ k}\).

Minors on the diagonal are called principle minors. The diagonal elements of \(A^{\^ k}\) are the determinants of these. For instance:

\[(\b{x \^ y})^T A (\b{x \^ y}) = \begin{vmatrix} A_x^x & A_x^y \\ A_y^x & A_y^y \end{vmatrix} = A_x^x A_y^y - A_y^x A_x^y\]

Note that we can take expand a matrix in its columns or rows first and get the same answer. Also note that in general, \((\^^k V)^T \simeq \^^k V^T\), at least in finite dimensional spaces (I think?), so we can compute either of \((A^{\^ k})^T = (A^T)^{\^ k}\). Either way you end up with coordinates on each of the pairs of the \(\^^k V\) basis vectors.

This notation gets pretty confusing. I propose that we allow ourselves to just write \(A_{x \^ y}^{x \^ y}\) when we mean \((A^{\^ 2})_{x \^ y}^{x \^ y}\). It’s smoothe. Here’s a summary:

Notation summary

\(A_{x \^ y}^{y \^ z} = (A^{\^ 2})_{x \^ y}^{y \^ z}\), so we can refer to components of \(A^{\^ 2}\) by using multivector subscripts and superscripts. The resulting value is a scalar.

\(A_{x \^ y} = (A^{\^ 2})_{x \^ y}\) gives the \(\b{x \^ y}\) column of \(A^{\^ 2}\), so we can refer to columns by omitting row indexes.

\(A^{x \^ y} = (\b{x}^T A) \^ (\b{y}^T A)\) similarly extracts the \(\b{x} \^ \b{y}\) row of \(A^{\^ 2}\).

Using these we can expand \(A^{\^ 2}\) like this:

\[A^{\^ 2} = \begin{pmatrix} A_{x \^ y}^{x \^ y} & A_{y \^ z}^{x \^ y} & A_{z \^ x}^{x \^ y} \\[5pt] A_{x \^ y}^{y \^ z} & A_{y \^ z}^{y \^ z} & A_{z \^ x}^{y \^ z} \\[5pt] A_{x \^ y}^{z \^ x} & A_{y \^ z}^{z \^ x} & A_{z \^ x}^{z \^ x} \end{pmatrix} = \begin{pmatrix} A_{x \^ y} & A_{y \^ z} & A_{z \^ x} \end{pmatrix} = \begin{pmatrix} A^{x \^ y} \\[5pt] A^{y \^ z} \\[5pt] A^{z \^ x} \end{pmatrix}\]

Higher powers of \(A^{\^ k}\) are the same idea, but indexed by the basis multivectors \(\b{x}_{\^ I} \in \^^k V\).

For a further example: notice how these three equations say slightly different things, because each index that we use is a basis multivector which does not appear on the right side:

\[A_{x \^ y \^ z} = \det (A) \, \b{x} \^ \b{y} \^ \b{z}\] \[A_{x \^ y \^ z}^{x \^ y \^ z} = \det (A)\] \[A^{\^3} = \det (A) \, (\b{x} \^ \b{y} \^ \b{z})^T \otimes (\b{x} \^ \b{y} \^ \b{z})\]

Whenever possible, \(\det(A)\) will refer to the scalar value of \(A^{\^ n}\), while \(A^{\^ n}\) refers to the linear transformation \(\in U^{\^ n} \ra V^{\^ n}\).

2.3 Maps between spaces with different dimensions

If \(A\) is a map between two vector spaces with different dimensions, we can still talk about \(A^{\^ k}\). Intuitively, this is a linear transformation which maps areas (or \(k\)-volumes) in one space to areas (\(k\)-volumes) in the other, and there’s no requirement that they have the same dimension for that to make sense.

If \(\dim U = m\) and \(\dim V = n\) and \(A : U \ra V\), then, as mentioned above, \(A^{\^ k}\) can be written as a \({m \choose k} \times {n \choose k}\) matrix.

Suppose \(A : \bb{R}^2 \ra \bb{R}^3\), and we label the \(\bb{R}^2\) by basis vectors \(\b{u,v}\). Then:

\[\^^2 A = \begin{pmatrix} A_{u \^ v}^{x \^ y} \\[5pt] A_{u \^ v}^{y \^ z} \\[5pt] A_{u \^ v}^{z \^ x} \end{pmatrix}\]

Concretely:

\[A = \begin{pmatrix} a & b \\ c & d \\ e & f \end{pmatrix}\] \[\^^2 A = \begin{pmatrix} ad-bc \\ cf-de \\ eb - fa \end{pmatrix}\] \[\^^2 (A^T) = (\^^2 A)^T = \begin{pmatrix} ad - bc & cf - de & eb - fa \end{pmatrix}\]

No matter what, \(A^{\^ k}\) is a map between \(\^^k U \ra \^^k V\), and if \(k > \dim U\) or \(k > \dim V\) then one of those spaces is zero-dimensional and \(A\) is just 0. In the case above, \(A^{\^ 3} = 0\).

Note that if the spaces don’t have the same dimension, then there’s no concept of a scalar ‘determinant’ to linear transformations between them.

(Technically there is nothing stopping us from discussing linear transformation which, say, maps areas in one space to volumes in another. But those are not going to be common – they are not geometrically ‘natural’. Besides the \(\star\) operator which maps \(\^^k V \lra \^^{n-k} V\), we mostly only use maps from \(k\)-vectors to \(k\)-vectors.)

My other articles about Exterior Algebra:

« Oriented Areas and the Shoelace Formula

Exterior Algebra Notes #2: the Inner Product »

▲ Home