Alex Kritchevsky2022-04-03T19:40:34+00:00http://alexkritchevsky.com/blogAlex Kritchevskyalex.kritchevsky@gmail.comMeditation on Software 12021-10-31T00:00:00+00:00https://alexkritchevsky.com/2021/10/31/software-1<p>There is something very wrong with how we write code.</p>
<!--more-->
<hr />
<h2 id="1">1</h2>
<p>For instance:</p>
<p>It takes an unreasonable amount of effort to do anything with software. And we still don’t do anything particularly well. How many millions of person-hours are spent on fixing bugs, or understanding confusing code?</p>
<p>And every significantly-sized project has enough complexity to require a team of specialists to support it, and still there is no useful overlap with the complexity of any other project.</p>
<p>And every company and organization is duplicating each other’s work because none of the solutions can be shared.</p>
<p>And every advance in hardware efficiency is canceled out by inefficiencies in software, so that everything is barely performant enough, all the time, and gobbles up any energy we have available to give it.</p>
<p>It is all <em>working</em> – kinda – in that humanity is churning out more software every day, solving problems and making money. But this can’t be ideal. This human race is spending too much human effort to make software that doesn’t work very well and doesn’t do very much.</p>
<p>I like to fantasize about how to do better.</p>
<hr />
<h2 id="2">2</h2>
<p>Really it is that there is something <em>medieval</em> about how we write code. We are still in the software dark ages, like mathematics before algebra and calculus were discovered. The way software is written in five hundred years – if we haven’t run out of breathable air or microchips or whatever by then – will, I expect, be mostly unrecognizable compared to how it’s done today, and at best we are, as a species, 20% of the way along that path. (My guess is that we’re at like 15% overall and then React pushes the number to 20% in a few places.)</p>
<p>Here’s a test for assessing how good humanity is at writing software:</p>
<p>Suppose a spaceship of 1000 colonists is traveling to another star system, light-years away. And suppose this ship has to be totally self-sufficient, including having the ability to support all of its software systems, fixing bugs and improvising solutions to whatever comes up on the journey, and likely building out whatever is needed once they get there.</p>
<p>Can the colonists confidently expect to be able to handle whatever software challenges comes up?</p>
<p>The answer is definitely ‘no’. The answer needs to be ‘yes’ if we are to colonize other star systems. I don’t want to ship off to another star system only to die partway of an unfixable bug in the life-support system. There is no way you can fill out the roster of the ship with expert software engineers, and there is no way a roster of non-experts, even if they are geniuses in other fields, can be expected to understand even one part of the ship end-to-end.</p>
<p>So we have work to do. It’s probably <em>possible</em>, but it will take some serious advances to get there.</p>
<hr />
<h2 id="3">3</h2>
<p>An analogy can be made to mechanical engineering. I don’t really know how my (gas-powered) car works. But if I open the hood and look at the engine, I feel like I have at least a hope of figuring it out. Apart from… the electronics… I can clearly tell which parts interact with which other parts, and approximately what they do to each other. Presumably if I take those parts apart I can tell how they work, approximately, internally, although I may not be able to put them back together again, or machine new parts of the same quality, without a lot of specialization.</p>
<p>But the fact I can make progress at all is valuable. If I took a long road trip away from civilization with just a box of tools and spare parts I have at least a <em>hope</em> of handling whatever comes up.</p>
<p>The difference, I think, is that physical machines are constrained by fundamental requirements of <em>causality</em>. For a widget to affect a gizmo, it has to, like, <em>touch</em> it, and there has to be some motive force between the two, which I can view and manipulate myself. Its physical interaction affords it a property of <em>scrutability</em> that allows me to make progress on understanding it. And if you take the widget apart, its internal components have the same property.<sup id="fnref:engine" role="doc-noteref"><a href="#fn:engine" class="footnote" rel="footnote">1</a></sup></p>
<p>Software today has no such property. It works exactly how it works, and good luck figuring it out from the outside.</p>
<p>The best I know of is browser devtools letting you view any website and see how it’s styled, but it barely counts. I hope that someday, figuring out any software is as natural as figuring out a physical machine like an old car engine.</p>
<p>As should be clear from this comparison, it’s not enough that software is open-source (although that’s a start). It must also be conceptualized and built in a way that makes causation clear and scrutable, and it needs to be split into scrutable modules that ‘push’ and ‘pull’ on each other in a way that we can follow. Most importantly, it needs to be constructed in such a way that allows for the digital equivalent of ‘opening up the hood and looking inside’, and we need to have the tools at hand to do so.</p>
<hr />
<h2 id="4">4</h2>
<p>I don’t think any of what I’m looking for exists today, outside of, perhaps, one-off proprietary solutions. But if I had to throw out some ideas, here’s where I think progress is happening:</p>
<p>The best IDE I know of is Chrome Devtools, except for the fact that it doesn’t let you write code (or really search for it, or really modify anything in a way that doesn’t get reversed the next time a callback is triggered). But it does something the rest of them don’t, which is let you record every piece of code that’s run on a page and inspect it to see what happened. Nevermind that this process is janky and error-prone; at least it <em>exists</em>. There is no future in having to add <code class="language-plaintext highlighter-rouge">print()</code> statements to find out what your code did.</p>
<p>The most scrutable way of writing code that I know of is in React. The declarative model is the right way to reason about UI code. The React Devtools are reasonably good at looking at something while it’s running, and, in some cases modifying it. Hooks are better than any other way I’ve ever seen to reason about side effects, although in every case the whole philosophy is hamstrung by being implemented in Javascript and having to transpile to the DOM. And the problem of data processing and externalities is, as far as I know, still an unsolved problem, despite the efforts of the Redux ecosystem.</p>
<p>(Perhaps in the not-too-distant future there is a version of the React whose shadow DOM <em>is</em> the DOM, and which runs in a language that doesn’t require dependency arrays, and which has first-class types built-in instead of shimmed on top, and in which you can’t make the mistake of forgetting to bind a function to the appropriate <code class="language-plaintext highlighter-rouge">this</code>, and whose debugger lets you follow asynchronous effects that are scheduled on later render frames. Wouldn’t that be a dream!)</p>
<p>At least when it comes to UI, there is a future where React Devtools, Figma, and your IDE are the same piece of software. And I think that in this world, user-facing code no longer has anything like unit tests, because it’s a waste of time to meticulously test code when you can look at it and observe it’s correct.</p>
<p>The best shell I know of is, I guess, Python. Bash and its descendants are a disaster and the world would be better off if they were entirely replaced. In the future there is no way that we’re going to be working in languages that use $PATH variables, that pipe unformatted string data through bizarrely-named commands inflected by obscure flags, or that require strings like <code class="language-plaintext highlighter-rouge">\u001b[31m</code> to colorize text. I mean, my god. (Once upon a time I had high hopes for <a href="https://github.com/unconed/TermKit">TermKit</a> but it never really got off the ground.)</p>
<p>I am not sure what the future of type systems is, but I know three things about it: 1. Constructing natural numbers out of successor functions is an irrelevant gimmick. 2. There will no concept of ‘undefined behavior’ that survives the typechecker, because that’s insane. 3. <a href="https://en.wikipedia.org/wiki/Refinement_type">Refinement types</a> are going to happen at some point. It will be considered antiquated to use a language that can’t specify the type of ‘integers greater than 5” in some ergonomic way.</p>
<p>Finally, I know this: most of the code written today isn’t any good, compared to what will be possible in the future. It’s not possible in today’s ecosystems to write something scalable, maintainable, and resilient to errors. It’s up to the frameworks and paradigms to development the art of programming to the point where it’s actually an efficient and accessible craft instead of a massive timesink for the whole human race.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:engine" role="doc-endnote">
<p>Of course this falls apart when chemistry gets involved; you actually do need some specialized knowledge to make sense of, say, the actual combustion process. And I definitely don’t know much about engines so maybe it’s way harder than I think. <a href="#fnref:engine" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
All the Exterior Algebra Operations2020-10-15T00:00:00+00:00https://alexkritchevsky.com/2020/10/15/ea-operations<p>More exterior algebra notes. This is a reference for (almost) all of the many operations that I am aware of in the subject. I will make a point of giving explicit algorithms and an explicit example of each, in the lowest dimension that can still be usefully illustrative.</p>
<p>Warning: very long.</p>
<!--more-->
<hr />
<h2 id="background-on-terminology-and-notations">Background on terminology and notations</h2>
<p>As far as I can tell, the same ideas underlying what I call ‘exterior’ algebra have been developed at least four separate times in four notations. A rough history is:</p>
<p>Grassmann developed the original ideas in the ~1840s, particularly in his <em>Ausdehnungslehre</em>, which unfortunately was never very well popularized, particularly because linear algebra hadn’t really been developed yet. Grassmann’s goal was, roughly, to develop ‘synthetic geometry’: geometry without any use of coordinates, where all of the operations act on abstract variables.</p>
<p>Some of Grassmann’s ideas made it into projective geometry, where multivectors are called ‘flats’ (at least in one book I have, by Stolfi) and typically handled in projective coordinates (in which the point \((x,y)\) is represented by any value \((\lambda x, \lambda y, \lambda)\)). Some ideas also made it into algebraic geometry, and there is some overlap with ‘algebraic varieties’; I don’t know much about this yet.</p>
<p>Cartan and others develop the theory of differential forms in the 1920s and included a few parts of Grassman’s exterior algebra, which got the basics included in most algebra texts thereafter. Physicists adopted the differential forms notation for handling curved spaces in general relativity, so they got used to wedge products there. But most of vector calculus was eventually based on Hamilton’s quaternions from the ~1840s, simplified into its modern form by Heaviside in the ~1880s.</p>
<p>In the 1870s Clifford combined Hamilton and Grassmann’s ideas into ‘Clifford Algebras’, but they were largely forgotten in favor of quaternions and later vector analysis. Dirac accidentally re-invented Clifford algebras in the 1920s with the Dirac/gamma matrices in relativistic QM. Hestenes eventually figured this out and did a lot of work to popularize his ‘Geometric Algebra’ starting in the 1960s, and a small but vocal group of mostly physicists has been pushing for increased use of multivectors / GA since then. More on this later.</p>
<p>Rota and his students also discovered Grassmann at some point (the 1960s as well, I think?) and developed the whole theory again as part of what they called ‘invariant theory’, in which they called multivectors ‘extensors’. They have a lot of good ideas but their notations largely suck. Rota and co. also overlapped into ‘matroid’ theory, which deals with the abstract notion of linear dependence and so ends up using a lot of the same ideas.</p>
<p>So “multivectors”, “extensors”, and “flats” (and “matroids” in the context of real vector spaces) (and “varieties” in some cases?) basically are all the same thing. “Exterior product”, “wedge product”, “progressive product”, and “join” are all the same operation.</p>
<p>For the most part I greatly prefer notations and terminology based on vector algebra, so I stick with “multivector” and translate other things where possible. However, it is undeniable that the best name for the exterior product is the <strong>join</strong>, and its dual is the <strong>meet</strong>.</p>
<p>Everyone also picks their choice of scalar coefficients differently. I always pick the one that involves the fewer factorial terms, and I don’t care about making sure the choices generalize to finite fields.</p>
<p>Unfortunately, Cartan and the vector analysis folks definitely got the symbol \(\^\) for the exterior product wrong. Projective geometers and Rota got it right: it should be \(\vee\), rather than \(\^\). Join is to vector spaces what union is to sets, and union is \(\cup\). Meet (discussed below) is analogous to \(\cap\). (And linear subspaces form a lattice, which already uses the symbols \(\^\) and \(\v\) this way, plus the terminology ‘join’ and ‘meet’!)</p>
<p>I’m going to keep using \(\^\) for join here for consistency with most of the literature, but it’s definitely wrong, so here’s an open request to the world:</p>
<p><strong>If you ever write a textbook using exterior algebra that’s going to be widely-read, please fix this notation for everyone by swapping \(\^\) and \(\v\) back. Thanks.</strong></p>
<hr />
<h2 id="note-on-duality">Note on duality</h2>
<p>Since I am mostly concerned with eventually using this stuff for physics, I can’t ignore the way physicists handle vector space duality. The inner product of vectors is defined only between a vector and its dual, and contraction is performed using a metric tensor, so \(g: V \o V^* \ra \bb{R}\). In index notation this means you always pair a lower index with an upper one: \(\b{u} \cdot \b{v} = u_i v^i\).</p>
<p>However, I think most of this should be intuitive even on plain Euclidean space with an identity metric, so I prefer first presenting each equation with no attention paid to duality, then a version with upper and lower indices. I’ll mostly avoid including a metric-tensor version for space, but it can be deduced from the index-notation version.</p>
<p>An added complication is that there is an argument to be made that use of the dual vector space to define the inner product is a <em>mistake</em>. I am not exactly qualified to say if this correct or not, but after everything I’ve read I suspect it is. The alternative to vector space duality is to define everything in terms of the volume form, so the inner product is defined by the relation:</p>
\[\alpha \^ \star \beta = \< \alpha, \beta \> \omega\]
<p>With \(\omega\) a choice of pseudoscalar. This means that the choice of metric becomes a choice of <em>volume form field</em>, which is actually pretty compelling. \(\< \alpha, \_ \>\) <em>is</em> a linear functional \(\in V^* \simeq V \ra \bb{R}\), and so counts as the dual vector space. But this can also make it tricky to define \(\star\), since some people think it should map vectors to dual vectors and vice versa.</p>
<p>Another idea is to interpret \(V^*\) as a “-1”-graded vector space relative to \(V\), such that \(\underset{-1}{a} \^ \underset{1}{b} = \underset{0}{(a \cdot b)}\). ‘Dual multivectors’ then have negative grades in general. This often seems like a good idea but I’m not sure about it yet.</p>
<p>Rota’s Invariant Theory school uses yet another definition of the inner product. They define the wedge product in terms of another operation, called a ‘bracket’ \([, ]\), so that \(\alpha \^ \star \beta = [\alpha, \beta] \omega\), but they also seem to treat the pseudoscalar as a regular scalar and so call this an inner product. I don’t think this is the right approach because I’m not comfortable forgetting the difference between \(\^^n \bb{R}\) and \(\bb{R}\), although as above I do like the idea of the volume form as defining the inner product. (They call the whole space equipped with such a bracket a ‘Peano space’. I don’t think the name caught on.)</p>
<hr />
<h2 id="1-the-tensor-product-o">1. The Tensor Product \(\o\)</h2>
<p>We should briefly mention the tensor product first. \(\o\) is the ‘free multilinear product’ on vector spaces. Multilinear means that \(u \o v\) is linear in both arguments: \((c_1 u_1 + c_2 u_2) \o v = c_1 (u_1 \o v) + (c_2 u_2 \o v)\), etc. <a href="https://en.wikipedia.org/wiki/Free_object">Free</a> means that any other multilinear product defined on vector spaces factors through \(\o\). Skipping some technicalities, this means if we have some other operation \(\ast\) on vectors which is multilinear in its arguments, then there is an map \(f\) with \(a \ast b = f(a \otimes b)\).</p>
<p>‘Free’-ness is generally a useful concept. \(\^\) happens to be the free <em>antisymmetric</em> multilinear product, so any other antisymmetric operation on the tensor algebra factors through \(\^\). There are ‘free’-r products than \(\o\) as well, if you let go of multilinearity and associativity.</p>
<p>\(\o\) acting on \(V\) (a vector space over \(\bb{R}\)) produces the ‘tensor algebra’ consisting of consisting of \(\o V = \bb{R} \oplus V \oplus V^{\o 2} \oplus \ldots\), with \(\o\) as the multiplication operation. There is a canonical inner product on any \(V^{\o n}\) inherited from \(V\)’s: \(\< \b{a} \o \b{b}, \b{c} \o \b{d} \> = \< \b{a}, \b{c} \> \< \b{b} , \b{d} \>\).</p>
<hr />
<h2 id="2-the-exterior-product-">2. The Exterior Product \(\^\)</h2>
<p>The basic operation of discussion is the exterior product \(\alpha \^ \beta\). Its most general definition is via the quotient of the tensor algebra by the relation \(x \o x \sim 0\) for all \(x\). Specifically, the exterior <em>algebra</em> is the algebra you get under this quotient; the exterior <em>product</em> is the behavior of \(\o\) under this algebra homomorphism.</p>
<p>Given a vector space \(V\) and tensor algebra \(\o V\), we define \(I\) as the ideal of elements of the form \(x \o x\) (so any tensor which contains any copy of the same basis vector twice). Then:</p>
\[\^ V \equiv V / I\]
<p>Elements in this quotient space are multivectors like \(\alpha \^ \beta\), and \(\o\) maps to the \(\^\) operation. If \(\pi\) is the canonical projection \(V \mapsto V/I\):</p>
\[\pi(\alpha) \^ \pi(\beta) \equiv \pi(\alpha \o \beta)\]
<p>In practice, you compute the wedge product of multivectors by just appending them, as the product inherits associativity from \(\o\) (with \(\| \alpha \| = m, \| \beta \| = n\)):</p>
\[\alpha \^ \beta = \alpha_1 \^ \ldots \^ \alpha_{m} \^ \beta_1 \^ \ldots \^ \beta_n\]
<p>There is are several standard ways to map a wedge product back to a tensor product (reversing \(\pi\), essentially, so we’ll write it as \(\pi^{-1}\) although it is not an inverse). One is to select <em>any</em> valid tensor:</p>
\[\pi^{-1} \alpha \stackrel{?}{=} (\alpha_1 \^ \ldots \^ \alpha_n) = \alpha_1 \o \ldots \o \alpha_n\]
<p>More useful, however, it to map the wedge product to a totally antisymmetrized tensor:</p>
\[\pi^{-1} \alpha = K \sum_{\sigma \in S_{m}} \sgn(\sigma) \alpha_{\sigma(1)} \o \ldots \o \alpha_{\sigma(m)}\]
<p>Where \(\sigma\) ranges over the permutations of \(m\) elements. This has \(m!\) terms for a basis vector \(\in \^^m \bb{R}^n\) ( a more complicated formula with \({n \choose m}\) terms is needed for general elements of \(\^^m \bb{R}^n\) – but you can basically apply the above for every component). It is impractical for algorithms but good for intuition. \(K\) is a constant that is chosen to be either \(1\), \(\frac{1}{m!}\), or \(\frac{1}{\sqrt{m!}}\), depending on the source. I prefer \(K=1\) to keep things simple. Here’s an example:</p>
\[\pi^{-1}(\b{x} \^ \b{y}) = \b{x} \o \b{y} - \b{y} \o \b{x}\]
<p>Antisymmetric tensors that appear in other subjects are usually supposed to be multivectors. Antisymmetrization is a familiar operation in Einstein notation:</p>
\[\b{a} \^ \b{b} \^ \b{c} \equiv a_{[i} b_j c_{k]} = \sum_{\sigma \in S_3} \sgn(\sigma) a_{\sigma(1)} b_{\sigma(2)} c_{\sigma(3)}\]
<p>Other names:</p>
<ul>
<li>“Wedge product”, because it looks like a wedge</li>
<li>“Progressive Product” (by Grassmann and Gian-Carlo Rota). ‘Progressive’ because it increases grades.</li>
<li>“Join”, in projective geometry and lattice theory. So-called because the wedge product of two vectors gives the linear subspace spanned by them, if it is non-zero.</li>
</ul>
<p>As mentioned above, the symbol for ‘join’ in other fields is \(\vee\). Exterior algebra has it backwards. It’s definitely wrong: these operations in a sense generalize set-theory operations, and \(\^\) should correspond to \(\cup\).</p>
<hr />
<h2 id="3-the-inner-product--">3. The Inner Product \(\<, \>\)</h2>
<p>The multivector inner product, written \(\alpha \cdot \beta\) or \(\< \alpha, \beta \>\), where \(\alpha\) and \(\beta\) have the same grade.</p>
<p>There are several definitions that disagree on whether it should have any scaling factors like \(\frac{1}{k!}\), depending on the definition of \(\^\). I think the only reasonable definition is that \((\b{x \^ y}) \cdot (\b{x \^ y}) = 1\). This means that this is <em>not</em> the same operation as the <em>tensor</em> inner product, applied to antisymmetric tensors:</p>
\[(\b{x \^ y}) \cdot (\b{x \^ y}) \neq (\b{x \o y} - \b{y \o x}) \cdot (\b{x \o y} - \b{y \o x}) = 2\]
<p>But it’s just too useful to normalize the magnitudes of all basis multivectors. It avoids a lot of \(k!\) factors that would otherwise appear everywhere.</p>
<p>To compute, either antisymmetrize <em>both</em> sides in the tensor representation and divide by \(k!\), or just antisymmetrize one side (either one):</p>
\[\begin{aligned}
(\b{a \^ b}) \cdot (\b{c \^ d}) &= \frac{1}{2!}(\b{a \o b} - \b{b \o a}) \cdot (\b{c \o d} - \b{d \o c}) \\
&= (\b{a \o b}) \cdot (\b{c \o d} - \b{d \o c}) \\
&= (\b{a \cdot c}) (\b{b \cdot d}) - (\b{a \cdot d}) (\b{b \cdot c})
\end{aligned}\]
<p>This also gives the coordinate form:</p>
\[(\b{a \^ b}) \cdot (\b{c \^ d}) = a_i b_j c^{[i} d^{j]} = a_i b_j (c^i d^j - c^j d^i)\]
<p>Or in general:</p>
\[\< \alpha, \beta \> = \< \bigwedge \alpha_i, \bigwedge \beta_j \> = \det(\alpha_i \cdot \beta_j)\]
<hr />
<h2 id="4-the-interior-product-cdot">4. The Interior Product \(\cdot\)</h2>
<p>The interior product is the ‘curried’ form of the inner product:</p>
\[\< \b{a} \^\alpha, \beta \> = \< \alpha, \b{a} \cdot \beta \>\]
<p>This is written as either \(\b{a} \cdot \beta\) or \(\iota_{\b{a}} \cdot \beta\). Computation is done by antisymmetrizing the side with the larger grade, then contracting:</p>
\[\b{a} \cdot (\b{b \^ c}) = \b{a} \cdot (\b{b \o c} - \b{c \o b}) = (\b{a} \cdot \b{b}) \b{c} - (\b{a} \cdot \b{c}) \b{b}\]
<p>In index notation:</p>
\[\b{a} \cdot (\b{b \^ c}) = a_i b^{[i} c^{j]} = a_i (b^{i} c^{j} - b^j c^i)\]
<p>Other names: the “contraction” or “insertion” operator, because it inserts its left argument into some of the ‘slots’ in the inner product of the right argument.</p>
<p><strong>The two-sided interior product</strong></p>
<p>Normally, in the notation \(\alpha \cdot \beta\), it’s understood that the lower grade is on the left, and the operation isn’t defined otherwise. But some people ignore this restriction, and I’m warming up to doing away with it entirely. I can’t see any reason not to define it to work either way.</p>
<p>When tracking dual vectors we need to be careful about which side ends up ‘surviving’. To be explicit, let’s track which ones we are considering as dual vectors:</p>
\[\b{x}^* \cdot (\b{x} \^ \b{y}) = \b{y} \\
(\b{x}^* \^ \b{y}^*) \cdot \b{x} = \b{y}^*\]
<p>Note that in both cases the vectors contract <em>left-to-right</em>. One vector / dual-vector is inserted into the ‘slots’ of the other dual-vector/vector. In coordinates, these are:</p>
\[\b{a}^* \cdot (\b{b \^ c}) = a_i (b^{[i} c^{j]})\]
\[(\b{b}^* \^ \b{c}^*) \cdot \b{a} = a^i(b_{[i} c_{j]})\]
<hr />
<h2 id="5-the-hodge-star-star">5. The Hodge Star \(\star\)</h2>
<p>\(\star\) produces the ‘complementary subspace’ to the subspace denoted by a multivector. It is only defined relative to a choice of pseudoscalar \(\omega\) – usually chosen to be all of the basis vectors in lexicographic order, like \(\b{x \^ y \^ z}\) for \(\bb{R}^3\). Then:</p>
\[\star \alpha = \alpha \cdot \omega\]
<p>A more common but less intuitive definition:</p>
\[\alpha \^ (\star \beta) = \< \alpha, \beta \> \omega\]
<p>The inner product and Hodge star are defined in terms of each other in various sources. For my purposes, it makes sense to assume the form of the inner product.</p>
<p>In practice, I compute \(\star \alpha\) in my head by finding a set of basis vectors such at \(\alpha \^ \star \alpha = \omega\) (up to a scalar). Explicit example in \(\bb{R}^4\):</p>
\[\star(\b{w} \^ \b{y}) = - \b{x \^ z}\]
<p>because</p>
\[\b{(w \^ y) \^ x \^ z} = - \b{w \^ x \^ y \^ z} = - \omega\]
<p>In Euclidean coordinates, \(\omega\) is given by the Levi-Cevita symbol \(\epsilon_{ijk}\), and \(\star \alpha = \alpha \cdot \omega\) works as expected:</p>
\[\star(\b{a} \^ \b{b})_k = \epsilon_{ijk} a^i b^j\]
<p>This is using the convention that the \(\star\) of a vector is a lower-index dual vector. I’ve seen both conventions: some people would additionally map it back to a vector using the metric:</p>
\[\star(\b{a} \^ \b{b})^k = \epsilon_{ij}^k a^i b^j = g^{kl} \epsilon_{ijl} a^i b^j\]
<p>Either convention seems fine as long as you keep track of what you’re doing. They’re both valid in index notation, anyway; the only difference is choosing which is meant by \(\star \alpha\).</p>
<p>It is kinda awkward that \(\omega\) is the usual symbol for the pseudoscalar object but \(\e\) is the symbol with indices. It is amusing, though, that \(\e\) looks like a sideways \(\omega\). I’ll stick with this notation here but someday I hope we could just use \(\omega\) everywhere, since \(\e\) is somewhat overloaded.</p>
<p>\(\star\) is sometimes written \(\ast\), but I think that’s uglier. In other subjects it’s written as \(\star \alpha \mapsto \alpha^{\perp}\) which I do like.</p>
<p>We need a bit of notation to handle \(\star\) is arbitrary dimensions. We index with multi-indices of whatever grade is needed – for the Levi-Cevita symbol, we write \(\e_{I}\) where \(I\) ranges over the one value, \(\omega\), of \(\^^n V\) (note: this is different than ranging over <em>every</em> choice of \(I\) with \(n!\) terms. Instead, we index by a single multivector term. It’s a lot easier.) To express contraction with this, we split the index into two multi-indices: \(\e_{I \^ J}\), so \(\star \alpha\) is written like this:</p>
\[(\star \alpha)_{K} = \alpha^I \e_{I K}\]
<p>The implicit sum is over every value of \(I \in \^^{\| \alpha \|} V\).</p>
<p>Note that in general \(\star^2 \alpha = (-1)^k (-1)^{n-k} \alpha\), so \(\star^{-1} \alpha = (-1)^k (-1)^{n-k} \star \alpha\).</p>
<hr />
<h2 id="6-the-cross-product-times">6. The Cross Product \(\times\)</h2>
<p>The cross-product is only defined in \(\bb{R}^3\) and is given by:</p>
\[\b{a} \times \b{b} = \star (\b{a} \^ \b{b})\]
<p>Some people say there is a seven-dimensional generalization of \(\times\), but they’re misguided. This generalizes to every dimension.</p>
<hr />
<h2 id="7-the-partial-trace-cdot_k">7. The Partial Trace \(\cdot_k\)</h2>
<p>In index notation it is common to take a ‘partial trace’ of a tensor: \(c_i^k = a_{ij} b^{jk}\), and sometimes we see a partial trace of an antisymmetric tensor:</p>
\[c_j^k = a_{[i, j]} b^{[j, k]} = (a_{ij} - a_{ji})(b^{jk} - b^{kj}) = a_{ij} b^{jk} - a_{ji} b^{jk} - a_{ij} b^{kj} + a_{ji} b^{kj}\]
<p>For whatever reason I have never seen an coordinate-free notation for this for multivectors. But it’s actually an important operation, because if we treat bivectors as rotation operators on vectors, it’s how they compose:</p>
\[[(a \b{x} + b \b{y}) \cdot (\b{x \^ y})] \cdot (\b{x \^ y} ) = (a \b{y} - b \b{x}) \cdot (\b{x \^ y}) = - (a \b{x} + b \b{y})\]
<p>Which means that apparently</p>
\[R_{xy}^2 = (\b{x} \^ \b{y}) \circ (\b{x} \^ \b{y}) = -(\b{x \o x} + \b{y \o y})\]
<p>Note that the result <em>isn’t</em> a multivector. In general it’s an element of \(\^ V \o \^ V\).</p>
<p>But it’s still useful. What’s the right notation, though? Tentatively, I propose we write \(\cdot_k\) to mean contracting \(k\) terms together. The choice of <em>which terms</em> is a bit tricky. The geometric product, discussed later, suggests that we should do inner-to-outer. But the way we already handle inner products suggests left-to-right. For consistency let’s go with the latter, and insert \(-1\) factors as necessary.</p>
<p>The partial trace of two multivectors is implemented like this:</p>
\[\alpha \cdot_k \beta = \sum_{\gamma \in \^^k V} (\gamma \cdot \alpha) \o (\gamma \cdot \beta) \in \^ V \o \^ V\]
<p>Where the sum is over unit-length basis multivectors \(\gamma\). Note that this use of \(\o\) is <em>not</em> the multiplication operation in the tensor algebra we constructed \(\^ V\) from; rather, it is the \(\o\) of \(\^ V \o \^ V\). This translates to:</p>
\[[\alpha \cdot_k \beta]_{J K} = \alpha_{IJ} \beta^I_{K} = \delta^{IH} \alpha_{IJ} \beta_{HK}\]
<p>(That \(\delta\) is the identity matrix; recall that indexing it by multivectors \(I, H \in \^^k V\) means to take elements of \(\delta^{\^^k}\) which is the identity matrix on \(\^^k V\).)</p>
<p>This construction gives \((\b{x \^ y})^{(\cdot_1) ^2} = (\b{x \o x + y \o y}) = I_{xy}\), because we contracted the first indices together. When used on a vector as a rotation operator, we need a rule like this:</p>
\[R_{xy}^2 = - (\b{x \^ y})^{\cdot_1 2}\]
<p>In general, contracting operators that are going to act on grade-\(k\) objects gives \(O \circ O = (-1)^k O^{\cdot 2}\). But I don’t think it’s worth thinking too hard about this: the behavior is very specific to the usage.</p>
<p><strong>Partial Star:</strong></p>
<p>One funny thing we can do with a partial trace is apply \(\star\) to one component of a multivector :</p>
\[\star_k \alpha = \alpha \cdot_k \omega\]
<p>Example in \(\bb{R}^3\):</p>
\[\begin{aligned}
\star_1 \b{x \^ y} &= (\star \b{x}) \o \b{y} - (\star \b{y}) \o \b{x} \\
&= (\b{y \^ z}) \o \b{y} - (\b{z \^ x}) \o \b{x}
\end{aligned}\]
<p>I would have thought this was overkill and would never be useful, but it turns out it has a usage in the next section.</p>
<p><strong>Coproduct slice:</strong></p>
<p>Prior to this section I haven’t really considered tensor powers of exterior algebras like \(\^ V \o \^ V\) in general before, except for wedge powers of matrices like \(\^^2 A\). But they do come up in the literature sometimes. Rota & Co had an operation they called the “coproduct slice” of a multivector, which splits a multivector in two by antisymmetrically replacing one of the \(\^\) positions with a \(\o\), like this:</p>
\[\p_{2,1} (\b{x \^ y \^ z}) = (\b{x \^ y}) \o \b{z} + (\b{y \^ z}) \o \b{x} + (\b{z \^ x}) \o \b{y}\]
<p>This gets at the idea that any wedge product (the free antisymmetric multilinear product) factors through the tensor product (the free multilinear product), and some concepts make more sense on the tensor product. For instance, it makes more sense to me to take the trace of two tensored terms than of two wedged terms. In general I am still trying to figure out for myself whether the “quotient algebra” or “antisymmetric tensor algebra” senses of \(\^\) are more important and fundamental, and the right way to think about the two.</p>
<p>Up to a sign, the coproduct slice can be implemented by tracing over the unit basis \(k\)-vectors:</p>
\[\p_{k, n-k} \beta = \sum_{\alpha \in \^^k V} \alpha \o (\alpha \cdot \beta )\]
<hr />
<h2 id="8-the-meet-vee">8. The Meet \(\vee\)</h2>
<p>\(\star\) maps every multivector to another one. Its action on the wedge product is to produce a dual operation \(\vee\), called the <em>meet</em> (recall that the wedge product is also aptly called the ‘join’).</p>
\[(\star \alpha) \vee (\star \beta) = \star(\alpha \^ \beta)\]
<p>The result is a complete exterior algebra because it’s isomorphic to one under \(\star\). So <em>both</em> of these are valid exterior algebras obeying the exact same rules:</p>
\[\^ V = (\^, V)\]
\[\vee V = (\vee, \star V)\]
<p>All operations work the same way if a \(\star\) is attached to every argument and we replace \(\^\) with \(\vee\):</p>
\[\star (\b{a} \^ \b{b}) = (\star \b{a}) \vee (\star \b{b})\]
<p>\(\vee \bb{R}^2\) is, for instance, spanned by \((\star 1, \star \b{x}, \star \b{y}, \star (\b{x} \^ \b{y})) = (\b{x \^ y}, \b{y}, - \b{x}, 1)\)</p>
<p>Sometimes \((\^, \v, V)\) is called a ‘double algebra’: a vector space with a choice of pseudoscalar and two dual exterior algebras. It’s also called the <a href="https://en.wikipedia.org/wiki/Grassmann%E2%80%93Cayley_algebra">Grassman-Cayley Algebra</a>. I like to write it as \(\^{ \v }V\).</p>
<p>The meet is kinda weird. It is sorta like computing the union of two linear subspaces:</p>
\[(\b{x \^ y}) \vee (\b{y \^ z}) = (\star\b{z}) \vee (\star\b{x}) = \star (\b{z \^ x}) = \b{y}\]
<p>But it only works if the degrees of the two arguments add up to \(\geq n\):</p>
\[\b{x} \vee \b{y} = \star(\b{y \^ z} \^ \b{z \^ x}) = 0\]
<p>A general definition is kinda awkward, but we can do it using the \(\star_k\) operation from the previous section. It looks like this:</p>
\[\alpha \vee \beta = (\star_{\| \beta \|} \alpha) \cdot \beta\]
<p>The \(\alpha\) will be inner-product’d with the \(\star\)‘d terms of \(\beta\). Recall that \(\star_k \beta\) becomes a sum of tensor products \(\beta_1 \o \beta_2\). We end up dotting \(\alpha\) with the first term:</p>
\[\alpha \vee \beta = [\sum_{\alpha_1 \^ \alpha_2 = \alpha} (\star \alpha_1) \o \alpha_2] \cdot \beta = \sum_{\alpha_1 \^ \alpha_2 = \alpha} (\star \alpha_1 \cdot \beta) \alpha_2\]
<p>(This is a sum over ‘coproduct slices’ of \(\alpha\), in one sense. This kind of sum is called ‘Sweedler Notation’ in the literature.) This is non-zero only if \(\beta\) contains all of the basis vectors <em>not</em> in \(\alpha\). It makes more sense on an example:</p>
\[\begin{aligned}
(\b{x \^ y}) \vee (\b{y} \^ \b{z}) &= \star_1 (\b{x \^ y}) \cdot (\b{y} \^ \b{z}) \\
&= (\b{y \^ z}) \o \b{y} - (\b{z \^ x}) \o \b{y}) \cdot (\b{y \^ z}) \\
&= \b{y}
\end{aligned}\]
<p>In index notation:</p>
\[(\alpha \vee \beta)_K = \alpha_{IJ} \e^{IK} \beta_{}\]
<p>Or we can directly translate \((\star \alpha) \vee (\star \beta) = \star(\alpha \^ \beta)\):</p>
\[(\star \alpha \vee \star \beta)^K = (\e_{IJ} \alpha^I) (\e_{JK} \beta^J) \e^{JKL}\]
<p>Note: I got exhausted trying to verify the signs on this, so they might be wrong. At some point I’ll come back and fix them.</p>
<p>Note 2: remember that \(\star^{-1} = (-1)^{k(n-k)} \star \neq \star\) in some dimensions, so you need to be careful about applying the duality to compute \(\vee\): \(\alpha \vee \beta = \star(\star^{-1} \alpha \^ \star^{-1} \beta)\). Also note that, since \(\vee\) is defined in terms of \(\star\), it is explicitly dependent on the choice of \(\omega\).</p>
<p>As mentioned above, the symbols for join and meet are definitely <em>swapped</em> in a way that’s going to be really hard to fix now. It should be meet = \(\^\), join = \(\vee\), so it matches usages everywhere else, as well as usages of \(\cup\) and \(\cap\) from set / boolean algebras.</p>
<p>Since \(\vee V\) is part of another complete exterior algebra, it also has all of the other operations, including a ‘dual interior product’ \(\alpha \cdot_{\vee} \beta\). I have never actually seen it used, but it exists.</p>
<hr />
<h2 id="9-relative-vee_mu-_mu-and-star_mu">9. Relative \(\vee_\mu\), \(\^_\mu\), and \(\star_\mu\),</h2>
<p>We saw that \(\star\) and by extension are \(\vee\) defined relative to a choice of pseudoscalar \(\omega\). What if we choose differently? It turns out that this is actually occasionally useful – I saw it used in <em>Oriented Projective Geometry</em> by Jorge Stolfi, which develops basically all of exterior algebra under an entirely different set of names. We write \(\star_{\mu}\) and \(\vee_{\mu}\) for the star / meet operations relative to a ‘universe’ multivector \(\mu\):</p>
\[\star_{\mu} \alpha = \alpha \cdot \mu\]
\[(\star_\mu \alpha) \vee_{\mu} (\star_\mu \beta) = \star_{\mu} (\alpha \^ \beta)\]
<p>The regular definitions set \(\mu = \omega\). The resulting exterior algebra shows us that any subset of the basis vectors of a space form an exterior algebra themselves. In case this seeems like pointless abstraction, I’ll note that it does come up, particularly when dealing with projective geometry. If \(\b{w}\) is a projective coordinate, we can write the projective \(\star_{\b{wxyz}}\) in terms of \(\star_{\b{xyz}}\):</p>
\[\star_{\b{wxyz}}( w \b{w} + x \b{x} + y \b{y} + z \b{z}) = \b{w} \^ \star_{\b{xyz}}(x\b{x} + y\b{y} +z \b{z}) + w (\b{x \^ y \^ z})\]
<p>There is also a way to define \(\^\) relative to a ‘basis’ multivector, \(\^_{\nu}\). The behavior is to join two multivectors ignoring their component along \(\nu\):</p>
\[(\nu \^ \alpha) \^_{\nu} (\nu \^ \beta) = \nu \^ (\alpha \^ \beta)\]
<p>For unit \(\nu\), this can be implemented as:</p>
\[\alpha \^_{\nu} \beta = \nu \^ (\nu \cdot \alpha) \^ (\nu \cdot \beta))\]
<p>It’s neat that for choices of \(\nu, \mu\), we can produce another exterior double algebra embedded within \((\^, \v, V)\):</p>
\[(\^_{\nu}, \v_{\mu}, \nu, \mu, V)\]
<p>Our regular choice of exterior algebra on the whole space is then given by:</p>
\[(\^, \v, V) = (\^_1, \v_\omega, 1, \omega, V)\]
<hr />
<h2 id="10-the-geometric-product-alphabeta">10. The Geometric Product \(\alpha\beta\)</h2>
<p>There is much to say about <a href="https://en.wikipedia.org/wiki/Geometric_algebra">Geometric algebra</a> and the ‘geometric product’. (Other names: “Clifford Algebra”, “Clifford Product”.)</p>
<p>GA is how I got into this stuff in the first place, but I avoid using the name for the most part because there is some social and mathematical baggage that comes with it. But its proponents deserve credit for popularizing the ideas of multivectors in the first place – I’m pretty sure we all agree that multivectors, as a concept, should be used and taught everywhere.</p>
<p>The social baggage is: the field, while perfectly credible in theory, tends to attract an unusual rate of cranks (many of them ex-physics students who want to ‘figure it all out’ – like myself! I might be a crank. I’m not sure.). The mathematical baggage is the proliferation of notations that are hard to use and not very useful.</p>
<p>The geometric product is a generalization of complex- and quaternion-multiplication to multivectors of any grade. The inputs and outputs are linear combinations of multivectors of any grade. It’s generally defined as another quotient of the tensor algebra: instead of just \(x \o x \sim 0\), as defined the exterior algebra, we use \(x \o y \sim - y \o x \sim 0\) (so we can still exchange positions of elements in a tensor) but \(x \o x \sim 1\). This means duplicate tensor terms are just replaced with \(1\) in tensor products, rather than annihilating the whole thing, like this:</p>
\[x \o x \o y \o x \o y \sim (x \o x) \o y \o (-y) \o x \sim -x\]
<p>The geometric product is the action of \(\o\) under this equivalence relation. In geometric algebra texts it is written with juxtaposition, since it generalizes scalar / complex multiplication that are written that way. I’ll do that for this section.</p>
\[(\b{xy})(\b{xyz}) = (\b{xy}) (-\b{yxz}) = (\b{x})(\b{xz}) = -\b{z}\]
<p>It’s associative, but not commutative or anticommutative in general.</p>
<p>The primary reason to use this operation is that its implementations on \(\bb{R}^2\), \(\bb{R}^3\), and \(\bb{R}^{3,1}\) are already used:</p>
<ul>
<li>The geometric product on even-graded elements of \(\bb{R}^2\) implements complex multiplication.</li>
<li>The geometric product on even-graded elements of \(\bb{R}^3\) implements quaternion multiplication.</li>
<li>The geometric product on four elements \((\b{t, x, y, z})\) with the \(x^2 = y^2 = z^2 = -1\) is implemented by the <a href="https://en.wikipedia.org/wiki/Gamma_matrices">gamma matrices</a> \(\gamma^{\mu}\) which are used in quantum mechanics.
<ul>
<li>(I won’t discuss the alternate metric in this article, but it’s done by using \(x \o x \sim Q(x,x)\) in the quotient construction of the algebra, where \(Q\) is the symmetric bilinear form that’s providing a metric.)</li>
</ul>
</li>
</ul>
<p>Geometric algebra tends to treat the geometric product as fundamental, and then produce the operations from it. For vectors, the definitions are:</p>
\[\< \b{a}, \b{b} \> = \frac{1}{2}(\b{ab + ba})\]
\[\b{a} \^ \b{b} = \frac{1}{2}(\b{ab - ba})\]
<p>But we could also define things the other way:</p>
\[\b{ab} = \frac{1}{2}(\b{a \cdot b} + \b{a \^ b})\]
<p>Multivector basis elements are just written by juxtaposing the relevant basis vectors, since \(\b{xy} = \b{x \^ y}\). I like this notation and should start using it even if I avoid the geometric product; it would save a lot of \(\^\)s.</p>
<p>To define the geometric product in terms of the other operations on this page, we need to define the <strong>reversion</strong> operator, which inverts the order of the components in a geometric product (with \(k\) as the grade of the argument):</p>
\[(abcde)^{\dag} = edcba = (-1)^{k(k-1)/2} (abcde)\]
<p>This generalizes complex conjugation, since it takes \(\b{xy} \ra -\b{xy}\) in \(\bb{R}^2\) and \(\bb{R}^3\). It allows us to compute geometric products, which contracts element from inner to outer, using the operations already defined on this page, which I have defined as contracting left-to-right in every case. The general algorithm for producing geometric products out of previously-mentioned operations then is to try projecting the onto <em>every</em> basis multivector:</p>
\[\alpha \beta = \sum_{\gamma \in \^^ V} (\gamma \cdot \alpha^\dag) \^ (\gamma \cdot \beta)\]
<p>This translates into index notation as:</p>
\[\alpha \beta = \sum_{\gamma \in \^^ V} (-1)^{\| \alpha \| ( \| \alpha \| -1)/2} \gamma_I \gamma_K \alpha^{I}_{[J}\beta^{K}_{L]}\]
<p>I think we can agree that’s pretty awkward. But it’s hard to be sure what to do with it. Clearly it’s <em>useful</em>, at least in the specific cases of complex and quaternions multiplication.</p>
<p>My overall opinion on the geometric product is this:</p>
<ul>
<li>I <em>tentatively</em> think that it is mis-defined to use inner-to-outer contraction, because of the awkward signs and conjugation operations that result.
<ul>
<li>I suspect the appeal of defining contraction this way was to make \((\b{xy})^2 = -1\), in order to produce something analogous to \(i^2 = -1\). But imo it’s really much more elegant if all basis elements have \(\alpha^2 = 1\).</li>
<li>If we want to preserve the existing of a multiplication operation with \(\alpha^2 = -1\), we can <em>define</em> the geometric product as \(\alpha \beta = \alpha^{\dag} \cdot \beta\) or something like that. Maybe.</li>
<li>Associativity is really nice, though. So maybe it’s my definition of the other products that’s wrong for doing away with it.</li>
</ul>
</li>
<li>However, it works suspiciously well for complex numbers, quaternions, and gamma matrices.</li>
<li>And it works suspiciously well for producing something that acts like a multiplicative inverse (see below).</li>
<li>But I know of almost zero cases where mixed-grade multivectors are useful, except for sums of “scalars plus one grade of multivector”.</li>
<li>I can’t find any general geometric intuition for the product in general.</li>
<li>So I’m mostly reserving judgment on the subject, until I figure out what’s going on more completely.</li>
</ul>
<hr />
<p><strong>Other operations of geometric algebra</strong></p>
<p>Unfortunately geometric algebra is afflicted by way too many other unintuitive operations. Here’s most of them:</p>
<ol>
<li><strong>Grade projection</strong>: \(\< \alpha \>_k = \sum_{\gamma \in \^^k V} (\gamma \cdot \alpha) \o \gamma\) extracts the \(k\)-graded terms of \(\alpha\).</li>
<li><strong>Reversion</strong>: \((abcde)^{\dag} = edcba = (-1)^{r(r-1)/2} (abcde)\). Generalizes complex conjugation.</li>
<li><strong>Exterior product</strong>: same operation as above, but now defined \(A \^ B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{r + s}\)</li>
<li><strong>Commutator product</strong>: \(A \times B = \frac{1}{2}(AB - BA)\). I don’t know what the point of this is.</li>
<li><strong>Meet</strong>: same as above, but now defined \(A \vee B = I(AI^{-1}) \^ (BI^{-1})\). GA writes the pseudoscalar as \(I\) and \(AI^{-1} = \star^{-1} A\).</li>
<li><strong>Interior product</strong>: for some reason there are a bunch of ways of doing this.
<ul>
<li><strong>Left contraction</strong>: \(A ⌋ B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{r - s}\)</li>
<li><strong>Right contraction</strong>: \(A ⌊ B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{s - r}\)</li>
<li><strong>Scalar product</strong>: \(A * B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{0}\)</li>
<li><strong>Dot product</strong>: \(A \cdot B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{\| s - r \|}\)</li>
</ul>
</li>
<li>There are a few other weird ‘conjugation’ operations (see <a href="https://en.wikipedia.org/wiki/Paravector">here</a>) but I think they’re thankfully fading out of usage.</li>
</ol>
<hr />
<h2 id="11-multivector-division-alpha-1">11. Multivector division \(\alpha^{-1}\)</h2>
<p>Ideally division of multivectors would produce a multivector \(\alpha^{-1}\) that inverts \(\^\):</p>
\[\frac{\alpha \^ \beta}{\alpha} = \beta\]
<p>There are several problems with this, though. One is that \(\alpha \^ \beta\) may be \(0\). Another is that \(\^\) isn’t commutative, so presumably \(\alpha^{-1} (\alpha \^ \beta)\) and \((\alpha \^ \beta) \alpha^{-1}\) are different. Another is that \(\beta + K \alpha\) is also a solution for any \(K\):</p>
\[\alpha \^ (\beta + K \alpha) = \alpha \^ \beta\]
<p>Or for any multivector \(\gamma\) with \(\gamma \^ \alpha = 0\):</p>
\[\alpha \^ (\beta + \gamma) = \alpha \^ \beta\]
<p>So there are at least a few ways to define this.</p>
<p><strong>Multivector division 1</strong>: Use the interior product and divide out the magnitude:</p>
\[\alpha^{-1} \beta = \frac{\alpha}{\| \alpha \|^2} \cdot \beta\]
<p>This gives up on trying to find <em>all</em> inverses, and just identifies one of them. It sorta inverts the wedge product, except it extracts only the orthogonal component in the result:</p>
\[\b{a}^{-1} (\b{a} \^ \b{b}) = \frac{\b{a}}{\| \b{a} \|^2} \cdot (\b{a} \^ \b{b}) = \b{b} - \frac{\b{a} (\b{a} \cdot \b{b})}{\| \b{a} \|^2} = \b{b} - \b{b}_{\parallel \b{a}} = \b{b}_{\perp \b{a}}\]
<p>The result is the ‘rejection’ of \(\b{b}\) off of \(\b{a}\). It doesn’t quite ‘invert’ \(\^\), but it’s a pretty sensible result. It is commutative due to our definition of the two sided interior product (both terms on contract left-to-right either way). If \(\b{a \^ b} = 0\) in the first place, then this rightfully says that \(\b{b}_{\perp \b{a}} = 0\) as well, which is nice.</p>
<p><strong>Multivector division 2</strong>: Allow the result to be some sort of general object, not a single-value:</p>
\[\alpha^{-1} \beta = \frac{\alpha}{\| \alpha \|^2} \cdot \beta + K\]
<p>where \(K\) is “the space of all multivectors \(\gamma\) with \(\alpha \^ \gamma = 0\)”. This operation produces the true preimage of multiplication via \(\^\), at the loss of an easy way to represent the result. But I suspect this definition is good and meaningful and is sometimes necessary to get the ‘correct’ answer.</p>
<p><strong>Multivector division 3</strong>: Use the geometric product.</p>
<p>The geometric product produces something that actually <em>is</em> division on GA’s versions of complex numbers and quaternions (even-graded elements of \(\^ \bb{R}^2\) and \(\^ \bb{R}^3\)):</p>
\[a^{-1} b = \frac{ab}{aa} = \frac{ab}{\| a \|^2}\]
<p>This is only defined for \(\| a \| \neq 0\) (remember, since GA has elements with \(\alpha^2 = -1\), you can have \(\| 1 + i \|^2 = 1^2 + i^2 = 0\)). You can read a lot about this inverse online, such as how to use it to reflect and rotate vectors.</p>
<hr />
<p>Cut for lack of time or knowledge:</p>
<ul>
<li>Exterior derivative and codifferential</li>
<li><a href="https://en.wikipedia.org/wiki/Cap_product">Cup and cap product</a> from algebraic topology. As far as I can tell these essentially implement \(\^\) and \(\vee\) on co-chains, which are more-or-less isomorphic to multivectors.</li>
</ul>
<hr />
<p>Other articles related to Exterior Algebra:</p>
<ol start="0">
<li><a href="/2018/08/06/oriented-area.html">Oriented Areas and the Shoelace Formula</a></li>
<li><a href="/2018/10/08/exterior-1.html">Matrices and Determinants</a></li>
<li><a href="/2018/10/09/exterior-2.html">The Inner product</a></li>
<li><a href="/2019/01/26/hodge-star.html">The Hodge Star</a></li>
<li><a href="/2019/01/27/interior-product.html">The Interior Product</a></li>
<li><a href="/2020/10/15/ea-operations.html">All the Exterior Algebra Operations</a></li>
</ol>
The essence of complex analysis2020-08-10T00:00:00+00:00https://alexkritchevsky.com/2020/08/10/complex-analysis<p>Rapid-fire non-rigorous intuitions for calculus on complex numbers. Not an introduction to the subject.</p>
<!--more-->
<p>Contents:</p>
<ul id="markdown-toc">
<li><a href="#1-the-complex-plane" id="markdown-toc-1-the-complex-plane">1. The complex plane</a></li>
<li><a href="#2-holomorphic-functions" id="markdown-toc-2-holomorphic-functions">2. Holomorphic functions</a></li>
<li><a href="#3-residues" id="markdown-toc-3-residues">3. Residues</a></li>
<li><a href="#4-integral-tricks" id="markdown-toc-4-integral-tricks">4. Integral tricks</a></li>
<li><a href="#5-topological-concerns" id="markdown-toc-5-topological-concerns">5. Topological concerns</a></li>
<li><a href="#6-convergence-concerns" id="markdown-toc-6-convergence-concerns">6. Convergence concerns</a></li>
<li><a href="#7-global-laurent-series" id="markdown-toc-7-global-laurent-series">7. Global Laurent Series</a></li>
</ul>
<hr />
<h2 id="1-the-complex-plane">1. The complex plane</h2>
<p>Calculus on \(\bb{C}\) is more-or-less just calculus on \(\bb{R}^2\), under the substitutions:</p>
\[\begin{aligned}
i &\lra R \\
a + bi & \lra (a + R b) \hat{x} = a \hat{x} + b \hat{y}
\end{aligned}\]
<p>Where \(R\) is the “rotation operator”. The identity \(\cos \theta + i \sin \theta = e^{i \theta}\) follows from applying the <a href="https://en.wikipedia.org/wiki/Exponential_map">exponential map</a> to \(R\) as the generator of rotations. If I had my way we would not use complex numbers ever and would just learn the subject as ‘calculus using rotation operators’ to avoid a proliferation of things that seem like magic.</p>
<p>The one way that \(\bb{C}\) is more than just \(\bb{R}^2\) is that there is a definition of multiplying two vectors:</p>
\[(a + b i) (c + d i) = (ac - bd) + (ad + bc) i\]
<p>The best way I know to interpret this is like this. The correspondence \(a + bi \Ra (a + R b) \hat{x}\) suggests that we interpret a complex number as an operator that is understood to ‘act on’ the \(\hat{x}\) basis vector. In this sense both adding and multiplying complex numbers are natural operations: adding them applies both operations to \(\hat{x}\) and adds the result; multiplying them applies them sequentially.</p>
\[[(a + b R) \circ (c + d R) ](\hat{x}) = [(ac - bd) + (ad + bc) R] (\hat{x})\]
<p>This model is especially appealing because it is easy to extend to higher dimensions.</p>
<p>So we want to do calculus on these operators on \(\bb{R}^2\). We start by identifying derivatives and differential forms. The differentials of the coordinate vectors are:</p>
\[\begin{aligned}
dz &= dx + i dy \\
d\bar{z} &= dx - i dy
\end{aligned}\]
<p>The partial derivatives are for some reason given the name <a href="https://en.wikipedia.org/wiki/Wirtinger_derivatives">Wirtinger derivatives</a>:</p>
\[\begin{aligned}
\p_z &= \frac{1}{2}(\p_x - i \p_y) \\
\p_{\bar{z}} &= \frac{1}{2}(\p_x + i \p_y)
\end{aligned}\]
<p>Note that the signs are swapped, compared to the forms, and factors of \(\frac{1}{2}\) have appeared. These are necessary because of the requirement that that \(\p_z (z) = \p_{\bar{z}} (\bar{z}) = 1\). In an alternate universe both sides might be given \(\frac{1}{\sqrt{2}}\) factors instead.</p>
<p>There are other parameterizations of \(\{ z, \bar{z} \}\) in terms of \(\bb{R}^2\) coordinates. The most common choice is polar coordinates: \(z = re^{i \theta}\) and \(\bar{z} = r e^{-i \theta}\). Then the forms are:</p>
\[\begin{aligned}
dz &= e^{i \theta} (dr + i r d \theta) \\
d\bar{z} &= e^{-i \theta} (dr - i r d \theta)
\end{aligned}\]
<p>Then the partial derivatives would be:</p>
\[\begin{aligned}
\p_z &= \frac{e^{-i \theta}}{2} (\p_r - \frac{i}{r} \p_\theta) \\
\p_{\bar{z}} &= \frac{e^{i \theta}}{2} (\p_r + \frac{i}{r} \p_\theta)
\end{aligned}\]
<p>Although these don’t come up very much. Note that any function that explicitly uses \(r\) or \(\theta\) has a \(\bar{z}\) dependency unless they cancel it out somehow, since both \(r\) and \(\theta\) do:</p>
\[\begin{aligned}
r &= \sqrt{z \bar{z}} \\
\theta &= - \frac{i}{2} \log \frac{z}{\bar{z}}
\end{aligned}\]
<hr />
<h2 id="2-holomorphic-functions">2. Holomorphic functions</h2>
<p>Perhaps we want to do calculus on complex numbers, and take derivatives of functions of \(z\). Being complex differentiable means that \(f(z)\) has a derivative that is itself a complex number: \((f_x, f_y) \in \bb{C}\) (when regarded as part of \(\bb{R}^2\)).</p>
<p>The <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Riemann_equations">Cauchy-Riemann equations</a> tell you when a complex function \(f(z) = u(x+iy) + i v(x + iy)\) is complex-differentiable:</p>
\[\begin{aligned}
u_x = v_y\\
u_y = - v_x
\end{aligned}\]
<p>In fact, the equations express the idea that \(f\) has no derivative with respect to \(\bar{z}\):</p>
\[\begin{aligned}
\p_{\bar{z}} f(z)
&= \frac{1}{2} (f_x + i f_y) \\
&\propto u_x + i v_x + i u_y - v_y \\
&= (u_x - v_y) + i (v_x + u_y) \\
&= 0 + i 0
\end{aligned}\]
<p>As long as \(f\) is continuous and this condition is true in a region \(D\), operations on \(f(z)\) essentially work like they would for one-variable functions in \(z\). For instance \(\p_z (z^n) = n z^{n-1}\).</p>
<p>While \(z\) seems like a 2-dimensional variable, there’s only one ‘degree of freedom’ in the derivative of a complex function. \(f'(z)\) has to be a simple complex number, which rotates and scales tangent vectors uniformly (a <a href="https://en.wikipedia.org/wiki/Conformal_map">conformal map</a>):</p>
\[f(z + dz) \approx f(z) + f'(z) dz = f(z) + re^{i\theta} dz\]
<p>Functions which are complex-differential at every point within a region are called <a href="https://en.wikipedia.org/wiki/Holomorphic_function">holomorphic</a> in that region for some reason. A function \(f(z)\) that is holomorphic (or ‘regular’?) in a region \(D\) is <em>extremely</em> well-behaved:</p>
<ul>
<li>\(f\) is <em>infinitely</em> complex-differentiable</li>
<li>and \(f\) is ‘complex analytic’, ie equal to its Taylor series in \(z\) throughout \(D\). The series around any particular point converges within the largest circular disk that stays within \(D\).</li>
<li>and \(f\) is locally invertible, ie \(f^{-1}(w + dw) \approx z + 1/f'(z) dw\) exists and is holomorphic in the neighborhood of \(z = f(w)\).</li>
<li>its antiderivatives exist, and its integrals along any closed contour vanishes: \(\oint_C f(z) dz = 0\).</li>
<li>the data of \(f\) in \(D\) is fully determined by its values on the boundary of the region, or on any one-dimensional curve within \(D\), or on some nontrivial subregion of \(D\).</li>
</ul>
<p>The general theme is that holomorphic/analytic functions generally act like one-dimensional functions and all of the calculus is really easy on them. This tends to be true much more than it is for 1d calculus.</p>
<p>If two analytic functions defined on different regions <em>agree</em> on an overlapping region, they are in a sense the ‘same function’. This lets you <a href="https://en.wikipedia.org/wiki/Analytic_continuation">analytically continue</a> a function by finding other functions which agree on a particular line or region. An easy case is to ‘glue together’ Taylor expansions around different points to go around a divergence.</p>
<p>Most 1d functions like \(e^x\) and \(\sin x\) have holomorphic complex versions like \(e^z\) and \(\sin z\) that are analytic everywhere. Discontinuous functions like \(\|z\|\) or \(\log z = i \theta \ln r\), or functions that include an explicit or implicity \(\bar{z}\) dependency, fail to be analytic somewhere.</p>
<p>Complex differentiability fails at singularities. We categorize the types:</p>
<ul>
<li><em>poles</em> of order \(n\), around which \(f(z) \sim 1/z^n\), which are ‘well-behaved’ singularities. Around these there’s a region where \(1/f\) is analytic. ‘Zeros’ and ‘poles’ are dual in the sense that \(f \sim z^n\) at zeroes and \(f \sim 1/z^n\) at poles.</li>
<li><em>removable singularities</em>: singularities that can be removed by redefinition, probably because they’re an indeterminate form. The canonical example is \(\sin(z)/z\) which is repaired by defining \(\sin(0)/0 = 1\). In a sense these are not singularities at all.</li>
<li><em>essential singularities</em>: singularities which oscillate infinitely rapidly near a point, such that they are in a sense too complicated to handle. \(\sin(1/z)\) or \(e^{1/z}\) are the canonical examples. They all look like this, oscillating infinitely: <a href="https://en.wikipedia.org/wiki/Picard_theorem">Great Picard’s Theorem</a> (what a name) says that near an essential singularity the function takes every value infinitely times, except possibly one.</li>
</ul>
<p>Poles are much more interesting than the other two.</p>
<hr />
<h2 id="3-residues">3. Residues</h2>
<p>No one would really care about complex analysis except for, well, <em>analysts</em>, were it not for one suspicious fact about the complex derivative:</p>
\[\p_{\bar{z}} \frac{1}{z} \neq 0\]
<p>Make sure you see that that’s a \(\bar{z}\)-derivative. For some reason, \(z^n\) for only \(n=-1\) has a certain kind of divergence at \(z=0\). It looks like a 2d <a href="https://en.wikipedia.org/wiki/Dirac_delta_function">delta <strike>function</strike> distribution</a>:</p>
\[\p_{\bar{z}} \frac{1}{z} = 2 \pi i \delta (z, \bar{z})\]
<p>This is intrinsically related to the fact that we’re doing calculus in 2d. In 3d a similar property holds for \(1/z^2\), and in 1d it’s \(\p_x \log x = \frac{1}{x} + i \pi \delta(x)\) that has the delta term.</p>
<p>This is equivalent to saying that the contour integral (integral on a closed path) of \(1/z\) around the origin is non-zero:</p>
\[\begin{aligned}
\oint \frac{1}{z} dz &= \oint \frac{e^{i \theta} dr + ir e^{i\theta} d \theta }{r e^{i \theta}} \\
&= \oint \frac{dr}{r} + i d \theta \\
&= 2 \pi i
\end{aligned}\]
<p>It’s clear why this non-zero contour only holds for \(z^{-1}\): for any other \(z^n\), the \(d \theta\) term is still a non-constant function of \(\theta\), so its values on each end cancel out. For \(n=-1\), though, the \(d \theta\) just counts the total change in angle.</p>
<p>The delta-function version follows from Stoke’s theorem. Since the contour integral gives the same value on any path as long as it circles \(z=0\), the divergence must be fully located at that point:</p>
\[\begin{aligned}
\oint_{\p D} \frac{1}{z} dz &= \iint_D d(\frac{dz}{z}) \\
2\pi i &= \iint_D \p_{\bar{z}} \frac{1}{z} d \bar{z} \^ dz \\
\p_{\bar{z}} \frac{1}{z} &\equiv 2 \pi i \delta(z, \bar{z})
\end{aligned}\]
<p>A function that is holomorphic except at a set of poles is called <em>meromorphic</em> (‘mero-‘ is <a href="https://www.etymonline.com/search?q=mero-">Greek</a>, meaning ‘part’ or ‘fraction’). If we integrate a meromorphic function around a region \(D\) the result only contains contributions from the \(\frac{1}{z}\) terms. Around each order-1 pole at \(z_k\), \(f(z_k) \sim f_{-1} \frac{1}{z_k} + f^{*}(z_k)\) where \(f^{*}\) has no \(z^{-1}\) term. The \(f_{-1}\) values at each pole are for some reason called <a href="https://en.wikipedia.org/wiki/Residue_theorem">residues</a>, and:</p>
\[\int_{\p D} f(z) dz = 2 \pi i \sum_{z_k} I(\p D, z_k) \text{Res} (f, z_k)\]
<p>Where \(I(\p D, z_k)\) gives the <a href="https://en.wikipedia.org/wiki/Winding_number">winding number</a> around the order-1 pole (+1 for single positive rotation, -1 for a single negative rotation, etc).</p>
<p>This makes integration of analytic functions around closed contours <em>really easy</em>; you can often just eyeball them:</p>
\[\oint_{\p D} \frac{1}{z-a} dz = 2\pi i 1_{a \in D}\]
<p>Multiplying and dividing powers of \((z-a)\) and then integrating around a curve containing \(a\) allows you to extract any term in the Taylor series of \(f(z)\) around \(a\):</p>
\[f_n = f^{(n)}(z)_{z=a} = \frac{n!}{2 \pi i} \oint f(z) (z-a)^{n-1} dz\]
<p>This is called <a href="https://en.wikipedia.org/wiki/Cauchy%27s_integral_formula">Cauchy’s Integral Theorem</a>. When negative terms are present the Taylor series is instead called a <a href="https://en.wikipedia.org/wiki/Laurent_series">Laurent Series</a>.</p>
\[\begin{aligned}
f(z) &\approx \sum f_n \frac{(z-a)^n}{n!} \\
&= \ldots + \frac{f_{-1}}{z} + f_0 + f_{1} z + f_2 \frac{z^2}{2!} + \ldots
\end{aligned}\]
<p>In particular the value at \(z=a\) is fully determined by the contour integral with \((z-a)^{- 1}\):</p>
\[f(a) = f_0 = \frac{1}{2 \pi i} \oint \frac{f(z)}{z-a} dz\]
<p>You can, of course, formulate this whole thing in terms of \(f(\bar{z})\) and \(\frac{dz}{\bar{z}}\) instead. If a function isn’t holomorphic in either \(z\) or \(\bar{z}\), you can still do regular \(\bb{R}^2\) calculus in two variables \(f(z, \bar{z})\), although I’m not sure how you would deal with poles.</p>
<p>There is a duality between zeroes and poles of meromorphic functions – in the region of a pole of a function \(f\), the function behaves like \(\frac{1}{g}\) where \(g\) is an analytic function. In general a meromorphic function can be written as \(f= \frac{h}{g}\) where \(g,h\) are analytic, with the zeroes of \(g\) corresponding to the poles of \(f\).</p>
<hr />
<h2 id="4-integral-tricks">4. Integral tricks</h2>
<p>Laurent series and the ‘calculus of Residues’ gives rise to a whole slew of integration tricks.</p>
<p>Closed integrals of a function with a Laurent series can be eyeballed using the Cauchy integral formula:</p>
\[\begin{aligned}
\oint_{r=1} \frac{1}{z(z-2)} dz &= \frac{1}{2} \oint_{r=1} \frac{1}{z-2} - \frac{1}{z} dz \\
&= 2 \pi i \frac{1}{2} (-1) \\
&= - \pi i
\end{aligned}\]
<p>Integrals along the real line \(\int_{-\infty}^{\infty}\) can often be computed by ‘closing the contour’ at \(+ r \infty\). This is especially easy if the integrand vanishes at \(r=\infty\), but also helps if it’s just easier to integrate there.</p>
\[\begin{aligned}
\int_{-\infty}^{\infty} \frac{1}{1 + x^2} dx &= \int_{r = -\infty}^{r = \infty} \frac{dz}{1 + z^2} + \int_{\theta=0, \, r=\infty}^{\theta=\pi, \, r=\infty} \frac{dz}{1 + z^2} \\
&= \oint \frac{1}{z - i} \frac{1}{z + i} dz \\
&= 2 \pi i \text{Res}(z=i, \frac{1}{z - i} \frac{1}{z + i}) \\
&= 2\pi i \frac{1}{2i} \\
&= \pi
\end{aligned}\]
<p>Here we closed the contour around the upper-half plane, upon which the integrand is \(0\) due to the \(r^2 \ra \infty\). One pole is the upper-half plane and one is in the lower. The winding number around the upper is \(+1\) and the residue is \(\frac{1}{z+i}\) evaluated at \(z=i\), or \(1/2i\). If we had used the lower half-plane the winding number would have been \(-1\) and the residue \(-1/2i\), so the result is independent of how we closed the contour. This method gives the answer very directly without having to remember that \(\int \frac{dx}{1 + x^2} = \tan^{-1} x\) or anything like that.</p>
<p>Note that this wouldn’t work if the pole was <em>on</em> the path of integration, as in \(\int_{-\infty}^{+\infty} \frac{1}{x} dx\). This integral is the <a href="https://en.wikipedia.org/wiki/Cauchy_principal_value">Cauchy Principal Value</a> and is in a sense an indefinite form like \(0/0\) whose value depends on the context. More on that another time.</p>
<p>Many other integrals are solvable by choosing contours that are amenable to integration. Often choices that keep \(r\) or \(\theta\) constant are easiest. See Wikipedia on <a href="https://en.wikipedia.org/wiki/Contour_integration">contour integration</a> for many examples.</p>
<hr />
<h2 id="5-topological-concerns">5. Topological concerns</h2>
<p>There are some tedious things you have to account for when considering functions of \(z\).</p>
<p>First, the \(\theta\) variable is discontinuous, since \(\theta = 0\) and \(\theta = 2\pi\) refer to the same point. This means that inverting a function of \(\theta\) will produce a <a href="https://en.wikipedia.org/wiki/Multivalued_function">multi-valued function</a>:</p>
\[\log e^{i \theta} = i \theta + 2 \pi i k_{\in \bb{Z}}\]
<p>Smoothly varying \(\theta = \int d \theta\) of course will just continue to tick up: \(2\pi, 4\pi, 6\pi\), etc. But the \(\log\) function itself appears to have a discontinuity of \(2\pi i\) at \(\theta = 0\).</p>
<p>When dealing with these multi-valued functions you can consider \(\theta = 0\) as a ‘branch point’ – a place where the function becomes multi-valued. But to be honest the whole theory of branch points isn’t very interesting if you aren’t a mathematician. I prefer to just think of all math being done modulo \(2 \pi\), or, if you need the discontinuity to count because you’re doing contour integrals, just get over the idea that functions can’t have multiple path-dependent values and don’t demand it have a unique inverse.</p>
<p>Another topological interest in \(\bb{C}\): if you ‘join together’ the points at infinity in every direction by defining a symbol \(\infty\) such that \(1/0 = \infty\), you get the “extended complex plane” or the <a href="https://en.wikipedia.org/wiki/Riemann_sphere">Riemann sphere</a>, since it is topologically shaped like a sphere. Most of the things that seem like they should be true involving \(\infty\) are true in this case. For example, the asymptotes of \(\frac{1}{z}\) on either size of \(\| z \| = 0\) really <em>do</em> connect at infinity and come back on the other side.</p>
<p>The Riemann sphere is topologically like a sphere, but acts like a <em>projective</em> plane, which is a bit unintuitive. (This corresponds rather to a half sphere where antipodal points are considered equivalent). Particularly, it kinda seems like \(+r\) and \(-r\) should be different points, rather than the ‘same’ infinity. There is probably a resolution to this using <a href="https://en.wikipedia.org/wiki/Oriented_projective_geometry">oriented projective geometry</a>, defining the back half of the sphere as a second copy of \(\bb{C}\) and conjoining the two at \(\infty e^{i \theta} \lra -\infty e^{i\theta}\), but that’s not worth pursuing further here.</p>
<p>Complex analytic functions map the Riemann sphere to itself in some way. For instance, \(z \mapsto \frac{1}{z}\) swaps \(0\) and \(\infty\) and the rest of the sphere comes along for a ride. Powers of \(z\) cause the mapping to be \(m:n\) – so \(z^2\) maps two copies of the sphere to one copy, while \(z^{1/2}\) maps one copy to two copies, hence becoming multi-valued. The <a href="https://en.wikipedia.org/wiki/M%C3%B6bius_transformation">möbius transformations</a>, functions of the form \(\frac{az + b}{cz + d}\) with \(ad-bc \neq 0\), are the invertible holomorphic transformations of the Riemann sphere. They comprise dilations, rotations, reflections, and inversaion of \(\bb{C}\).</p>
<hr />
<h2 id="6-convergence-concerns">6. Convergence concerns</h2>
<p>Although Laurent series capture the properties of complex analytic functions well, they still only work within a definite radius of convergence, given by the distance to the closest pole. Sometimes we can expand around other points to work around this. A common choice is expanding around \(z=\infty\), by creating a series in \(1/z\) instead:</p>
\[\frac{1}{1-z} = \begin{cases}
1 + z + z^2 + \ldots & \| z \| < 1 \\
-\frac{1}{z} - \frac{1}{z^2} - \frac{1}{z^3} - \ldots & \| z \| > 1
\end{cases}\]
<p>Amusingly, you can keep changing the point you expand around to ‘go around’ a pole, producing an analytic continuation outside the radius of the initial Taylor series.</p>
<p>Complex Taylor series diverge for the same reasons that real ones do, but the choices of radius make a lot more sense in complex analysis than they do in real analysis: they are the distance to the closest singularity (for instance, \(\frac{1}{1 + x^2}\) around \(x=0\) has radius of convergence \(r=1\) since there are poles at \(\pm i\)).</p>
<p>The simpelst way to show that a series converges is to show that the series still converges if \(z\) is replaced with \(r = \|z\|\), since</p>
\[f(z) = a_0 + a_1 z + a_2 z^2 + \ldots \leq f(r) = a_0 + a_1 r + a_2 r^2 + \ldots\]
<p>After all, the phases of the \(z\) terms can only serve to reduce the sums of the magnitudes.</p>
<p>We know that geometric series \(1 + x + x^2 + \ldots\) converge only if \(x < 1\). This means that \(\sum a_n r^n\) definitely converges if the terms look like \(\sqrt[n]{\| a_n r^n \|} = \sqrt[n]{\| a_n \| } r \lt 1\), which gives the <a href="https://en.wikipedia.org/wiki/Root_test">root test</a> for convergence:</p>
\[R = \frac{1}{\lim_{n \ra \infty} \sup \sqrt[n]{\| a_n \| }} > 0\]
<p>If \(R = 0\) the series still might converge (depending on what the phases of \(a_n\) do!); if \(r > R\) it definitely doesn’t. If \(R = \infty\) then the series converges for all finite \(\|z\|\) and is called an ‘entire’ function, which is a weird name.</p>
<p>The root test is the most powerful of the simple convergence tests, because it hits exactly on the property that \(\sum \| a_n \| r^n\) converges if it’s less than a geometric sum \(\sum x^n\). Other tests ‘undershoot’ this property; for instance the ratio test says that</p>
\[\| \frac{a_n r^n }{a_{n+1} r^{n+1}} \| = \| \frac{a_n}{a_{n+1}} \| \frac{1}{r} < \| \frac{x^n}{x^{n+1}} \| = 1\]
<p>This captures the idea that the series does converge if its successive ratios are less than that of a geometric series, but fails if the terms look like \(x + x + x^2 + x^2 + x^3 + x^3 + \ldots\) or something.</p>
<hr />
<h2 id="7-global-laurent-series">7. Global Laurent Series</h2>
<p>This is my own idea for making divergence of Laurent series more intuitive.</p>
<p>Laurent series coefficients are derivatives of the function evaluated at a particularly point, like \(f^{(n)}(z=0)\), such that a whole Laurent series is</p>
\[f(z) = \ldots + f^{(-2)}(0) \frac{2! }{z^2} - f^{(-1)}(0) \frac{1!}{z} + f(0) + f^{(1)} z + f^{(2)}(0) \frac{z^2}{2!} + \ldots\]
<p>Suppose that for some reason the Cauchy forms of computing derivatives and ‘inverse’ derivatives are the ‘correct’ way to compute these values:</p>
\[\begin{aligned}
f(0) &= \frac{1}{2\pi i} \oint_{C} \frac{f(z) dz}{z} \\
\frac{f^{(n)}(0)}{n!} &= \frac{1}{2\pi i} \oint_{C} \frac{f(z) dz}{z^{n+1}} \\
(-1)^n n! f^{(-n)}(0) &= \frac{1}{2\pi i}\oint_{C} z^{n-1} f(z) dz \\
\end{aligned}\]
<p>Where \(C\) is a circle of radius \(R\) around \(z=0\). Then some handwaving leads to an alternate characterization of divergent series. For most calculations \(\p_z f(0)\) is independent of the choice of \(C\), but for a function with a pole away from the origin, they are not. Consider \(f(z) = \frac{1}{1-z}\), and let \(C\) be the positively oriented circle of fixed radius \(R\). Then:</p>
\[\begin{aligned}
f_R(0) &= \frac{1}{2\pi i}\oint_{C} \frac{1}{(z)(1-z)} dz \\
&= \frac{1}{2\pi i}\oint_{C} \frac{1}{z} + \frac{1}{1-z} dz \\
&=\text{Res}_C (z=0, \frac{1}{z} - \frac{1}{z-1}) + \text{Res}_C (z=1, \frac{1}{z} - \frac{1}{z-1}) \\
&= 1 - H(R-1) \\
\end{aligned}\]
<p>Where \(H\) is a <a href="https://en.wikipedia.org/wiki/Heaviside_step_function">step function</a> \(H(x) = 1_{x > 0}\). The value of \(f(0)\) changes depending on the radius we ‘measure’ it at. The derivative and integral terms show the same effect, after computing some partial fractions:</p>
\[\begin{aligned}
f_R'(0) &= \frac{1}{2\pi i}\oint_{C} \frac{1}{(z^2)(1-z)} dz \\
&= \frac{1}{2\pi i}\oint_{C} \frac{1}{z} + \frac{1}{z^2} - \frac{1}{z-1} dz \\
&= 1 - H(R-1) \\
f^{(-1)}_R(0) &= \frac{1}{2\pi i}\oint_{C}\frac{- 1}{z-1} dz \\
&=-H(R-1)
\end{aligned}\]
<p>In total we get, using \(H(x) = 1 - H(-x)\):</p>
\[f^{(n)}_R(0) = \begin{cases}
H(1-R) & n \geq 0 \\
- H(R-1) & n < 0
\end{cases}\]
<p>Which gives the ‘real’ Laurent series as:</p>
\[\frac{1}{1-z} = - (\; \ldots + z^{-2} + z^{-1}) H(\|z\| - 1) + (1 + z + z^2 + \ldots) H(1 - \|z\|)\]
<p>The usual entirely-local calculations of \(f'(z)\), etc miss the ‘global’ property: that the derivative calculations fail to be valid beyond \(R=1\), and a whole different set of terms become non-zero, which correspond to expansion around \(z=\infty\).</p>
<p>Which if you ask me is very elegant, and very clearly shows why the radius of convergence of the conventional expansion around \(z=0\) is the distance to the closest pole. Of course it is a bit circular, because to get this we had to choose to use circles \(C\) to measure derivatives, but that’s ok.</p>
<hr />
<p>Anyway, in summary, please use \(\bb{R}^2\) instead of \(\bb{C}\) if you can. Thanks.</p>
The essence of quantum mechanics2020-07-24T00:00:00+00:00https://alexkritchevsky.com/2020/07/24/qm<p>Here’s what I know about QM. I’m trying to learn QFT and it helps to have the prerequisites compressed into the simplest possible representation. It also helps me to write everything down in a compressed form so I can reference it more easily.</p>
<p>This will make no sense if you don’t already have a good understanding of quantum mechanics.</p>
<p>Conventions: \(c = 1\), \(g_{\mu \nu} = diag(+, -, -, -)\). I like to write \(S_{\vec{x}}\) for \(\nabla S\).</p>
<!--more-->
<hr />
<h2 id="1-wavefunction-solutions">1. Wavefunction solutions</h2>
<p>QM makes a lot more sense to me if you (a) handle everything relativistically from the start and (b) just assume the form of the wave function solutions instead of deriving them. If I had my way I’d start a quantum mechanics course with special relativity, followed by introducing the scalar wave function, like this:</p>
<p>Consider a function on spacetime with the form \(\psi(t, \vec{x}) = e^{ i S(t, \vec{x})/\hbar}\) which assigns a complex phase to every point. It is fully determined by the <strong>action</strong> \(S(\vec{x}, t)\), and in particular given an initial state \(\psi_0\), is determined by the action gradient \(dS = S_{\mu} dx^\mu\). This lets us compare quantum states by integrating over some path \(\Gamma\):</p>
\[\psi(t, \vec{x}) = e^{i/\hbar \int_{\Gamma} dS} \psi_0\]
<p>Later on when potentials are involved we will need to be specific about the path of integration, but for now we can think of \(S\) as a scalar function that determines \(\psi\) everywhere.</p>
<p>Relativistic invariance insists that \(\psi\) have the same value in any reference frame, and \(- i \hbar \p \psi = - i \hbar (\p S) \psi = (S_t, S_{\vec{x}}) \psi\) must be a covariant 4-vector. Contraction with \(\bar{\psi}\) extracts the vector components: \(\< \psi \| {- i \hbar \p} \| \psi \> = \bar{\psi} (S_t, S_{\vec{x}}) \psi = (S_t, S_{\vec{x}})\). Finally, \(\| (S_t, S_{\vec{x}}) \| = \sqrt{S_{t}^2 - S_{\vec{x}}^2}\) must be a Lorentz-invariant scalar.</p>
<p>We call \(i \hbar \p_t = \hat{E}\) and \(- i \hbar \p_x = \hat{P}\) the <strong>energy</strong> and <strong>momentum</strong> operators. The quantum mechanical operators apparently extract properties of \(S\), but because \(S\) is packed inside an exponential, they extract them as eigenvalues: \(i \hbar \p_t \psi = - S_t \psi\). Our quantum-mechanical inner product and our operators are just <em>tools for extracting properties of \(S\)</em>, since \(\psi\) is the only thing we can directly operate on. When an equation like the Schrödinger equation contains a \(\hat{P} = - i \hbar \p_x\) operator, it’s just a skew way of writing the \(p_x\) value.</p>
<p>Since quantum mechanical measurements only happen through operators like these, the exact values of \(\psi\) up to a phase, and therefore \(S\) up to a constant, are not physically observable.</p>
<p>For a free massive spinless particle the action is \(S = - p_{\mu} x^{\mu} = \int -p_\mu dx^{\mu}\), where \(p\) is the four-momentum and \(\| p \| = m\), the rest mass. In the rest frame this is simply \(S = -m \tau = - \int m d\tau\). In the absence of a potential this gives the wave function:</p>
\[\psi(x) = e^{- i/\hbar \int_{0}^{x} p_\mu dx^\mu} \psi(0) = e^{- i/\hbar p_\mu x^\mu} \psi(0) = e^{i/\hbar ( \vec{p} \cdot \vec{x} - E t)} \psi(0)\]
<p>which is a Fourier component with momentum \(p_\mu\). Time evolution via exponentiation of the Hamiltonian amounts to translating in \(t\):</p>
\[\psi(t + \Delta t) = e^{i/\hbar \hat{H} \Delta t} \psi(t) = e^{\Delta t \p_t} \psi(t) = e^{i/\hbar (\vec{p} \cdot \vec{x} - E(t + \Delta t))} \psi_0\]
<p>(This uses the idea that exponentiating a differential operator translates in that coordinate: \(e^{a \p_x} f(x) = f(x+a)\).)</p>
<p>When an initial state is not a pure Fourier mode with a definite momentum, we expand it as a sum of modes. For instance, if at \(t=0\) we measure an electron at \(\vec{x} = 0\), then the initial state is</p>
\[\psi(0, \vec{x}) = \delta(\vec{x}) = \int e^{i \vec{p} \cdot \vec{x}} d \vec{p}\]
<p>When potentials are involved, \(dS\) is modified. The electromagnetic field, for instance, enters as \(p \mapsto p - i qA\), so \(dS = (p_{\mu} - i q A_{\mu}) dx^{\mu}\). Depending on the field configuration we may no longer be able to easily integrate \(\int dS\): if \(A\) includes a current, then it contains a ‘line’ of divergence, and the path integral’s result will depend on how many times \(\Gamma\) circles this divergence. This causes the path integral to give <em>different</em> values based on the choice of path. Summing over these paths, with appropriate weighting, corresponds in QFT to summing over the number of photons that are exchanged (I think. Will work it out in detail when I get to QFT).</p>
<hr />
<h2 id="2-correspondences">2. Correspondences</h2>
<p>Many concepts in quantum mechanics follow naturally from this foundation:</p>
<p><strong>Mass</strong>: For a free particle \(S_t = E\) and \(S_{\vec{x}} = \vec{p}\), and \(m = \sqrt{E^2 - p^2}\) is the relativistic rest energy/momentum relation. The wave function looks like \(\psi = e^{i/\hbar \int \vec{p} \cdot d\vec{x} - E dt} \psi_0\). A high energy/momentum corresponds to a rapidly changing action, and thus to a wave function that is <em>quickly rotating</em> as you translate in time or space. Ultimately, the mass \(m\) corresponds to the speed of phase rotation in a particle’s rest frame, and its energy and momentum are the results of Lorentz-transforming \(dS = - m d\tau\) into other frames.</p>
<p><strong>Path integration</strong>: Relative changes in \(S\) can be found by integrating: \(S(f) - S(i) = \int_{\Gamma} dS\) along any curve \(\Gamma\) from \(i\) to \(f\), and \(\psi(f) = e^{i/\hbar (S(f) - S(i))} \psi(i)\). Thus \(e^{i/\hbar (S(f) - S(i))}\) is the ‘transition matrix’ between any two states, along a given path. The total transition amplitude is a sum over all possible paths between two states. This extends handily to QFT’s path integrals when creation/annihilation of particles is included.</p>
<p><strong>The roles of \(\hbar\) and \(i\)</strong>: \(S \mapsto e^{iS / \hbar }\) is the conversion from ‘action’ space to ‘phase’ space. \(\hbar\) changes units from action (energy \(\times\) time) to radians; if we set \(\hbar = 1\) we are declaring that we measure action in radians. The resulting space after mapping by \(e^{iS}\) is physically meaningful, because in some cases we’ll end up summing these phase factors from multiple starting states and seeing interference patterns. I suspect that the output space is the \(U(1)\) that is identified with the electromagnetic gauge field but am not sure. If so, I think it would be good to write \(R_{EM}\) instead of \(i\), in order to avoid accidentally conflating the \(i\) factors from rotations in different spaces.</p>
<p><strong>Angular momentum</strong>: The orbital angular momentum operator, \(\hat{L}_z = -i \hbar \p_{\phi}\), does the same thing as \(\hat{P} = - i \hbar \p_{\vec{x}}\) but for a wave function in spherical coordinates. The azimuthal angle term looks like \(\psi \sim e^{i/\hbar (l_z \phi - E t)}\), and \(\hat{L}_z \psi = l_z \psi\). The azimuthal quantum number \(l_z\) (often written \(m\)) measures how many times \(\psi\) oscillates in a rotation of the polar angle \(\phi\); it is quantized precisely because the \(\phi\) coordinate has a built-in periodicity. A \(z\)-angular momentum value of \(l_z\) labels the number of periods the wave makes as you rotate \(\phi\) in the \(z\)-plane.</p>
<p><strong>Spin-\(\frac{1}{2}\)</strong>: If \(l_z = 1/2\), then \(\psi_{\pm} \sim e^{i/\hbar (\pm \frac{1}{2} \phi - E t)}\) acts like a spinor (by modeling the spin as orbital angular momentum, and omitting the \(r\) and \(\theta\) components). This function appears trivially unphysical, since it has different values at \(\phi = 0\) vs \(\phi = 2 \pi\). The resolution is the fact that it’s only meaningful to use the wave function to <em>compare</em> states that are connected by a path – and for a spinor it’s correct that \(\< \psi(\phi = 2 \pi) \| \psi(\phi = 0) \> = -1\). (This is a useful mental model but isn’t the full story. My next post will be a truly exhaustive exploration of spinors.) (Much later edit: this next post got very difficult for me to finish. Hopefully I can get back to it someday.)</p>
<p><strong>Spin-\(1\)</strong>: A <em>vector</em>-valued wave function \(\vec{\psi} = (\psi_x, \psi_y, \psi_z)\), where the terms transform according to physical rotations, is a spin-1 object. To consider its \(z\)-angular momentum we change bases to a <a href="Spherical_basis">spherical basis</a> (not to be confused with spherical coordinates):</p>
\[(\hat{x}, \hat{y}, \hat{z}) \ra (\frac{\hat{x} + i \hat{y}}{\sqrt{2}}, \hat{z}, \frac{\hat{x} - i \hat{y}}{\sqrt{2}})\]
<p>Or in cylindrical coordinates, using \(\hat{x} = (\cos \phi )\hat{\rho} - (\rho \sin \phi )\hat{\phi}\) and \(\hat{y} = (\sin \phi) \hat{\rho} + (\rho \cos \phi) \hat{\phi}\):</p>
\[= (\frac{e^{i \phi} (\hat{\rho}+ i \rho \hat{\phi})}{\sqrt{2}}, \hat{z}, \frac{e^{- i \phi} (\hat{\rho} - i \rho \hat{\phi})}{\sqrt{2}})\]
<p>The coordinates of \(\vec{\psi}\) in this basis are:</p>
\[(\psi_{+1}, \psi_0, \psi_{-1}) = (\frac{\psi_x - i \psi_y}{\sqrt{2}}, \psi_z, \frac{\psi_x + i \psi_y}{\sqrt{2}})\]
<p>In the new basis, the coordinate vectors have an explicit \(\phi\)-dependence, which captures the idea that any vector-valued function has an <em>intrinisic</em> \(\phi\)-derivative, independent of reference frame, just by virtue of being a vector. (This is kinda obvious in hindsight, but it took me forever to understand.)</p>
<p>So the components of a vector wave function \(\vec{\psi}\) locally looks like \(\psi_{s_z}(\phi, \rho, z, t) \sim e^{i \hbar (s_z + l_z) \phi } \psi_{s_z}(\rho, z, t)\), where \(l_z\) is its orbital angular momentum and \(s_z \in (+1, 0, -1)\) is a frame-independent contribution just from its vectorial nature. The \(0\) component of \(L_z\) spin corresponds to a vector-valued wave function pointing only in the \(z\) direction. \(\pm 1\) components correspond to having \(x\) or \(y\) components, with the sign determined by their relative phase.</p>
<p>Note what it means to have spin \(1\): it’s not that it fixes the <em>value</em> of the angular momentum; rather, it specifies the different ways that the angular momentum can transform under rotation. The three choices determine whether \(\vec{\psi}\) is in the \(z\) direction \((s_z = 0)\) or whether it has a positive or negative ‘rotational’ components in the \(xy\) plane (\(s_z = \pm 1\)). Particularly, having angular momentum \(s_z = +1\) means that the \(y\) component is advanced in phase compared to the \(x\) component. This is why the ‘ladder’ operator \(\hat{S}_+ = \hat{S}_x + i \hat{S}_y\) serves to increase the angular momentum, because it includes a factor of \(e^{i \phi}\):</p>
\[\hat{L}_+ = (\hat{L}_x + i \hat{L}_y) \sim e^{i \phi}\]
<aside class="toggleable" id="angular" placeholder="<b>Aside</b>: Angular momentum calculations <em>(click to expand)</em>">
<p>Here are some calculations I did to make sure I wasn’t lying through my teeth here:</p>
<p>The angular momentum operators are \(\vec{x} \^ \hat{P} = - i \hbar (y \p_z - z \p_y, z \p_x - x \p_z, x \p_y - y \p_x)\), giving:</p>
\[\begin{aligned}
\hat{L}_z \psi = l_z \psi &= - i \hbar (x \p_y - y \p_x) \psi = (x p_y - y p_x) \psi \\
\end{aligned}\]
<p>etc. Another form is \(\hat{L}_z = -i \hbar \p_{\phi}\):</p>
\[\begin{aligned}
\hat{L}_z \psi &= - i \hbar \p_{\phi} \psi \\
&= -i \hbar (x_{\phi} \p_x + y_{\phi} \p_y) \psi \\
&= -i \hbar (-y \p_x + x \p_y) \psi \\
&= (x \hat{P}_y - y \hat{P}_x) \psi
\end{aligned}\]
<p>Thus a function of the form \(\psi = e^{i l_z \phi /\hbar}\) has \(\hat{L}_z \psi = l_z \psi\).</p>
<p>The \(\hat{L}_x\) and \(\hat{L}_y\) operators have less-pleasant forms in spherical coordinates:</p>
\[\begin{aligned}
\hat{L}_x &= -i \hbar ({- \sin} (\phi) \p_{\theta} - \cot(\theta) \cos (\phi )\p_{\phi}) \\
\hat{L}_y &= -i \hbar (\cos (\phi) \p_{\theta} - \cot(\theta) \sin (\phi )\p_{\phi}) \\
\end{aligned}\]
<p>The failure of commutation, such as \([\hat{L}_x, \hat{L}_z] \neq 0\), comes from the fact that this adds \(\phi\)-dependencies that will affect the \(l_z\) value.</p>
<p>Now look at the raising operator \(L_+\):</p>
\[\begin{aligned}
L_{+} &= L_x + i L_y \\
&= -i \hbar ((-\sin \phi + i \cos \phi) \p_{\theta} - \cot(\theta) (\cos \phi + i \sin \phi) \p_{phi})\\
&= -i \hbar (e^{i \phi} )(i \p_{\theta} - \cot(\theta) \p_{\phi})
\end{aligned}\]
<p>Ignoring the coefficient this produces (I’m told it’s \(\hbar \sqrt{j(j+1) - l_z (l_z+1)}\).), the reason that it raises the \(l_z\) value is the inclusion of an \(e^{i \phi}\) term, giving \(e^{i \phi} e^{i l_z \phi} = e^{i (l_z + 1) \phi}\).</p>
<p>A constant vector function is given by (in somewhat more pleasant cylindrical coordinates \((\rho, \phi, z)\)):</p>
\[\begin{aligned}
\vec{\psi} &= \psi_x \hat{x} + \psi_y \hat{y} + \psi_z \hat{z} \\
&= \frac{1}{2} (\psi_x - i \psi_y)(\hat{x} + i \hat{y}) + \psi_z \hat{z} + \frac{1}{2} (\psi_x + i \psi_y) (\hat{x} - i \hat{y}) \\
&= \frac{1}{2} \psi_{+1} e^{i \phi} (\hat{\rho}+ i \rho \hat{\phi}) + \psi_0 \hat{z} + \frac{1}{2} \psi_{-1} e^{- i \phi} (\hat{\rho} - i \rho \hat{\phi})
\end{aligned}\]
<p>Clearly \(\hat{L}_z (\psi_{+1}, \psi_0, \psi_{-1}) = (+1 \psi_{+1}, 0 \psi_{0}, -1 \psi_{-1})\).</p>
</aside>
<p>By the way, photons are spin-1 particles, but cannot have the \(s_z = 0\) state for what I currently understand as ‘complicated technical reasons’. Roughly, it goes: because photons have no rest frame, the \(s_z = 0\) value is forbidden, as that would imply that there is a choice of \(z\) around which a photon wave function is symmetric. The remaining \(s_z = \pm 1\) states correspond to photon polarizations.</p>
<p><strong>The Schrödinger Equation</strong>: We can write \(S_t^2 - S_{\vec{x}}^2 = m^2\) as \(S_t = \sqrt{m^2 + S_x^2} = m \sqrt{1 + \frac{S_x^2}{m^2}}\) and expand as a Taylor series (note that \(\| S_x/m \| = \| p / m \| \ll 1\)) to get:</p>
\[S_t = m (1 + \frac{1}{2} \frac{S_x^2}{m^2} + O((\frac{S_x^2}{m^2})^2) \approx m + \frac{S_x^2}{2m}\]
<p>Using our operator forms we get the free-particle Schrödinger equation:</p>
\[\hat{E} \psi \approx (m + \hat{P}^2/2m) \psi\]
<p>Interpreting, this says that the time-derivative of the action is a constant (the mass) plus a term proportional to the kinetic energy, plus higher-order terms that vanish at low momentums.</p>
<p>The initial \(m\) term is normally ignored in non-relativistic QM. It corresponds to a constant change in phase along any path (and adds a constant term to the Lagrangian), but it drops out of any calculation if you (a) only integrate over time and (b) never create/annihilate particles.</p>
<p><strong>Schrödinger with potential</strong>: The \(V\) term in the non-relativistic Schrödinger ends up next to the kinetic energy term: \(\hat{E} \psi \approx (m + \hat{P}^2/2m + V) \psi\). Working backwards through the derivation, we figure that the constraint on \(S\) must be: \(S_t - V = \sqrt{m^2 + S_x^2}\). But there is no particular reason this would have a clean relativistic form, since we treat our potential non-relativistically anyway.</p>
<p>Nevetheless we can add to our interpretation: the role of a classical scalar potential \(V\) is to modify the phase change as a wave function translates in time, such that the particle acts like it has energy \(E - V\) instead of \(E\). The role of a vector potential is to modify the momentum, \(\vec{p} \mapsto \vec{p} - \vec{A}\).</p>
<p>The electromagnetic field uses the 4-potential \(q A = q (\phi, \vec{A})\). The electromagnetic wave function is something like \(\psi = e^{i/\hbar \int (\vec{p} - q \vec{A}) \cdot d\vec{x} - (E - q \phi)]dt } = e^{i/\hbar \int (p_{\mu} - q A_{\mu}) dx^{\mu}}\).</p>
<p><strong>Covariant Derivatives</strong>:</p>
<p>Given the electromagnetic wave function of the form \(\psi = e^{i/\hbar \int (p_{\mu} - q A_{\mu}) dx^{\mu}}\), we can extract the \(p_{\mu}\) term with a more involved derivative operator, the ‘covariant derivative’ \(D_{\mu} = \p_{\mu} + i q A_{\mu}\), or equivalently, modifying the moment operator to be \(\hat{P}_{\mu} = \hat{p}_{\mu} + \hbar q A_{\mu}\):</p>
\[- i \hbar D_{\mu} \psi = - i \hbar (\p_{\mu} + i q A_{\mu}) \psi = p_{\mu} \psi\]
<p>This derivative manages to extract the \(p_{\mu}\) term by itself by subtracting off the \(qA\) contribution.</p>
<p><strong>Gauge transformations</strong>:</p>
<p>Since physics is determined by an action integral like \(\int( p_\mu - q A_\mu )dx^\mu\), any integrable (exact) form (some \(\Lambda\) with \(d \Lambda = 0\)) can be added to \(A\) and will only affect the action in a path-independent way:</p>
\[\int_i^f (p_\mu - q A_\mu + \Lambda_\mu) dx^\mu= P_i^f - \Lambda_i^f - q \int_i^f A_\mu dx^\mu\]
<p>The covariant derivative is so called because it produces a derivative operator, and thus a momentum operator, which respects this gauge-freedom by removing any explicit dependence on the value of \(A\). In my opinion, though, this is a very roundabout way to reach the conclusion: the explicit purpose of \(\hat{P}\) is to extract the value of \(p\), which is ultimately the thing that must obey \(p_{\mu} p^{\mu} = m^2\); the specific method of removing the gauge freedom is an implementation detail.</p>
<p>This performs a gauge transform that doesn’t affect the relative amplitudes of different paths between \(i \ra f\) – only the resulting phase. As such there is no way to observe this effect in a closed system, so the addition of \(d \Lambda\) is a free variable in the theory. However, it turns out to be important when considering interacting systems, in ways that I haven’t learned yet but will be essential in QFT.</p>
<p><strong>The Lagrangian</strong>: The integral \(\Delta S = \int dS\) can be parameterized by time as</p>
\[\Delta S = \int (S_{\vec{x}} \cdot d\vec{x}/dt - S_t) dt = \int L \, dt\]
<p>\(L = dS / dt\) is the source of the (single-particle) Lagrangian, and is where the elementary form \(L = T - V\) comes from. For a free particle, \(L dt = -m d\tau\), and \(\Delta S = - \int m d \tau\). In a classical scalar potential with \(S_t = E = T + V\):</p>
\[L = S_{\vec{x}} \cdot d\vec{x}/dt - S_t = \vec{p} \cdot \vec{v} - E\]
<p>In classical mechanics often \(E = T + V\) and \(\vec{p} \cdot \vec{v} = 2 T\), giving</p>
\[L = 2 T - (T + V) = T - V\]
<p>Regardless of how we parameterize \(S = \int dS\), applying stationary-action will give the classical trajectory. Feynman’s classic explanation of this is that all but the ‘stationary’ path – the choice of \(\Gamma\) such that \(\delta S / \delta \Gamma \vert_{\Gamma} = 0\) – will exhibit destructive interference in the macroscopic limit, resulting in the laws of classical physics. Quantitatively, this means that in the classical limit as \(\hbar \ra 0\), the path integral is dominated by the minimal path:</p>
\[\begin{aligned}
\lim_{\hbar \ra 0} \int d\Gamma e^{i S[\Gamma] /\hbar}
&= \lim_{\hbar \ra 0} \int d (\Delta \Gamma) e^{i S[\Gamma_{\text{min}} + \Delta \Gamma] /\hbar} \\
&\sim \lim_{\hbar \ra 0} e^{i S[\Gamma_{\text{min}}] /\hbar }
\end{aligned}\]
<p>I don’t exactly know how to make that rigorous but it makes heuristic sense: as \(\hbar \ra 0\) the function oscillates infinitely rapidly, cancelling itself out in the integral over \(\Delta \Gamma\), but at least the minimal path, where \(\delta S / \delta \Gamma = 0\), oscillates less than the rest do. We could imagine expanding \(S\) as a Taylor series \(S = S[\Gamma_{\text{min}}] + (\delta S / \delta \Gamma) \delta \Gamma + \ldots\), but I really don’t know if that’s allowed.</p>
<p><strong>Noether’s Theorem</strong>: Suppose there is some dynamical variable \(q\) that \(S\) depends on. Then we can locally approximate \(\Delta S = S(q, \ldots) + S_q \Delta q\), adding a phase to the wave function \(\psi \ra e^{i S_q \Delta q /\hbar} \psi\). This leaves physics unchanged if and only if \(S_q\) is a constant, such that this is a uniform global phase transformation.</p>
<p>But if \(q\) is a physical symmetry of the system, then it <em>must</em> lead to the same physics; therefore \(S_q\) is a constant throughout the system’s evolution (gauge fields notwithstanding). \(S_q\) is called the ‘Noether charge’ corresponding to the \(q\) symmetry. \(E\) is the charge associated with \(t\); \(\vec{p}\) for \(\vec{x}\), \(\vec{L}\) for \(\vec{\theta}\), etc.</p>
<hr />
<h2 id="summary">Summary</h2>
<ol>
<li>QM is easier to follow if you start from the fact that the wave function has the form \(\psi = e^{i S/\hbar}\).</li>
<li>Operators and inner products are ways to extract properties of \(S\).</li>
<li>The Schrödinger equation for a free particle is a low-energy approximation of the statement that \(\| \p S \| = m\).</li>
<li>The only free physical quantity in a wave function is the 4-vector \(\p S\), which measures which part of the variation in \(S\) is in the spatial vs timelike direction.</li>
<li>Potentials enter by modifying \(\p S\), eg \(\p S \mapsto \p S - i q A\). \(\int_i^f dS = S(f) - S(i)\) may no longer hold depending on the properties of \(A\).</li>
<li>Intrinsic angular momentum is a property of what kind of object the wave function’s value is (scalar, vector, spinor, etc).</li>
</ol>
<p>Normally you have to unlearn QM to learn relativistic QM, but the relativistic version makes much more sense in the first place so why not start there?</p>
<hr />
<p>Next up, spinors.</p>
<p>Much-later edit: spinors were harder than I thought :(</p>
A possible derivation of the Born Rule?2019-12-22T00:00:00+00:00https://alexkritchevsky.com/2019/12/22/many-worlds<p>I think that the Many-Worlds Interpretation (MWI) of quantum mechanics is probably ‘correct’. There is no reason to think that the rules of atomic phenomena would stop applying at larger scales when an experimenter becomes entangled with their experiment.</p>
<p>However, MWI has the problem (shared with all the other mainstream interpretations of QM) that it does not explain why quantum randomness leads to the probabilities that we observe. The so-called <a href="https://en.wikipedia.org/wiki/Born_rule">Born Rule</a> says that if a system is in a state \(\alpha \| 0 \> + \beta \| 1 \>\), upon ‘measurement’ (in which we entangle with one or the other outcome), we measure the eigenvalue associated with the state \(\| 0 \>\) with probability</p>
\[P[0] = \| \alpha \|^2\]
<p>The Born Rule is normally included as an additional postulate in MWI, and this is somewhat unsatisfying. Or at least, it is apparently difficult to justify, given that I’ve read a bunch of attempts, each of which talks about how there haven’t been any other satisfactory attempts. I think it would be unobjectionable to say that there is not a consensus on how to motivate the Born Rule from MWI without any other assumptions.</p>
<p>Anyway here’s an argument I found that I find somewhat compelling. It argues that the Born Rule can emerge from interference if you assume that every <em>measurement</em> of a probability that you’re exposed to (which I guess is a Many-Worlds-ish idea) is assigned a random, uncorrelated phase.</p>
<!--more-->
<hr />
<h2 id="1-classical-measurements-of-probability">1. Classical measurements of probability</h2>
<p>First let’s discuss a toy example of ‘measuring a probability’ in a non-quantum experiment. Suppose we’re flipping a biased coin that gets heads with probability \(P[H] = p\) and \(P[T] = q = 1 - p\). We’ll write it in a notation suggestive of quantum mechanics.: let’s call its states \(\| H \>\) and \(\| T \>\), so the results of a coin flip are written as \(p \| H \> + q \| T \>\) with \(p + q= 1\). Upon \(n\) iterations of classical coin-flipping we end up in state</p>
\[(p \| H \> + q \| T \>)^n = \sum_k {n \choose k} p^k q^{n-k} \| H^k T^{n-k} \>\]
<p>Where \(\| H^k T^{n-k} \>\) means a state in which we have observed \(k\) heads and \(n-k\) tails in any order.</p>
<p>Now suppose this whole experiment is being performed by an experimenter who’s trapped in a box or something. The experimenter does the experiment, writes down what they think the probability of heads is, and then transmits <em>only that</em> to us, outside of the box. So the only value we end up seeing is the value of their <em>measurement</em> of \(P[H] = p\), which we’ll call \(\hat{P}[H]\). The best estimate that the experimenter can give, of course, is their observed frequency \(\frac{k}{n}\), so we might say that the resulting system’s states are now identified by the probability perceived by the experimenter:</p>
\[(p \| H \> + q \| T \>)^n = \sum_k {n \choose k} p^k q^{n-k} \| \hat{P}[H] = k/n\>\]
<p>If you let \(n\) get very large, the states near where \(\hat{P}[H] = p\) will end up having the highest-magnitude amplitude, and so we expect to end up in a ‘universe’ where the measurement of the probability \(p\) converges on the true value of \(p\). This is easily seen, because for large \(n\) the binomial distribution \(B(n, p, q)\) converges to a normal distribution \(\mathcal{N}(np, npq)\) with mean \(np\). So, asymptotically, the state \(\| \hat{P}[H] = \frac{np}{n} = p \>\) becomes increasingly high-amplitude relative to all of the others. This is a way of phrasing the law of large numbers.</p>
<p>I think this is as good an explanation as any as to what probability ‘is’. Instead of trying to figure out what it means for <em>us</em> to experience an infinite number of events and observe a probability, let’s just ask an experimenter who’s locking in a box to figure it out for us, and then just have them send us their results! Unsurprisingly, the experimenter does a good job of recovering classical probability.</p>
<hr />
<h2 id="2-the-quantum-version">2. The quantum version</h2>
<p>Now let’s try it for a qubit (a ‘quantum coin’). The individual experiment runs are now given by \(\alpha \| 0 \> + \beta \| 1 \>\) where \(\alpha, \beta\) are probability amplitudes with \(\| \alpha \|^2 + \| \beta \|^2 = 1\). Note that normalizing these to sum to 1 is just for convenience and doesn’t predetermine the probabilities – if you don’t normalize now, you just have to divide through by the normalization later instead.</p>
<p>As before we have our experimenter perform \(n\) individual measurements of the qubit and report the results to us:</p>
\[(\alpha \| 0 \> + \beta \| 1 \>)^n\]
<p>Where are things going to go differently? If we imagine our experimenter as a standalone quantum system, it seems like their measurements may pick up their own phases and possibly interfere with each other. That is, a single \(\| P = \frac{k}{n} \>\) macrostate, consisting of all the different ways they could have gotten \(k\) \(\| 1 \>\)s out of \(n\) measurements, will consist of many different ‘worlds’ that may end up with different phases themselves, and there is no reason to think that they will add up neatly. I’m not totally sure this is reasonable, but it leads to an interesting result, so let’s assume it is.</p>
<p>For an example, consider the \(n=2\) case. We’ll let each \(\| 0 \>\) state have a different phase \(\alpha_j = \| \alpha \| e^{i \theta_j}\). (We can ignore the \(\| 1 \>\) phase without loss of generality by treating it as an overall coefficient to the entire wave function.)</p>
<p>The state we generate will be:</p>
\[\begin{aligned}
&(\alpha_1 \| 0 \> + \beta \| 1 \>) (\alpha_2 \| 0 \> + \beta \| 1 \>) \\
&= \alpha_1 \alpha_2 \| 0 0 \> + \alpha_1 \beta \| 0 1 \> + \beta \alpha_2 \| 1 0 \> + \beta^2 \| 1 1 \> \\
\end{aligned}\]
<p>This is no longer a clean binomial distribution. Writing \(a = \| \alpha \|\) and \(b = \| \beta \|\) for clarity, the two-iteration wave function is:</p>
\[= e^{i (\theta_1 + \theta_2) } a^2 \| 0^2 \> + ab (e^{i \theta_1} + e^{i \theta_2}) \| 0^1 1^1 \> + b^2 \| 1^2 \>\]
<p>Note that \(ab (e^{i \theta_1} + e^{i \theta_2}) \| 0^1 1^1 \>\) only has the same magnitude as \(2ab \| 0^1 1^1 \>\), the classical value, when \(\theta_1 = \theta_2\).</p>
<p>This suggests that, if the experimenter’s different experiment outcomes can randomly interfere with each other as quantum states, then the probability of their reporting \(\| 0^1 1^1 \>\) will be suppressed compared to \(\| 0^2 \>\) or \(\| 1^2 \>\).</p>
<hr />
<h2 id="3-random-walks-in-state-space">3. Random Walks in State Space</h2>
<p>Now we consider what this looks like as \(n \ra \infty\).</p>
<p>For a state with \(k\) \(\alpha\| 0 \>\) terms, we end up with a sum of exponentials with \(k\) phases in them:</p>
\[E_{k, n} = \sum_{J \in S_{k,n}} e^{i \sum_{j \in J} \theta_j}\]
<p>Here \(S_{k,n}\) is the set of \(k\)-element subsets of \(n\) elements. For instance if \(k=2, n=3\):</p>
\[E_{2, 3} = e^{i(\theta_1 + \theta_2)} + e^{i(\theta_2 + \theta_3)} + e^{i(\theta_1 + \theta_3)}\]
<p>Our wave function for \(n\) iterations of the experiment is given by</p>
\[\psi = \sum_k a^k b^{n-k} E_{k, n} \| 0^k 1^{n-k} \> = \sum_k a^k b^{n-k} E_{k, n} \| \hat{P}[0] = \frac{k}{n} \>\]
<p>The classical version of this is a binomial distribution because \(E_{k, n}\) is replaced with \({n \choose k}\). The quantum version observes some cancellation. We want to know: as \(n \ra \infty\), what value of \(k\) dominates?</p>
<p>We don’t know anything the phases themselves, so we’ll treat them as classical independent random variables (which turns out to be the key assumption here). This means that \(\bb{E}[e^{i \theta}] = 0\) and therefore \(\bb{E}[E_{k, n}] = 0\) for all \(k\). But the expected <em>magnitude</em> is not 0. The sum of all of these random vectors forms a random walk in the complex plane, and the expected amplitude of a random walk is <a href="http://mathworld.wolfram.com/RandomWalk2-Dimensional.html">given</a> by \(\bb{E}[ \| E_{1, n} \|^2 ] = n\).</p>
<p>Brief derivation: this comes from the fact that</p>
\[\begin{aligned}
\bb{E}[ \| E_{1, n} \|^2 ] &= \bb{E} [ \sum_i e^{- i \theta_i} \| \sum_j e^{i \theta_j} ] \\
&= \bb{E} \sum_i \| e^{i \theta_i} \|^2 + \bb{E} \sum_{i \neq j} e^{- i \theta_i} e^{i \theta_j} \\
&= n \bb{E}[1] + \sum_{i \neq j} \bb{E}[e^{i (\theta_i - \theta_j)}] \\
&= n
\end{aligned}\]
<p>This means that the magnitude of the \(k=1\) term for our quantum coin is proportional to \(\sqrt{n}\), rather than the classical value of \(n\).</p>
<p>For \(k > 1\), the same argument applies (it’s still basically a random walk), except that there are \({ n \choose k }\) terms in the sum, so in every case we get an expected amplitude \(\bb{E} [ \| E_{k, n} \|^2 ] = { n \choose k }\). This makes the resulting experimenter wave function look like:</p>
\[\begin{aligned}
(e^{\hat{\theta} i} \alpha \| 0 \> + \beta \| 1 \>)^n
&\sim \sum_{k =0}^n \sqrt{ n \choose k } a^k b^{n-k} \| 0^k 1^{n-k} \text{ in some order }\> \\
&\sim \sum_{k =0}^n \sqrt{ n \choose k } a^k b^{n-k} \| \hat{P}[ 0 ] = k/n \>
\end{aligned}\]
<p>(This is not an equality because it still depends on a classical random variable \(\hat{\theta}\). But it produces the correct expected magnitudes for each term, which is what we care about.)</p>
<hr />
<h2 id="4-the-born-rule">4. The Born Rule</h2>
<p>After running \(n\) experiments in their box, our experimenter tell us a number, their perceived value of \(P[\| 0 \>]\). As \(n \ra \infty\) the highest-amplitude state will dominate. For that, we only need to compute the value of \(k\) at the peak amplitude, and we can find that using \(\| \psi \|^2\), which is easy to work with:</p>
\[\| \psi \|^2 \sim \sum {n \choose k} (a^2)^k (b^2)^{n-k}\]
<p>This is a binomial distribution \(B(n, a^2, b^2) = B(n, \|\alpha\|^2, \| \beta \|^2)\), which asymptotically looks like a normal distribution \(\mathcal{N}(n \| \alpha \|^2, n \| \alpha \|^2 \| \beta \|^2)\) with maximum \(k = n \| \alpha \|^2\), which means that the highest-amplitude state measures is:</p>
\[\begin{aligned}
\| \hat{P}[0]= \frac{n \| \alpha \|^2}{n} \> = \| \hat{P}[0] = \| \alpha \|^2 \>
\end{aligned}\]
<p>Thus we conclude that the observed probability of measuring \(\| 0 \>\) when interacting with a system in state \(\alpha \| 0 \> + \beta \| 1 \>\) is centered around \(\| \alpha \|^2\), as reported by an experimenter in a box who runs the measurement many times and reports their measurement of the probability afterwards. And that’s the Born Rule.</p>
<p>Ultimately this follows from postulating that many different ways of seeing the same result interfere with each other, suppressing the amplitudes of seeing less uniform results by a factor of the square root of their multiplicity.</p>
<p>So that’s interesting. I find the argument to be suspiciously clean, and therefore compelling.</p>
<p>As far as I can tell it also works in generalizations of the same setup:</p>
<ul>
<li>to distributions with more than two possible values.</li>
<li>to ‘nested’ experiments, where you find out the value of a measurement from multipler measures who each got it from multiple experimenters. In this case all of the measurers are able to interfere with each other, from your perspective, so it gets flattened out to a single level of interference.</li>
<li>if the amplitues aren’t normalized to begin with. If \(\|\alpha \|^2 + \| \beta \|^2 \neq 1\) the resulting asymptotic normal distribution will just end up having mean \(\frac{n \| \alpha \|^2}{\| \alpha \|^2 + \| \beta \|^2}\).</li>
</ul>
<p>I’m not sure I’ve correctly identified what might actually lead to the random interference in this experiment. Is it the experimental apparatus interfering with itself? Is it hidden degrees of freedom in the experiment itself? Or maybe it’s all of reality, from the point of view of an observer trying to make sense of all historical evidence for the Born Rule. And it’s unclear to me how carefully isolated an experiment would have to be for different orderings of its results to interfere with each other. Presumably the answer is “a lot”, but what if it isn’t?</p>
<p>If this is actually how nature works, I wonder if it’s detectable somehow. What if you could isolate a particular experiment so much that you could suppress the interference of histories. Can you get the probabilities to become proportional to \(\| \alpha \|\)? Or maybe there is some measurable difference between the distribution of probabilities resulting from a random walk, compared to the normal distribution in classical probability? After all a “squared normal distribution” seems like it would fall off faster than a regular one.</p>
<p>Suffice to say I would love to know a) what’s wrong with this argument, or b) if it exists or has been debunked in the literature somewhere, cause I haven’t found anything (although admittedly I didn’t look very hard).</p>