Alex Kritchevsky2022-07-09T14:34:42+00:00http://alexkritchevsky.com/blogAlex Kritchevskyalex.kritchevsky@gmail.comOutcrop.py2022-07-03T00:00:00+00:00https://alexkritchevsky.com/2022/07/03/outcrop<p>I wrote a <a href="https://github.com/ajakaja/outcrop">little Python script</a> that does something useful: it takes an image file and pads it with white space (or whatever color space) to make it have a certain aspect ratio. <!--more--> Basically it turns this:</p>
<p><img src="/assets/posts/2022-07-03/fjord-small.jpg" width="450px" style="" /></p>
<p>into this (which is 8x10):</p>
<p><img src="/assets/posts/2022-07-03/fjord-8x10-small.jpg" width="450px" style="border: 1px dashed #333333" /></p>
<p>This came about because I was trying to print some photos on photo-printing websites and was extremely annoyed that they (Shutterly, NationsPhotoLab, etc) will happily try to print a photo in any aspect ratio you choose, but will mindlessly crop the photo to that size and refuse to let you fit the whole thing in the frame. It’s a perfectly reasonably request to print an 8x8 photo on 8x10 paper with blank space around it! They need to get it together. (<a href="https://www.nytimes.com/wirecutter/reviews/best-online-photo-printing/">This Wirecutter article</a> even complains about the same thing, sheesh.)</p>
<p>I mention it here because: even though I write code every day for work, I only do things for personal utility, like, once a year, maybe? And it feels stupid and tragic to have a craft skill and barely use it — so when I actually do use it, even if it’s somewhat trivial, maybe it’s worth making a note of. Maybe it will be useful to somebody else, someday.</p>
<p>(In this day and age, of infinite social media posts showing crafty people doing clever things, things that you or I imagine we could be doing every day if only we had the — will? patience? quality? — it is easy to feel bad, all the time, about the days we are wasting and the skills we are squandering.</p>
<p>Indeed, I made this blog for myself basically in order to ‘save some work’, to stave off the feeling that I had produced nothing in my free time, at a time when I was spending each day dredging up obscure math papers and obsessively musing over leftover confusions from undergraduate physics.</p>
<p>Well! The best psychological move is surely to not worry about what we have and haven’t done with ourselves — and not feeling anxiety about failing to do something is, paradoxically, the best way to do end up doing it — but another not-so-bad move is to do a few things, however small, and let ourselves feel good about them. Right?)</p>
Meditation on Software 12021-10-31T00:00:00+00:00https://alexkritchevsky.com/2021/10/31/software-1<p>A rant, about how something is very wrong with how we write code.</p>
<!--more-->
<hr />
<h2 id="1">1</h2>
<p>For instance:</p>
<p>It takes an unreasonable amount of effort to do anything with software, and we still don’t do anything particularly well. How many millions of person-hours are spent every year on fixing bugs, and understanding confusing code?</p>
<p>And every significantly-sized project has enough complexity to require a team of specialists to support it, and still there is no useful overlap with the complexity of any other project.</p>
<p>And every company and organization is duplicating each other’s work because almost none of the solutions can be shared.</p>
<p>And every advance in hardware efficiency is canceled out by inefficiencies in software, so that everything is barely performant enough, all the time, and gobbles up any energy we have available to give it.</p>
<p>If you work in the industry this should all be obvious. If you write code and don’t feel like most of your time is wasted dealing with problems that shouldn’t exist in the first place… I don’t know, wake up?</p>
<p>It is all <em>working</em> – kinda – in that humanity is churning out software and solving problems and making money. But it can’t be <em>ideal</em>. This human race of ours is spending too much of its human effort to make software that doesn’t work very well and doesn’t do very much, slowly.</p>
<p>I like to fantasize about how to do better.</p>
<hr />
<h2 id="2">2</h2>
<p>Really it is that there is something <em>medieval</em> about how we write code. We are still in the software dark ages, like mathematics before algebra and calculus were discovered. Or perhaps thought, before philosophy. The way software is written in five hundred years – if we haven’t run out of breathable air or microchips or whatever – will, I expect, be mostly unrecognizable compared to how it’s done today, and at best we are, as a species, 20% of the way along that path. (My guess is that we’re at like 15% in most places and then React.js pushes the number to 20%. More on that another day.)</p>
<p>Here’s a test for assessing how good humanity is at writing software:</p>
<p>Suppose that humanity builds out a spaceship of 1000 colonists is traveling to another star system, many light-years away. And suppose this ship has to be totally self-sufficient, including having the ability to support all of its software, fixing bugs and improvising solutions to whatever comes up on the journey, and likely building out whatever is needed once they get there. Can the colonists confidently expect to be able to handle whatever software challenges comes up?</p>
<p>The answer is: hah! definitely ‘no’. They will all die. RIP.</p>
<p>The answer needs to be ‘yes’ if we are to colonize other star systems. I don’t want to ship off to another star system only to die partway of an unfixable bug in the life-support system. There is no way you can fill out the roster of the ship with expert software engineers, and there is no way a roster of non-experts, even if they are geniuses in other fields, can be expected to understand even one part of the ship end-to-end.</p>
<p>So we have work to do. It’s probably <em>possible</em>, but it will take some serious advances to get there.</p>
<hr />
<h2 id="3">3</h2>
<p>An analogy can be made to mechanical engineering. I don’t really know how my (gas-powered) car works. But if I open the hood and look at the engine, I feel like I have at least a hope of figuring it out. Apart from… the electronics… I can clearly tell which parts interact with which other parts, and approximately what they do to each other.</p>
<p>Presumably if I take those parts apart I can tell how they work, approximately, internally, although I may not be able to put them back together again, or machine new parts of the same quality, without a lot of specialization. But the fact I can make progress at all is valuable. If I took a long road trip away from civilization with just a box of tools and spare parts I have at least a <em>hope</em> of handling whatever comes up.</p>
<p>The difference, I think, is that physical machines are constrained by fundamental requirements of <em>causality</em>. For a widget to affect a gizmo, it has to, like, <em>touch</em> it, and there has to be some motive force between the two, which I can view and manipulate myself. Its physical interaction affords it a property of <em>scrutability</em> that allows me to make progress on understanding it. And if you take the widget apart, its internal components have the same property. Of course this falls apart when chemistry gets involved; you actually do need some specialized knowledge to make sense of, say, the actual combustion process. But for mechanical systems, it’s a start.</p>
<p>Software today has no such property. It works exactly how it works, and good luck figuring it out from the outside. It is stable and functional like a spinning top balanced on the tip of a needle.</p>
<p>The best Scrutability I know of is web browser DevTools, which let you view any website and see how it’s organized and styled… but it barely counts. I hope that someday, figuring out how any software works is as natural as taking apart and tinkering with an old car engine.</p>
<p>As should be clear from this comparison, it’s not enough that software is open-source (although that’s a start). It must also be conceptualized and built in a way that makes the causation Scrutable. It needs to be split into Scrutable modules that ‘push’ and ‘pull’ on each other in a way that we can follow. Most importantly, it needs to be constructed in such a way that allows for the digital equivalent of ‘opening up the hood and looking inside’, and we need to have the tools at hand to do so.</p>
<hr />
<h2 id="4">4</h2>
<p>I don’t think any of what I’m looking for exists today, outside of, perhaps, one-off proprietary solutions. But if I had to throw out some ideas, here’s where I think progress is happening:</p>
<p>The most Scrutable system I know of is viewing a website in Chrome Devtools, except for the fact that it doesn’t let you write code (or really search for it, or really modify anything in a way that doesn’t get reversed the next time a callback is triggered). But it does something the rest of them don’t, which is let you record every piece of code that’s run on a page and inspect it to see what happened. Nevermind that this process is janky and error-prone, and obfuscated, and the backend is invisible, and recording execution via the Performance tab is a mindfuck– at least it <em>exists</em>. There is no future in having to add <code class="language-plaintext highlighter-rouge">print()</code> statements to find out what your code did, or in reading imperative code to figure out what state a UI ended up in. You have to be able to look at everything, <em>as it exists</em>.</p>
<p>The most Scrutable way of writing code that I know of is React. The declarative model is the right way to reason about UI code. React Devtools are reasonably good at looking at something while it’s running, and, in some cases modifying it. Hooks are better than any other way I’ve ever seen to reason about side effects, although in every case the whole philosophy is hamstrung by being implemented in Javascript and having to transpile to the DOM. And the problem of data processing and externalities is, as far as I know, still an unsolved problem, despite the efforts of the Redux/Flux/state managment ecosystem.</p>
<p>(Perhaps in the not-too-distant future there is a version of the React whose shadow DOM <em>is</em> the DOM, and which runs in a language that doesn’t require dependency arrays, and which has first-class types built-in instead of shimmed on top, and in which you can’t make the mistake of forgetting to bind a function to the appropriate <code class="language-plaintext highlighter-rouge">this</code>, and whose debugger lets you follow asynchronous effects that are scheduled on later render frames. Wouldn’t that be nice!)</p>
<p>At least when it comes to UI, there is a future where React Devtools, Figma, and your IDE are the same piece of software. And I think that in this world, user-facing code no longer has anything like unit tests, because it’s a waste of time to meticulously test code when you can look at it and observe it’s correct.</p>
<p>The best shell I know of is, I guess, Python. Bash and its descendants are a disaster and the world would be better off if they were entirely replaced. In the future there is no way that we’re going to be working in languages that use $PATH variables, that pipe unformatted string data through bizarrely-named commands inflected by obscure flags, or that require in-band signalling with strings like <code class="language-plaintext highlighter-rouge">\u001b[31m</code> to colorize text. I mean, my god. (Once upon a time I had high hopes for <a href="https://github.com/unconed/TermKit">TermKit</a> but it never really got off the ground.)</p>
<p>I am not sure what the future of type systems is, but I know three things about it: 1. Constructing natural numbers out of successor functions is an irrelevant gimmick. 2. There will no concept of ‘undefined behavior’ that survives the typechecker, because that’s insane. 3. <a href="https://en.wikipedia.org/wiki/Refinement_type">Refinement types</a> are going to happen at some point. It will be considered antiquated to use a language that can’t specify the type of ‘integers greater than 5” in some ergonomic way.</p>
<p>Finally, I know this: most of the code written today isn’t any good, compared to what will be possible in the future. It’s not possible in today’s ecosystems to write something scalable, maintainable, and resilient to errors. It’s up to the frameworks and paradigms to develop the art of programming to the point where it’s actually an efficient and accessible craft instead of a massive timesink for the whole human race.</p>
All the Exterior Algebra Operations2020-10-15T00:00:00+00:00https://alexkritchevsky.com/2020/10/15/ea-operations<p>More exterior algebra notes. This is a reference for (almost) all of the many operations that I am aware of in the subject. I will make a point of giving explicit algorithms and an explicit example of each, in the lowest dimension that can still be usefully illustrative.</p>
<p>Warning: very long.</p>
<!--more-->
<hr />
<h2 id="background-on-terminology-and-notations">Background on terminology and notations</h2>
<p>As far as I can tell, the same ideas underlying what I call ‘exterior’ algebra have been developed at least four separate times in four notations. A rough history is:</p>
<p>Grassmann developed the original ideas in the ~1840s, particularly in his <em>Ausdehnungslehre</em>, which unfortunately was never very well popularized, particularly because linear algebra hadn’t really been developed yet. Grassmann’s goal was, roughly, to develop ‘synthetic geometry’: geometry without any use of coordinates, where all of the operations act on abstract variables.</p>
<p>Some of Grassmann’s ideas made it into projective geometry, where multivectors are called ‘flats’ (at least in one book I have, by Stolfi) and typically handled in projective coordinates (in which the point \((x,y)\) is represented by any value \((\lambda x, \lambda y, \lambda)\)). Some ideas also made it into algebraic geometry, and there is some overlap with ‘algebraic varieties’; I don’t know much about this yet.</p>
<p>Cartan and others develop the theory of differential forms in the 1920s and included a few parts of Grassman’s exterior algebra, which got the basics included in most algebra texts thereafter. Physicists adopted the differential forms notation for handling curved spaces in general relativity, so they got used to wedge products there. But most of vector calculus was eventually based on Hamilton’s quaternions from the ~1840s, simplified into its modern form by Heaviside in the ~1880s.</p>
<p>In the 1870s Clifford combined Hamilton and Grassmann’s ideas into ‘Clifford Algebras’, but they were largely forgotten in favor of quaternions and later vector analysis. Dirac accidentally re-invented Clifford algebras in the 1920s with the Dirac/gamma matrices in relativistic QM. Hestenes eventually figured this out and did a lot of work to popularize his ‘Geometric Algebra’ starting in the 1960s, and a small but vocal group of mostly physicists has been pushing for increased use of multivectors / GA since then. More on this later.</p>
<p>Rota and his students also discovered Grassmann at some point (the 1960s as well, I think?) and developed the whole theory again as part of what they called ‘invariant theory’, in which they called multivectors ‘extensors’. They have a lot of good ideas but their notations largely suck. Rota and co. also overlapped into ‘matroid’ theory, which deals with the abstract notion of linear dependence and so ends up using a lot of the same ideas.</p>
<p>So “multivectors”, “extensors”, and “flats” (and “matroids” in the context of real vector spaces) (and “varieties” in some cases?) basically are all the same thing. “Exterior product”, “wedge product”, “progressive product”, and “join” are all the same operation.</p>
<p>For the most part I greatly prefer notations and terminology based on vector algebra, so I stick with “multivector” and translate other things where possible. However, it is undeniable that the best name for the exterior product is the <strong>join</strong>, and its dual is the <strong>meet</strong>.</p>
<p>Everyone also picks their choice of scalar coefficients differently. I always pick the one that involves the fewer factorial terms, and I don’t care about making sure the choices generalize to finite fields.</p>
<p>Unfortunately, Cartan and the vector analysis folks definitely got the symbol \(\^\) for the exterior product wrong. Projective geometers and Rota got it right: it should be \(\vee\), rather than \(\^\). Join is to vector spaces what union is to sets, and union is \(\cup\). Meet (discussed below) is analogous to \(\cap\). (And linear subspaces form a lattice, which already uses the symbols \(\^\) and \(\v\) this way, plus the terminology ‘join’ and ‘meet’!)</p>
<p>I’m going to keep using \(\^\) for join here for consistency with most of the literature, but it’s definitely wrong, so here’s an open request to the world:</p>
<p><strong>If you ever write a textbook using exterior algebra that’s going to be widely-read, please fix this notation for everyone by swapping \(\^\) and \(\v\) back. Thanks.</strong></p>
<hr />
<h2 id="note-on-duality">Note on duality</h2>
<p>Since I am mostly concerned with eventually using this stuff for physics, I can’t ignore the way physicists handle vector space duality. The inner product of vectors is defined only between a vector and its dual, and contraction is performed using a metric tensor, so \(g: V \o V^* \ra \bb{R}\). In index notation this means you always pair a lower index with an upper one: \(\b{u} \cdot \b{v} = u_i v^i\).</p>
<p>However, I think most of this should be intuitive even on plain Euclidean space with an identity metric, so I prefer first presenting each equation with no attention paid to duality, then a version with upper and lower indices. I’ll mostly avoid including a metric-tensor version for space, but it can be deduced from the index-notation version.</p>
<p>An added complication is that there is an argument to be made that use of the dual vector space to define the inner product is a <em>mistake</em>. I am not exactly qualified to say if this correct or not, but after everything I’ve read I suspect it is. The alternative to vector space duality is to define everything in terms of the volume form, so the inner product is defined by the relation:</p>
\[\alpha \^ \star \beta = \< \alpha, \beta \> \omega\]
<p>With \(\omega\) a choice of pseudoscalar. This means that the choice of metric becomes a choice of <em>volume form field</em>, which is actually pretty compelling. \(\< \alpha, \_ \>\) <em>is</em> a linear functional \(\in V^* \simeq V \ra \bb{R}\), and so counts as the dual vector space. But this can also make it tricky to define \(\star\), since some people think it should map vectors to dual vectors and vice versa.</p>
<p>Another idea is to interpret \(V^*\) as a “-1”-graded vector space relative to \(V\), such that \(\underset{-1}{a} \^ \underset{1}{b} = \underset{0}{(a \cdot b)}\). ‘Dual multivectors’ then have negative grades in general. This often seems like a good idea but I’m not sure about it yet.</p>
<p>Rota’s Invariant Theory school uses yet another definition of the inner product. They define the wedge product in terms of another operation, called a ‘bracket’ \([, ]\), so that \(\alpha \^ \star \beta = [\alpha, \beta] \omega\), but they also seem to treat the pseudoscalar as a regular scalar and so call this an inner product. I don’t think this is the right approach because I’m not comfortable forgetting the difference between \(\^^n \bb{R}\) and \(\bb{R}\), although as above I do like the idea of the volume form as defining the inner product. (They call the whole space equipped with such a bracket a ‘Peano space’. I don’t think the name caught on.)</p>
<hr />
<h2 id="1-the-tensor-product-o">1. The Tensor Product \(\o\)</h2>
<p>We should briefly mention the tensor product first. \(\o\) is the ‘free multilinear product’ on vector spaces. Multilinear means that \(u \o v\) is linear in both arguments: \((c_1 u_1 + c_2 u_2) \o v = c_1 (u_1 \o v) + (c_2 u_2 \o v)\), etc. <a href="https://en.wikipedia.org/wiki/Free_object">Free</a> means that any other multilinear product defined on vector spaces factors through \(\o\). Skipping some technicalities, this means if we have some other operation \(\ast\) on vectors which is multilinear in its arguments, then there is an map \(f\) with \(a \ast b = f(a \otimes b)\).</p>
<p>‘Free’-ness is generally a useful concept. \(\^\) happens to be the free <em>antisymmetric</em> multilinear product, so any other antisymmetric operation on the tensor algebra factors through \(\^\). There are ‘free’-r products than \(\o\) as well, if you let go of multilinearity and associativity.</p>
<p>\(\o\) acting on \(V\) (a vector space over \(\bb{R}\)) produces the ‘tensor algebra’ consisting of consisting of \(\o V = \bb{R} \oplus V \oplus V^{\o 2} \oplus \ldots\), with \(\o\) as the multiplication operation. There is a canonical inner product on any \(V^{\o n}\) inherited from \(V\)’s: \(\< \b{a} \o \b{b}, \b{c} \o \b{d} \> = \< \b{a}, \b{c} \> \< \b{b} , \b{d} \>\).</p>
<hr />
<h2 id="2-the-exterior-product-">2. The Exterior Product \(\^\)</h2>
<p>The basic operation of discussion is the exterior product \(\alpha \^ \beta\). Its most general definition is via the quotient of the tensor algebra by the relation \(x \o x \sim 0\) for all \(x\). Specifically, the exterior <em>algebra</em> is the algebra you get under this quotient; the exterior <em>product</em> is the behavior of \(\o\) under this algebra homomorphism.</p>
<p>Given a vector space \(V\) and tensor algebra \(\o V\), we define \(I\) as the ideal of elements of the form \(x \o x\) (so any tensor which contains any copy of the same basis vector twice). Then:</p>
\[\^ V \equiv V / I\]
<p>Elements in this quotient space are multivectors like \(\alpha \^ \beta\), and \(\o\) maps to the \(\^\) operation. If \(\pi\) is the canonical projection \(V \mapsto V/I\):</p>
\[\pi(\alpha) \^ \pi(\beta) \equiv \pi(\alpha \o \beta)\]
<p>In practice, you compute the wedge product of multivectors by just appending them, as the product inherits associativity from \(\o\) (with \(\| \alpha \| = m, \| \beta \| = n\)):</p>
\[\alpha \^ \beta = \alpha_1 \^ \ldots \^ \alpha_{m} \^ \beta_1 \^ \ldots \^ \beta_n\]
<p>There is are several standard ways to map a wedge product back to a tensor product (reversing \(\pi\), essentially, so we’ll write it as \(\pi^{-1}\) although it is not an inverse). One is to select <em>any</em> valid tensor:</p>
\[\pi^{-1} \alpha \stackrel{?}{=} (\alpha_1 \^ \ldots \^ \alpha_n) = \alpha_1 \o \ldots \o \alpha_n\]
<p>More useful, however, it to map the wedge product to a totally antisymmetrized tensor:</p>
\[\pi^{-1} \alpha = K \sum_{\sigma \in S_{m}} \sgn(\sigma) \alpha_{\sigma(1)} \o \ldots \o \alpha_{\sigma(m)}\]
<p>Where \(\sigma\) ranges over the permutations of \(m\) elements. This has \(m!\) terms for a basis vector \(\in \^^m \bb{R}^n\) ( a more complicated formula with \({n \choose m}\) terms is needed for general elements of \(\^^m \bb{R}^n\) – but you can basically apply the above for every component). It is impractical for algorithms but good for intuition. \(K\) is a constant that is chosen to be either \(1\), \(\frac{1}{m!}\), or \(\frac{1}{\sqrt{m!}}\), depending on the source. I prefer \(K=1\) to keep things simple. Here’s an example:</p>
\[\pi^{-1}(\b{x} \^ \b{y}) = \b{x} \o \b{y} - \b{y} \o \b{x}\]
<p>Antisymmetric tensors that appear in other subjects are usually supposed to be multivectors. Antisymmetrization is a familiar operation in Einstein notation:</p>
\[\b{a} \^ \b{b} \^ \b{c} \equiv a_{[i} b_j c_{k]} = \sum_{\sigma \in S_3} \sgn(\sigma) a_{\sigma(1)} b_{\sigma(2)} c_{\sigma(3)}\]
<p>Other names:</p>
<ul>
<li>“Wedge product”, because it looks like a wedge</li>
<li>“Progressive Product” (by Grassmann and Gian-Carlo Rota). ‘Progressive’ because it increases grades.</li>
<li>“Join”, in projective geometry and lattice theory. So-called because the wedge product of two vectors gives the linear subspace spanned by them, if it is non-zero.</li>
</ul>
<p>As mentioned above, the symbol for ‘join’ in other fields is \(\vee\). Exterior algebra has it backwards. It’s definitely wrong: these operations in a sense generalize set-theory operations, and \(\^\) should correspond to \(\cup\).</p>
<hr />
<h2 id="3-the-inner-product--">3. The Inner Product \(\<, \>\)</h2>
<p>The multivector inner product, written \(\alpha \cdot \beta\) or \(\< \alpha, \beta \>\), where \(\alpha\) and \(\beta\) have the same grade.</p>
<p>There are several definitions that disagree on whether it should have any scaling factors like \(\frac{1}{k!}\), depending on the definition of \(\^\). I think the only reasonable definition is that \((\b{x \^ y}) \cdot (\b{x \^ y}) = 1\). This means that this is <em>not</em> the same operation as the <em>tensor</em> inner product, applied to antisymmetric tensors:</p>
\[(\b{x \^ y}) \cdot (\b{x \^ y}) \neq (\b{x \o y} - \b{y \o x}) \cdot (\b{x \o y} - \b{y \o x}) = 2\]
<p>But it’s just too useful to normalize the magnitudes of all basis multivectors. It avoids a lot of \(k!\) factors that would otherwise appear everywhere.</p>
<p>To compute, either antisymmetrize <em>both</em> sides in the tensor representation and divide by \(k!\), or just antisymmetrize one side (either one):</p>
\[\begin{aligned}
(\b{a \^ b}) \cdot (\b{c \^ d}) &= \frac{1}{2!}(\b{a \o b} - \b{b \o a}) \cdot (\b{c \o d} - \b{d \o c}) \\
&= (\b{a \o b}) \cdot (\b{c \o d} - \b{d \o c}) \\
&= (\b{a \cdot c}) (\b{b \cdot d}) - (\b{a \cdot d}) (\b{b \cdot c})
\end{aligned}\]
<p>This also gives the coordinate form:</p>
\[(\b{a \^ b}) \cdot (\b{c \^ d}) = a_i b_j c^{[i} d^{j]} = a_i b_j (c^i d^j - c^j d^i)\]
<p>Or in general:</p>
\[\< \alpha, \beta \> = \< \bigwedge \alpha_i, \bigwedge \beta_j \> = \det(\alpha_i \cdot \beta_j)\]
<hr />
<h2 id="4-the-interior-product-cdot">4. The Interior Product \(\cdot\)</h2>
<p>The interior product is the ‘curried’ form of the inner product:</p>
\[\< \b{a} \^\alpha, \beta \> = \< \alpha, \b{a} \cdot \beta \>\]
<p>This is written as either \(\b{a} \cdot \beta\) or \(\iota_{\b{a}} \cdot \beta\). Computation is done by antisymmetrizing the side with the larger grade, then contracting:</p>
\[\b{a} \cdot (\b{b \^ c}) = \b{a} \cdot (\b{b \o c} - \b{c \o b}) = (\b{a} \cdot \b{b}) \b{c} - (\b{a} \cdot \b{c}) \b{b}\]
<p>In index notation:</p>
\[\b{a} \cdot (\b{b \^ c}) = a_i b^{[i} c^{j]} = a_i (b^{i} c^{j} - b^j c^i)\]
<p>Other names: the “contraction” or “insertion” operator, because it inserts its left argument into some of the ‘slots’ in the inner product of the right argument.</p>
<p><strong>The two-sided interior product</strong></p>
<p>Normally, in the notation \(\alpha \cdot \beta\), it’s understood that the lower grade is on the left, and the operation isn’t defined otherwise. But some people ignore this restriction, and I’m warming up to doing away with it entirely. I can’t see any reason not to define it to work either way.</p>
<p>When tracking dual vectors we need to be careful about which side ends up ‘surviving’. To be explicit, let’s track which ones we are considering as dual vectors:</p>
\[\b{x}^* \cdot (\b{x} \^ \b{y}) = \b{y} \\
(\b{x}^* \^ \b{y}^*) \cdot \b{x} = \b{y}^*\]
<p>Note that in both cases the vectors contract <em>left-to-right</em>. One vector / dual-vector is inserted into the ‘slots’ of the other dual-vector/vector. In coordinates, these are:</p>
\[\b{a}^* \cdot (\b{b \^ c}) = a_i (b^{[i} c^{j]})\]
\[(\b{b}^* \^ \b{c}^*) \cdot \b{a} = a^i(b_{[i} c_{j]})\]
<hr />
<h2 id="5-the-hodge-star-star">5. The Hodge Star \(\star\)</h2>
<p>\(\star\) produces the ‘complementary subspace’ to the subspace denoted by a multivector. It is only defined relative to a choice of pseudoscalar \(\omega\) – usually chosen to be all of the basis vectors in lexicographic order, like \(\b{x \^ y \^ z}\) for \(\bb{R}^3\). Then:</p>
\[\star \alpha = \alpha \cdot \omega\]
<p>A more common but less intuitive definition:</p>
\[\alpha \^ (\star \beta) = \< \alpha, \beta \> \omega\]
<p>The inner product and Hodge star are defined in terms of each other in various sources. For my purposes, it makes sense to assume the form of the inner product.</p>
<p>In practice, I compute \(\star \alpha\) in my head by finding a set of basis vectors such at \(\alpha \^ \star \alpha = \omega\) (up to a scalar). Explicit example in \(\bb{R}^4\):</p>
\[\star(\b{w} \^ \b{y}) = - \b{x \^ z}\]
<p>because</p>
\[\b{(w \^ y) \^ x \^ z} = - \b{w \^ x \^ y \^ z} = - \omega\]
<p>In Euclidean coordinates, \(\omega\) is given by the Levi-Cevita symbol \(\epsilon_{ijk}\), and \(\star \alpha = \alpha \cdot \omega\) works as expected:</p>
\[\star(\b{a} \^ \b{b})_k = \epsilon_{ijk} a^i b^j\]
<p>This is using the convention that the \(\star\) of a vector is a lower-index dual vector. I’ve seen both conventions: some people would additionally map it back to a vector using the metric:</p>
\[\star(\b{a} \^ \b{b})^k = \epsilon_{ij}^k a^i b^j = g^{kl} \epsilon_{ijl} a^i b^j\]
<p>Either convention seems fine as long as you keep track of what you’re doing. They’re both valid in index notation, anyway; the only difference is choosing which is meant by \(\star \alpha\).</p>
<p>It is kinda awkward that \(\omega\) is the usual symbol for the pseudoscalar object but \(\e\) is the symbol with indices. It is amusing, though, that \(\e\) looks like a sideways \(\omega\). I’ll stick with this notation here but someday I hope we could just use \(\omega\) everywhere, since \(\e\) is somewhat overloaded.</p>
<p>\(\star\) is sometimes written \(\ast\), but I think that’s uglier. In other subjects it’s written as \(\star \alpha \mapsto \alpha^{\perp}\) which I do like.</p>
<p>We need a bit of notation to handle \(\star\) is arbitrary dimensions. We index with multi-indices of whatever grade is needed – for the Levi-Cevita symbol, we write \(\e_{I}\) where \(I\) ranges over the one value, \(\omega\), of \(\^^n V\) (note: this is different than ranging over <em>every</em> choice of \(I\) with \(n!\) terms. Instead, we index by a single multivector term. It’s a lot easier.) To express contraction with this, we split the index into two multi-indices: \(\e_{I \^ J}\), so \(\star \alpha\) is written like this:</p>
\[(\star \alpha)_{K} = \alpha^I \e_{I K}\]
<p>The implicit sum is over every value of \(I \in \^^{\| \alpha \|} V\).</p>
<p>Note that in general \(\star^2 \alpha = (-1)^k (-1)^{n-k} \alpha\), so \(\star^{-1} \alpha = (-1)^k (-1)^{n-k} \star \alpha\).</p>
<hr />
<h2 id="6-the-cross-product-times">6. The Cross Product \(\times\)</h2>
<p>The cross-product is only defined in \(\bb{R}^3\) and is given by:</p>
\[\b{a} \times \b{b} = \star (\b{a} \^ \b{b})\]
<p>Some people say there is a seven-dimensional generalization of \(\times\), but they’re misguided. This generalizes to every dimension.</p>
<hr />
<h2 id="7-the-partial-trace-cdot_k">7. The Partial Trace \(\cdot_k\)</h2>
<p>In index notation it is common to take a ‘partial trace’ of a tensor: \(c_i^k = a_{ij} b^{jk}\), and sometimes we see a partial trace of an antisymmetric tensor:</p>
\[c_j^k = a_{[i, j]} b^{[j, k]} = (a_{ij} - a_{ji})(b^{jk} - b^{kj}) = a_{ij} b^{jk} - a_{ji} b^{jk} - a_{ij} b^{kj} + a_{ji} b^{kj}\]
<p>For whatever reason I have never seen an coordinate-free notation for this for multivectors. But it’s actually an important operation, because if we treat bivectors as rotation operators on vectors, it’s how they compose:</p>
\[[(a \b{x} + b \b{y}) \cdot (\b{x \^ y})] \cdot (\b{x \^ y} ) = (a \b{y} - b \b{x}) \cdot (\b{x \^ y}) = - (a \b{x} + b \b{y})\]
<p>Which means that apparently</p>
\[R_{xy}^2 = (\b{x} \^ \b{y}) \circ (\b{x} \^ \b{y}) = -(\b{x \o x} + \b{y \o y})\]
<p>Note that the result <em>isn’t</em> a multivector. In general it’s an element of \(\^ V \o \^ V\).</p>
<p>But it’s still useful. What’s the right notation, though? Tentatively, I propose we write \(\cdot_k\) to mean contracting \(k\) terms together. The choice of <em>which terms</em> is a bit tricky. The geometric product, discussed later, suggests that we should do inner-to-outer. But the way we already handle inner products suggests left-to-right. For consistency let’s go with the latter, and insert \(-1\) factors as necessary.</p>
<p>The partial trace of two multivectors is implemented like this:</p>
\[\alpha \cdot_k \beta = \sum_{\gamma \in \^^k V} (\gamma \cdot \alpha) \o (\gamma \cdot \beta) \in \^ V \o \^ V\]
<p>Where the sum is over unit-length basis multivectors \(\gamma\). Note that this use of \(\o\) is <em>not</em> the multiplication operation in the tensor algebra we constructed \(\^ V\) from; rather, it is the \(\o\) of \(\^ V \o \^ V\). This translates to:</p>
\[[\alpha \cdot_k \beta]_{J K} = \alpha_{IJ} \beta^I_{K} = \delta^{IH} \alpha_{IJ} \beta_{HK}\]
<p>(That \(\delta\) is the identity matrix; recall that indexing it by multivectors \(I, H \in \^^k V\) means to take elements of \(\delta^{\^^k}\) which is the identity matrix on \(\^^k V\).)</p>
<p>This construction gives \((\b{x \^ y})^{(\cdot_1) ^2} = (\b{x \o x + y \o y}) = I_{xy}\), because we contracted the first indices together. When used on a vector as a rotation operator, we need a rule like this:</p>
\[R_{xy}^2 = - (\b{x \^ y})^{\cdot_1 2}\]
<p>In general, contracting operators that are going to act on grade-\(k\) objects gives \(O \circ O = (-1)^k O^{\cdot 2}\). But I don’t think it’s worth thinking too hard about this: the behavior is very specific to the usage.</p>
<p><strong>Partial Star:</strong></p>
<p>One funny thing we can do with a partial trace is apply \(\star\) to one component of a multivector :</p>
\[\star_k \alpha = \alpha \cdot_k \omega\]
<p>Example in \(\bb{R}^3\):</p>
\[\begin{aligned}
\star_1 \b{x \^ y} &= (\star \b{x}) \o \b{y} - (\star \b{y}) \o \b{x} \\
&= (\b{y \^ z}) \o \b{y} - (\b{z \^ x}) \o \b{x}
\end{aligned}\]
<p>I would have thought this was overkill and would never be useful, but it turns out it has a usage in the next section.</p>
<p><strong>Coproduct slice:</strong></p>
<p>Prior to this section I haven’t really considered tensor powers of exterior algebras like \(\^ V \o \^ V\) in general before, except for wedge powers of matrices like \(\^^2 A\). But they do come up in the literature sometimes. Rota & Co had an operation they called the “coproduct slice” of a multivector, which splits a multivector in two by antisymmetrically replacing one of the \(\^\) positions with a \(\o\), like this:</p>
\[\p_{2,1} (\b{x \^ y \^ z}) = (\b{x \^ y}) \o \b{z} + (\b{y \^ z}) \o \b{x} + (\b{z \^ x}) \o \b{y}\]
<p>This gets at the idea that any wedge product (the free antisymmetric multilinear product) factors through the tensor product (the free multilinear product), and some concepts make more sense on the tensor product. For instance, it makes more sense to me to take the trace of two tensored terms than of two wedged terms. In general I am still trying to figure out for myself whether the “quotient algebra” or “antisymmetric tensor algebra” senses of \(\^\) are more important and fundamental, and the right way to think about the two.</p>
<p>Up to a sign, the coproduct slice can be implemented by tracing over the unit basis \(k\)-vectors:</p>
\[\p_{k, n-k} \beta = \sum_{\alpha \in \^^k V} \alpha \o (\alpha \cdot \beta )\]
<hr />
<h2 id="8-the-meet-vee">8. The Meet \(\vee\)</h2>
<p>\(\star\) maps every multivector to another one. Its action on the wedge product is to produce a dual operation \(\vee\), called the <em>meet</em> (recall that the wedge product is also aptly called the ‘join’).</p>
\[(\star \alpha) \vee (\star \beta) = \star(\alpha \^ \beta)\]
<p>The result is a complete exterior algebra because it’s isomorphic to one under \(\star\). So <em>both</em> of these are valid exterior algebras obeying the exact same rules:</p>
\[\^ V = (\^, V)\]
\[\vee V = (\vee, \star V)\]
<p>All operations work the same way if a \(\star\) is attached to every argument and we replace \(\^\) with \(\vee\):</p>
\[\star (\b{a} \^ \b{b}) = (\star \b{a}) \vee (\star \b{b})\]
<p>\(\vee \bb{R}^2\) is, for instance, spanned by \((\star 1, \star \b{x}, \star \b{y}, \star (\b{x} \^ \b{y})) = (\b{x \^ y}, \b{y}, - \b{x}, 1)\)</p>
<p>Sometimes \((\^, \v, V)\) is called a ‘double algebra’: a vector space with a choice of pseudoscalar and two dual exterior algebras. It’s also called the <a href="https://en.wikipedia.org/wiki/Grassmann%E2%80%93Cayley_algebra">Grassman-Cayley Algebra</a>. I like to write it as \(\^{ \v }V\).</p>
<p>The meet is kinda weird. It is sorta like computing the union of two linear subspaces:</p>
\[(\b{x \^ y}) \vee (\b{y \^ z}) = (\star\b{z}) \vee (\star\b{x}) = \star (\b{z \^ x}) = \b{y}\]
<p>But it only works if the degrees of the two arguments add up to \(\geq n\):</p>
\[\b{x} \vee \b{y} = \star(\b{y \^ z} \^ \b{z \^ x}) = 0\]
<p>A general definition is kinda awkward, but we can do it using the \(\star_k\) operation from the previous section. It looks like this:</p>
\[\alpha \vee \beta = (\star_{\| \beta \|} \alpha) \cdot \beta\]
<p>The \(\alpha\) will be inner-product’d with the \(\star\)‘d terms of \(\beta\). Recall that \(\star_k \beta\) becomes a sum of tensor products \(\beta_1 \o \beta_2\). We end up dotting \(\alpha\) with the first term:</p>
\[\alpha \vee \beta = [\sum_{\alpha_1 \^ \alpha_2 = \alpha} (\star \alpha_1) \o \alpha_2] \cdot \beta = \sum_{\alpha_1 \^ \alpha_2 = \alpha} (\star \alpha_1 \cdot \beta) \alpha_2\]
<p>(This is a sum over ‘coproduct slices’ of \(\alpha\), in one sense. This kind of sum is called ‘Sweedler Notation’ in the literature.) This is non-zero only if \(\beta\) contains all of the basis vectors <em>not</em> in \(\alpha\). It makes more sense on an example:</p>
\[\begin{aligned}
(\b{x \^ y}) \vee (\b{y} \^ \b{z}) &= \star_1 (\b{x \^ y}) \cdot (\b{y} \^ \b{z}) \\
&= (\b{y \^ z}) \o \b{y} - (\b{z \^ x}) \o \b{y}) \cdot (\b{y \^ z}) \\
&= \b{y}
\end{aligned}\]
<p>In index notation:</p>
\[(\alpha \vee \beta)_K = \alpha_{IJ} \e^{IK} \beta_{}\]
<p>Or we can directly translate \((\star \alpha) \vee (\star \beta) = \star(\alpha \^ \beta)\):</p>
\[(\star \alpha \vee \star \beta)^K = (\e_{IJ} \alpha^I) (\e_{JK} \beta^J) \e^{JKL}\]
<p>Note: I got exhausted trying to verify the signs on this, so they might be wrong. At some point I’ll come back and fix them.</p>
<p>Note 2: remember that \(\star^{-1} = (-1)^{k(n-k)} \star \neq \star\) in some dimensions, so you need to be careful about applying the duality to compute \(\vee\): \(\alpha \vee \beta = \star(\star^{-1} \alpha \^ \star^{-1} \beta)\). Also note that, since \(\vee\) is defined in terms of \(\star\), it is explicitly dependent on the choice of \(\omega\).</p>
<p>As mentioned above, the symbols for join and meet are definitely <em>swapped</em> in a way that’s going to be really hard to fix now. It should be meet = \(\^\), join = \(\vee\), so it matches usages everywhere else, as well as usages of \(\cup\) and \(\cap\) from set / boolean algebras.</p>
<p>Since \(\vee V\) is part of another complete exterior algebra, it also has all of the other operations, including a ‘dual interior product’ \(\alpha \cdot_{\vee} \beta\). I have never actually seen it used, but it exists.</p>
<hr />
<h2 id="9-relative-vee_mu-_mu-and-star_mu">9. Relative \(\vee_\mu\), \(\^_\mu\), and \(\star_\mu\),</h2>
<p>We saw that \(\star\) and by extension are \(\vee\) defined relative to a choice of pseudoscalar \(\omega\). What if we choose differently? It turns out that this is actually occasionally useful – I saw it used in <em>Oriented Projective Geometry</em> by Jorge Stolfi, which develops basically all of exterior algebra under an entirely different set of names. We write \(\star_{\mu}\) and \(\vee_{\mu}\) for the star / meet operations relative to a ‘universe’ multivector \(\mu\):</p>
\[\star_{\mu} \alpha = \alpha \cdot \mu\]
\[(\star_\mu \alpha) \vee_{\mu} (\star_\mu \beta) = \star_{\mu} (\alpha \^ \beta)\]
<p>The regular definitions set \(\mu = \omega\). The resulting exterior algebra shows us that any subset of the basis vectors of a space form an exterior algebra themselves. In case this seeems like pointless abstraction, I’ll note that it does come up, particularly when dealing with projective geometry. If \(\b{w}\) is a projective coordinate, we can write the projective \(\star_{\b{wxyz}}\) in terms of \(\star_{\b{xyz}}\):</p>
\[\star_{\b{wxyz}}( w \b{w} + x \b{x} + y \b{y} + z \b{z}) = \b{w} \^ \star_{\b{xyz}}(x\b{x} + y\b{y} +z \b{z}) + w (\b{x \^ y \^ z})\]
<p>There is also a way to define \(\^\) relative to a ‘basis’ multivector, \(\^_{\nu}\). The behavior is to join two multivectors ignoring their component along \(\nu\):</p>
\[(\nu \^ \alpha) \^_{\nu} (\nu \^ \beta) = \nu \^ (\alpha \^ \beta)\]
<p>For unit \(\nu\), this can be implemented as:</p>
\[\alpha \^_{\nu} \beta = \nu \^ (\nu \cdot \alpha) \^ (\nu \cdot \beta))\]
<p>It’s neat that for choices of \(\nu, \mu\), we can produce another exterior double algebra embedded within \((\^, \v, V)\):</p>
\[(\^_{\nu}, \v_{\mu}, \nu, \mu, V)\]
<p>Our regular choice of exterior algebra on the whole space is then given by:</p>
\[(\^, \v, V) = (\^_1, \v_\omega, 1, \omega, V)\]
<hr />
<h2 id="10-the-geometric-product-alphabeta">10. The Geometric Product \(\alpha\beta\)</h2>
<p>There is much to say about <a href="https://en.wikipedia.org/wiki/Geometric_algebra">Geometric algebra</a> and the ‘geometric product’. (Other names: “Clifford Algebra”, “Clifford Product”.)</p>
<p>GA is how I got into this stuff in the first place, but I avoid using the name for the most part because there is some social and mathematical baggage that comes with it. But its proponents deserve credit for popularizing the ideas of multivectors in the first place – I’m pretty sure we all agree that multivectors, as a concept, should be used and taught everywhere.</p>
<p>The social baggage is: the field, while perfectly credible in theory, tends to attract an unusual rate of cranks (many of them ex-physics students who want to ‘figure it all out’ – like myself! I might be a crank. I’m not sure.). The mathematical baggage is the proliferation of notations that are hard to use and not very useful.</p>
<p>The geometric product is a generalization of complex- and quaternion-multiplication to multivectors of any grade. The inputs and outputs are linear combinations of multivectors of any grade. It’s generally defined as another quotient of the tensor algebra: instead of just \(x \o x \sim 0\), as defined the exterior algebra, we use \(x \o y \sim - y \o x \sim 0\) (so we can still exchange positions of elements in a tensor) but \(x \o x \sim 1\). This means duplicate tensor terms are just replaced with \(1\) in tensor products, rather than annihilating the whole thing, like this:</p>
\[x \o x \o y \o x \o y \sim (x \o x) \o y \o (-y) \o x \sim -x\]
<p>The geometric product is the action of \(\o\) under this equivalence relation. In geometric algebra texts it is written with juxtaposition, since it generalizes scalar / complex multiplication that are written that way. I’ll do that for this section.</p>
\[(\b{xy})(\b{xyz}) = (\b{xy}) (-\b{yxz}) = (\b{x})(\b{xz}) = -\b{z}\]
<p>It’s associative, but not commutative or anticommutative in general.</p>
<p>The primary reason to use this operation is that its implementations on \(\bb{R}^2\), \(\bb{R}^3\), and \(\bb{R}^{3,1}\) are already used:</p>
<ul>
<li>The geometric product on even-graded elements of \(\bb{R}^2\) implements complex multiplication.</li>
<li>The geometric product on even-graded elements of \(\bb{R}^3\) implements quaternion multiplication.</li>
<li>The geometric product on four elements \((\b{t, x, y, z})\) with the \(x^2 = y^2 = z^2 = -1\) is implemented by the <a href="https://en.wikipedia.org/wiki/Gamma_matrices">gamma matrices</a> \(\gamma^{\mu}\) which are used in quantum mechanics.
<ul>
<li>(I won’t discuss the alternate metric in this article, but it’s done by using \(x \o x \sim Q(x,x)\) in the quotient construction of the algebra, where \(Q\) is the symmetric bilinear form that’s providing a metric.)</li>
</ul>
</li>
</ul>
<p>Geometric algebra tends to treat the geometric product as fundamental, and then produce the operations from it. For vectors, the definitions are:</p>
\[\< \b{a}, \b{b} \> = \frac{1}{2}(\b{ab + ba})\]
\[\b{a} \^ \b{b} = \frac{1}{2}(\b{ab - ba})\]
<p>But we could also define things the other way:</p>
\[\b{ab} = \frac{1}{2}(\b{a \cdot b} + \b{a \^ b})\]
<p>Multivector basis elements are just written by juxtaposing the relevant basis vectors, since \(\b{xy} = \b{x \^ y}\). I like this notation and should start using it even if I avoid the geometric product; it would save a lot of \(\^\)s.</p>
<p>To define the geometric product in terms of the other operations on this page, we need to define the <strong>reversion</strong> operator, which inverts the order of the components in a geometric product (with \(k\) as the grade of the argument):</p>
\[(abcde)^{\dag} = edcba = (-1)^{k(k-1)/2} (abcde)\]
<p>This generalizes complex conjugation, since it takes \(\b{xy} \ra -\b{xy}\) in \(\bb{R}^2\) and \(\bb{R}^3\). It allows us to compute geometric products, which contracts element from inner to outer, using the operations already defined on this page, which I have defined as contracting left-to-right in every case. The general algorithm for producing geometric products out of previously-mentioned operations then is to try projecting the onto <em>every</em> basis multivector:</p>
\[\alpha \beta = \sum_{\gamma \in \^^ V} (\gamma \cdot \alpha^\dag) \^ (\gamma \cdot \beta)\]
<p>This translates into index notation as:</p>
\[\alpha \beta = \sum_{\gamma \in \^^ V} (-1)^{\| \alpha \| ( \| \alpha \| -1)/2} \gamma_I \gamma_K \alpha^{I}_{[J}\beta^{K}_{L]}\]
<p>I think we can agree that’s pretty awkward. But it’s hard to be sure what to do with it. Clearly it’s <em>useful</em>, at least in the specific cases of complex and quaternions multiplication.</p>
<p>My overall opinion on the geometric product is this:</p>
<ul>
<li>I <em>tentatively</em> think that it is mis-defined to use inner-to-outer contraction, because of the awkward signs and conjugation operations that result.
<ul>
<li>I suspect the appeal of defining contraction this way was to make \((\b{xy})^2 = -1\), in order to produce something analogous to \(i^2 = -1\). But imo it’s really much more elegant if all basis elements have \(\alpha^2 = 1\).</li>
<li>If we want to preserve the existing of a multiplication operation with \(\alpha^2 = -1\), we can <em>define</em> the geometric product as \(\alpha \beta = \alpha^{\dag} \cdot \beta\) or something like that. Maybe.</li>
<li>Associativity is really nice, though. So maybe it’s my definition of the other products that’s wrong for doing away with it.</li>
</ul>
</li>
<li>However, it works suspiciously well for complex numbers, quaternions, and gamma matrices.</li>
<li>And it works suspiciously well for producing something that acts like a multiplicative inverse (see below).</li>
<li>But I know of almost zero cases where mixed-grade multivectors are useful, except for sums of “scalars plus one grade of multivector”.</li>
<li>I can’t find any general geometric intuition for the product in general.</li>
<li>So I’m mostly reserving judgment on the subject, until I figure out what’s going on more completely.</li>
</ul>
<hr />
<p><strong>Other operations of geometric algebra</strong></p>
<p>Unfortunately geometric algebra is afflicted by way too many other unintuitive operations. Here’s most of them:</p>
<ol>
<li><strong>Grade projection</strong>: \(\< \alpha \>_k = \sum_{\gamma \in \^^k V} (\gamma \cdot \alpha) \o \gamma\) extracts the \(k\)-graded terms of \(\alpha\).</li>
<li><strong>Reversion</strong>: \((abcde)^{\dag} = edcba = (-1)^{r(r-1)/2} (abcde)\). Generalizes complex conjugation.</li>
<li><strong>Exterior product</strong>: same operation as above, but now defined \(A \^ B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{r + s}\)</li>
<li><strong>Commutator product</strong>: \(A \times B = \frac{1}{2}(AB - BA)\). I don’t know what the point of this is.</li>
<li><strong>Meet</strong>: same as above, but now defined \(A \vee B = I(AI^{-1}) \^ (BI^{-1})\). GA writes the pseudoscalar as \(I\) and \(AI^{-1} = \star^{-1} A\).</li>
<li><strong>Interior product</strong>: for some reason there are a bunch of ways of doing this.
<ul>
<li><strong>Left contraction</strong>: \(A ⌋ B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{r - s}\)</li>
<li><strong>Right contraction</strong>: \(A ⌊ B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{s - r}\)</li>
<li><strong>Scalar product</strong>: \(A * B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{0}\)</li>
<li><strong>Dot product</strong>: \(A \cdot B = \sum_{r,s} \< \< A \>_r \< B \>_s \>_{\| s - r \|}\)</li>
</ul>
</li>
<li>There are a few other weird ‘conjugation’ operations (see <a href="https://en.wikipedia.org/wiki/Paravector">here</a>) but I think they’re thankfully fading out of usage.</li>
</ol>
<hr />
<h2 id="11-multivector-division-alpha-1">11. Multivector division \(\alpha^{-1}\)</h2>
<p>Ideally division of multivectors would produce a multivector \(\alpha^{-1}\) that inverts \(\^\):</p>
\[\frac{\alpha \^ \beta}{\alpha} = \beta\]
<p>There are several problems with this, though. One is that \(\alpha \^ \beta\) may be \(0\). Another is that \(\^\) isn’t commutative, so presumably \(\alpha^{-1} (\alpha \^ \beta)\) and \((\alpha \^ \beta) \alpha^{-1}\) are different. Another is that \(\beta + K \alpha\) is also a solution for any \(K\):</p>
\[\alpha \^ (\beta + K \alpha) = \alpha \^ \beta\]
<p>Or for any multivector \(\gamma\) with \(\gamma \^ \alpha = 0\):</p>
\[\alpha \^ (\beta + \gamma) = \alpha \^ \beta\]
<p>So there are at least a few ways to define this.</p>
<p><strong>Multivector division 1</strong>: Use the interior product and divide out the magnitude:</p>
\[\alpha^{-1} \beta = \frac{\alpha}{\| \alpha \|^2} \cdot \beta\]
<p>This gives up on trying to find <em>all</em> inverses, and just identifies one of them. It sorta inverts the wedge product, except it extracts only the orthogonal component in the result:</p>
\[\b{a}^{-1} (\b{a} \^ \b{b}) = \frac{\b{a}}{\| \b{a} \|^2} \cdot (\b{a} \^ \b{b}) = \b{b} - \frac{\b{a} (\b{a} \cdot \b{b})}{\| \b{a} \|^2} = \b{b} - \b{b}_{\parallel \b{a}} = \b{b}_{\perp \b{a}}\]
<p>The result is the ‘rejection’ of \(\b{b}\) off of \(\b{a}\). It doesn’t quite ‘invert’ \(\^\), but it’s a pretty sensible result. It is commutative due to our definition of the two sided interior product (both terms on contract left-to-right either way). If \(\b{a \^ b} = 0\) in the first place, then this rightfully says that \(\b{b}_{\perp \b{a}} = 0\) as well, which is nice.</p>
<p><strong>Multivector division 2</strong>: Allow the result to be some sort of general object, not a single-value:</p>
\[\alpha^{-1} \beta = \frac{\alpha}{\| \alpha \|^2} \cdot \beta + K\]
<p>where \(K\) is “the space of all multivectors \(\gamma\) with \(\alpha \^ \gamma = 0\)”. This operation produces the true preimage of multiplication via \(\^\), at the loss of an easy way to represent the result. But I suspect this definition is good and meaningful and is sometimes necessary to get the ‘correct’ answer.</p>
<p><strong>Multivector division 3</strong>: Use the geometric product.</p>
<p>The geometric product produces something that actually <em>is</em> division on GA’s versions of complex numbers and quaternions (even-graded elements of \(\^ \bb{R}^2\) and \(\^ \bb{R}^3\)):</p>
\[a^{-1} b = \frac{ab}{aa} = \frac{ab}{\| a \|^2}\]
<p>This is only defined for \(\| a \| \neq 0\) (remember, since GA has elements with \(\alpha^2 = -1\), you can have \(\| 1 + i \|^2 = 1^2 + i^2 = 0\)). You can read a lot about this inverse online, such as how to use it to reflect and rotate vectors.</p>
<hr />
<p>Cut for lack of time or knowledge:</p>
<ul>
<li>Exterior derivative and codifferential</li>
<li><a href="https://en.wikipedia.org/wiki/Cap_product">Cup and cap product</a> from algebraic topology. As far as I can tell these essentially implement \(\^\) and \(\vee\) on co-chains, which are more-or-less isomorphic to multivectors.</li>
</ul>
<hr />
<p>Other articles related to Exterior Algebra:</p>
<ol start="0">
<li><a href="/2018/08/06/oriented-area.html">Oriented Areas and the Shoelace Formula</a></li>
<li><a href="/2018/10/08/exterior-1.html">Matrices and Determinants</a></li>
<li><a href="/2018/10/09/exterior-2.html">The Inner product</a></li>
<li><a href="/2019/01/26/hodge-star.html">The Hodge Star</a></li>
<li><a href="/2019/01/27/interior-product.html">The Interior Product</a></li>
<li><a href="/2020/10/15/ea-operations.html">All the Exterior Algebra Operations</a></li>
</ol>
The essence of complex analysis2020-08-10T00:00:00+00:00https://alexkritchevsky.com/2020/08/10/complex-analysis<p>Rapid-fire non-rigorous intuitions for calculus on complex numbers. Not an introduction, but if you find/found the subject hopelessly confusing, this should help.</p>
<!--more-->
<p>Contents:</p>
<ul id="markdown-toc">
<li><a href="#1-the-complex-plane" id="markdown-toc-1-the-complex-plane">1. The complex plane</a></li>
<li><a href="#2-holomorphic-functions" id="markdown-toc-2-holomorphic-functions">2. Holomorphic functions</a></li>
<li><a href="#3-residues" id="markdown-toc-3-residues">3. Residues</a></li>
<li><a href="#4-integral-tricks" id="markdown-toc-4-integral-tricks">4. Integral tricks</a></li>
<li><a href="#5-topological-concerns" id="markdown-toc-5-topological-concerns">5. Topological concerns</a></li>
<li><a href="#6-convergence-concerns" id="markdown-toc-6-convergence-concerns">6. Convergence concerns</a></li>
<li><a href="#7-global-laurent-series" id="markdown-toc-7-global-laurent-series">7. Global Laurent Series</a></li>
</ul>
<hr />
<h2 id="1-the-complex-plane">1. The complex plane</h2>
<p>Calculus on \(\bb{C}\) is more-or-less just calculus on \(\bb{R}^2\), under the substitutions:</p>
\[\begin{aligned}
i &\lra R \\
a + bi & \lra (a + R b) \hat{x} = a \hat{x} + b \hat{y}
\end{aligned}\]
<p>Where \(R\) is the “rotation operator”. Yes, it is strange that you can do calculus with a rotation operator, but keep an open mind.</p>
<p>It is also possible to consider complex numbers as 2x2 matrices, using \(i = R = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\). Then in general</p>
\[a + bi = \begin{pmatrix} a & -b \\ b & a \end{pmatrix}\]
<p>The identity \(\cos \theta + i \sin \theta = e^{i \theta}\) follows from applying the <a href="https://en.wikipedia.org/wiki/Exponential_map">exponential map</a> to \(R\). If I had my way we would not use complex numbers ever and would just learn the subject as ‘calculus using rotation operators’ to avoid a proliferation of things that seem like magic, although a bit of work is needed to make it pedagogically sound. Certainly the words “complex” and “imaginary” aren’t doing anybody any favors.</p>
<p>The one way that \(\bb{C}\) is more interesting than plain \(\bb{R}^2\) is that there is a definition of multiplying two vectors:</p>
\[(a + b i) (c + d i) = (ac - bd) + (ad + bc) i\]
<p>The best way I know to interpret this is like so: the correspondence \(a + bi \Ra (a + R b) \hat{x}\) suggests that we interpret a complex number as an operator that is understood to ‘act on’ the \(\hat{x}\) basis vector. In this sense both adding and multiplying complex numbers are natural operations: adding them applies both operations to \(\hat{x}\) and adds the result; multiplying them applies them sequentially.</p>
\[[(a + b R) \circ (c + d R) ](\hat{x}) = [(ac - bd) + (ad + bc) R] (\hat{x})\]
<p>This model is especially appealing because it is easy to extend to higher dimensions. (This generalizes to ‘geometric algebra’ which works in every dimension, <em>not</em> quaternions, which only work in \(\bb{R}^3\).)</p>
<p>So we want to do calculus on these operators on \(\bb{R}^2\). We start by identifying derivatives and differential forms. The differentials of the coordinate vectors are:</p>
\[\begin{aligned}
dz &= dx + i dy \\
d\bar{z} &= dx - i dy
\end{aligned}\]
<p>The partial derivatives are for some reason called “<a href="https://en.wikipedia.org/wiki/Wirtinger_derivatives">Wirtinger derivatives</a>”:</p>
\[\begin{aligned}
\p_z &= \frac{1}{2}(\p_x - i \p_y) \\
\p_{\bar{z}} &= \frac{1}{2}(\p_x + i \p_y)
\end{aligned}\]
<p>Note that the signs are swapped, compared to the forms, and factors of \(\frac{1}{2}\) have appeared. These are necessary because of the requirement that that \(\p_z (z) = \p_{\bar{z}} (\bar{z}) = 1\). In an alternate universe both sides might be given \(\frac{1}{\sqrt{2}}\) factors instead, but they weren’t.</p>
<p>There are other parameterizations of \(\{ z, \bar{z} \}\) in terms of \(\bb{R}^2\) coordinates. The most common choice is polar coordinates: \(z = re^{i \theta}\) and \(\bar{z} = r e^{-i \theta}\). Then the forms are:</p>
\[\begin{aligned}
dz &= e^{i \theta} (dr + i r d \theta) \\
d\bar{z} &= e^{-i \theta} (dr - i r d \theta)
\end{aligned}\]
<p>Then the partial derivatives would be:</p>
\[\begin{aligned}
\p_z &= \frac{e^{-i \theta}}{2} (\p_r - \frac{i}{r} \p_\theta) \\
\p_{\bar{z}} &= \frac{e^{i \theta}}{2} (\p_r + \frac{i}{r} \p_\theta)
\end{aligned}\]
<p>Although these don’t come up very much. Note that any function that explicitly uses \(r\) or \(\theta\) has a \(\bar{z}\) dependency unless they cancel it out somehow, since both \(r\) and \(\theta\) do:</p>
\[\begin{aligned}
r &= \sqrt{z \bar{z}} \\
\theta &= - \frac{i}{2} \log \frac{z}{\bar{z}}
\end{aligned}\]
<hr />
<h2 id="2-holomorphic-functions">2. Holomorphic functions</h2>
<p>Complex analysis is mostly concerned with doing calculus on functions in \(\bb{C}\), so we are interested in differentiable functions of \(z\). Being complex differentiable means that \(f(z)\) has a derivative that is itself a complex number (when regarded as part of \(\bb{R}^2\)): \((f_x, f_y) \in \bb{C}\).</p>
<p>The <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Riemann_equations">Cauchy-Riemann equations</a> tell you when a complex function \(f(z) = u(x+iy) + i v(x + iy)\) is complex-differentiable:</p>
\[\begin{aligned}
u_x = v_y\\
u_y = - v_x
\end{aligned}\]
<p>This really just expresses the idea that \(f\) has no derivative with respect to \(\bar{z}\):</p>
\[\begin{aligned}
\p_{\bar{z}} f(z)
&= \frac{1}{2} (f_x + i f_y) \\
&= u_x + i v_x + i u_y - v_y \\
&= (u_x - v_y) + i (v_x + u_y) \\
&= 0 + i 0
\end{aligned}\]
<p>(\(\p_{\bar{z}} f(z) = 0\) is a much better way to write this. The Cauchy-Riemann version should be deprecated.)</p>
<p>As long as \(f\) is continuous and this condition is true in a region \(D\), operations on \(f(z)\) essentially work like they would for one-variable functions in \(z\). For instance \(\p_z (z^n) = n z^{n-1}\).</p>
<p>While \(z\) seems like a 2-dimensional variable, there’s only one ‘degree of freedom’ in the derivative of a complex function. \(f'(z)\) has to be a simple complex number, which rotates and scales tangent vectors uniformly (a <a href="https://en.wikipedia.org/wiki/Conformal_map">conformal map</a>):</p>
\[f(z + dz) \approx f(z) + f'(z) dz = f(z) + re^{i\theta} dz\]
<p>Functions which are complex-differential at every point within a region are called <a href="https://en.wikipedia.org/wiki/Holomorphic_function">holomorphic</a> (‘holo’ is Greek for ‘whole’) or regular in that region for some reason. A function \(f(z)\) that is holomorphic in a region \(D\) is extremely well-behaved in that region:</p>
<ul>
<li>\(f\) is <em>infinitely</em> complex-differentiable in \(D\)</li>
<li>and \(f\) is ‘complex analytic’, ie equal to its Taylor series in \(z\) throughout \(D\). The series around any particular point converges within the largest circular disk that stays within \(D\).</li>
<li>and \(f\) is locally invertible, ie \(f^{-1}(w + dw) \approx z + (f'(z))^{-1} dw\) exists and is holomorphic in the neighborhood of \(w = f(z)\).</li>
<li>its antiderivatives exist, and its integrals along any closed contour \(C\) inside \(D\) vanishes: \(\oint_C f(z) dz = 0\).</li>
<li>the data of \(f\) in \(D\) is fully determined by its values on the boundary of the region, or on any one-dimensional curve within \(D\), or on any nontrivial subregion of \(D\), in the sense that its Taylor series can be computed on a subset of the space and then will give the correct value throughout \(D\) (possibly via <a href="(https://en.wikipedia.org/wiki/Analytic_continuation)">analytic continuation</a>).</li>
</ul>
<p>The general theme is that holomorphic/analytic functions generally act like one-dimensional functions and all of the calculus is really easy on them. This tends to be true much more than it is for 1d calculus.</p>
<p>If two analytic functions defined on <em>different</em> regions nonetheless agree on an overlapping region, they are in a sense the ‘same function’. This means that you can “analytically continue” a function by finding other functions which agree on an overlapping line or region. A simple use of this is to ‘glue together’ Taylor expansions around different points to go around a divergence. The <a href="https://en.wikipedia.org/wiki/Riemann_zeta_function">Riemann Zeta function</a> is a famous example of a function which has an interesting analytic continuation: the function is easily defined on the positive real axis where \(x>1\), but the famous <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis">Riemann Hypothesis</a> concerns zeroes of its analytic continuation elsewhere on \(\bb{C}\).</p>
<p>Most 1d functions like \(e^x\) and \(\sin x\) have holomorphic complex versions like \(e^z\) and \(\sin z\) that are analytic everywhere. Discontinuous functions like \(\|z\|\) or \(\log z = i \theta \ln r\), or functions that include an explicit or implicit \(\bar{z}\) dependency, fail to be analytic somewhere.</p>
<p>Complex differentiability fails at singularities. We categorize the types:</p>
<ul>
<li><em>poles</em> of order \(n\), around which \(f(z) \sim 1/z^n\), which are ‘well-behaved’ singularities. Around these there’s a region where \(1/f\) is analytic. ‘Zeros’ and ‘poles’ are dual in the sense that \(f \sim z^n\) at zeroes and \(f \sim 1/z^n\) at poles.</li>
<li><em>removable singularities</em>: singularities that can be removed by redefinition, probably because they’re an indeterminate form. The canonical example is \(\sin(z)/z\) which is repaired by defining \(\sin(0)/0 = 1\). In a sense these are not singularities at all, they’re just poorly handle by our notation.</li>
<li><em>essential singularities</em>: singularities which oscillate infinitely rapidly near a point, such that they are in a sense too complicated to handle by normal methods. \(\sin(1/z)\) or \(e^{1/z}\) are the canonical examples. They all look like this, oscillating infinitely: <a href="https://en.wikipedia.org/wiki/Picard_theorem">Great Picard’s Theorem</a> (what a name) says that near an essential singularity the function takes every value infinitely times, except possibly one.</li>
</ul>
<p>Poles are much more interesting than the other two.</p>
<hr />
<h2 id="3-residues">3. Residues</h2>
<p>No one would really care about complex analysis except for, well, analysts, were it not for one suspicious fact about the complex derivatives:</p>
\[\p_{\bar{z}} \frac{1}{z} \neq 0\]
<p>(Make sure you see that that’s a \(\bar{z}\)-derivative.)</p>
<p>For some reason, for <em>only</em> \(n=-1\), \(z^n\) has a certain kind of divergence at \(z=0\). It looks like a 2d <a href="https://en.wikipedia.org/wiki/Dirac_delta_function">delta <strike>function</strike> distribution</a>:</p>
\[\p_{\bar{z}} \frac{1}{z} = 2 \pi i \delta (z)\]
<p>Meaning tha \(\p_{\bar{z}} \frac{1}{z} = 0\) unless \(z = 0\), in which case it has the value \(2 \pi i\).</p>
<p>[By the way, this is intrinsically related to the fact that we’re doing calculus in 2d. It is really a skew way of writing the more fundamental fact that \(\oint d \theta = 2 \pi\) if you integrate around the origin, combined with the fact that \(\frac{1}{z} dz\) cancels out its own \(\theta\) dependence. It’s related to the 1-dimensional formula \(\p_x \log x = \frac{1}{x} + i \pi \delta(x)\), and there are versions in higher dimensions as well. Physicists are familiar with the 3d case without always realizing it: the Maxwell equation \(\nabla \cdot E = \rho\) applied to a point charge only works if \(\nabla \cdot \frac{\hat{r}}{r^2} = 4 \pi \delta(r)\). More on that another time, I hope.]</p>
<p>This is equivalent to saying that the contour integral (integral on a closed path) of \(1/z\) around the origin is non-zero:</p>
\[\begin{aligned}
\oint \frac{1}{z} dz &= \oint \frac{e^{i \theta} dr + ir e^{i\theta} d \theta }{r e^{i \theta}} \\
&= \oint \frac{dr}{r} + i d \theta \\
&= 2 \pi i
\end{aligned}\]
<p>It’s clear why this non-zero contour only holds for \(z^{-1}\): for any other \(z^n\), the \(d \theta\) term is still a non-constant function of \(\theta\), so its values on each end cancel out. For \(n=-1\), though, the \(d \theta\) just counts the total change in angle.</p>
<p>The delta-function version follows from Stoke’s theorem. Since the contour integral gives the same value on any path as long as it circles \(z=0\), the divergence must be fully located at that point:</p>
\[\begin{aligned}
\oint_{\p D} \frac{1}{z} dz &= \iint_D d(\frac{dz}{z}) \\
2\pi i &= \iint_D \p_{\bar{z}} \frac{1}{z} d \bar{z} \^ dz \\
2 \pi i \iint_D \delta(z, \bar{z}) d\bar{z} \^ dz &= \iint_D \p_{\bar{z}} \frac{1}{z} d \bar{z} \^ dz \\
\p_{\bar{z}} \frac{1}{z} &\equiv 2 \pi i \delta(z, \bar{z})
\end{aligned}\]
<hr />
<p>A function that is holomorphic except at a set of poles is called <em>meromorphic</em> (‘mero-‘ is <a href="https://www.etymonline.com/search?q=mero-">Greek</a>, meaning ‘part’ or ‘fraction’). If we integrate a meromorphic function around a region \(D\) the result only contains contributions from the \(\frac{1}{z}\) terms. Around each order-1 pole at \(z_k\), \(f(z_k)\) has a series expansion that looks like \(f(z_k) \sim f_{-1} \frac{1}{z_k} + f^{*}(z_k)\) where \(f^{*}(z_k)\) has no \(z^{-1}\) term. A clever calculist then realizes that a contour integral around a region can be computed <em>only</em> from the values of the \(z_{-1}\) terms at each pole.</p>
<p>The \(f_{-1}\) series coefficients at each pole \(z_k\) are for some reason called <a href="https://en.wikipedia.org/wiki/Residue_theorem">residues</a> and are written as \(\text{Res}(f, z_k)\). Thus we can transform a contour integral like this:</p>
\[\int_{\p D} f(z) dz = 2 \pi i \sum_{z_k} I(\p D, z_k) \text{Res} (f, z_k)\]
<p>Where \(I(\p D, z_k)\) gives the <a href="https://en.wikipedia.org/wiki/Winding_number">winding number</a> around the order-1 pole (+1 for single positive rotation, -1 for a single negative rotation, etc).</p>
<p>This makes integration of analytic functions around closed contours <em>really easy</em>. You can often just eyeball them:</p>
\[\oint_{\p D} \frac{1}{z-a} dz = (2\pi i) 1_{a \in D}\]
<p>(\(1_{a \in D}\) is an <a href="https://en.wikipedia.org/wiki/Indicator_function">indicator function</a> which equals \(1\) if \(a \in D\) and \(0\) otherwise.)</p>
<p>Multiplying and dividing powers of \((z-a)\) and then integrating around a curve containing \(a\) allows you to extract any term in the Taylor series of \(f(z)\) around \(a\):</p>
\[f_n = f^{(n)}(z)_{z=a} = \frac{n!}{2 \pi i} \oint f(z) (z-a)^{n-1} dz\]
<p>This is called <a href="https://en.wikipedia.org/wiki/Cauchy%27s_integral_formula">Cauchy’s Integral Theorem</a>. When negative terms are present the Taylor series is instead called a <a href="https://en.wikipedia.org/wiki/Laurent_series">Laurent Series</a>.</p>
\[\begin{aligned}
f(z) &\approx \sum f_n \frac{(z-a)^n}{n!} \\
&= \ldots + \frac{f_{-1}}{z} + f_0 + f_{1} z + f_2 \frac{z^2}{2!} + \ldots
\end{aligned}\]
<p>In particular the value at \(z=a\) is fully determined by the contour integral with \((z-a)^{- 1}\):</p>
\[f(a) = f_0 = \frac{1}{2 \pi i} \oint \frac{f(z)}{z-a} dz\]
<p>You can, of course, formulate this whole thing in terms of \(f(\bar{z})\) and \(\frac{dz}{\bar{z}}\) instead. If a function isn’t holomorphic in either \(z\) or \(\bar{z}\), you can still do regular \(\bb{R}^2\) calculus in two variables \(f(z, \bar{z})\), although I’m not sure how you would deal with poles.</p>
<p>By the way, there is a remarkable duality between zeroes and poles. In the region of a pole of a function \(f\), the function behaves like \(\frac{1}{g}\) where \(g\) is an analytic function. In general a meromorphic function can be written as \(f= \frac{h}{g}\) where \(g,h\) are analytic, with the zeroes of \(g\) corresponding to the poles of \(f\).</p>
<hr />
<h2 id="4-integral-tricks">4. Integral tricks</h2>
<p>If you stare at the “calculus of residues” long enough you’ll realize that, although they deal complex-valued functions, you can pull some tricks that allow them to be used to solve real-valued integrals. Even if you never look at complex analysis again, you’ll still occasionally see complex analytic trickery come up in solving <a href="https://math.stackexchange.com/questions/562694/integral-int-11-frac1x-sqrt-frac1x1-x-ln-left-frac2-x22-x1/563063">otherwise annoying integrals</a>.</p>
<p>For starters, note that closed integrals of a function with a Laurent series can be eyeballed using the Cauchy integral formula:</p>
\[\begin{aligned}
\oint_{r=1} \frac{1}{z(z-3)} dz &= \oint_{r=1} \frac{1}{3} \frac{1}{z-3} - \frac{1}{3}\frac{1}{z} dz \\
&= 2 \pi i \frac{1}{3} (-1) \\
&= - \frac{2}{3} \pi i
\end{aligned}\]
<p>Now how can we apply that to an integral in \(\bb{R}\)?</p>
<p>Integrals along the real line \(\int_{-\infty}^{\infty}\) can often be computed by ‘closing the contour’ at \(r = \infty\). This is especially easy if the integrand vanishes at \(r=\infty\), because the whole term just drops out, but it’s also enough if it’s just easy to compute there.</p>
<p>For instance, in this integral we deduce that the integral from \((-\infty, \infty)\) equals the integral on the closed contour that adds in a section from \(r = +\infty\) to \(r = -\infty\) by varying \(\theta \in (0, \pi)\), because the integrand is \(0\) on that whole arc:</p>
\[\begin{aligned}
\int_{-\infty}^{\infty} \frac{1}{1 + x^2} dx &= \int_{r = -\infty}^{r = \infty} \frac{dz}{1 + z^2} + \int_{\theta=0, \, r=\infty}^{\theta=\pi, \, r=\infty} \frac{dz}{1 + z^2} \\
&= \oint \frac{1}{z - i} \frac{1}{z + i} dz \\
&= (2 \pi i) \text{Res}(z=i, \frac{1}{z - i} \frac{1}{z + i}) \\
&= 2\pi i \frac{1}{2i} \\
&= \pi
\end{aligned}\]
<p>Here we closed the contour around the upper-half plane, upon which the integrand is \(0\) due to the \(r^2 \ra \infty\). One pole is the upper-half plane and one is in the lower. The winding number around the upper is \(+1\) and the residue is \(\frac{1}{z+i}\) evaluated at \(z=i\), or \(1/2i\). If we had used the lower half-plane the winding number would have been \(-1\) and the residue \(-1/2i\), so the result is independent of how we closed the contour. This method gives the answer very directly without having to remember that \(\int \frac{dx}{1 + x^2} = \tan^{-1} x\) or anything like that.</p>
<p>(Note that this wouldn’t work if the pole was <em>on</em> the path of integration, as in \(\int_{-\infty}^{+\infty} \frac{1}{x} dx\). This integral is the <a href="https://en.wikipedia.org/wiki/Cauchy_principal_value">Cauchy Principal Value</a> and is in a sense an indefinite form like \(0/0\) whose value depends on the context. More on that another time.)</p>
<p>Many other integrals are solvable by choosing contours that are amenable to integration. Often choices that keep \(r\) or \(\theta\) constant are easiest. See Wikipedia on <a href="https://en.wikipedia.org/wiki/Contour_integration">contour integration</a> for many examples.</p>
<hr />
<h2 id="5-topological-concerns">5. Topological concerns</h2>
<p>There are some tedious things you have to account for when considering functions of \(z\).</p>
<p>First, the \(\theta\) variable is discontinuous, since \(\theta = 0\) and \(\theta = 2\pi\) refer to the same point. This means that inverting a function of \(\theta\) will produce a <a href="https://en.wikipedia.org/wiki/Multivalued_function">multi-valued function</a>:</p>
\[\log e^{i \theta} = i \theta + 2 \pi i k_{\in \bb{Z}}\]
<p>Smoothly varying \(\theta = \int d \theta\) of course will just continue to tick up based on the path of integration: \(2\pi, 4\pi, 6\pi\), etc. But the \(\log\) function itself appears to have a discontinuity of \(2\pi i\) at \(\theta = 0\).</p>
<p>When dealing with these multi-valued functions you can consider \(\theta = 0\) as a ‘branch point’ — a place where the function becomes multi-valued. But to be honest the whole theory of branch points isn’t very interesting. I prefer to just think of all math being done modulo \(2 \pi\), that is, the type of the value \(\log \theta\) is topogically a circle, rather than a number on the real line.</p>
<p>Another topological interest in \(\bb{C}\): if you ‘join together’ the points at infinity in every direction by defining a symbol \(\infty\) such that \(1/0 = \infty\), you get the “extended complex plane” or the <a href="https://en.wikipedia.org/wiki/Riemann_sphere">Riemann sphere</a>, since it is topologically shaped like a sphere. Most of the things that seem like they should be true involving \(\infty\) are true in this case. For example, the asymptotes of \(\frac{1}{z}\) on either size of \(\| z \| = 0\) really <em>do</em> connect at infinity and come back on the other side.</p>
<p>The Riemann sphere is topologically like a sphere, but acts like a <em>projective</em> plane, which is a bit unintuitive. (It turns out to corresponds rather to a half sphere where antipodal points are considered equivalent). Particularly, it kinda seems like \(+r\) and \(-r\) should be different points, rather than the ‘same’ infinity. There is probably a resolution to this using <a href="https://en.wikipedia.org/wiki/Oriented_projective_geometry">oriented projective geometry</a>, defining the back half of the sphere as a second copy of \(\bb{C}\) and conjoining the two at \(\infty e^{i \theta} \lra -\infty e^{i\theta}\), but that’s not worth discussing further here, and it’s usually not mentioned in complex-analysis treatments.</p>
<p>Complex analytic functions map the Riemann sphere to itself in some way. For instance, \(z \mapsto \frac{1}{z}\) swaps \(0\) and \(\infty\) and the rest of the sphere comes along for the ride. Powers of \(z\) cause the mapping to be \(m:n\) — so \(z^2\) maps two copies of the sphere to one copy, while \(z^{1/2}\) maps one copy to two copies, hence becoming multi-valued. The <a href="https://en.wikipedia.org/wiki/M%C3%B6bius_transformation">möbius transformations</a>, functions of the form \(\frac{az + b}{cz + d}\) with \(ad-bc \neq 0\), are the invertible holomorphic transformations of the Riemann sphere; they comprise dilations, rotations, reflections, and inversions of \(\bb{C}\).</p>
<hr />
<h2 id="6-convergence-concerns">6. Convergence concerns</h2>
<p>Just like in real-variable calculus, Taylor series in \(\bb{C}\) do not always converge, and the region of convergence is determined by the distance to the closest pole. Unlike real-variable calculus, this makes a lot more sense — since in \(\bb{R}\) you still had to account for convergence failures due to poles that were on the complex part of the plane! Many of the ‘convergence tests’ that are learned in intro calculus make a lot more sense in $$\bb{C}$ as well.</p>
<p>In short, a series \(f(z-a) \approx f_0 + f_1 (z-a) + f_2 \frac{(z-a)^2}{2!} + \ldots\) only converges within circles around \(a\) that do not contain any poles, and thus what we call the ‘radius of convergence’ is the distance to the nearest pole in \(\bb{C}\). For instance, \(\frac{1}{1 + x^2}\) around \(x=0\) has radius of convergence \(r=1\) since there are poles at \(\pm i\).</p>
<p>Sometimes we can expand a series around different points to work around this. Amusingly, you can keep changing the point you expand around to ‘go around’ a pole, producing an analytic continuation outside the radius of the initial Taylor series.</p>
<p>It is occasionally useful to expand a function as a Taylor series around \(z=\infty\), by creating a series in \(1/z\) instead:</p>
\[\frac{1}{1-z} = \begin{cases}
1 + z + z^2 + \ldots & \| z \| < 1 \\
-\frac{1}{z} - \frac{1}{z^2} - \frac{1}{z^3} - \ldots & \| z \| > 1
\end{cases}\]
<p>The simplest way to show that a series converges is to show that the series still converges if \(z\) is replaced with \(r = \|z\|\), since</p>
\[f(z) = a_0 + a_1 z + a_2 z^2 + \ldots \leq f(r) = a_0 + a_1 r + a_2 r^2 + \ldots\]
<p>After all, the phases of the \(z\) terms can only serve to reduce the sums of the magnitudes.</p>
<p>We know that geometric series \(1 + x + x^2 + \ldots\) converge if and only if \(x < 1\). This lets us quickly concoct the ‘best’ of the convergence tests: a series converges in regions where the magnitudes of its terms converge like a geometric series. That is, \(\sum a_n r^n\) definitely converges if the terms look like \(\sqrt[n]{\| a_n r^n \|} = \sqrt[n]{\| a_n \| } r \lt 1\). This gives the <a href="https://en.wikipedia.org/wiki/Root_test">root test</a> for convergence:</p>
\[R = \frac{1}{\lim_{n \ra \infty} \sup \sqrt[n]{\| a_n \| }} > 0\]
<p>If \(R = 0\) the series still might converge, depending on what the phases of \(a_n\) do! — if they all point the same way it doesn’t, but if they point every which way it might. If \(r > R\) it definitely doesn’t. If \(R = \infty\) then the series converges for all finite \(\|z\|\) and is called an ‘entire’ function, which is a weird name but there you go.</p>
<p>The root test is the most powerful of the simple convergence tests, because it hits <em>exactly</em> on the property that \(\sum \| a_n \| r^n\) converges if its magnitudes are a convergent geometric sum like \(\sum x^n\). The other tests that you might have heard of all ‘undershoot’ this property: for instance the “ratio test” says that</p>
\[\| \frac{a_n r^n }{a_{n+1} r^{n+1}} \| = \| \frac{a_n}{a_{n+1}} \| \frac{1}{r} < \| \frac{x^n}{x^{n+1}} \| = 1\]
<p>This captures the idea that the series does converge if its successive ratios are less than that of a geometric series, but for example it it fails if the terms look like \(x + x + x^2 + x^2 + x^3 + x^3 + \ldots\) or something.</p>
<hr />
<h2 id="7-global-laurent-series">7. Global Laurent Series</h2>
<p>This is my own idea for making divergence of Laurent series more intuitive.</p>
<p>Laurent series coefficients are derivatives of the function evaluated at a particularly point, like \(f^{(n)}(z=0)\), such that a whole Laurent series is</p>
\[f(z) = \ldots + f^{(-2)}(0) \frac{2! }{z^2} - f^{(-1)}(0) \frac{1!}{z} + f(0) + f^{(1)} z + f^{(2)}(0) \frac{z^2}{2!} + \ldots\]
<p>Suppose that for some reason the Cauchy forms of computing derivatives and ‘inverse’ derivatives are the ‘correct’ way to compute these values:</p>
\[\begin{aligned}
f(0) &= \frac{1}{2\pi i} \oint_{C} \frac{f(z) dz}{z} \\
\frac{f^{(n)}(0)}{n!} &= \frac{1}{2\pi i} \oint_{C} \frac{f(z) dz}{z^{n+1}} \\
(-1)^n n! f^{(-n)}(0) &= \frac{1}{2\pi i}\oint_{C} z^{n-1} f(z) dz \\
\end{aligned}\]
<p>Where \(C\) is a circle of radius \(R\) around \(z=0\). Then some handwaving leads to an alternate characterization of divergent series. For most calculations \(\p_z f(0)\) is independent of the choice of \(C\), but for a function with a pole away from the origin, they are not. Consider \(f(z) = \frac{1}{1-z}\), and let \(C\) be the positively oriented circle of fixed radius \(R\). Then:</p>
\[\begin{aligned}
f_R(0) &= \frac{1}{2\pi i}\oint_{C} \frac{1}{(z)(1-z)} dz \\
&= \frac{1}{2\pi i}\oint_{C} \frac{1}{z} + \frac{1}{1-z} dz \\
&=\text{Res}_C (z=0, \frac{1}{z} - \frac{1}{z-1}) + \text{Res}_C (z=1, \frac{1}{z} - \frac{1}{z-1}) \\
&= 1 - H(R-1) \\
\end{aligned}\]
<p>Where \(H\) is a <a href="https://en.wikipedia.org/wiki/Heaviside_step_function">step function</a> \(H(x) = 1_{x > 0}\). The value of \(f(0)\) changes depending on the radius we ‘measure’ it at. The derivative and integral terms show the same effect, after computing some partial fractions:</p>
\[\begin{aligned}
f_R'(0) &= \frac{1}{2\pi i}\oint_{C} \frac{1}{(z^2)(1-z)} dz \\
&= \frac{1}{2\pi i}\oint_{C} \frac{1}{z} + \frac{1}{z^2} - \frac{1}{z-1} dz \\
&= 1 - H(R-1) \\
f^{(-1)}_R(0) &= \frac{1}{2\pi i}\oint_{C}\frac{- 1}{z-1} dz \\
&=-H(R-1)
\end{aligned}\]
<p>In total we get, using \(H(x) = 1 - H(-x)\):</p>
\[f^{(n)}_R(0) = \begin{cases}
H(1-R) & n \geq 0 \\
- H(R-1) & n < 0
\end{cases}\]
<p>Which gives the ‘real’ Laurent series as:</p>
\[\frac{1}{1-z} = - (\; \ldots + z^{-2} + z^{-1}) H(\|z\| - 1) + (1 + z + z^2 + \ldots) H(1 - \|z\|)\]
<p>The usual entirely-local calculations of \(f'(z)\), etc miss the ‘global’ property: that the derivative calculations fail to be valid beyond \(R=1\), and a whole different set of terms become non-zero, which correspond to expansion around \(z=\infty\).</p>
<p>Which if you ask me is very elegant, and very clearly shows why the radius of convergence of the conventional expansion around \(z=0\) is the distance to the closest pole. Of course it is a bit circular, because to get this we had to choose to use circles \(C\) to measure derivatives, but that’s ok.</p>
<hr />
<p>Anyway, in summary, please use \(\bb{R}^2\) instead of \(\bb{C}\) if you can. Thanks.</p>
The essence of quantum mechanics2020-07-24T00:00:00+00:00https://alexkritchevsky.com/2020/07/24/qm<p>Here’s what I know about QM. I’m trying to learn QFT and it helps to have the prerequisites compressed into the simplest possible representation. It also helps me to write everything down in a compressed form so I can reference it more easily.</p>
<p>This will make no sense if you don’t already have a good understanding of quantum mechanics.</p>
<p>Conventions: \(c = 1\), \(g_{\mu \nu} = diag(+, -, -, -)\). I like to write \(S_{\vec{x}}\) for \(\nabla S\).</p>
<!--more-->
<hr />
<h2 id="1-wavefunction-solutions">1. Wavefunction solutions</h2>
<p>QM makes a lot more sense to me if you (a) handle everything relativistically from the start and (b) just assume the form of the wave function solutions instead of deriving them. If I had my way I’d start a quantum mechanics course with special relativity, followed by introducing the scalar wave function, like this:</p>
<p>Consider a function on spacetime with the form \(\psi(t, \vec{x}) = e^{ i S(t, \vec{x})/\hbar}\) which assigns a complex phase to every point. It is fully determined by the <strong>action</strong> \(S(\vec{x}, t)\), and in particular given an initial state \(\psi_0\), is determined by the action gradient \(dS = S_{\mu} dx^\mu\). This lets us compare quantum states by integrating over some path \(\Gamma\):</p>
\[\psi(t, \vec{x}) = e^{i/\hbar \int_{\Gamma} dS} \psi_0\]
<p>Later on when potentials are involved we will need to be specific about the path of integration, but for now we can think of \(S\) as a scalar function that determines \(\psi\) everywhere.</p>
<p>Relativistic invariance insists that \(\psi\) have the same value in any reference frame, and \(- i \hbar \p \psi = - i \hbar (\p S) \psi = (S_t, S_{\vec{x}}) \psi\) must be a covariant 4-vector. Contraction with \(\bar{\psi}\) extracts the vector components: \(\< \psi \| {- i \hbar \p} \| \psi \> = \bar{\psi} (S_t, S_{\vec{x}}) \psi = (S_t, S_{\vec{x}})\). Finally, \(\| (S_t, S_{\vec{x}}) \| = \sqrt{S_{t}^2 - S_{\vec{x}}^2}\) must be a Lorentz-invariant scalar.</p>
<p>We call \(i \hbar \p_t = \hat{E}\) and \(- i \hbar \p_x = \hat{P}\) the <strong>energy</strong> and <strong>momentum</strong> operators. The quantum mechanical operators apparently extract properties of \(S\), but because \(S\) is packed inside an exponential, they extract them as eigenvalues: \(i \hbar \p_t \psi = - S_t \psi\). Our quantum-mechanical inner product and our operators are just <em>tools for extracting properties of \(S\)</em>, since \(\psi\) is the only thing we can directly operate on. When an equation like the Schrödinger equation contains a \(\hat{P} = - i \hbar \p_x\) operator, it’s just a skew way of writing the \(p_x\) value.</p>
<p>Since quantum mechanical measurements only happen through operators like these, the exact values of \(\psi\) up to a phase, and therefore \(S\) up to a constant, are not physically observable.</p>
<p>For a free massive spinless particle the action is \(S = - p_{\mu} x^{\mu} = \int -p_\mu dx^{\mu}\), where \(p\) is the four-momentum and \(\| p \| = m\), the rest mass. In the rest frame this is simply \(S = -m \tau = - \int m d\tau\). In the absence of a potential this gives the wave function:</p>
\[\psi(x) = e^{- i/\hbar \int_{0}^{x} p_\mu dx^\mu} \psi(0) = e^{- i/\hbar p_\mu x^\mu} \psi(0) = e^{i/\hbar ( \vec{p} \cdot \vec{x} - E t)} \psi(0)\]
<p>which is a Fourier component with momentum \(p_\mu\). Time evolution via exponentiation of the Hamiltonian amounts to translating in \(t\):</p>
\[\psi(t + \Delta t) = e^{i/\hbar \hat{H} \Delta t} \psi(t) = e^{\Delta t \p_t} \psi(t) = e^{i/\hbar (\vec{p} \cdot \vec{x} - E(t + \Delta t))} \psi_0\]
<p>(This uses the idea that exponentiating a differential operator translates in that coordinate: \(e^{a \p_x} f(x) = f(x+a)\).)</p>
<p>When an initial state is not a pure Fourier mode with a definite momentum, we expand it as a sum of modes. For instance, if at \(t=0\) we measure an electron at \(\vec{x} = 0\), then the initial state is</p>
\[\psi(0, \vec{x}) = \delta(\vec{x}) = \int e^{i \vec{p} \cdot \vec{x}} d \vec{p}\]
<p>When potentials are involved, \(dS\) is modified. The electromagnetic field, for instance, enters as \(p \mapsto p - i qA\), so \(dS = (p_{\mu} - i q A_{\mu}) dx^{\mu}\). Depending on the field configuration we may no longer be able to easily integrate \(\int dS\): if \(A\) includes a current, then it contains a ‘line’ of divergence, and the path integral’s result will depend on how many times \(\Gamma\) circles this divergence. This causes the path integral to give <em>different</em> values based on the choice of path. Summing over these paths, with appropriate weighting, corresponds in QFT to summing over the number of photons that are exchanged (I think. Will work it out in detail when I get to QFT).</p>
<hr />
<h2 id="2-correspondences">2. Correspondences</h2>
<p>Many concepts in quantum mechanics follow naturally from this foundation:</p>
<p><strong>Mass</strong>: For a free particle \(S_t = E\) and \(S_{\vec{x}} = \vec{p}\), and \(m = \sqrt{E^2 - p^2}\) is the relativistic rest energy/momentum relation. The wave function looks like \(\psi = e^{i/\hbar \int \vec{p} \cdot d\vec{x} - E dt} \psi_0\). A high energy/momentum corresponds to a rapidly changing action, and thus to a wave function that is <em>quickly rotating</em> as you translate in time or space. Ultimately, the mass \(m\) corresponds to the speed of phase rotation in a particle’s rest frame, and its energy and momentum are the results of Lorentz-transforming \(dS = - m d\tau\) into other frames.</p>
<p><strong>Path integration</strong>: Relative changes in \(S\) can be found by integrating: \(S(f) - S(i) = \int_{\Gamma} dS\) along any curve \(\Gamma\) from \(i\) to \(f\), and \(\psi(f) = e^{i/\hbar (S(f) - S(i))} \psi(i)\). Thus \(e^{i/\hbar (S(f) - S(i))}\) is the ‘transition matrix’ between any two states, along a given path. The total transition amplitude is a sum over all possible paths between two states. This extends handily to QFT’s path integrals when creation/annihilation of particles is included.</p>
<p><strong>The roles of \(\hbar\) and \(i\)</strong>: \(S \mapsto e^{iS / \hbar }\) is the conversion from ‘action’ space to ‘phase’ space. \(\hbar\) changes units from action (energy \(\times\) time) to radians; if we set \(\hbar = 1\) we are declaring that we measure action in radians. The resulting space after mapping by \(e^{iS}\) is physically meaningful, because in some cases we’ll end up summing these phase factors from multiple starting states and seeing interference patterns. I suspect that the output space is the \(U(1)\) that is identified with the electromagnetic gauge field but am not sure. If so, I think it would be good to write \(R_{EM}\) instead of \(i\), in order to avoid accidentally conflating the \(i\) factors from rotations in different spaces.</p>
<p><strong>Angular momentum</strong>: The orbital angular momentum operator, \(\hat{L}_z = -i \hbar \p_{\phi}\), does the same thing as \(\hat{P} = - i \hbar \p_{\vec{x}}\) but for a wave function in spherical coordinates. The azimuthal angle term looks like \(\psi \sim e^{i/\hbar (l_z \phi - E t)}\), and \(\hat{L}_z \psi = l_z \psi\). The azimuthal quantum number \(l_z\) (often written \(m\)) measures how many times \(\psi\) oscillates in a rotation of the polar angle \(\phi\); it is quantized precisely because the \(\phi\) coordinate has a built-in periodicity. A \(z\)-angular momentum value of \(l_z\) labels the number of periods the wave makes as you rotate \(\phi\) in the \(z\)-plane.</p>
<p><strong>Spin-\(\frac{1}{2}\)</strong>: If \(l_z = 1/2\), then \(\psi_{\pm} \sim e^{i/\hbar (\pm \frac{1}{2} \phi - E t)}\) acts like a spinor (by modeling the spin as orbital angular momentum, and omitting the \(r\) and \(\theta\) components). This function appears trivially unphysical, since it has different values at \(\phi = 0\) vs \(\phi = 2 \pi\). The resolution is the fact that it’s only meaningful to use the wave function to <em>compare</em> states that are connected by a path – and for a spinor it’s correct that \(\< \psi(\phi = 2 \pi) \| \psi(\phi = 0) \> = -1\). (This is a useful mental model but isn’t the full story. My next post will be a truly exhaustive exploration of spinors.) (Much later edit: this next post got very difficult for me to finish. Hopefully I can get back to it someday.)</p>
<p><strong>Spin-\(1\)</strong>: A <em>vector</em>-valued wave function \(\vec{\psi} = (\psi_x, \psi_y, \psi_z)\), where the terms transform according to physical rotations, is a spin-1 object. To consider its \(z\)-angular momentum we change bases to a <a href="Spherical_basis">spherical basis</a> (not to be confused with spherical coordinates):</p>
\[(\hat{x}, \hat{y}, \hat{z}) \ra (\frac{\hat{x} + i \hat{y}}{\sqrt{2}}, \hat{z}, \frac{\hat{x} - i \hat{y}}{\sqrt{2}})\]
<p>Or in cylindrical coordinates, using \(\hat{x} = (\cos \phi )\hat{\rho} - (\rho \sin \phi )\hat{\phi}\) and \(\hat{y} = (\sin \phi) \hat{\rho} + (\rho \cos \phi) \hat{\phi}\):</p>
\[= (\frac{e^{i \phi} (\hat{\rho}+ i \rho \hat{\phi})}{\sqrt{2}}, \hat{z}, \frac{e^{- i \phi} (\hat{\rho} - i \rho \hat{\phi})}{\sqrt{2}})\]
<p>The coordinates of \(\vec{\psi}\) in this basis are:</p>
\[(\psi_{+1}, \psi_0, \psi_{-1}) = (\frac{\psi_x - i \psi_y}{\sqrt{2}}, \psi_z, \frac{\psi_x + i \psi_y}{\sqrt{2}})\]
<p>In the new basis, the coordinate vectors have an explicit \(\phi\)-dependence, which captures the idea that any vector-valued function has an <em>intrinisic</em> \(\phi\)-derivative, independent of reference frame, just by virtue of being a vector. (This is kinda obvious in hindsight, but it took me forever to understand.)</p>
<p>So the components of a vector wave function \(\vec{\psi}\) locally looks like \(\psi_{s_z}(\phi, \rho, z, t) \sim e^{i \hbar (s_z + l_z) \phi } \psi_{s_z}(\rho, z, t)\), where \(l_z\) is its orbital angular momentum and \(s_z \in (+1, 0, -1)\) is a frame-independent contribution just from its vectorial nature. The \(0\) component of \(L_z\) spin corresponds to a vector-valued wave function pointing only in the \(z\) direction. \(\pm 1\) components correspond to having \(x\) or \(y\) components, with the sign determined by their relative phase.</p>
<p>Note what it means to have spin \(1\): it’s not that it fixes the <em>value</em> of the angular momentum; rather, it specifies the different ways that the angular momentum can transform under rotation. The three choices determine whether \(\vec{\psi}\) is in the \(z\) direction \((s_z = 0)\) or whether it has a positive or negative ‘rotational’ components in the \(xy\) plane (\(s_z = \pm 1\)). Particularly, having angular momentum \(s_z = +1\) means that the \(y\) component is advanced in phase compared to the \(x\) component. This is why the ‘ladder’ operator \(\hat{S}_+ = \hat{S}_x + i \hat{S}_y\) serves to increase the angular momentum, because it includes a factor of \(e^{i \phi}\):</p>
\[\hat{L}_+ = (\hat{L}_x + i \hat{L}_y) \sim e^{i \phi}\]
<aside class="toggleable" id="angular" placeholder="<b>Aside</b>: Angular momentum calculations <em>(click to expand)</em>">
<p>Here are some calculations I did to make sure I wasn’t lying through my teeth here:</p>
<p>The angular momentum operators are \(\vec{x} \^ \hat{P} = - i \hbar (y \p_z - z \p_y, z \p_x - x \p_z, x \p_y - y \p_x)\), giving:</p>
\[\begin{aligned}
\hat{L}_z \psi = l_z \psi &= - i \hbar (x \p_y - y \p_x) \psi = (x p_y - y p_x) \psi \\
\end{aligned}\]
<p>etc. Another form is \(\hat{L}_z = -i \hbar \p_{\phi}\):</p>
\[\begin{aligned}
\hat{L}_z \psi &= - i \hbar \p_{\phi} \psi \\
&= -i \hbar (x_{\phi} \p_x + y_{\phi} \p_y) \psi \\
&= -i \hbar (-y \p_x + x \p_y) \psi \\
&= (x \hat{P}_y - y \hat{P}_x) \psi
\end{aligned}\]
<p>Thus a function of the form \(\psi = e^{i l_z \phi /\hbar}\) has \(\hat{L}_z \psi = l_z \psi\).</p>
<p>The \(\hat{L}_x\) and \(\hat{L}_y\) operators have less-pleasant forms in spherical coordinates:</p>
\[\begin{aligned}
\hat{L}_x &= -i \hbar ({- \sin} (\phi) \p_{\theta} - \cot(\theta) \cos (\phi )\p_{\phi}) \\
\hat{L}_y &= -i \hbar (\cos (\phi) \p_{\theta} - \cot(\theta) \sin (\phi )\p_{\phi}) \\
\end{aligned}\]
<p>The failure of commutation, such as \([\hat{L}_x, \hat{L}_z] \neq 0\), comes from the fact that this adds \(\phi\)-dependencies that will affect the \(l_z\) value.</p>
<p>Now look at the raising operator \(L_+\):</p>
\[\begin{aligned}
L_{+} &= L_x + i L_y \\
&= -i \hbar ((-\sin \phi + i \cos \phi) \p_{\theta} - \cot(\theta) (\cos \phi + i \sin \phi) \p_{phi})\\
&= -i \hbar (e^{i \phi} )(i \p_{\theta} - \cot(\theta) \p_{\phi})
\end{aligned}\]
<p>Ignoring the coefficient this produces (I’m told it’s \(\hbar \sqrt{j(j+1) - l_z (l_z+1)}\).), the reason that it raises the \(l_z\) value is the inclusion of an \(e^{i \phi}\) term, giving \(e^{i \phi} e^{i l_z \phi} = e^{i (l_z + 1) \phi}\).</p>
<p>A constant vector function is given by (in somewhat more pleasant cylindrical coordinates \((\rho, \phi, z)\)):</p>
\[\begin{aligned}
\vec{\psi} &= \psi_x \hat{x} + \psi_y \hat{y} + \psi_z \hat{z} \\
&= \frac{1}{2} (\psi_x - i \psi_y)(\hat{x} + i \hat{y}) + \psi_z \hat{z} + \frac{1}{2} (\psi_x + i \psi_y) (\hat{x} - i \hat{y}) \\
&= \frac{1}{2} \psi_{+1} e^{i \phi} (\hat{\rho}+ i \rho \hat{\phi}) + \psi_0 \hat{z} + \frac{1}{2} \psi_{-1} e^{- i \phi} (\hat{\rho} - i \rho \hat{\phi})
\end{aligned}\]
<p>Clearly \(\hat{L}_z (\psi_{+1}, \psi_0, \psi_{-1}) = (+1 \psi_{+1}, 0 \psi_{0}, -1 \psi_{-1})\).</p>
</aside>
<p>By the way, photons are spin-1 particles, but cannot have the \(s_z = 0\) state for what I currently understand as ‘complicated technical reasons’. Roughly, it goes: because photons have no rest frame, the \(s_z = 0\) value is forbidden, as that would imply that there is a choice of \(z\) around which a photon wave function is symmetric. The remaining \(s_z = \pm 1\) states correspond to photon polarizations.</p>
<p><strong>The Schrödinger Equation</strong>: We can write \(S_t^2 - S_{\vec{x}}^2 = m^2\) as \(S_t = \sqrt{m^2 + S_x^2} = m \sqrt{1 + \frac{S_x^2}{m^2}}\) and expand as a Taylor series (note that \(\| S_x/m \| = \| p / m \| \ll 1\)) to get:</p>
\[S_t = m (1 + \frac{1}{2} \frac{S_x^2}{m^2} + O((\frac{S_x^2}{m^2})^2) \approx m + \frac{S_x^2}{2m}\]
<p>Using our operator forms we get the free-particle Schrödinger equation:</p>
\[\hat{E} \psi \approx (m + \hat{P}^2/2m) \psi\]
<p>Interpreting, this says that the time-derivative of the action is a constant (the mass) plus a term proportional to the kinetic energy, plus higher-order terms that vanish at low momentums.</p>
<p>The initial \(m\) term is normally ignored in non-relativistic QM. It corresponds to a constant change in phase along any path (and adds a constant term to the Lagrangian), but it drops out of any calculation if you (a) only integrate over time and (b) never create/annihilate particles.</p>
<p><strong>Schrödinger with potential</strong>: The \(V\) term in the non-relativistic Schrödinger ends up next to the kinetic energy term: \(\hat{E} \psi \approx (m + \hat{P}^2/2m + V) \psi\). Working backwards through the derivation, we figure that the constraint on \(S\) must be: \(S_t - V = \sqrt{m^2 + S_x^2}\). But there is no particular reason this would have a clean relativistic form, since we treat our potential non-relativistically anyway.</p>
<p>Nevetheless we can add to our interpretation: the role of a classical scalar potential \(V\) is to modify the phase change as a wave function translates in time, such that the particle acts like it has energy \(E - V\) instead of \(E\). The role of a vector potential is to modify the momentum, \(\vec{p} \mapsto \vec{p} - \vec{A}\).</p>
<p>The electromagnetic field uses the 4-potential \(q A = q (\phi, \vec{A})\). The electromagnetic wave function is something like \(\psi = e^{i/\hbar \int (\vec{p} - q \vec{A}) \cdot d\vec{x} - (E - q \phi)]dt } = e^{i/\hbar \int (p_{\mu} - q A_{\mu}) dx^{\mu}}\).</p>
<p><strong>Covariant Derivatives</strong>:</p>
<p>Given the electromagnetic wave function of the form \(\psi = e^{i/\hbar \int (p_{\mu} - q A_{\mu}) dx^{\mu}}\), we can extract the \(p_{\mu}\) term with a more involved derivative operator, the ‘covariant derivative’ \(D_{\mu} = \p_{\mu} + i q A_{\mu}\), or equivalently, modifying the moment operator to be \(\hat{P}_{\mu} = \hat{p}_{\mu} + \hbar q A_{\mu}\):</p>
\[- i \hbar D_{\mu} \psi = - i \hbar (\p_{\mu} + i q A_{\mu}) \psi = p_{\mu} \psi\]
<p>This derivative manages to extract the \(p_{\mu}\) term by itself by subtracting off the \(qA\) contribution.</p>
<p><strong>Gauge transformations</strong>:</p>
<p>Since physics is determined by an action integral like \(\int( p_\mu - q A_\mu )dx^\mu\), any integrable (exact) form (some \(\Lambda\) with \(d \Lambda = 0\)) can be added to \(A\) and will only affect the action in a path-independent way:</p>
\[\int_i^f (p_\mu - q A_\mu + \Lambda_\mu) dx^\mu= P_i^f - \Lambda_i^f - q \int_i^f A_\mu dx^\mu\]
<p>The covariant derivative is so called because it produces a derivative operator, and thus a momentum operator, which respects this gauge-freedom by removing any explicit dependence on the value of \(A\). In my opinion, though, this is a very roundabout way to reach the conclusion: the explicit purpose of \(\hat{P}\) is to extract the value of \(p\), which is ultimately the thing that must obey \(p_{\mu} p^{\mu} = m^2\); the specific method of removing the gauge freedom is an implementation detail.</p>
<p>This performs a gauge transform that doesn’t affect the relative amplitudes of different paths between \(i \ra f\) – only the resulting phase. As such there is no way to observe this effect in a closed system, so the addition of \(d \Lambda\) is a free variable in the theory. However, it turns out to be important when considering interacting systems, in ways that I haven’t learned yet but will be essential in QFT.</p>
<p><strong>The Lagrangian</strong>: The integral \(\Delta S = \int dS\) can be parameterized by time as</p>
\[\Delta S = \int (S_{\vec{x}} \cdot d\vec{x}/dt - S_t) dt = \int L \, dt\]
<p>\(L = dS / dt\) is the source of the (single-particle) Lagrangian, and is where the elementary form \(L = T - V\) comes from. For a free particle, \(L dt = -m d\tau\), and \(\Delta S = - \int m d \tau\). In a classical scalar potential with \(S_t = E = T + V\):</p>
\[L = S_{\vec{x}} \cdot d\vec{x}/dt - S_t = \vec{p} \cdot \vec{v} - E\]
<p>In classical mechanics often \(E = T + V\) and \(\vec{p} \cdot \vec{v} = 2 T\), giving</p>
\[L = 2 T - (T + V) = T - V\]
<p>Regardless of how we parameterize \(S = \int dS\), applying stationary-action will give the classical trajectory. Feynman’s classic explanation of this is that all but the ‘stationary’ path – the choice of \(\Gamma\) such that \(\delta S / \delta \Gamma \vert_{\Gamma} = 0\) – will exhibit destructive interference in the macroscopic limit, resulting in the laws of classical physics. Quantitatively, this means that in the classical limit as \(\hbar \ra 0\), the path integral is dominated by the minimal path:</p>
\[\begin{aligned}
\lim_{\hbar \ra 0} \int d\Gamma e^{i S[\Gamma] /\hbar}
&= \lim_{\hbar \ra 0} \int d (\Delta \Gamma) e^{i S[\Gamma_{\text{min}} + \Delta \Gamma] /\hbar} \\
&\sim \lim_{\hbar \ra 0} e^{i S[\Gamma_{\text{min}}] /\hbar }
\end{aligned}\]
<p>I don’t exactly know how to make that rigorous but it makes heuristic sense: as \(\hbar \ra 0\) the function oscillates infinitely rapidly, cancelling itself out in the integral over \(\Delta \Gamma\), but at least the minimal path, where \(\delta S / \delta \Gamma = 0\), oscillates less than the rest do. We could imagine expanding \(S\) as a Taylor series \(S = S[\Gamma_{\text{min}}] + (\delta S / \delta \Gamma) \delta \Gamma + \ldots\), but I really don’t know if that’s allowed.</p>
<p><strong>Noether’s Theorem</strong>: Suppose there is some dynamical variable \(q\) that \(S\) depends on. Then we can locally approximate \(\Delta S = S(q, \ldots) + S_q \Delta q\), adding a phase to the wave function \(\psi \ra e^{i S_q \Delta q /\hbar} \psi\). This leaves physics unchanged if and only if \(S_q\) is a constant, such that this is a uniform global phase transformation.</p>
<p>But if \(q\) is a physical symmetry of the system, then it <em>must</em> lead to the same physics; therefore \(S_q\) is a constant throughout the system’s evolution (gauge fields notwithstanding). \(S_q\) is called the ‘Noether charge’ corresponding to the \(q\) symmetry. \(E\) is the charge associated with \(t\); \(\vec{p}\) for \(\vec{x}\), \(\vec{L}\) for \(\vec{\theta}\), etc.</p>
<hr />
<h2 id="summary">Summary</h2>
<ol>
<li>QM is easier to follow if you start from the fact that the wave function has the form \(\psi = e^{i S/\hbar}\).</li>
<li>Operators and inner products are ways to extract properties of \(S\).</li>
<li>The Schrödinger equation for a free particle is a low-energy approximation of the statement that \(\| \p S \| = m\).</li>
<li>The only free physical quantity in a wave function is the 4-vector \(\p S\), which measures which part of the variation in \(S\) is in the spatial vs timelike direction.</li>
<li>Potentials enter by modifying \(\p S\), eg \(\p S \mapsto \p S - i q A\). \(\int_i^f dS = S(f) - S(i)\) may no longer hold depending on the properties of \(A\).</li>
<li>Intrinsic angular momentum is a property of what kind of object the wave function’s value is (scalar, vector, spinor, etc).</li>
</ol>
<p>Normally you have to unlearn QM to learn relativistic QM, but the relativistic version makes much more sense in the first place so why not start there?</p>
<hr />
<p>Next up, spinors.</p>
<p>Much-later edit: spinors were harder than I thought :(</p>