Alex Kritchevsky2024-02-15T20:10:43+00:00http://alexkritchevsky.com/blogAlex Kritchevskyalex.kritchevsky@gmail.comDivergences and Delta Functions2023-10-24T00:00:00+00:00https://alexkritchevsky.com/2023/10/24/divergence<p>There’s an identity in electromagnetism which has been bugging me since college.</p>
<p>Gauss’s law says that the divergence of the electric field is equivalent to the charge distribution: \(\del \cdot \b{E} = \rho\). But in order to use this for a point charge—which is the most basic example in the subject!—we already don’t have the mathematical objects we need to calculate the divergence on the left or to represent the charge distribution on the right.</p>
<p>After all, the field of a point charge has to be \(\b{E} = q \hat{\b{r}}/4 \pi r^2\), and since its charge should be concentrated at a point it has to be a delta function: \(\del \cdot (q \hat{\b{r}}/4 \pi r^2) = q \delta(\b{x})\). In your multivariable-calculus-based E&M class you might mention this briefly, at best. Yet it is… kinda weird? And important? It feels like it should make a basic fact that lives inside of a larger intuitive framework of divergences and delta functions and everything else.</p>
<!--more-->
<p>Why, in the first place, are we using this divergence operator that we didn’t know how to actually calculate—are we missing something? Are there <em>other</em> divergences that we don’t know how to calculate? Does it work the same way in other dimensions? What about other powers of \(\frac{1}{r}\)? Are there other derivative <em>operators</em> we don’t know about that do similar tricks? Is there an equivalent version for the curl and by extension the magnetic field? Is there an equivalent version for dipoles, or multipoles? Etc. (The answer to all of these questions is ‘yes’, by the way.)</p>
<p>Not only is it unsatisfying, it’s also hard to learn about. For years I’ve been referring back to this one <a href="https://www.physicsforums.com/threads/divergence-of-the-e-field-at-a-theoretical-point-charge.956012/">rather confusing physicsforum.com post</a>, and I’m pretty tired of reading that. It’s not even good! Griffith’s and other E&M textbooks also mention it but they’re obscured by pedagogy and most of the interesting parts are left as exercises… and even then they don’t have much to say. Meanwhile venerable Wikipedia’s treatment is very slim and spread out over many hard-to-navigate articles; the best one is probably <a href="https://en.wikipedia.org/wiki/Green%27s_function_for_the_three-variable_Laplace_equation">here</a> but it’s still not great.</p>
<p>So today’s the day: I’m going to figure this out in all the generalization I want and write myself the reference I have wanted so I never have to visit that forum post, or that one page of Griffiths, ever again.</p>
<hr />
<h2 id="1-the-basic-argument">1. The Basic Argument</h2>
<p>The first thing we learn in electrostatics is that the electric field of a point particle is</p>
\[\b{E} = \frac{q \hat{\b{r}}}{4 \pi r^2}\]
<p>That is, the field points radially out in every direction from the ‘infinitely concentrated’ point charge, and the magnitude falls off proportional to \(4 \pi r^2\). Non-coincidentally, \(4\pi r^2\) is the formula for the surface area of a sphere of radius \(r\). Evidently electric flux lines get weaker exactly in proportion to how much they “spread out”. It is as though you had a pipe whose input has to be equal to its output, except the input is at the origin and the output is “every direction at once”. Put differently, an electric charge is the source of a flux and then that flux fluxes around in exactly the way a flux has to flux around, which is: conservatively. A source of nonzero electric flux is what a charge <em>is</em>.</p>
<p>Which means that you can detect the presence of charges by measuring the flux around a volume. This is Gauss’s Law: that summing the electric flux through any closed surface measures the total charge contained within it.</p>
\[\oiint_{S} \b{E} \cdot d \b{A} = q_{\text{enclosed}}\]
<p>The divergence theorem turns Gauss’s Law into</p>
\[\iiint_V \del \cdot \b{E} \; dV = q_{\text{enclosed}}\]
<p>We also learn the differential form of Gauss’s Law, which says that the divergence \(\del \cdot \b{E}\) equals the charge distribution \(\rho(\b{x})\). For a point particle the integral’s value is entirely concentrated at the origin, so \(\rho(\b{x})\) has to be a delta function:</p>
\[\rho(\b{x}) = q \delta(\b{x})\]
<p>But we also know the functional form of \(\b{E}\) for a point charge: it’s \(q \hat{\b{r}} /4 \pi r^2\). Hence at least in \(\bb{R}^3\) it must be true that:</p>
\[\del \cdot \frac{\hat{\b{r}}}{r^2} = 4 \pi \delta^3(\b{x})\]
<p>Equivalently:<sup id="fnref:laplacian" role="doc-noteref"><a href="#fn:laplacian" class="footnote" rel="footnote">1</a></sup></p>
\[- \del^2 \frac{1}{r} = 4 \pi \delta^3(\b{x})\]
<p>We can also write this delta function in terms of \(r\):<sup id="fnref:spherical" role="doc-noteref"><a href="#fn:spherical" class="footnote" rel="footnote">2</a></sup></p>
\[4 \pi \delta^3(\b{x}) = - \del^2 \frac{1}{r} = \del \frac{\hat{\b{r}}}{r^2} =\frac{\delta(r)}{r^2}\]
<p>Which is neat, and also rather suspicious-looking. Seems like the more interesting identity here is that \(\delta^3 (\b{x}) = \delta(r) / 4 \pi r^2\) where the numerator is the surface area of a \(2\)-sphere.</p>
<p>It’s pleasing (since it’s pleasing when any integral is easy) that you can simply plug that into the equation for the electric field of an arbitrary charge distribution and recover Gauss’s law:</p>
\[\begin{aligned}
\b{E}(\b{x}) &= \frac{1}{4 \pi} \int \frac{\rho(x')}{\|\b{x} - \b{x}' \|^2} dx' \\
\nabla \cdot \b{E}(\b{x}) &= \frac{1}{4 \pi} \int [\del \cdot \frac{1}{\|\b{x} - \b{x}' \|^2} ] \rho(x') \d \b{x}' \\
&= \frac{1}{4 \pi} \int [4 \pi \delta(\b{x} - \b{x}')] \rho(x') \d \b{x}' \\
&= \int \delta(\b{x} - \b{x}') \rho(x') \d \b{x}' \\
&= \rho(\b{x})
\end{aligned}\]
<hr />
<h2 id="2-the-other-definition-of-divergence">2. The Other Definition of Divergence</h2>
<p>Producing this result by working backwards from physics is good enough for most purposes, but it’s a bit perplexing. Maybe there’s a cleaner derivation?</p>
<p>I’ve looked around and there are some formal-ish <a href="https://math.stackexchange.com/questions/1335591/divergence-of-vecf-frac-hat-mathrmrr2">ways</a> to do it, by a procedure they call ‘regularizing’ \(\frac{\b{r}}{r^2}\) as a limit of a more complicated function like \(\b{r} /(r^2 + a^2)^{\frac{3}{2}}\), which is a way of producing distributions as a limit of non-distributions. I guess it’s rigorous, but I don’t want to do it. It doesn’t teach me anything new about divergences or delta functions at all. Plus it just feels unnecessary.<sup id="fnref:delta" role="doc-noteref"><a href="#fn:delta" class="footnote" rel="footnote">3</a></sup></p>
<p>Others <a href="https://math.stackexchange.com/questions/2136837/divergence-of-vecf-frac1r2-hatr">claim</a> that the divergence of \(\b{r}/r^2\) “is” undefined according to the usual definition, and that we’re just assigning a value to make the divergence theorem work. They’re obviously wrong: we’re not <em>inventing</em> a value; we’re <em>discovering</em> the actual value and it just requires delta functions to express. For the purposes of physics we don’t care at all about confining the space of objects we consider to just the standard-issue smooth functions. Evidently multivariable calculus <em>wants</em> distributions to get involved; we may as well let it happen.</p>
<p>The most satisfying explanation, in my opinion, is based on a different definition of divergence which isn’t used as much:</p>
<p>Recall that in multivariable calculus class we initially define divergence as a sum of partial derivatives \(\p_x \hat{\b{x}}+ \p_y \hat{\b{y}}+ \p_z \hat{\b{z}}\) (or whatever it becomes in other coordinate systems). But there’s another definition which is really a more direct extension of the one-variable derivative<sup id="fnref:derivative" role="doc-noteref"><a href="#fn:derivative" class="footnote" rel="footnote">4</a></sup>. It looks like this:</p>
\[\del \cdot \b{F} = \lim_{V \ra 0} \frac{1}{\| V \|} \oint_{\p V} \b{F} \cdot d\b{n}\]
<p>That is, it’s a ratio of the flux through a volume surrounding the point divided by the volume itself, as the volume goes to zero. It’s actually a standard definition and is at the top of the Wikipedia page on divergence, but for whatever reason it doesn’t come up as often. To use it, you compute the volume in the denominator as a sphere or a cube or whatever you want. For instance if \(\b{F} = x \hat{\b{x}} + y \hat{\b{y}} + z \hat{\b{z}}= r \hat{\b{r}}\) and we integrate over a sphere, then</p>
\[\begin{aligned}
\del \cdot \b{F} &= \frac{1}{4/3 \pi r^3} \oint (r \hat{\b{r}}) \cdot \hat{\b{r}} (r^2 d \theta d \phi) \\
&= \frac{4 \pi r^3}{4/3 \pi r^3} \\
&= 3 \\
&= (\p_x \hat{\b{x}}+ \p_y \hat{\b{y}}+ \p_z \hat{\b{z}}) \cdot (x \hat{\b{x}} + y \hat{\b{y}} + z \hat{\b{z}})
\end{aligned}\]
<p>In many ways this is more intuitive! On the other hand I have no idea how to prove that it’s equivalent to \(\p_x \hat{\b{x}}+ \p_y \hat{\b{y}}+ \p_z \hat{\b{z}}\) in general, and it’s hard to google for because you just get results about proving the divergence theorem. Sigh. But it makes some sense. \(\del \cdot \b{F} = (\p_x, \p_y, \p_z) \cdot \b{F}\) acts like the same formula but implemented on a cube instead of a sphere.</p>
<p>Using this definition we can derive the weird equation from E&M as follows. The flux of \(\hat{\b{r}}/r^2\) through a sphere of radius \(\e\) is \(4 \pi \e^2 / \e^2 = 4 \pi\) as long as the volume contains the origin and therefore the limit is \(\del \cdot \hat{\b{r}} / r^2 = \lim_{V \ra 0} 4 \pi 1_{\mathcal{O} \in V}/V\) which, if you integrate that against test functions, acts like \(4 \pi \delta(\b{x})\). Something like that.</p>
<hr />
<h1 id="3-dealing-with-deltar-in-other-dimensions">3. Dealing with \(\delta(r)\) in other dimensions</h1>
<p>One nice thing about the integral definition is that it makes generalizations of the delta function divergence to other dimensions very natural: just integrate over different types of objects. In each case the coefficient is given by the surface area of an \((n-1)\)-sphere of radius \(R=1\), which you can <a href="https://en.wikipedia.org/wiki/N-sphere">look up</a>.</p>
\[\begin{aligned}
\del \cdot \frac{\hat{\b{r}}}{r^{n-1}} &= S_{n-1} \delta(\b{x}) \\
&= \frac{\delta(r)}{r^{n-1}}
\end{aligned}\]
<p>e.g. in \(\bb{R}^2\) with polar coordinates (so \(\b{r}_{xy} = x \hat{\b{x}} + y \hat{\b{y}}\)):</p>
\[\del \cdot \frac{\hat{\b{r}}_{xy}}{r_{xy}} = 2 \pi \delta(x,y) = \frac{\delta(r_{xy})}{r_{xy}}\]
<p>Note that you can totally compute a 2-divergence in a plane in \(\bb{R}^3\), or a 3-divergence in \(\bb{R}^4\), etc. I guess we could write it as \(\del_{xy} \cdot \b{F}\). I do vaguely recall seeing that object in formulas in my life but can’t remember where.</p>
<p>In fact this construction works in \(\bb{R}^1\) also, but it’s kinda weird: the 1d version of \(\hat{\b{r}}/r^2\) in \(\bb{R}^3\) and \(\hat{\b{r}}_{xy}/r_{xy}\) in \(\bb{R}^2\) is \(\hat{\b{r}}_{x}\), the “one dimensional radius function”, also written less strangely as \(\sgn(x) \hat{\b{x}}\). That is, it’s a unit vector pointing in the \(+\b{x}\) direction in the positive numbers and the \(-\b{x}\) direction in the negative numbers. Then:</p>
\[\begin{aligned} \del \cdot (\hat{\b{r}}_x ) &= (\p_x \hat{\b{x}}) \cdot (\sgn(x) \hat{\b{x}}) \\
&= \p_x \sgn(x) \\
&= 2 \delta(x) \\
&= \delta(r_x)
\end{aligned}\]
<p>The factor of \(2\) can be regarded as the “surface area” of a \(0\)-sphere, that is, of a line segment. Admittedly it’s kinda weird to write \(2 \delta(x) = \delta(r_x)\). One way of thinking about it is that \(\hat{\b{r}}_x\) “acts like” \(\delta(x) \theta(x)\), covering only half the displacement at the origin with a step function, whereas \(\hat{\b{x}} (\sgn(x))\) acts like \(\delta(x) \sgn(x)\) and covers the full displacement. Hence the factor of \(2\).</p>
<p>Yes, that sounds weird and made up. I’m happy with it mostly I realized that it gives a satisfying result in in \(\bb{R}^3\) as well: recall that in spherical coordinates the radial term of the divergence looks like \(\del \cdot \b{f} = \frac{1}{r^2} \p_r (r^2 f_r)\). Well, suppose \(f = \hat{\b{r}} \theta(r) /r^2\) where once again we imagine that we need the \(\theta(r)\) in there to deal with how \(r\) switches signs at the origin. Then \(\del \cdot \b{f} = \frac{1}{r^2} \p_r [r^2 \frac{\theta(r)}{r^2}] = \frac{1}{r^2} \p_r \theta(r) = \delta(r)/r^2\) is the right value. Not bad, eh?</p>
<p>By the way, there is some information about all of this on the Wikipedia article for <a href="https://en.wikipedia.org/wiki/Newtonian_potential">Newtonian potential</a>. They call the function which is the fundamental solution to \(\del^2 f = \delta\) in \(\bb{R}^d\) the “Newtonian Kernel” \(\Gamma\), and write</p>
\[\Gamma(x) = \begin{cases}
2 \pi \log r & d = 2 \\
\frac{1}{d(2-d) V_d} r^{2 - d} & d \neq 2
\end{cases}\]
<p>Where \(V_d\) is the <em>volume</em> of the \(d\)-sphere. That’s a bit confusing. It’s easier to follow with the identity \(V_d = \frac{S_{d}}{d}\) where \(S_d\) is the surface area of the \(d\)-sphere. Then this is really</p>
\[\Gamma(x) = \begin{cases}
\frac{1}{2 \pi} \log r & d = 2 \\
\frac{1}{(2-d) S_d} r^{2 - d} & d \neq 2
\end{cases}\]
<p>And its gradient is given by the same formula in all dimensions:</p>
\[\del \Gamma(x) = \frac{1}{S_d} \frac{\hat{\b{r}}}{r^{d-1}}\]
<p>This agrees with what we wrote above, and even works in \(d=1\) if you consider the “surface area of the 1-sphere” to be \(S_1 = 2\).</p>
<hr />
<h1 id="4-other-shapes-and-multipoles">4. Other Shapes and Multipoles</h1>
<p>In fact every example charge distribution from elementary E&M has an expression as a delta function. It’s just that we’re not very… good… at using delta functions so we don’t normally write them that way.</p>
<p>Here are two classic examples from intro E&M, plus the electric fields that you get from applying Gauss’s law to their symmetries:</p>
<ul>
<li>an infinite line of charge in the \(z\)-direction with linear charge density \(\mu\) has electric field \(\b{E}(\b{x}) = \mu \hat{\b{r}}_{xy} / (2 \pi r_{xy})\).</li>
<li>an infinite plane of charge in the \(xy\) plane with area charge density \(\sigma\) has constant electric field \(\b{E}(\b{x}) = \sigma \sgn(z) \hat{\b{z}}/ 2\). The \(\sgn(z)\) makes this valid on both sides of the plane.</li>
</ul>
<p>In each case it should be that \(\del \cdot \b{E} = \rho(\b{x})\). Evidently:</p>
\[\begin{aligned}
\rho_{\text{line}}(\b{x}) &= \mu \delta(y, z) \\
&= \mu \frac{\delta(r_{xy})}{2 \pi r_{xy}} \\
\rho_{\text{plane}}(\b{x}) &= \sigma \delta(z) \\
&= \sigma \frac{\delta(r_z)}{2} \\
\end{aligned}\]
<p>Here are the forms of \(V\), \(\b{E}\), and \(\rho\) side-by-side:</p>
\[\begin{aligned}
\rho_{\text{line}}(\b{x}) &= \mu \delta(y, z) \\
\b{E}_{\text{line}}(\b{x}) &= \frac{\mu}{2 \pi} \frac{\hat{\b{r}}_{xy}}{ r_{xy}} \\
V_{\text{line}}(\b{x}) &= \frac{\mu}{2\pi} \ln {r_{xy}} \\
&\\
\rho_{\text{plane}}(\b{x}) &= \sigma \delta(z) \\
\b{E}_{\text{plane}}(\b{x}) &= \frac{\sigma}{2} \sgn(z) \hat{\b{z}} \\
V_{\text{plane}}(\b{x}) &= \frac{\sigma}{2} \| z \| \hat{\b{z}} \\
\end{aligned}\]
<p>How about some other interesting charge distributions?</p>
<p>A perfect <a href="https://en.wikipedia.org/wiki/Electric_dipole_moment">electric dipole</a> is the limiting case of a positive and negative charge next to each other, so that their net charge is zero but there is a nonzero dipole moment \(\b{p}\) along a certain axis axis. The potential, electric field, and charge distributions of dipoles are given by the limit as we press two point charges together while keeping the product \(qd\) fixed. But in fact this limit is just a directional derivative:</p>
\[\rho_{\text{dipole}}(\b{x}) = \lim_{d \ra 0} [(+ q)\delta(\b{x} - \b{d}/2) + (- q)\delta(\b{x} + \b{d}/2)] = -\b{p} \cdot \p [ \delta(\b{x})] = -\p_{\b{p}} \delta(\b{x})\]
<p>So the charge distribution of a dipole is the gradient of a delta-function. That makes sense: the net charge is zero, but there’s two infinite spikes at the origin infinitesimally close to each other, which is what \(\delta'\) looks like.</p>
<p>We can immediately write down the electric field and potential also:</p>
\[\begin{aligned}
V_{\text{dipole}}(\b{x}) &= -\p_{\b{p}} [ \frac{1}{4 \pi r}] = \b{p} \cdot [\frac{\hat{\b{r}}}{4 \pi r^2} ]\\
\b{E}_{\text{dipole}}(\b{x}) = -\del V_{\text{dipole}}(\b{x}) &= -\p_{\b{p}}[ \frac{ \hat{\b{r}}}{4 \pi r^2}] = \frac{3 (\b{p} \cdot \hat{\b{r}}) \hat{\b{r}} - \b{p}}{4 \pi r^3} \\
\rho_{\text{dipole}}(\b{x}) = \del \b{E}_{\text{dipole}}(\b{x}) &= -\p_{\b{p}} [\delta(\b{x})] \\
\end{aligned}\]
<p>(Although see the next aside for a correction to \(\b{E}\): there’s apparently supposed to be a delta function term there also.)</p>
<aside id="dipole" class="toggleable" placeholder="<b>Aside</b>: The Dipole Field Discrepancy <em>(click to expand)</em>">
<p>By the way. While we’re talking about dipoles and delta functions. Remember how the dipole term in \(\b{E}\) was the second derivative of \(\frac{1}{4 \pi r}\)?</p>
\[\b{E}_{\text{dipole}} = \frac{3 (\b{p} \cdot \hat{\b{r}}) \hat{\b{r}} - \b{p}}{4 \pi r^3}\]
<p>It turns out there is some debate in the physics world about whether this should have a delta function term attached to it and what the coefficient should be:</p>
\[\b{E}_{\text{dipole (corrected?)}} \stackrel{?}{=} \frac{1}{4 \pi } [ \frac{3 (\b{p} \cdot \hat{\b{r}}) \hat{\b{r}} - \b{p}}{ r^3}] - \frac{1}{3} \b{p} \delta(\b{x})\]
<p>Griffiths and Jackson, the pre-eminent textbooks, both say it should look like that. The argument is that if you integrate the electric field \(\int \b{E}(\b{x}) d^3 \b{x}\) over a region containing a dipole, it is off: you should get that the total field is \(-\frac{1}{3} \b{p}\), but instead you get that it’s zero as long as you exclude the origin. (The \(\frac{1}{3}\) is really \(\frac{1}{4 \pi} \times \frac{4 \pi}{3}\), the second term being the volume of a unit sphere.)</p>
<p>But when you go looking to read about this correction, people are pretty polarized (no pun intended). <a href="https://iopscience.iop.org/article/10.1088/0143-0807/28/2/012/meta">This</a> delightful paper by Andre Gsponer (not a typo) argues that the problem is that nobody is very good at using the \(r = \| \b{r} \|\) variable, which (as I also noticed earlier) has a derivative of \(\sgn(r)\) at \(r = 0\); hence, its second derivative produces a delta function at the origin. In particular, they argue that the actual potential of a point charge goes as</p>
\[V(\b{x}) = \frac{1}{4 \pi r} \sgn(r)\]
<p>Or equivalently:</p>
\[V(\b{x}) = \frac{1}{4 \pi \| r \|}\]
<p>since \(r \, \sgn (r) = \frac{r}{\sgn (r)} = \| r \|\). The \(\sgn(r)\) hangs out even though it’s always positive in order to give a correct derivative later.</p>
\[\begin{aligned}
\del \frac{1}{\| r \|} &= \p_r (\frac{1}{r} \, \sgn (r)) \\
&= - \frac{\hat{\b{r}}}{r^2} \sgn(r) + 2 \frac{\hat{\b{r}}}{r} \delta(r) \\
\del^2 \frac{1}{\| r \|} &= \frac{1}{r^2} \p_r [r^2 (- \frac{1}{r^2} \sgn(r) + 2 \frac{1}{r} \delta(r))] \\
&= \frac{1}{r^2} \p_r [- \sgn(r) + 2 r \delta(r)] \\
&= \frac{1}{r^2} [- \delta(r) + \cancel{2 \delta(r) + 2 r \delta'(r)}] \\
&= - \frac{1}{r^2} \delta(r)
\end{aligned}\]
<p>(Note that the radial part of the divergence is given by \(\del \cdot f = \frac{1}{r^2} \p_r(r^2 f_r)\), and also that \(x \delta'(x) = - \delta(x)\).)</p>
<p>The dipole version is:</p>
\[\begin{aligned}
\del \p_{\b{p}} \frac{1}{\| r \|} &= \del [ - \frac{\b{p} \cdot \b{\hat{r}}}{r^2} \sgn (r)] \\
&= \frac{3 (\b{p} \cdot \b{r})(\b{r}) - r^2\b{p} }{r^5} \sgn(r) - \frac{(\b{p} \cdot \b{r}) \b{\hat{r}}}{r^3} \delta(r) \\
&= \frac{3 (\b{p} \cdot \b{r})(\b{r}) - r^2\b{p} }{r^5} \sgn(r) - \frac{\b{p}}{r^2} \delta(r) \\
\end{aligned}\]
<p>It’s that last term \(- \frac{\b{p}}{r^2} \delta(r)\) which gives the discrepancy: when integrated over a sphere the \(1/r^2\) cancels out the \(r^2\) integration factor so the result is just the volume of the sphere, \(\frac{4 \pi}{3}\), leading to \(-\frac{\b{p}}{3} \delta(r)\). So there you go. Apparently there should be delta functions on \(\b{E}\) fields also, and it’s the missing \(\sgn(r)\)s that are causing us to lose track of our deltas. Who knew?</p>
<p>Also, fun fact: apparently Jackson, who wrote that one textbook everyone knows, also published a <a href="http://cds.cern.ch/record/118393?ln=en">paper</a> arguing that the fact that <em>intrinsic</em> dipoles have a different delta function term (\(+ \frac{8 \pi}{3}\) instead of \(- \frac{4 \pi}{3}\), he says) compared to dipoles that are the limit of two monopoles shows that distant stars must have magnetic dipoles (that is, circulating electric currents) rather than magnetic monopoles in them, or they’d have a 42cm spectral line instead of a 21cm spectral line. Weird. I didn’t really follow it.</p>
<p>There are some other weird papers around the subject:</p>
<ul>
<li><a href="https://arxiv.org/pdf/1604.01121.pdf">This</a> paper by Edward Parker discusses various ways to get the terms in Jackson’s argument.</li>
<li><a href="https://pubs.aip.org/aapt/ajp/article-abstract/51/9/826/1043129/Some-novel-delta-function-identities?redirectedFrom=fulltext">Some novel delta‐function identities</a> by Charles Frahm derives some of these equations with explicit calculations in indexes.</li>
<li><a href="https://arxiv.org/abs/1001.1530">Comment on “Some novel delta-function identities”</a> by Jerrold Franklin thinks that Frahm did it wrong and does it a different way. They do explicitly claim that the \(-\p^2 (\frac{1}{r}) = 4 \pi \hat{\b{x}}^{\o 2}\delta(\b{x})\), though, and that everyone else has been integrating over the angular dependence implicitly.</li>
<li>And then there’s <a href="https://arxiv.org/abs/1308.2262">Comment on “Comment on `Some novel delta-function identities”</a> by Yunyun Yang and Ricardo Estrada… but unfortunately ArXiv doesn’t have the pdf. I think they took it down because it was an older version and they changed the name later: the actual paper is called <a href="https://repository.lsu.edu/cgi/viewcontent.cgi?article=1282&context=mathematics_pubs">Distributions in spaces with thick points</a>, which deals with everything more rigorously than I care for and honestly gets crazy in how complex it is, defining distributions on certain surfaces and a new kind of “thick” delta functions. Why is figuring out what happens at \(r=0\) in \(\bb{R}^3\) so hard?</li>
</ul>
<p>Math is horrifying, but this chain of commentaries is kinda funny. Out of all of these I think the \(\frac{1}{r} \ra \frac{1}{r} \sgn(r)\) trick is the most useable. Probably best to stay away from “thick distributions” for now.</p>
<p>In summary:</p>
\[\b{E}_{\text{dipole}} \stackrel{?}{=} \frac{1}{4 \pi } [ \frac{3 (\b{p} \cdot \hat{\b{r}}) \hat{\b{r}} - \b{p}}{ r^3}] + \begin{cases} - \frac{1}{3} \b{p} \delta(\b{x}) & \\ + \frac{2}{3} \b{p} \delta(\b{x}) \end{cases} \text{ (depending on who you ask)}\]
</aside>
<p>Another way of looking at dipoles is to consider manually placing a bunch of charges at a distance \(h\) apart and then taking \(h \ra 0\). Write \(\Delta_\b{v}\) for a finite difference at a distance \(h\) along the \(\b{v}\). For instance \(\Delta_{\b{v}} = f(\b{x} + \b{v} h) - f(\b{x})\). Note that \(\p_\b{v} f(\b{x}) = \lim_{h \ra 0} \Delta_\b{v} f(\b{x})\). Also, we can write \(T_\b{z} f \equiv f(\b{x} + \b{z} h)\), such that \(D_\b{z} = T_\b{z} - 1\) and \(D_\b{z} f = (T_\b{z} - 1) f = T_\b{z} f - f\).</p>
<p>Then a “physical” dipole (where the charges are a small but finite distance apart) is proportional to</p>
\[\Delta_\b{z} \delta(\b{x}) = \delta(\b{x} + \b{z} h) - \delta(\b{x})\]
<p>Then the infinitesimal dipole charge distribution is given by \(\rho(\b{x}) = - \frac{q}{h} \Delta_\b{z} \delta(\b{x})\), which in the limit where \(h \ra 0\) with \(hq = p\) fixed gives</p>
\[\rho_{\text{dipole}}(\b{x}) = (- p ) \p_\b{z} \delta(\b{x}) = (-p \b{z}) \cdot \p \delta(\b{x})\]
<p>A physical quadrupole is given by the “second finite difference” (so, second derivative). We can consider the case along a single axis:</p>
\[\begin{aligned}
\Delta_\b{z}^2 \delta &= (T_\b{z} - 1)^2 \delta \\
&= T_\b{z}^2 \delta - 2 T_\b{z} \delta + \delta \\
&\equiv \delta(\b{x} + 2h \b{z}) - 2 \delta(\b{x} + h \b{z}) + \delta(\b{x})
\end{aligned}\]
<p>In the limit we take \(h \ra 0\) while holding \(q^2 h = Q\) and get \(\rho_{\b{z}\b{z}\text{-quadrupole}} = Q \p_\b{z}^2 \delta = \hat{Q} \cdot \p^2 \delta\) (where \(\hat{Q}\) is a quadrupole tensor which only has an \(\b{z}\b{z}\) component). Or, we can do an \(\b{y}\)-\(\b{z}\) quadrupole:</p>
\[\begin{aligned}
\Delta_\b{y} \Delta_\b{z} \delta (\b{x}) &= (T_\b{y} - 1)(T_\b{z} - 1) \delta \\
&= T_\b{y} T_\b{z} \delta - T_\b{y} \delta - T_\b{z} \delta + \delta \\
&\equiv \delta(\b{x} + h(\b{z} + \b{y})) - \delta(\b{x} + h \b{xz}) - \delta(\b{x} + h \b{z}) + \delta(\b{x})
\end{aligned}\]
<p>The limit with \(q^2 h = Q\) is \(\rho_{yz\text{-quadrupole}} = Q \p_y \p_z \delta(\b{x}) = \hat{Q} \cdot \p^2 \delta\) (where \(\hat{Q}\) is now a quadrupole tensor which only has an \(yz\) component).</p>
<p>This construction is nicely easy to generalize, for instance to any charge distribution that’s a mix of points and multipoles at any separation from each other.</p>
<p>We can also make lines and planes and other shapes out of multipoles. For instance a “line of dipoles” looks like a positively-charged line infinitesimally close to a negatively-charged line. The result is just that we take an additional \(-\p\) of every term, which is equivalent to forcing the two charged surfaces to be \(d \ra 0\) apart with opposite signs while holding the ratio \(qd\) constant. For instance a line of charge on the \(z\) axis had charge distribution \(\rho_{\text{line}} = \mu \delta(x,y)\). A dipole along the \(x\)-axis made out of two of thsee has charge distribution</p>
\[\rho_{\text{line of dipoles}} = \lim_{h \ra 0, h\mu = p} [\rho_{\text{line}}(\b{x} + h \hat{x}) - \rho_{\text{line}}(\b{x})] = p \p_x \delta(x,y)\]
<p>More generally, the <a href="https://en.wikipedia.org/wiki/Multipole_expansion">multipole distribution</a> of a potential gives an Taylor expansion of \(V\) away from the charges, in terms of the increasingly higher-order moments \((q, \b{p}, \hat{Q}, \ldots)\) of the underlying charge distribtion.<sup id="fnref:quad" role="doc-noteref"><a href="#fn:quad" class="footnote" rel="footnote">5</a></sup></p>
\[\begin{aligned}
V(\b{x})_{\text{multipole}} &= \frac{1}{4 \pi} [\frac{q}{r} + \b{p} \cdot \frac{\hat{\b{r}}}{r^2} + \frac{1}{2} \hat{Q} \cdot \frac{\hat{\b{r}}^{\o 2} }{r^3} + \ldots] \\
&=\frac{1}{4 \pi r} [q + \b{p} \cdot (-\p) + \frac{1}{2} \hat{Q} \cdot (-\p)^2 + \ldots]\\
\end{aligned}\]
<p>Each of these terms is going to have a \(-\del^2\) that gives a delta-function derivative of some order.</p>
\[\begin{aligned}
-\del V = \b{E}(\b{x})_{\text{multipole}} &= [q + \b{p} \cdot (-\p) + \frac{1}{2} \hat{Q} \cdot (-\p)^{2} + \ldots] \frac{\hat{\b{r}}}{4 \pi r^2} \\
-\del^2 V = \rho(\b{x})_{\text{multipole}} &= [q + \b{p} \cdot (-\p) + \frac{1}{2} \hat{Q} \cdot (-\p)^{2} + \ldots] \delta^3(\b{x})
\end{aligned}\]
<hr />
<h2 id="5-other-powers-of-r">5. Other powers of \(r\)</h2>
<p>The multipole examples imply that in general, there are lots of objects in \(\bb{R}^3\) that have delta function divergences and it’s not just \(\hat{\b{r}}/r^2\), but the results are going to involve <em>derivatives</em> of delta functions instead… which are even harder to detect with the usual implementations of divergence.</p>
<p>For instance we can can compute \(\del \cdot \hat{\b{r}}/r^3\) in two ways. Everywhere except the origin, we can use \(\del \cdot f(r) = \frac{1}{r^2} \p_r (r^2 f_r)\) to get</p>
\[\del \cdot \frac{\hat{\b{r}}}{r^3} = - \frac{1}{r^4}\]
<p>And around the origin we use the integral definition of divergence:</p>
\[\begin{aligned}
\del \cdot \frac{\hat{\b{r}}}{r^3} &= \lim_{R \ra 0} \oint_{R} \frac{1}{r} \frac{d \Omega}{ r^2} \\
&= \frac{4 \pi }{r} \delta^3(\b{x}) \\
&= \frac{\delta(r)}{r^3} \\
\end{aligned}\]
<p>It seems like we should be able to make the delta function into a derivative, similar to what showed up in the multipole distribution. But it’s a little weird. Normally we can replace \(\delta/x^n\) with \(\frac{(-1)^n}{n!} \delta^{(x)}\). But it seems like the identity is probably a little bit different in radial coordinates, since after all we expect this to be true:</p>
\[\frac{4 \pi }{r} \delta^3(\b{x}) = \frac{1 }{r} \frac{\delta(r)}{ r^2} = \frac{- \delta'(r)}{r^2}\]
<p>That is, \(\frac{1}{r^3}\) should give a <em>first</em> derivative, not a <em>third</em> derivative:</p>
\[\frac{4 \pi }{r} \delta^3(\b{x}) = \frac{1 }{r} \frac{\delta(r)}{ r^2} \stackrel{!}{\neq} \frac{- \delta^{(3)}(r)}{3!}\]
<p>The problem, I presume, is basically that \(\p_r \delta(r)\) is a weird object, because in an integral against a test function \(\< -\p_r \delta(r), f \>\), the normal integration-by-parts that lets us move the derivative across doesn’t work: \(\< -\p_r \delta(r), f \> \neq \< \delta, \p_r f \>\): since the radial integral has bounds \((0,r)\), we <em>can’t</em> ignore the boundary. This integration by parts is what justifies \(\delta(x)/x = -\p_x \delta(x)\) normally, since \(\< \delta(x)/x, f \> = \< \delta, f/x \> = - f'(0)\) (in a principal-value sense?). Therefore it is <em>probably</em> best to leave \(\delta(r) / r^3\) as-is instead of trying to turn it into a radial derivative.</p>
<p>Nevertheless I’m pretty sure there are ways to do it, but it’s a lot more than I want to figure out right now. Roughly speaking, though, we can expect that a term like \(\delta(r)/r^k\) is going to turn into a delta-function derivative that is comparable to \(\frac{1}{r^2} \frac{(-1)^{k-2}}{(k-2)!} \delta^{(k-2)}\), that is, it will act like a delta-derivative but it acts like a factor of \(\frac{1}{r^{(k-2)}}\) instead. But I hope to figure out the actual details in a future article.</p>
<hr />
<h2 id="6-curl-and-magnetic-fields">6. Curl and Magnetic Fields</h2>
<p>One last question. How does this work for magnetism and curl?</p>
<p>The equivalent Maxwell equation is Ampère’s law, which establishes that the curl of the <em>magnetic</em> field is proportional to the current density (in units with \(\mu_0 = 1\)):</p>
\[\del \times \b{B} = \b{J}\]
<p>The integral form is:</p>
\[\oint_{\p A} \b{B} \cdot d \ell = \iint_{A} \b{J} \cdot d\b{A}\]
<p>Like divergence, there’s an integral form for the curl, which is basically the same idea except that it is computed each plane instead of in the overall volume. The coordinates on the plane with normal \(\b{u}\) are given by:</p>
\[(\del \times \b{F} )\cdot \hat{\b{u}} = \lim_{A \ra 0} \frac{1}{\| A \|} \oint_{\p A} \b{F} \cdot d \ell\]
<p>We can use this to justify delta-function formulas for currents on various surfaces, although I’m going to skip most of the steps. The equivalent identity is going to be for a current which is entirely concentrated in a line, which we’ll assume is at the origin of the \((x,y)\) plane and directed up the \(z\) axis.</p>
\[\b{J} = j \hat{\b{z}} \delta(x,y) = j \hat{\b{z}} \frac{\delta(\b{r}_{xy})}{r_{xy}}\]
<p>(That’s in \((r_{xy}, \theta, z)\) cylindrical coordinates; the \(\frac{1}{r_{xy}}\) factor is the same as the one for the 2d divergence up above.)</p>
<p>Of course the magnetic field due to a infinitely thin wire is a basic textbook example, so we know immediately what function has this as its curl:</p>
\[\begin{aligned}\b{B} &= j\frac{\hat{\theta}}{2 \pi r_{xy}} \\
\b{J} = \del \times \b{B} &= j \hat{\b{z}} \delta(r_{xy}) \end{aligned}\]
<p>The \(\hat{\theta}/r_{xy}\) vector field is the classic ‘twist around the origin’ vector field that points along \(\hat{\theta}\) and at a right angle to \(\hat{\b{r}}_{xy}\) everywhere. It might look more familiar in cartesian coordinates:</p>
\[\hat{\theta} = \frac{x \hat{\b{y}} - y \hat{\b{x}}}{\sqrt{x^2 + y^2}} = \frac{x \hat{\b{y}} - y \hat{\b{x}}}{r_{xy}}\]
<p>Then its curl is:</p>
\[\del \times \frac{\hat{\theta}}{r_{xy}} = 2 \pi \hat{\b{z}} \delta(x, y) = \hat{\b{z}} \frac{ \delta(r_{xy})}{r_{xy}}\]
<p>That’s neat, I guess.</p>
<p>We can also do the magnetic field due to a single magnetic dipole (an infinitesimal magnetic or loop of current or classically-interpreted particle with spin) with magnetic dipole moment \(\b{m}\). We’ll use Gsponer’s notation of including a \(\sgn(r)\) term and see if it gives us some good delta functions terms. The vector potential is:</p>
\[\b{A} = \frac{1}{4\pi} \frac{\b{m} \times \b{r}}{r^3} \sgn(r)\]
<p>The magnetic field is (yes, definitely had to look up some identities for this):</p>
\[\begin{aligned}
4 \pi \b{B} = 4 \pi \del \times \b{A} &= \del \times [(\b{m} \times \frac{\b{r}}{r^3}) \sgn(r)] \\
&= [(- \b{m} \cdot \del) \frac{\b{r}}{r^3} + \b{m} (\cancel{\del \cdot \frac{\b{r}}{r^3}})] \sgn(r) - \frac{\b{m} \times \b{r}}{r^3} \times \del \sgn(r) \\
&= [\b{m} \cdot (-\frac{1}{r^3} + \frac{3 \b{r}^{\o 2}}{r^5}) ] \sgn(r) - \frac{\b{m} \times \b{r}}{r^3} \times \hat{\b{r}} \delta(r) \\
\b{B} &= \frac{3 (\b{m} \cdot \b{r}) \b{r} - r^2 \b{m}}{4 \pi r^5} \sgn(r) + (\hat{\b{r}} \times \b{m} \times \hat{\b{r}}) \frac{\delta(r)}{4 \pi r^2}
\end{aligned}\]
<p>(Since we included the \(\sgn\) term that should track the delta functions for us, it seemed like \(\del \cdot \frac{\b{r}}{r^3} = \del \cdot \frac{\hat{\b{r}}}{r^2}\) could be ignored now.) The latter term is the delta-function correction to the magnetic dipole field. The internet tells me that its integral over space is \(+\frac{8 \pi}{3} \b{m}\), compared to \(-\frac{4 \pi}{3} \b{p}\) for the scalar dipole field, and, as mentioned earlier, Jackson says this is responsible for the specific wavelength in the hyperfine splitting of hydrogen. Weird.</p>
<hr />
<h2 id="7-summary">7. Summary</h2>
<p>Most of these equations are the same toy examples from an intro electromagnetism course, written in a different way. But it is satisfying to see them written “explicitly”, which is what the delta functions let us do, instead of “working around” the delta function formulation by computing with e.g. Gauss’s law. I think things would have been easier to learn, back then, if the delta-function forms of these objects were made explicit from the start.</p>
<p>For posterity here’s a summary of the talked we’ve talked about:</p>
<p><strong>Delta Functions in Radial Coordinates</strong></p>
\[\begin{aligned}
\delta^n(\b{x}) &= \delta(r_n)/ S_{n-1} \\
\delta^3(\b{x}) &= \delta(r_3)/4 \pi r^2 \\
\delta^2(\b{x}) &= \delta(r_2)/2 \pi r \\
\delta(x) &= \delta(r_1)/2 \\
\end{aligned}\]
<p>For the \(1d\) case, recall that the “0-sphere” is a line segment whose “surface” area is usefully understood to be \(S_0 = 2\), giving \(\delta(x) = \frac{1}{2}\delta(r)\). That does seem a bit weird—why are we chopping our delta function in two?—but it does seem to work.</p>
<p><strong>Integral Form of Divergence and Curl</strong></p>
<p>Divergence in general is given by</p>
\[\del \cdot \b{F} = \lim_{V \ra 0} \frac{1}{\| V \|} \oint_{\p V} \b{F} \cdot d \b{n}\]
<p>Curl is given by</p>
\[(\del \times \b{F} )\cdot \hat{\b{u}} = \lim_{A \ra 0} \frac{1}{\| A \|} \oint_{\p A} \b{F} \cdot d \ell\]
<p>for any plane with normal \(\b{u}\); chose \(\b{u} = \{ \b{x}, \b{y}, \b{z} \}\) to get the usual vector projections.</p>
<p><strong>Functions whose divergence/curl/exterior derivative are delta functions</strong></p>
<p>In \(\bb{R}^n\) the divergence’s integrand includes a factor of \(r^{n-1}\) from the coordinates, while all the angular coordinates naturally integrate to \(S_{n-1}\). Therefore it’s \(1/r^{n-1}\) that cancels the radial part out and produces a delta function at the origin:</p>
\[\del \cdot \frac{\hat{\b{r}}}{r^{n-1}} = S_{n-1} \delta^n (\b{x}) = \frac{\delta(r)}{r^{n-1}}\]
<p>Which in \(\bb{R}^3\) is</p>
\[\del \cdot \frac{\hat{\b{r}}}{r^{2}} = 4 \pi \delta^3 (\b{x}) = \frac{\delta(r)}{r^2}\]
<p>Meanwhile if curl is integrated around a loop in e.g. the \(\b{xy}\) plane, then the integrand includes a factor of the radius \(\rho\) in that plane and is therefore canceled out by \(\rho^{-1}\).</p>
\[\del \times \frac{\hat{\theta}}{\rho^{-1}} = (0, 0, 2 \pi \delta(x, y)) = (0, 0, \frac{\delta(\rho)}{\rho})\]
<p>The analogs on other dimensions of \(n\)-spheres can be used to generalize these to higher dimensions, or to charge or current distributions that take lower-dimensional forms like lines or planes of charge.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:laplacian" role="doc-endnote">
<p>Sorry but I am stubbornly opposed to the Laplacian symbol \(\Delta = \del^2\) <a href="#fnref:laplacian" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:spherical" role="doc-endnote">
<p>By the way (because I definitely didn’t know this off the top of my head) you can’t just replace \(\delta(\b{x})\) with \(\delta(r)\). Translating \(\delta(\b{x})\) to spherical coordinates requires some extra care because it has to be true that \(\int_V \delta(x,y,z) d^3 \b{x} = \int_V \delta(r) (r^2 \sin \theta) \, dr \, d \theta \, d\phi\). The two angular integrals integrate to \(4 \pi\), so \(\delta(r)\) has to have a \(\frac{1}{4 \pi r^2}\) to cancel everything out. <a href="#fnref:spherical" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:delta" role="doc-endnote">
<p>In general there are two ways of defining delta functions: either you make them out of a limit of functions you know how to do analysis on and prove that the limit is well-defined, which is this regularization procedure… or you define them to have certain properties by fiat, and then show that they exist. The latter, IMO, is the “right” way. I think the approximations are only to satisfy people who are unnecessarily fixated on classical functions that have definite values at points. (There’s a rather nice book called “Theory of Distributions: A Non-Technical Introduction” by Richards & Youn which I like because of how much it commits to the better approach.) <a href="#fnref:delta" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:derivative" role="doc-endnote">
<p>The sense in which this is a 1d derivative: \(1/V\) can be written as \(\int dV\), with the integral is over a ball of radius \(\e\). In the numerator it’s over the boundary of that ball. So divergence is \(\underset{e \ra 0}{\lim} [\int_{ \p B_\e} f \, dA] / (\int_{B_\e} dV) .\) In one dimension a ball of radius \(\e\) is just a line segment, so this is literally the same as the 1d derivative \(\lim_{\e \ra 0} \frac{f(x + \e) - f(x)}{\e}\). <a href="#fnref:derivative" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:quad" role="doc-endnote">
<p>\(\hat{Q}\) here is the rank-2 <a href="https://en.wikipedia.org/wiki/Quadrupole">quadrupole tensor</a>. Equations using it and higher-order multipoles are best unpacked in index notation: \(\p^2_{ \hat{Q}} \delta(\b{x}) = Q^{ij} \p_i \p_j \delta(\b{x})\). By the way, I haven’t learned a ton about \(\hat{Q}\), and I’m a bit confused about when it ought to have a factor of \(1/2\) or not. It might be a convention. Definitely double-check before using this. <a href="#fnref:quad" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Delta Function Miscellanea2023-10-14T00:00:00+00:00https://alexkritchevsky.com/2023/10/14/delta-fns<p>Here’s some stuff about delta functions I keep needing to remember, including:</p>
<ul>
<li>the best way to define them</li>
<li>how \(\delta(x)/x = - \delta'(x)\)</li>
<li>possible interpretations of \(x \delta(x)\)</li>
<li>some discussion of the \(\delta(g(x))\) rule</li>
<li>how \(\delta(x)\) works in curvilinear coordinates.</li>
</ul>
<!--more-->
<hr />
<h2 id="1-definitional-stuff">1. Definitional Stuff</h2>
<p><strong>Quibbles about Definitions</strong></p>
<p>I don’t like the way most books introduce delta functions. IMO, if all the ways of defining something give rise to the same properties, then that object “exists” and you don’t really need to define it in terms of another object. Sure, you can construct a delta (distribution) as a limit of Gaussians fixed a fixed integral or whatever, but why would you? \(\int \delta(x) f(x) \, dx = f(0)\) is just fine. (Well, you do have to include some other properties to ensure that \(\< \delta', f \> = - \< \delta, f \>\), but that’s not important.)</p>
<p>The most common definition in physics is the definition in terms of the Fourier transform:</p>
\[\delta(k) = \frac{1}{2 \pi}\int e^{-ikx} dx\]
<p>And I would emphasize that that is just an identity, not a definition, similar to \(\sin^2 + \cos^2 = 1\).</p>
<p>I also don’t care at all about the use of the word “function” vs. “generalized function” vs. “distribution”. For my purposes, everything is a distribution and demanding a value at a point is (possibly) a mistake. I imagine that in the far future we will use the word “function” for all of these things starting in high school and nobody will care.</p>
<hr />
<p><strong>Fourier Transform Interpretation</strong></p>
<p>The Fourier transform of \(f(x)\) is given by:</p>
\[\hat{f}(k) = \int f(x) e^{-ikx} dx\]
<p>One interpretation of the Fourier transform is something like:</p>
<blockquote>
<p>\(e^{-ikx}\) is an orthogonal basis for frequency-space functions. We write \(f\) In this basis \(\hat{f}(k)\) by projecting \(f(x)\) onto each component of basis by taking an inner product \(\< f(x), e^{-ikx}\> = \int f(x) e^{-ikx} dx\).</p>
</blockquote>
<p>That’s a pretty good definition, and one that I hold dearly because it took me a while to figure out in college, but I think there’s an even better interpretation waiting in the wings. Something like:</p>
<blockquote>
<p>A function \(f\) is a generic object that doesn’t know anything about our choices of bases. The position-space implementation \(f\) is simply \(f\) written out in the position basis. The Fourier Transform of \(f(x)\) is \(f\) evaluated at \(\hat{k}\), where \(\hat{k}\) is a frequency-value rather than a position value, but the two bases live on equal footing and we can treat either as fundamental.</p>
</blockquote>
<p>It just so happens that it’s implemented as an integral transform. In particular, the transform is kinda like computing \(f \ast \delta(\hat{k})\), where the convolution acts like an operation that projects objects into different bases. whatever that means. We could imagine expressing both \(f\) and \(\delta(\hat{k})\) in a <em>third</em> basis, neither position nor frequency, and that operation should still make sense.</p>
<hr />
<h2 id="2-derivatives-of-delta-act-like-division">2. Derivatives of \(\delta\) act like division</h2>
<p>I always end up needing to look this up.</p>
<p>The rule for derivatives of delta functions are most easily found by comparing their Fourier transforms. Since we know the that \(\p_x^n = n x^{n-1}\) we can compare, using \(\F(x f) = i \p_k \hat{f}\) and \(\F(\p_x f ) = i k \hat{f}\):</p>
\[\begin{aligned}
\F(\p_x x^n ) &= \F(n x^{n-1} ) \\
(ik) (i \p_k)^{n} \delta_k &= n (i \p_k)^{n-1} \delta_k \\
- k \delta_k^{(n)} &= n \delta_k^{(n-1)} \\
\end{aligned}\]
<p>This shows us the relationship between \(\delta_k^{(n)}\) and \(\delta_k^{(n-1)}\). Evidently they differ by a factor of \(-\frac{k}{n}\). Repeating the process (and switching the variables back to \(x\) since we don’t need the Fourier transforms anymore) gives</p>
\[\begin{aligned}
- x \delta^{(n)} &= n \delta^{(n-1)} \\
(-x)^2 \delta^{(n)} &= n (n-1) \delta^{(n-2)} \\
(-x)^3 \delta^{(n)} &= n (n-1) (n-2) \delta^{(n-3)} \\
& \vdots \\
(-x)^n \delta^{(n)} &= (n!) \delta^{(0)} \\
\end{aligned}\]
<p>Rearranging things, we get a bunch of useful identites:</p>
\[\begin{aligned}
\delta' &= - \frac{1}{x} \delta \\
\delta^{(2)} &= 2\frac{}{x^2} \delta \\
&\vdots \\
\delta^{(n)} &= \frac{ n!}{(-x)^n} \delta
\end{aligned}\]
<p>And also:</p>
\[\begin{aligned}
\frac{\delta}{x} &= - \delta' \\
\frac{\delta}{x^2} &= \frac{\delta^{(2)}}{2} \\
& \vdots \\
\frac{\delta}{x^n} &= \frac{(-1)^n}{n!} \delta^{(n)}
\end{aligned}\]
<p>Etc. I write this all out because it is easy to get confused by the factorials in there (and as a reference for myself…). Note that if \(\delta(x)\) is replaced with something like \(\delta(x - a)\), then all those denominators become \(\frac{1}{-(x-a)^n}\).</p>
<p>When you actually go to integrate these against a test function, it reveals an interesting relationship between delta functions and derivatives.</p>
\[\begin{aligned}
\int \frac{ n!}{(-x)^n} \delta f \d x &= \int \delta^{(n)} f \d x \\
&= \int (-1)^n \delta f^{(n)} \d x \\
\frac{n!}{(-0)^n} f(0) &\stackrel{?}{=} (-1)^n f^{(n)}(0) \\
\frac{n!}{0^n} f(0) &\stackrel{?}{=} f^{(n)}(0)
\end{aligned}\]
<p>The left side, of course, is really a <a href="https://en.wikipedia.org/wiki/Principal_value">principal value</a> \(\P \int \frac{ n!}{(-x)^n} \delta f \d x\), which we imagine to mean, basically, “evaluate this at zero but very carefully”. To see what this could mean, imagine that \(f(x)\) has a Taylor series \(f(x) = f_0 + f_1 x + \frac{x^2}{2!} f_2 + \ldots\). Then the left side <em>sorta</em> extracts the \(f_n\) term, because all the lower-order terms like \(\frac{0}{0^n}\) go to infinity (which we ignore?) and all the higher-order terms like \(\frac{0^{n + m}}{0^n} = 0^m\) go to zero.</p>
<p>Somehow this hints at the true magic of delta functions but I don’t quite see it yet.</p>
<hr />
<h2 id="3-multiplications-of-delta-act-like-integrals">3. Multiplications of \(\delta\) act like integrals?</h2>
<p>What about \(x^n \delta(x)\) where \(n > 0\)?</p>
<p>According to the actual rigorous theory of distributions, \(x^n \delta(x) = 0\) for any \(n > 1\), because its integral against a test function is zero. But I don’t believe them. I think there’s more going on here.</p>
<p>To illustrate this point, consider extending the argument of the last section to a function with a Laurent series (a finite number of negative-power terms):</p>
\[f(x) = \ldots + f_{-2} \frac{2!}{x^2} + f_{-1} \frac{1}{x} + f_0 + f_1 + f_2 \frac{x^2}{2!} + \ldots\]
<p>Then it is fairly clear that we could extract the negative-power terms in the same way:</p>
\[f_{-n} = \P \int \frac{x^n}{n!} \delta f \d x\]
<p>Assuming that, once again, all the powers of zero other than \(0^0 = 1\) “cancel out” somehow. So I would argue that \(\frac{x^n}{n!} \delta(x)\) is extracting <em>residues</em> the same way that \(\frac{n!}{x^n} \delta(x)\) extracts <em>derivatives</em>. It’s very nicely symmetric, if you’re willing to allow that \(x \delta \neq 0\).</p>
<p>What does it mean to extract a residue with a delta function? Well, it means that \(\P \int x \delta f d x\) is zero (or some other value we pretend to equal zero) unless \(f(x) \sim \frac{f_{-1}}{x}\) at that point, in which case it extracts that coefficient \(f_{-1}\). Residues aren’t quite the same thing as integrals, but what seems to happen is that, <em>when</em> you close your integration contours, residues are the only thing that’s left — like how in \(\bb{C}\), a closed integral picks up only the residues inside the integration boundary.</p>
<p>I guess this is useful in two ways. One, it’s the same idea of a residue that you get in complex analysis using the Cauchy integral formula:</p>
\[f_{-1}(z) = \frac{1}{2\pi i} \oint_C \frac{f(z)}{z} \d z\]
<p>but it’s extracted in a much more intuitive way. I have <a href="/2020/08/10/complex-analysis.html">written before</a> about how the Cauchy integral formula works. The short version is that if you apply Stoke’s theorm it turns into \(\iint \delta(\bar{z}, z) \d \bar{z} \^ \d z\), which relies on the fact that, for mysterious reasons, \(\p_{\bar{z}} \frac{1}{z} = 2 \pi i \delta(z, \bar{z})\).</p>
<p>Two, it makes it a lot easier to see how you would generalize the Cauchy integral formula, the concept of residues, and Laurent series to higher dimensions. Integrating against a delta function in the coordinate you care about — easy. Concocting a whole theory of contour integration — super weird and hard. Works for me.</p>
<p>The one weakness, of course, is that it’s rather unclear what to do with the fact that, evidently, \(\P \int \frac{x}{x^2} \delta \d x = 0\). Shouldn’t \(\frac{0}{0^2} = \infty\), or something like that? Not sure. But this is just one out of very many instances where it seems like math doesn’t handle dividing by zero correctly, so I guess we can file it away in that category and not worry about it for a while.</p>
<p>To summarize, we claim with some handwaving that:</p>
\[\< \frac{x^n}{n!} \delta, f \> = f^{(-n)}(0)\]
<p>Where the meaning of a “negative derivative” of a function is that it is a residue, ie, the \((-n)\)‘th term in the Laurent series of \(f\) around \(x=0\).</p>
<hr />
<h2 id="4-deltagx-becomes-a-sum-over-poles">4. \(\delta(g(x))\) becomes a sum over poles</h2>
<p>I always end up needing to look this up too.</p>
<p>Since \(\delta(x)\) integrates \(f(x)\) to \(f(0)\) at <em>every</em> zero of the \(\delta\), we of course have, via \(u = g(x)\) substitution:</p>
\[\begin{aligned}
\int \delta(g(x)) f(g(x)) \, dg(x) &= \int \delta(u) f(u) \, du \\
&=f(u)_{u = 0} \\
&=f(g^{-1}(0)) \\
\int \delta(g(x)) f(g(x)) \| g'(x) \| dx &= f(g^{-1}(0)) \\
\delta(g(x)) &= \sum_{x_0 \in g^{-1}(0)} \frac{\delta(x - x_0)}{\| g'(x_0) \|} \\
\end{aligned}\]
<p>For instance:</p>
\[\delta(x^2 - a^2) = \frac{\delta(x - a)}{2\| a \|} + \frac{\delta(x + a)}{2\| a \|}\]
<p>Somewhat harder to remember is the multivariable version:</p>
\[\begin{aligned}\int f(g(\b{x})) \delta(g( \b{x})) \| \det \del g(\b{x}) \| d^n \b{x}
&= \int_{g(\bb{R})} \delta(\b{u}) f(\b{u}) d \b{u} \\
\int f(\b{x}) \delta(g (\b{x})) d \b{x} &= \int_{\sigma = g^{-1}(0)} \frac{f(\b{x})}{\| \nabla g(\b{x}) \|} d\sigma(\b{x}) \\
\end{aligned}\]
<p>Where the final integral is in some imaginary coordinates on the zeroes of \(g(\b{x})\).</p>
<p>In general there is a giant model of “delta functions for surface integrals” which I’ve never quite wrapped my head around, but intend to tackle in a later article. Basically there’s a sense in which every line and surface integral, etc, can be modeled as an appropriate delta function. Wikipedia doesn’t talk about it much. There’s a couple lines on the delta function page, but there’s quite a more for some reason on the page for <a href="https://en.wikipedia.org/wiki/Laplacian_of_the_indicator">Laplacian of the Indicator</a>.</p>
<p>I’d also love to understand the version of this for vector- or tensor-valued functions as well. What goes in the denominator? Some kind of non-scalar object? Weird.</p>
<p>By the way, there is a cool trick which I found in a paper called <a href="https://www.reed.edu/physics/faculty/wheeler/documents/Miscellaneous%20Math/Delta%20Functions/Simplified%20Dirac%20Delta.pdf">Simplified Production of Dirac Delta Function Identities</a> by Nicholas Wheeler<sup id="fnref:wheeler" role="doc-noteref"><a href="#fn:wheeler" class="footnote" rel="footnote">1</a></sup> to derive \(\delta(ax) = \frac{\delta(x)}{\| a \|}\). We observe that \(\theta(ax) = \theta(x)\) if \(a > 0\) and \(\theta(ax) = 1 - \theta(x)\) if \(a < 0\). So we can compute \(\p_x \theta(ax)\) in two different ways:</p>
\[\begin{aligned}
\p_x \theta(ax) &= \p_x \theta(ax) \\
a \theta'(ax) &= \sgn(a) \theta'(x) \\
\delta(ax) &= \frac{1}{\| a \|} \delta(x)
\end{aligned}\]
<p>That paper also observes another property I hadn’t thought about, which is that</p>
\[\delta'(ax) = \frac{1}{a \| a \|} \delta(x)\]
<p>Basically, the funny “absolute value” business only happens in the derivative of \(\theta(ax)\), not the rest of the chain. There are also ways of deriving the more general properties like the form of \(\delta(g(x))\) by starting from derivatives of \(\theta(g(x))\).</p>
<hr />
<h2 id="5-delta-is-weird-in-other-coordinate-systems">5. \(\delta\) is weird in other coordinate systems</h2>
<p>I am often reminding myself how \(\delta\) acts in spherical coordinates.</p>
<p>It is useful to think about \(\delta\) as being defined like this:</p>
\[\delta(x) = \frac{1_{x =0 }}{dx}\]
<p>In the sense that it is designed to perfectly cancel out \(dx\) terms in integral. It’s \(0\) everywhere, except at the origin where it perfectly cancels the out \(dx\). Point is, \(\delta\) always transforms like the <em>inverse</em> of how \(dx\) transforms. If you write \(dg(x) = \| g'(x) \| dx\), then of course \(\delta\) transforms as</p>
\[\delta(g(x)) = \frac{\delta(x - g^{-1}(0))}{\| g'(x) \|}\]
<p>This, at least, makes it easy to figure out what happens in other coordinate systems.</p>
<p>By the way. The notation \(\delta^3(\b{x})\) customarily means that the function is <em>separable</em> into all the individual variables: \(\delta^3(\b{x}) = \delta(x) \delta(y) \delta(z)\). In other coordinate systems this <em>doesn’t</em> work: separating it requires introducing coefficients, as we’re about to see.</p>
<p>Here’s spherical coordinates:</p>
\[\begin{aligned}
\iiint_{\bb{R}^3} \delta^3(\b{x}) f(\b{x}) d^3 \b{x} &= f(x=0,y=0,z=0) \\
f(r=0, \theta=0, \phi=0)
&= \int_0^{2 pi} \int_{-\pi}^\pi \int_0^\infty \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} f(r, \theta, \phi) r^2 \sin \theta \, dr \, d \theta \, d \phi \\
&= \int_{-\pi}^\pi \int_0^\infty \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} f(r, \theta, 0) ( 2 \pi r^2 \sin \theta) \, dr \, d \theta \\
&= \int_0^\infty \frac{\delta(r)}{4 \pi r^2 } f(r, 0, 0) ( 4 \pi r^2 ) \, dr \\
&= f(0,0,0)
\end{aligned}\]
<p>So:</p>
\[\begin{aligned}
\delta(x, y, z) &= \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} \\
&= \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} \\
&= \frac{\delta(r)}{4 \pi r^2 }
\end{aligned}\]
<p>There is some trickiness to all this, though. Be careful: the \(r\) integral is from \((0, \infty)\) instead of the conventional \((-\infty, \infty)\). Sometimes identities that you’re used to working won’t work the same way if you are dealing with \(\delta(r)\) as a result. Also, it’s very unusual, but not impossible I suppose, to have functions that have a non-trivial \(\theta\) dependence even as \(r \ra 0\). I have no idea what that would be mean and I don’t know how to handle it with a delta function.</p>
<p>I’ve occasionally also seen it written in this weird way, where the \(\cos \theta\) factor causes the \(\sin \theta\) in the denominator to disappear.</p>
\[\delta(x,y,z) = \frac{\delta(r, \cos \theta, \phi)}{r^2}\]
<p>Here’s the polar / cylindrical coordinate version:</p>
\[\delta^2(x, y) = \frac{\delta(r, \theta)}{r} = \frac{\delta(r)}{2 \pi r}\]
<p>Evidently in \(\bb{R}^n\), the numerators are related to the surface areas of <a href="https://en.wikipedia.org/wiki/N-sphere">n-spheres</a>.</p>
<hr />
<h2 id="6-the-indicator-function-i_x--x-deltax">6. The Indicator Function \(I_x = x \delta(x)\)</h2>
<p>Related to \(x^n \delta(x)\) up above…</p>
<p>Since \(\int \delta(x) f(x) \, dx = f(0)\), we could flip this around and say that this is the <em>definition</em> of evaluating \(f\) at \(0\). Or, more generally, we could say that integrating against \(\delta(x-y) \, dx\) is “what it means” to evaluate \(f(y)\).</p>
<p>This is a bit strange though. Why does evaluation require an integral? Maybe we need to define a new thing, the indicator function, which requires no integral:</p>
\[I_x f = f(x)\]
<p>The definition is</p>
\[I_x = \begin{cases}
1 & x = 0 \\
0 & \text{otherwise}
\end{cases}\]
<p>But that probably masks its distributional character. A better definition is that it’s just</p>
\[I_x = x \delta(x)\]
<p>Whereas \(\delta_x\) is infinite at the origin and is defined to <em>integrate</em> to \(1\), the \(I_x\) function is just required to <em>equal</em> one at the origin. Of course, its integral is \(0\). It could also be constructed like this:</p>
\[I_x = \lim_{\e \ra 0^{+}} \theta(x - \e) - \theta(x + \e)\]
<p>(Compare to the \(\delta\) version: \(\delta_x = \lim_{\e \ra 0^{+}} \frac{1}{x} [ \theta(x - \e) - \theta(x + \e)] \stackrel{?}{=} \P(\frac{I_x}{x})\).)</p>
<p>By either definition, \(I_x\) has zero derivative everywhere:</p>
\[\p_x I_x = \delta(x) - \delta(-x) = 0 \\
\p_x (x \delta(x)) = \delta(x) + x \delta'(x) = \delta(x) - \delta(x) = 0\]
<p>Compare to \(\p_x \delta(x) = \delta'(x) = -\frac{\delta(x)}{x}\). It sorta seems like there might be some even <em>further</em> generalization of functions which could distinguish this derivative from \(0\), since obviously \(x \delta(x)\) is not, in fact, constant at \(x = 0\). The derivative would be some distribution-like object which has the property that \(I'(x, dx) = \begin{cases} 1 & x + dx = 0 \\ 0 & \text{otherwise} \end{cases}\) … which is weird.</p>
<p>\(I_x\) also has this delta-function-like property:</p>
\[\int I_x \frac{f(x)}{x} dx = f(0)\]
<p>(in a “principal value” sense, of course) It seems natural to consider a whole family of these with any power, \(x^k \delta(x)\). As we know, dividing \(\delta\) by powers of \(x\) like \(\delta^{(n)}(x)\) produces derivatives: \((\delta(x)/x )f(x) = -f'(0)\). So I would guess that these positive-power \(x^n \delta(x)\) functions produce… integrals? But normally (cf contour integration) integrals add up contributions from (a) boundaries and (b) poles (and really poles are a kind of boundary, topologically). These \(x^n \delta(x)\) terms only add up the constributions from poles but do nothing at infinity. Maybe that’s because in some sense the \(x^{-n} \delta(x) \propto \delta^{(n)}(x)\) terms are the ones that deal with poles at infinity?</p>
<p>I like this \(I_x\) object. It seems fundamental. Maybe we should just write \(f(x) = I_x f\) all the time.</p>
<hr />
<h2 id="7-miscellaneous-breadcrumbs">7. Miscellaneous Breadcrumbs</h2>
<p>Things I want to remember but don’t have much to say about:</p>
<p>There is a thing called a <a href="https://en.wikipedia.org/wiki/Wave_front_set">Wavefront Set</a> that comes from the subfield of “microlocal analysis”. It allows ‘characterizing’ singularities in a way that, for instance, would extract which dimensions a delta function product like \(\delta(x) \delta(y)\) is acting in.</p>
<p>Among other things, the Wavefront Set allows you to say when multiplying distributions together is well-behaved: as I understand it, they have to not have singularities “in the same directions”. \(\delta(x) \delta(x)\), for instance, is not allowed. (I bet \((x \delta)^2\) is, though.) Here’s a <a href="https://arxiv.org/abs/1404.1778">nice paper on the subject</a>.</p>
<p>There are further generalizations of functions called <a href="https://en.wikipedia.org/wiki/Hyperfunction">hyperfunctions</a>, which are instead defined in terms of “the difference of two holomorphic functions on a line” (which can be e.g. a pole at the origin). Gut reaction: relies on complex analysis, which sounds annoying.</p>
<p>A <a href="https://en.wikipedia.org/wiki/Current_(mathematics)">current</a> is a differential-form distribution on a manifold. Some day I’m going to have to learn about those, but for now, nah, I’m good.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:wheeler" role="doc-endnote">
<p>This paper also has something interesting, if not entirely comprehensible, things to say about the existence of forward- and backwards- time propagators in QFT wave equation solutions. <a href="#fnref:wheeler" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
How To Invert Everything By Dividing By Zero2023-09-25T00:00:00+00:00https://alexkritchevsky.com/2023/09/25/inverses<p>For a generic linear equation like \(ax = b\) the solutions, if there are any, seem to always be of the form</p>
\[x = a_{\parallel}^{-1} (b) + a_{\perp}^{-1} (0)\]
<p>regardless of whether \(a\) is “invertible”. Here \(a_{\parallel}^{-1}\) is a sort of “parallel inverse”, in some cases called the “pseudoinverse”, which is the invertible part of \(a\). \(a_{\perp}^{-1}\) is the “orthogonal inverse”, called either the nullspace or kernel of \(a\) depending what field you’re in, but either way it’s the objects for which \(ax = 0\). Clearly \(ax = a (a_{\parallel}^{-1} (b) + a_{\perp}^{-1} (0)) = a a_{\parallel}^{-1} (b)\), and that’s the solution if one exists.</p>
<p>This pattern shows up over and over in different fields, but I’ve never seen it really discussed as a general phenomenon. But really, it makes sense: why shouldn’t <em>any</em> operator be invertible, as long as you are willing to have the inverse (a) live in some larger space of objects and (b) possibly become multi-valued?</p>
<p>Here are some examples. Afterwards I’ll describe how this phenomenon gestures towards a general way of dividing by zero.</p>
<!--more-->
<hr />
<h2 id="1-the-pattern">1. The Pattern</h2>
<h3 id="matrices">Matrices</h3>
<p>Consider a linear system of equations that’s represented by matrix multiplication as</p>
\[A \b{x} = \b{b}\]
<p>If \(A\) is square and invertible, then the unique solution to this equation is given via left-multiplication by \(A^{-1}\):</p>
\[\b{x} = A^{-1} \b{b}\]
<p>If \(A\) is not square and invertible, then the system of equations is either underspecified (has a non-trivial linear subspace of possible solutions) or overspecified (is unsolvable because there are more linearly-independent constraints than degrees of freedom).</p>
<p>The matrix \(A\) partitions its domain and codomain into some subspaces:</p>
<ul>
<li>the row space \(\text{row}(A)\) is all \(x\) such that \(Ax \neq 0\), so it’s the input vectors that \(A\) produces something interesting from.</li>
<li>the kernel \(\text{ker}(A)\) is all \(x\) such that \(ax = 0\), so it’s input vectors that \(A\) ignores.</li>
<li>the column space \(\text{col}(A)\) is all \(y\) such that \(Ax = y\) for some \(x\), so it’s output vectors that \(A\) can produce.</li>
<li>the cokernel \(\text{coker}(A)\) is all \(y\) such that \(Ax \neq y\) for any \(x\), so it’s output vectors that \(A\) can’t produce.</li>
</ul>
<p>\(A\b{x} = \b{b}\) is solvable if \(\b{b}\) lies in \(\text{col} (A)\). If so, \(\b{x}\) will consist of a sum of elements from \(\text{row}(A)\), but can also include any element from \(\text{ker}(A)\) freely since it doesn’t change the value of \(A \b{x}\). That is, it takes the form \(\b{x} = \b{x}_{\text{one}} + \b{x}_{\text{free}}\), where the first term is <em>one</em> solution and the second term produces all the others. In fact the explicit form is given in terms of the <a href="https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse">Moore-Penrose Pseudoinverse</a> \(A^{+}\):</p>
\[\b{x} = A^+ (\b{b}) + (I - A^+ A) \vec{\lambda}\]
<p>where \(\vec{\lambda}\) is a vector of free parameters corresponding to the number of degrees of freedom in the solution space.</p>
<p>\(A^{+}\) acts like an inverse, but only between \(\text{row}(A)\) and \(\text{col}(A)\) (which necessarily have the same dimension). Instead of the condition \(A A^{-1} = I\) the pseudoinverse obeys the weaker condition \(A A^{+} A = A\), which is why the second term above cancels out: \(A(I - A^+ A) = A - A A^+ A = A- A = 0\). Meanwhile \(A^+ \b{b}\) maps \(\b{b}\) from the column space back to the row space, if possible (or gets as close as possible, if not).</p>
<p>Anyway, of the instances of this “generalized inverse” thing, this matrix pseudoinverse is the most refined and easiest to understand. But it seems to make sense in a lot of other settings.</p>
<hr />
<h3 id="scalars">Scalars</h3>
<p>Once again, if an equation of the form \(ax = b\) can be inverted, it usually looks like this:</p>
\[x = a_{\parallel}^{-1} (b) + a_{\perp}^{-1}(0)\]
<p>Where the first term is a single solution that satisfies the constraint, and the second term is any displacement ‘orthogonal’ to \(a\) within the solution space that has \(a (a_{\perp}^{-1}) = 0\).</p>
<p>For instance here is how you invert “scalar multiplication”, that is, how you divide by zero.</p>
\[\begin{aligned}
ax &= b \\
x &= b/a + \lambda 1_{a=0}
\end{aligned}\]
<p>\(\lambda\) is a free parameter and \(1_{a=0}\) an indicator function. If \(a=0\) and \(b=0\) then this works as long as we interpret \(b/a = 0/0\) to equal, say, \(1\). If \(a=0\) but \(b \neq 0\) then the answer is infinite, which makes sense: there’s really no solution to \(0x = b\) unless \(x\) is an object that acts like \(b/0\). So we can write the solutions as:</p>
\[x = \begin{cases}
b/a & a \neq 0 \\
b \times \infty & a = 0, b \neq 0 \\
\text{anything} & a = 0, b = 0 \\
\end{cases}\]
<p>Not that this is especially useful on its own, but it’s one of a pattern. In a sense it is the <em>right</em> answer, in that it contains all the information that the true inverse of this operation has to contain — just, represented in a way we don’t really know how to do math with. Also note that this is basically the \(1 \times 1\) version of the matrix pseudoinverse.</p>
<hr />
<h3 id="vectors">Vectors</h3>
<p>Here’s the \(1 \times N\) version, for a more interesting example.</p>
\[\b{a} \cdot \b{x} = b\]
<p>Then</p>
\[\begin{aligned}
\b{x} &= \b{a}_{\parallel}^{-1} (b) + \b{a}_{\perp}^{-1}(0) \\
&= \frac{\b{a}}{\| \b{a} \|^2} (b) + \vec{\lambda} \cdot \b{a}_{\perp}
\end{aligned}\]
<p>The first term is again the pseudoinverse \(\b{a}\) (naturally, since \(\b{a}\) is a matrix). For vectors we can write it easily: it’s \(\b{a}^{+} = \b{a}_{\parallel}^{-1} = \frac{\b{a}}{\| \b{a} \|^2}\). It only inverts the subspace in which \(\b{a}\) is invertible, which for a vector is just one-dimensional. The second term is the “orthogonal inverse” of \(\b{a}\), which is any element of the \((N-1)\)-dimensional nullspace which is orthogonal to \(\b{a}\). \(\vec{\lambda}\) here is a meaningless vector of imagined free parameters which select a particular element out of \(\b{a}_{\perp}\).</p>
<hr />
<h3 id="functions">Functions</h3>
<p>Here’s an example with functions.</p>
\[x f(x) = 1\]
<p>The solution is</p>
\[f(x) = \begin{cases}
1/x & x \neq 0 \\
\lambda & x = 0
\end{cases}\]
<p>\(\lambda\) is again a free parameter. We’d kinda like to write these as one equation, and the way we can do it is like this:</p>
\[f(x) = \mathcal{P}(\frac{1}{x}) + \lambda \delta(x)\]
<p>Where \(\mathcal{P}\) is the <a href="https://en.wikipedia.org/wiki/Cauchy_principal_value">Cauchy Principal Value</a> and \(\delta(x)\) is a delta distribution. Whereas vectors and matrices forced us to “kick up into” a space with free parameters, functions require us to also “kick up into” the space of distributions. Neither \(\P\) nor \(\d\) gives a useful value on its own at \(x=0\), but when used as a distribution (e.g. integrated against a test function) this object really does behave like the solution to \(x f(x) = 1\). This is a pattern: when we invert non-invertible things, we often have to (a) introduce free parameters and (b) “leave the space we’re in” for a larger space that can contain the solution. In this case we leave “function space” for “distribution space” because \(\frac{1}{x}\) has no value at \(x=0\).<sup id="fnref:step" role="doc-noteref"><a href="#fn:step" class="footnote" rel="footnote">1</a></sup></p>
<p>What if we have more powers of \(x\)?</p>
\[x^n f(x) = 1\]
<p>We should get something like</p>
\[f(x) = \mathcal{P}(\frac{1}{x^n}) + \delta(x) [\lambda_1 + \lambda_2 x + \ldots \lambda_{n-1} x^{n-1} ]\]
<p>Since, by the same argument, all of those terms should go to \(0\) when \(x=0\) as well.</p>
<hr />
<h3 id="differential-operators">Differential Operators</h3>
<p>Here’s another example:</p>
\[D f = g\]
<p>The natural solution is</p>
\[f = D^{-1}_{\parallel}(g) + D^{-1}_{\perp}(0)\]
<p>What do these terms mean for a differential operator? The second term, again, is an inverse operator that produces the “nullspace” of \(D\), which is to say, produces solutions to the homogenous equation \(D f = 0\). These are “free wave” solutions to the operator, which are packaged with a (probably) infinite number off free parameters. In general there is also a set of boundary conditions on the solution \(f\), so we’ll need to pick from these free parameters to satisfy the boundary conditions.</p>
<p>Or, we could solve it using a Green’s functions for \(D\), by computing \(G = D^{-1}_{\parallel}(\delta)\) and then \(f = G \ast g\).</p>
<p>Or, we could invert the operator by directly inverting in Fourier space, which will end up being the function inversion from the previous section:</p>
\[\hat{G} = \hat{D}_{\perp}^{-1}(\hat{g}) + \hat{D}_{\parallel}^{-1}(0)\]
<p>The \(\hat{D}_{\parallel}^{-1}(0)\) term will produce the homogenous solutions to the differential operator when un-Fourier-transformed.</p>
<p>Example: the Poisson equation \(\del^2 f(\b{x}) = g(\b{x})\) is Fourier transformed into \((i k)^2 \hat{f}(\b{k}) = \hat{g}(\b{k})\) so \(\hat{f}(\b{k}) = \frac{1}{(ik)^2} \hat{g}(\b{k}) + \lambda_1 \delta(k) + O(\frac{1}{k}) \ldots\).<sup id="fnref:radial" role="doc-noteref"><a href="#fn:radial" class="footnote" rel="footnote">2</a></sup> The third term can be \(\frac{1}{k} \delta(k)\), \(\frac{A\b{k}}{k^2}\), or higher-order tensor combinations as long as the net magnitude is \(O(\frac{1}{k})\).</p>
<p>When untransformed we <em>should</em> get</p>
\[\begin{aligned}
f(\b{x}) &= \mathcal{F}^{-1} [\frac{1}{(ik)^2} \hat{g}( \b{k}) + \lambda_1 \delta(k) + O(\frac{1}{k}) \lambda_2 \delta(k)] \\
&= - \int \frac{1}{4 \pi \| r - r_0 \|} g(\b{x}_0) d \b{x}_0 + \text{<the harmonic functions>}
\end{aligned}\]
<p>Although I’m not quite sure how to take those Fourier transforms so I can’t promise it.</p>
<p>Anyway, point is, whatever our generalized inverse operation looks like on derivative operators, it should be related via Fourier transforms to the generalized inverse on functions.</p>
<hr />
<h1 id="2-inverting-matrices-by-dividing-by-zero">2. Inverting Matrices By Dividing By Zero</h1>
<p>There is an interesting way of looking at the procedure for solving a matrix equation \(A\b{x} = \b{b}\). We recast the problem into <a href="/2019/02/23/exterior-6.html">projective geometry</a> by adding a single “homogenous” coordinate to the vector space, and then rewriting the equation like this:</p>
\[(A, -\vec{b}) \cdot (\b{x}, 1) = 0\]
<p>\((A, -\b{b})\) is a single \(N \times (N+1)\)-dimensional matrix, and now we’re solving an \((N+1)\)-component vector instead. That vector will come to us with the last coordinate \(\neq 1\), of course, so we divide \((\lambda \b{x}, \lambda)\) through by \(\lambda\) at the end to find the true value of \(\b{x}\). That division won’t work if \(\lambda = 0\), of course, but in that case we interpret “dividing by zero” to equal “infinity”, so that \((\b{x}, 0) \sim (\infty \b{x}, 0)\). In projective geometry these infinite objects are completely meaningful and useful; for instance the intersection of two parallel lines is a “point at infinity” and this means that the statement “any two lines intersect in a point” becomes true without any discalimers. As such the division-by-zero is a feature, not a bug.</p>
<p>The nice thing about this form is that there’s no longer two heterogenous terms, \(a^{-1}_{\parallel}\) and \(a^{-1}_{\perp}\), in the inverse. Writing \(B = (A, \b{b})\), the new equation is now</p>
\[B (\b{x}, 1) = 0\]
<p>And the <em>only</em> thing we’re concerned with is the nullspace of that matrix. Of course, the matrix is no longer square, so it’s going to have a degree of freedom in its output (which cancels out with the new coordinate that was forced to equal \(1\)). But that’s not a problem: we want to deal with degrees of freedom in a generic way in the output anyway. We’ll just expect to get at least one.</p>
<p>The solution to this is best understood in terms of exterior algebra. First let’s walk through the case where \(A\) is invertible, so that \(B^{\^n} \neq \vec{0}\). Then there is a multivector \(B^{\^n}\) which represents every dimension in \(B\), ie, every dimension it does <em>not</em> set to \(0\). Its complement \(B_{\perp} = \star B^{\^ n}\) represents every dimension \(B\) <em>does</em> set to zero. Since \(A\) is invertible (for now), this complement is one-dimensional, and we can compute it:</p>
\[\begin{aligned}
B_{\perp} &= \star B^{\^ n} \\
&= \star [(A, -\b{b})^{\^ n}] \\
&= \star (A^{\^(n-1)} \^ (-\b{b}), A^{\^ n})
\end{aligned}\]
<p>Now note that \(\star A^{\^(n-1)} = - \text{adj}(A)\), the <a href="https://en.wikipedia.org/wiki/Adjugate_matrix">adjugate matrix</a> of \(A\), \(\star [A^{\^(n-1)} \^ (-\b{b})] = \text{adj}(A) \b{b}\), and \(\star A^{\^n} = \det(A)\), the determinant. So that gets us to the inverse:</p>
\[\begin{aligned}
\lambda (\b{x}, 1) &= \star (\star -\text{adj}(A) \cdot (-\b{b}), \star \det(A)) \\
\b{x} &= \frac{\text{adj}(A)}{\det(A)} \cdot (-\b{b}) \\
&= A^{-1} \b{b}
\end{aligned}\]
<p>… Up to some signs that I’m <em>really</em> not sure about. It is very hard to keep track of how signs should work when you try to connect exterior algebra objects to regular linear algebra objects with \(\star\). At some point I’m going to hash it all out for myself so I can stop leaving this disclaimers but I haven’t gotten to it yet. See the aside for some justification, though.</p>
<aside class="toggleable" id="id" placeholder="<em>Aside</em>: Justification for signs">
<p>Until I get a chance to really figure out the rules for \(\star\), here’s a calculation I did to make sure I’m right.</p>
<p>Suppose \(B = (A_1, A_2, A_3, -\b{b})\) where each vector has three-components. We’ll call the fourth dimension \(\b{w}\) and declare the unit pseudoscalar to be \(\b{x} \^ \b{y} \^ \b{z} \^ \b{w}\). The rule for computing \(\star\) is:</p>
\[\alpha \^ (\star \alpha) = \| \alpha \|^2 \omega\]
<p>So e.g. \(\star (\b{x\^y\^w}) = - \b{z}\), because \(\b{x \^ y \^ w \^} \^ (-\b{z}) = \omega\) (because we have to move the \(\b{z}\) one spot over to get back to \(\omega\).)</p>
<p>Now, the easiest way to think about \(B^{\^ 3}\) is to just write down its four terms in <em>any order</em>, but keep track of their basis components:</p>
\[B^{\^3} = \underbrace{B_1 \^ B_2 \^ (-\b{b})}_{\b{x \^ y \^ w}} + \underbrace{B_2 \^ B_3 \^ (-\b{b})}_{\b{y \^ z \^ w}} + \underbrace{B_3 \^ B_1 \^ (-\b{b})}_{\b{z \^ x \^ w}} + \underbrace{B_1 \^ B_2 \^ B_3}_{\b{x \^ y \^ z}}\]
<p>When we compute \(\star B^{\^ 3}\), the signs follow these components, so we get</p>
\[\star B^{\^ 3} = \underbrace{B_1 \^ B_2 \^ (-\b{b})}_{- \b{z}} + \underbrace{B_2 \^ B_3 \^ (-\b{b})}_{- \b{x}} + \underbrace{B_3 \^ B_1 \^ (-\b{b})}_{- \b{y}} + \underbrace{B_1 \^ B_2 \^ B_3}_{+ \b{w}}\]
<p>Then you just shuffle the coordinates into whatever order you wanted your original matrix in, which is probably \((\b{x}, \b{y}, \b{z}, \b{w})\).</p>
\[\star B^{\^ 3} = (- \text{adj}(A) \cdot (-\b{b}), \det (A))\]
<p>It’s not elegant but it’ll have to do, at least until I can find or construct the <em>forbidden calculus of noncommutative tensorial exterior algebra</em>.</p>
</aside>
<p>I think of this object as a “negative \(1\)-th wedge power”:</p>
\[A^{-\^ 1} = \frac{ A^{\^(n-1)}}{A^{\^n}}\]
<p>Which is to say, it’s like the \((n-1)\)-th wedge power, but divided through by the \(n\)-th, giving a total “grade” of \(-1\). It’s an object that becomes the inverse matrix if we translate it back into matrix form with \(\star\), but is still meaningful… sorta… for non-invertible matrices.</p>
<hr />
<p>The step of this equation that fails if \(A\) is not invertible is that \(A^{\^n} = 0\), hence, the problem of inverting matrices is the same thing as the problem of dividing by zero, and a theory of dividing by zero correctly should definitely be a theory of inverting non-invertible matrices also!</p>
<p>What happens when we do that division? We get something like… I guess… that the solution to \(Ax = 1\) is something like \(x = \P ( \frac{1}{A}) + \vec{\lambda} \cdot A_{\perp} \delta(\det A) + \ldots\), a series of delta functions of different ordres. But how many \(\lambda\) components should we end up with? Obviously it should come from the <em>actual</em> grade of the solution multivector \(A_{\perp}\). So whatever system of dividing by zero we come up with is going to have to be able to “divide by a zero multivector” also.</p>
<p>The number of terms we end up with is related to (compared to the Moore-Penrose pseudo-inverse formula) the multiplicity of the zero at \(\det(A) = 0\). The more “copies of zero”, the more degrees of freedom in the output space that we’ll have to produce. That means that \(\text{diag}(0,1)\) and \(\text{diag}(0, 0)\) should be different objects: the former is \(\propto 0\) while the second is \(\propto 0^2\), so that we know how many degrees of freedom are produced in the inverse. The number of copies of zero is going to be the same as the number of \(0\) singular values in the matrix’s SVD.</p>
<p>The value we end up getting should in some sense be the same object as the Moore-Penrose pseudoinverse. So that means that this</p>
\[\b{x} = A^+ (\b{b}) + (I - A^+ A) \vec{\lambda}\]
<p>is definitely another way of writing this</p>
\[\b{x} = A^{\^(-1)} \cdot (-\b{b})\]
<p>which should, when properly formulated, be a formula that solves linear systems of equations “in full generality”.</p>
<hr />
<p>So, the “inverse of zero” is “a degree of freedom.” Division by a multivector necessarily produces extra \(\lambda\) terms, proportional to “how zero it is”.</p>
<p>By the way: the only reason we moved <em>into</em> projective coordinates was to deal with the problems of “dividing by zero” and dealing with infinities in a way our math system could handle. But if our math system could <em>handle</em> dividing by zero and producing degrees of freedom correctly on its own, then we wouldn’t have needed any of the projective coordinate stuff. This suggests a general principle: projective coordinates are being used to work around a problem in the underlying calculus.</p>
<p>There’s a lot more to say about this and I hope to investigate it later (and actually go learn algebraic geometry so I’m not cluelessly ignorant! Hmm.) But instead, let’s go look at differential equations again.</p>
<hr />
<h1 id="3-integration-constants-and-boundary-conditions">3. Integration Constants and Boundary Conditions</h1>
<p>The way we solved \(A \b{x} = \b{b}\) in projective coordinates for matrices can give us intuition inverting differential operators also like \(D f = g\).</p>
<p>Consisder how differential equations’ inverses produce degrees of freedom in their solutions. This is not something I know at any particular theoretical level, but over the years I’ve picked up a sense of it. For starters, obviously \(N\)th-order differential equation can end up with \(N\) integration constants:</p>
\[\p_x^N (f(x) + c_1 + c_2 x + \ldots c_N x^{N-1}) = \p_x^N f(x)\]
<p>But other operators end up with infinite series of solutions corresponding to their infinite space of eigenvalues:</p>
\[\begin{aligned}
( \p_x^2 + \lambda^2) f(x) &= 0 \\
f(x) &= \sum a_n \sin \lambda x + b_n \cos \lambda x \\
&= \sum c_n e^{i \lambda x} + d_n e^{-i \lambda x}
\end{aligned}\]
<p>Just from these examples it’s clear that, once again, the space of solutions is proportional to the space of things that the operator sets to zero. \(\p_x^2\) sets only two polynomial terms (\(1\) and \(x\)) to zero, so its solution space has two degrees of freedom. \((\p_x^2 + \lambda^2)\), on the other hand, has a whole \(\bb{Z}^2\) worth of values it sets to zero, which is why it ends up with so many values in its solution set. Presumably these zero sets are quantifiable in the exact way as the inverse determinants from the previous section.</p>
<p>Here’s another thing. Our solution set will once again be parameterized by some perpendicular-inverse <em>object</em>, times another <em>object</em> of free parameters</p>
\[f(x) = D_{\parallel}^{-1} g(x) + D_{\perp}^{-1} (\Lambda)\]
<p>For instance for the inverse of \(\p_X^N\), the free-parameter object is still a vector \((c_1, c_2, \ldots c_N)\). For the infinite sines and cosines it’s a matrix \(\begin{pmatrix} a_1 & a_2 & \ldots \\ b_1 & b_2 & \ldots \end{pmatrix}\). I would not be surprised to find out that there are equations whose solutions are parameterized by tensors or even manifolds (like, “there’s one solution per point on a sphere”).</p>
<p>Now look what happens when we add boundary conditions. In many cases we can include the boundary conditions as an additional term in a “homogenized” equation:</p>
\[\begin{cases} D f &= g \\
B f &= 0 \\
\end{cases}\]
<p>That is:</p>
\[\begin{pmatrix} D & -g \\ B & 0 \end{pmatrix} \begin{pmatrix} f \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}\]
<p>Let’s assign \(Q = \begin{pmatrix} D & -g \\ B & 0 \end{pmatrix}\), so that</p>
\[Q\cdot (f, 1) = 0\]
<p>Then it feels like a good guess that <em>in some sense</em> this equation should be meaningful:</p>
\[(f, 1) = Q^{-1}(0) = \star [\wedge^{n} Q]\]
<p>Just in the matrix case. Although I don’t have any idea how you compute \(\^^n\) of a <em>block matrix of differential operators</em>. But if you could, it seems like you would get the correct form again:</p>
\[f = Q^{- \^1}(0) = Q_{\parallel}^{-1} (0) + Q_{\perp}^{-1} (\lambda)\]
<p>only now \(Q\) is an object that <em>the boundary conditions encoded in it</em>. The result had better be: \(Q_{\perp}\) is smaller than it was for \(D\), because of the need to reduce the solution space to compensate for the boundary conditions, and the exact degree to which it is smaller corresponds to <em>how many degrees of freedom it took to express the boundary conditions in terms of the kernel of \(D\)</em>.</p>
<p>If that sounds weird, it’s not. It’s a thing we already do when solving differential equations in QFT. When solving for the behavior of a field of charges and photons, you find a solution that has the form “field contribution due to charges in the region” plus “free parameters corresponding to photons that are passing through the area”. In the absence of boundary conditions, any number of photons can be passing through and un-interacted with. But when we specify a boundary condition, we have to find a way to write the boundary condition <em>as a sum of photon waves</em>, which are the free-field terms, so that we can express it <em>entirely in objects that have \(Df = 0\)</em>. In the case of photons of course that means taking the Fourier transform of the boundary.<sup id="fnref:link" role="doc-noteref"><a href="#fn:link" class="footnote" rel="footnote">3</a></sup></p>
<p>Basically, I expect that there is a way to give meaning to the object \(Q^{-\^1}(0)\) such that this expression makes sense:</p>
\[\begin{aligned}
Q^{-\^1}(0) &= \begin{pmatrix} D & -g \\ B & 0 \end{pmatrix}^{-\^1} (0) \\
&= (\text{the solution due to the charges}) \\
&+ (\text{the boundary condition-preserving free field terms}) \\
&* (\text{a point in a manifold of free parameters})
\end{aligned}\]
<p>… but in a non-physics-related sense, where this is just how you solve equations in generality. Wouldn’t that be cool?</p>
<p>Of course, I wouldn’t know where to start with making \(Q^{-\^1}\) mean something right now; that’s a matrix of operators you’re talking about! But, heck, give me the right non-commutative exterior algebra and maybe it just works? It’s too clean not to.</p>
<p>(Aside: I wonder if it’s possible for a differential equation to have a disconnected manifold worth of free parameters. Probably yes? I wouldn’t be surprised at all. In that case what kind of object does \(\lambda\) end up being? I it’s fine for it to be a “disconnected manifold”; it would be the equation solver’s job to find an actual coordinate system when they want to <em>express</em> the solutions.)</p>
<hr />
<h1 id="4-conclusions">4. Conclusions?</h1>
<p>As you can see, if you followed any of that: signs exist that there is some kind of general calculus for inverting all kinds of linear objects which applies the same way to scalars, matrix equations, differential equations, and maybe more. Somehow it involves dividing by zero in a consistent way and being able to produce free parameters in arbitrary manifolds in the solutions to equations.</p>
<p>The basic procedure which seems to happen over and over is:</p>
<ul>
<li>To solve \(ax = b\),</li>
<li>Produce \(x = (a, -b)^{-\^ 1}\).</li>
<li>Which also has the form \(x = a_{\parallel}^{-1}(b) + a_{\perp}^{-1} \lambda\)</li>
<li>Which by definition solves the equation, and is parameterized by a free point \(\lambda\) in the solution manifold.</li>
</ul>
<p>The trick is figuring out what any of that means, lol.</p>
<p>I don’t see how it works at all, but the symmetry is so good that it’s agonizing. Of course everything gets more and more abstract the further we go into differential equations or non-linearity — but maybe there’s somebody out there who can piece it all together? And maybe, somehow, we can figure how it <em>has</em> to work by mastering all the subjects, comparing all the examples, and poking in at the edges… until we find its true form?</p>
<div class="triangles">
<svg class="trianglesvg" xmlns="http://www.w3.org/2000/svg" height="20" width="20">
<polygon class="triangle" style="cursor:auto;" fill="#c3e281" stroke="#c3e281" stroke-width="2" points="6,4 6,16 16.39,10" />
</svg>
<svg class="trianglesvg" xmlns="http://www.w3.org/2000/svg" height="20" width="20">
<polygon class="triangle" style="cursor:auto;" fill="#c3e281" stroke="#c3e281" stroke-width="2" points="6,4 6,16 16.39,10" />
</svg>
<svg class="trianglesvg" xmlns="http://www.w3.org/2000/svg" height="20" width="20">
<polygon class="triangle" style="cursor:auto;" fill="#c3e281" stroke="#c3e281" stroke-width="2" points="6,4 6,16 16.39,10" />
</svg>
</div>
<hr />
<p>My other articles about Exterior Algebra:</p>
<ol start="0">
<li><a href="/2018/08/06/oriented-area.html">Oriented Areas and the Shoelace Formula</a></li>
<li><a href="/2018/10/08/exterior-1.html">Matrices and Determinants</a></li>
<li><a href="/2018/10/09/exterior-2.html">The Inner product</a></li>
<li><a href="/2019/01/26/exterior-3.html">The Hodge Star</a></li>
<li><a href="/2019/01/27/exterior-4.html">The Interior Product</a></li>
<li><a href="/2019/02/13/exterior-5.html">EA as Linearized Set Theory?</a></li>
<li><a href="/2019/02/23/exterior-6.html">Oriented Projective Geometry</a></li>
<li><a href="/2019/03/02/exterior-7.html">Simplex Volumes and Boundaries</a></li>
<li><a href="/2020/10/15/ea-operations.html">All the Exterior Algebra Operations</a></li>
</ol>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:step" role="doc-endnote">
<p>In the context of any <em>particular</em> problem, \(\lambda\) may have a definite value. For instance when computing the Fourier transform of \(\p_x \theta(x) = \delta(x)\) (where \(\theta\) is the step function), one ends up with \((ik) \hat{\theta}(k) = 1\). Upon dividing through by \((ik)\) the result is \(\frac{1}{ik} + \pi \delta(k)\), where \(\pi\) is the right answer; it’s actually \((2 \pi) (\frac{1}{2})\), where \(\frac{1}{2}\) is the average value of \(\theta(x)\), hence it’s the right coefficient at \(k=0\). <a href="#fnref:step" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:radial" role="doc-endnote">
<p>If you go to solve this, keep in mind that \(\delta(k) = \frac{\delta(\b{k})}{4 \pi k^2}\). <a href="#fnref:radial" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:link" role="doc-endnote">
<p>I’ll include a link for this when I find one. Can’t remember where I learned it. <a href="#fnref:link" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Data Science Mostly... Isn't2023-05-01T00:00:00+00:00https://alexkritchevsky.com/2023/05/01/data-science<!--more-->
<p>A classic use of data science at a technology company is a feature A/B test to improve revenue or engagement metrics.</p>
<p>It goes like this:</p>
<ol>
<li>Ask a question: can we drive more users to pay for our product by changing, say, the color of the sign-up button?</li>
<li>Formulate a hypothesis: what if we make it a big grey button? A big red button? A big <em>flashing</em> red button?</li>
<li>Do an experiment: implement each of the variants and randomly assign some visitors to each one, measuring the signup rate under each hypothesis.</li>
<li>Analyse the results: perform a mathematical ritual which determines if the changes made a “statistically significant” difference in sign-up rates. Interpret this to mean that the resulting sign-up rate is not simply due to random variation, and identify whether the winning variant actually caused more sign-ups to happen.</li>
<li>Conclude things: if one of the variants clearly emerged as a winner, commit to that result and remove the others from code. Don’t forget to also mass-email folks and show off the results of your experiment and claim credit for driving X% signups / revenue / whatever. Good on you, promotions all around. Stick it on your resume.</li>
</ol>
<p>We might call this the Data Science Scientific Method. It certainly looks like we did science—after all, isn’t this how science worked on those triptychs in middle school?</p>
<p><img src="/assets/posts/2023-05-01/science.png" width="600px" style="" /></p>
<p>Everybody knows that Science is Good. Why is it good? Because it’s the only way to be <em>right</em>, which means systematically avoiding bias, guesswork, and blind spots. Fortunately everyone at the company also knows that that everybody <em>else</em> knows that Science is Good, so there’s no need to even discuss it. Instead we can carry on with doing science, being right, and making money. We know it’s right because it’s working: science is getting good results, converting customers, earning promotions, and making lots of money. Everyone can be very pleased that they’re part of big rational money-making machine which is smarter than any individual.</p>
<p>…</p>
<p>Two problems with this:</p>
<ol>
<li>That’s not actually what science is.</li>
<li>That big rational machine is somehow dumber than a single person.</li>
</ol>
<hr />
<p>You know what data science “experiment” I’ve never seen anyone do?</p>
<p>Test the question “can one person, using their intuition and maybe asking around a bit, make better decisions than our whole data-driven experimentation ritual?” Cause if so, then maybe you should just give that person all the decision-making power and do whatever they say for a while. You’ll save a lot of money on compute alone.</p>
<p>Or, do better science. Science is, indeed, about being <em>right</em>. Just because you did a thing that you called Science, and which everyone thinks is Science at a New-York-Times-reader level of understanding, doesn’t mean you’re <em>right</em>. The fact that you made money successfully doesn’t even make you right! At best you were provably right at the small question of “which of these colors of button gets more clicks over the next little while?”; probably not at the bigger question of “what’s the optimal way to make more money?” and definitely not on the even-bigger question of “what’s the best thing to do for the business to succeed?”</p>
<hr />
<h2 id="what-science-is">What Science Is</h2>
<p>It’s not this:</p>
<ol>
<li>Ask a question</li>
<li>Formulate a hypothesis</li>
<li>Do an experiment</li>
<li>Analyse the results</li>
<li>Conclude things</li>
</ol>
<p>Science is not a formulaic process anyway, at least not really, but if it was, the most important aspect of the formula is that it has the property of <em>converging on better understanding over time</em>. Anyone who becomes knowledgeable and correct and efficacious over time is doing something scientific, no matter the setting or their precise process. That is both the reason that science is usually more correct than random idiots, and also the reason why it’s not always correct. Science isn’t right by fiat—but on average it does <em>get righter</em> whereas random idiots do not.</p>
<p>So the formula we come up with better involve feeding back into itself and doing a better job with each iteration. With that in mind, here’s a better list of steps:</p>
<ol>
<li>Learn a lot. Observe and study the world.</li>
<li>Conceive of a model of the world that makes sense from what you know.</li>
<li>Ask a question (that your model has something interesting to say about…)</li>
<li>Formulate a hypothesis (about what answer your model would give if it were correct that other models would not give)</li>
<li>Do an experiment</li>
<li>Analyse the results</li>
<li>Conclude things (about the accuracy of your model)</li>
<li>Update your confidence in your model based on the result.</li>
<li>Go back to step (2) until your model is good enough for your purposes.</li>
<li>Go and do things in the world using your model now that you know it’s good.</li>
</ol>
<p>If you don’t formulate a model as you go, or use your experiments to test and improve that model, you’re not really doing anything that deserves to be called Science. You’re just doing… experiments, in a loop, without ever learning anything? You’re certainly not <em>improving</em> on your model over time.</p>
<p>As I see it, corporate data science isn’t ‘science’, because it isn’t really about building models or being right in the first place. Its real job is to make <em>decisions</em>. Or more precisely, to <em>launder</em> decisions: to make the decision-making process so benign and agreeable that nobody risks blame, for making mistakes or missing opportunities. You want me to understand how users’ minds work and then <em>pick</em> what color the button should be? But what if I’m… wrong? Better to let an experiment decide for us— nobody ever got fired for making the button whichever color people clicked on the most for two weeks last June.</p>
<p>Science is—has to be—the process of studying things in a way that necessarily becomes more correct over time. Asymptotically it should figure out the true workings of whatever slice of reality you’re looking at. This always involves building up theories of how reality works, and typically should require validating those theories with experiments to see how good they are, although, strictly speaking, that’s only useful inasmuch it gets you closer to correctness; you can get away without it sometimes.</p>
<p>In business, the theories are answers to the question “how should we run our company?”. That is, “what should all these people do?”. That experiment to see what color the button should be? It assumes, but never checks, that the answer to that question is this: “we should run experiments like this one all day and then do whatever they say into the future” You could also do some science on <em>that</em> hypothesis, if you wanted, but don’t bother; it’s pretty obviously wrong. We have plenty of evidence already: everywhere you look, data-science-driven companies are doing fantastically dumb things, and everybody who works at them knows it (and will happily tell you, with pained resignation!). And yet these companies carry on with their foolishness anyway—because their collective understanding is that they’re supposed to stay carefully on the sacred data-driven path.</p>
<p>Why are they able to go on being so wrong?</p>
<p>I think that in the end the problem is a near-complete absence of leadership.</p>
<hr />
<h2 id="leadership">Leadership</h2>
<p>How do you do a better job at running an organization than a bunch of make-believe science experiments?</p>
<p>For starters, write down and aggregate all the results of your little experiments in a way that can be used for future decisions, and then use that knowledge to <em>actually make future decisions</em>—in lieu of further experimentation. If every “no thanks I don’t want to buy your shit” button should be grey and invisible to press the psychological buttons that make you more money, and if doing so passes your own ethics (treating your customers with respect is, perhaps, not relevant here…), you don’t really need to test the next one. You already know! It will work! That’s what the science was for!</p>
<p>“That sounds nice,” you think, and yet… a few quarters later… somehow you’re back where you were, doing that same experiment again, testing variants of buttons and user flows and whatever, even though it’s obvious how to make improvements without testing them. Why?</p>
<p>Well, it turns out there’s another reason people do this “science”, and it’s probably the main reason: it’s that they need to be able to take <em>credit</em>. If you knew the button color would help the business and your job was just a regular job where your responsibility is to do a good job at something, you’d just change the color to the color it should be and that would be it. But if you need to argue about getting a promotion, you need to <em>attribute</em> that revenue to something you did, because your annual review is gonna say “drove X% revenue growth via <a series of changes>” and if you don’t know how much you drove you can’t put it in there. Even if, and this is the super-aggravating part, the thing you did was <em>obvious</em> and the experiment was <em>a total waste of time and money</em>.</p>
<p>The only people who are really in a position to put a stop to this are the people are who doing the evaluations of other people’s work. They should be smarter: smart enough to see that the experiments were a waste of time, smart enough to see that the conclusions reached were foregone and unhelpful, and especially smart enough to enough to see that “doing good work” isn’t the same as “getting good measurable results” and is in fact probably the opposite of it. In fact the best work most often consists of what you build, grow, and polish using your volition and grit, and those are all (a) easy to see and (b) hard to quantify.</p>
<p>But as long as the evaluators don’t <em>believe</em> this is their job, we’re stuck. They’re trapped in the same cycle: they’re accountable to someone else who wants everything quantified numbers also, and they’re beholden to some rubric for evaluation, and even if they wanted to go rogue, there’s no actual culture in place of evaluating things on their <em>merits</em> and in fact they’ll get ridiculed and reprimanded if they try. Their boss? Same thing. The problem infects everything, and unfortunately ends up landing on least accountable group of all: the shareholders, an amorphous machine that only cares about everyone <em>else</em> believing the numbers are real for a while so they can sell at a profit.</p>
<p>If you keep following this train of thought you end up in a weird subterranean viewing chamber where you gaze into the obsidian boulder and see reflected in it one of the dark truths of corporate capitalism: that everything, products and performance reviews and promotions, design and engineering and marketing and the rest, are all about avoiding making any kind of decisions whatsoever, or taking any personal risk at anything. Everybody has offloaded all responsibility… for everything… onto <em>math</em>.</p>
<p>It is a farce. Is this person a good manager? No idea! Did they help people grow, or do more than the bare minimum at every step, or build something that will last? Can’t remember! Does it matter how good their management was? Who cares, what we want to know is, did they do measurable things to benefit the company? Heck yes, and we proved it with science! Nice. Bonus secured.</p>
<hr />
<h2 id="the-question-of-fallibility">The Question of Fallibility</h2>
<p>A common reaction is: well, the reason they’re doing all this math is that people are biased and fallible and math has at least a hope of avoiding that. In fact in the past they did all sorts of things without math and it was even worse. Right?</p>
<p>And I agree. Don’t get me wrong: if these same people one day started “making decisions”, leaving everything else unchanged, it would be disastrous. People can be really, really dumb. Especially people who have never had to do a good job at decision-making before. The people in charge today largely got where they are without having to prove themselves by making lots of smaller decisions first. It was rarely a requirement to be any good at <em>leadership</em> at all.</p>
<p>But I don’t think it’s impossible to do better. What’s missing is leaders being accountable for their <em>competency as leaders</em>. That’s why, among other things, it’s absolutely necessary to support unionization and internal protests and anything else which involves taking power away from leaders who don’t deserve it. A leader should haev to <em>earn</em> the right to lead at any level and should have to <em>keep earning</em> it to continue leading, which means that the people they’re leading need to have the right to take it away from them. For bizarre reasons Western culture seems to think that people should get to stay in charge of something because they built it. I disagree. Once you become responsible for a bunch of other people’s life’s work, we have a right (not yet enshrined in law, of course) to depose you if you don’t do a really good job of it.</p>
<p>If anything, the modern cargo-cult data-science is probably a reaction to the <em>last</em> age, a time when tyrannical executives roamed the earth, doing whatever insanely dumb shit they wanted, checked only by the whims of their also-tyrannical superiors. Words like “systematic bias” or “logical fallacies” hadn’t been discovered yet, so there wasn’t even language to explain why they were so wrong. And they had those positions not because they were the best at what they did, not because they earned it with hard work, but because they were, like, the right type of well-groomed white dude who looks good in a sportcoat. I’ve heard it’s still like that in many parts of the world. But, you know, not in modern corporate America—we’re better than that. We use data now!</p>
<p>In a way, handing decision-making over to what is essentially an algorithm is a way of taking power away from powerful people and getting them to <em>somehow go along with it</em>. Which doesn’t sound so bad, really. If anything it’s the theme of the last century of progressivism.</p>
<p>Yet the result is, in a way, shameful. The organization is <em>still</em> clueless, only now it can’t as easily blame anyone for its clueless floundering. This arrangement can actually afford the people in power <em>more</em> protection: they’re less culpable for the consequences of their actions, because they don’t <em>take</em> actions. They still fail… but everything was decided by the data science. In fact everybody, in power or not, can say this: we just did what the data told us; we did everything right! Especially in tech, where money and therefore validation just blows in your door if you leave it open, these organizations can’t even tell how dumb they’re being. It all seems to be working! They’re still getting conversions; people are clicking on the ads; the stock price is good. When in fact the actual test of your hypothesis about how to run a business happens on far too slow a scale to get any feedback—decades, I guess. However long it takes a company to rot at its core, wither for a while, and finally go bankrupt. As long as the economy is good and the industry is ascendant, everybody gets plenty rich from mediocrity.</p>
<p>Well, fine. I, for one, don’t enjoy being part of this money-grubbing idiot machine at all. It’s no fun to spend your one life’s effort on stupid work. And even if you somehow avoid this fake science in your job, every product you use has already been ruined by it anyway (look no further than “every recommendation algorithm”). You would have to go live by yourself in the woods to get away from being affected by decisions that are obviously bad to everyone but looked good enough in the metrics. (Don’t even get me started on things that are actually good ways to make money but <s>are totally evil</s> you’d be ashamed telling your kids about. “The data told me to do it!” I bet it did.)</p>
<p>But hey. Maybe it’s a good thing that we’re already getting used to the idea of offloading all responsibility onto a machine we built to rule us. That might turn out to be serendipitous.</p>
<p>:thinking-face:</p>
An Interesting List of React Mistakes2023-04-25T00:00:00+00:00https://alexkritchevsky.com/2023/04/25/react-mistakes<p>Look, a list of simple things that you and your colleagues should know to avoid, but which you will still do by accident from time-to-time and eventually spend months of your life, in total, tracking down.</p>
<p>Don’t worry, these are all a bit more interesting than React 101 stuff like “don’t write conditional hooks”.</p>
<!--more-->
<p>(Apologies for the syntax highlighter’s occasionally bizarre understanding of TSX. Maybe I’ll fix it someday.)</p>
<hr />
<h2 id="1-useeffect-with-no-deps-array">1. <code class="language-plaintext highlighter-rouge">useEffect</code> with no deps array</h2>
<p>As you may know, <code class="language-plaintext highlighter-rouge">useEffect</code> with an <em>empty</em> deps array runs once, on mount.</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">useEffect</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// I'll run once, on mount.</span>
<span class="k">return</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// and I'll run once, on unmount</span>
<span class="p">};</span>
<span class="p">},</span> <span class="p">[]);</span>
</code></pre></div></div>
<p>And <code class="language-plaintext highlighter-rouge">useEffect</code> with <em>no</em> deps array runs on every render:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">useEffect</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// I'll run every time you get re-rendered. Why? Nobody knows.</span>
<span class="p">});</span>
</code></pre></div></div>
<p>But even if you know this, you’re going to mess it up occasionally. You probably know about it if you’ve read the docs like a good engineer, but it’s going to come back to haunt you some day. It should be banned. It is a huge mistake in the API that this exists.</p>
<p>I’d bet money that in ten years <code class="language-plaintext highlighter-rouge">useEffect</code> will be called <code class="language-plaintext highlighter-rouge">DEPRECATED_useEffect()</code> after having been split up into a few smaller hooks that have fewer footguns. Mainly this one, but also for a few other reasons…</p>
<hr />
<h2 id="2-useeffect-with-listeners">2. <code class="language-plaintext highlighter-rouge">useEffect</code> with listeners</h2>
<p>A common use of <code class="language-plaintext highlighter-rouge">useEffect</code> is to set up some listener manually.</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">useEffect</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">keydown</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span> <span class="cm">/* whatever */</span> <span class="p">};</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">keydown</span><span class="dl">'</span><span class="p">,</span> <span class="nx">keydown</span><span class="p">);</span>
<span class="k">return</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">removeEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">keydown</span><span class="dl">'</span><span class="p">,</span> <span class="nx">keydown</span><span class="p">);</span>
<span class="p">};</span>
<span class="p">},</span> <span class="p">[]);</span>
</code></pre></div></div>
<p>(You’d rather use <code class="language-plaintext highlighter-rouge">useMemo</code>, because you often don’t care that this happens after render, but it doesn’t take a cleanup function so you don’t.)</p>
<p>But what if you want the listener to depend on some local variables?</p>
<p>The React linter will tell you to make sure they’re all in the deps array, like this:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">useEffect</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">keydown</span> <span class="o">=</span> <span class="nx">dispatch</span><span class="p">(</span><span class="nx">someAction</span><span class="p">(</span><span class="nx">someId</span><span class="p">));</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">keydown</span><span class="dl">'</span><span class="p">,</span> <span class="nx">keydown</span><span class="p">);</span>
<span class="k">return</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">removeEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">keydown</span><span class="dl">'</span><span class="p">,</span> <span class="nx">keydown</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">},</span> <span class="p">[</span><span class="nx">dispatch</span><span class="p">,</span> <span class="nx">someId</span><span class="p">]);</span>
<span class="c1">// ^^^^ silly, but at least we appeased the linter?</span>
</code></pre></div></div>
<p>Now the listener is constantly being un-registered and re-registered whenever <code class="language-plaintext highlighter-rouge">someId</code> changes, even though that’s completely pointless. In this case it is fairly innocuous, but it can combine with other weird situations and cause real bugs. Such as: needing listeners registered in a particular order for bubbling to work correctly, or the callback being registered in some other module that does non-trivial work on subscription (such as a server call for a subscription token).</p>
<p>Several solutions:</p>
<ul>
<li>live with it (gross)</li>
<li>save your variables on refs (gross)</li>
<li>use a <code class="language-plaintext highlighter-rouge">useStableRef</code> hook to save variables on refs more ergonomically (less gross, but still gross)</li>
<li>invent a <code class="language-plaintext highlighter-rouge">useMemoizedEffect()</code> hook that takes two deps arrays and does the tracking under the hood (gross but cool I guess)</li>
<li>ignore the rule ← do this. The rule is dumb.</li>
</ul>
<p>Basically, <code class="language-plaintext highlighter-rouge">useEffect</code>’s deps array concept is kinda broken. It really ought to let you manually tell it when to re-run, but it doesn’t, so you have to work around it somehow.</p>
<p>My preferred solution is to completely ignore the deps array linter – either turn it off entirely, or turn it off on one line with <code class="language-plaintext highlighter-rouge">/* tslint:disable */</code>, or just ignore it. In practice it seems like half of the deps arrays I write are intentionally different from what the linter wants, because once you’re a moderately-sophisticated hooks user, you <em>know</em> when you want every hook to re-render, and it is not exactly the same as when their syntactic dependencies change.</p>
<p>Of course, the trick is making sure everyone on your team is okay with this, and does it correctly. All of this suggests that the concept is fundamentally broken. I love hooks, in general (they’re better than any <em>other</em> way of writing UI), but there is something glaringly wrong with them still, and the solution is something that hasn’t come yet.</p>
<hr />
<h2 id="3-useeffect-in-general">3. <code class="language-plaintext highlighter-rouge">useEffect</code> in general</h2>
<p>It was never quite clear, anyway, what <code class="language-plaintext highlighter-rouge">useEffect</code> is supposed to do.</p>
<p>Yes, it’s something that should happen when a component re-renders, a ‘side-effect’, hence the name. But the API and the Linter together make it clear that the intention is that effects should happen whenever the relevant props change, so it’s expressing the concept of “a side-effect <em>of props changing</em>”, which is a code-level construct, an implementation detail. And then there’s the no-deps-array version that runs on every render, a “side-effect of rendering”.</p>
<p>Both of those are distinctly different from the thing you usually need: “a side-effect in <em>business-logic</em>”. That is: “when <em>certain</em> props <code class="language-plaintext highlighter-rouge">[x,y]</code> change, do a particular thing (whose specification has nothing to do with that <code class="language-plaintext highlighter-rouge">x</code> or <code class="language-plaintext highlighter-rouge">y</code>, necessarily)”. In this case the “exhaustive deps” lint rule is basically always wrong, and, as above, I think the solution is to ignore it. You should understand which type of effect you’re writing and then write it whole-heartedly, and if that means disabling the linter, so be it.</p>
<p>The real Gotcha here is that you will waste a bunch of time <em>arguing</em> about what a proper <code class="language-plaintext highlighter-rouge">useEffect</code> is, and whether the exhaustive-deps lint rule is correct, and whether some particularly egregious violation of it is okay. When in fact <code class="language-plaintext highlighter-rouge">useEffect</code> is a badly-conceived API and you should just ignore it and do exactly what you need it to do, with the understanding that it is one API trying to be three things at once.</p>
<hr />
<h2 id="4-forgetting-displayname">4. Forgetting <code class="language-plaintext highlighter-rouge">DisplayName</code></h2>
<p>Set <code class="language-plaintext highlighter-rouge">displayName</code> on Function components so they have useful names in React Devtools:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">Fc</span><span class="p">:</span> <span class="nx">React</span><span class="p">.</span><span class="nx">FC</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span> <span class="cm">/* whatever */</span> <span class="p">};</span>
<span class="nx">Fc</span><span class="p">.</span><span class="nx">displayName</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">Fc</span><span class="dl">"</span><span class="p">;</span>
</code></pre></div></div>
<p>More importantly: do this before you need it. It’s nice when everything in devtools has a name. In particular, it’s nice if things have names in prod because there will eventually be a bug that you can’t reproduce easily on your non-minified code, so you’ll want to point React Devtools at Prod, and it will be <em>really</em> nice if this is already done.</p>
<p>If you write higher-order components (which you shouldn’t have to do because we have hooks now, but whatever) you should give them good names also:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">SomeHOC</span> <span class="o">=</span> <span class="p">(</span><span class="nx">component</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">wrapped</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span> <span class="cm">/* whatever */</span> <span class="p">};</span>
<span class="nx">wrapped</span><span class="p">.</span><span class="nx">displayName</span> <span class="o">=</span> <span class="s2">`SomeHOC(</span><span class="p">${</span><span class="nx">component</span><span class="p">.</span><span class="nx">displayName</span><span class="p">}</span><span class="s2">)`</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>By the way, you could avoid needing <code class="language-plaintext highlighter-rouge">displayName</code> by defining your components with <code class="language-plaintext highlighter-rouge">function</code> instead of as <code class="language-plaintext highlighter-rouge">const</code> array functions, because Javascript has an, um, easter egg that <code class="language-plaintext highlighter-rouge">function foo()</code> becomes an object which, besides being callable, also has a field called <code class="language-plaintext highlighter-rouge">name</code> on it with value <code class="language-plaintext highlighter-rouge">'foo'</code>. Weird, right? But we don’t prefer to do that because (a) we prefer to just always use arrow functions because regular functions should be deleted from the language probably, and (b) you can’t specify the <code class="language-plaintext highlighter-rouge">React.FC<Props></code> type on the same line, and those types are very useful for e.g. catching that you forgot to return a valid value (<code class="language-plaintext highlighter-rouge">ReactElement | null</code>) from a branch in the function body.</p>
<hr />
<h2 id="5-usestate-with-callbacks">5. <code class="language-plaintext highlighter-rouge">useState</code> with callbacks</h2>
<p>Every once in a while you find yourself needing to store a callback in local state. You’ll screw it up the first time, though:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">type</span> <span class="nx">CallbackType</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="k">void</span><span class="p">;</span>
<span class="kd">const</span> <span class="p">[</span><span class="nx">callback</span><span class="p">,</span> <span class="nx">setCallback</span><span class="p">]</span> <span class="o">=</span> <span class="nx">useState</span><span class="o"><</span><span class="nx">CallbackType</span> <span class="o">|</span> <span class="kc">null</span><span class="o">></span><span class="p">(</span><span class="kc">null</span><span class="p">);</span>
<span class="c1">// lalala, writing code on autopilot</span>
<span class="nx">setCallback</span><span class="p">(()</span> <span class="o">=></span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">callback</span><span class="dl">"</span><span class="p">));</span>
</code></pre></div></div>
<p>See the problem?</p>
<p>Allow me to remind you that the second component in <code class="language-plaintext highlighter-rouge">useState</code>’s return value is a function with signature</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nx">update</span><span class="p">:</span> <span class="nx">T</span> <span class="o">|</span> <span class="p">(</span><span class="nx">prev</span><span class="p">:</span> <span class="nx">T</span><span class="p">)</span> <span class="o">=></span> <span class="nx">T</span><span class="p">)</span> <span class="o">=></span> <span class="k">void</span>
</code></pre></div></div>
<p>That is, it takes <em>either</em> a <code class="language-plaintext highlighter-rouge">T</code> or a reducer function (which takes an old <code class="language-plaintext highlighter-rouge">T</code> and produces a new <code class="language-plaintext highlighter-rouge">T</code>). So you actually need to write</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">setCallback</span><span class="p">(()</span> <span class="o">=></span> <span class="p">()</span> <span class="o">=></span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">callback</span><span class="dl">"</span><span class="p">));</span>
</code></pre></div></div>
<p>Even though it’s obvious once you think about it, it is super easy to forget in practice. When it does come up, I’d recommend to either:</p>
<ul>
<li>wrap <code class="language-plaintext highlighter-rouge">setCallback</code> in another function which ensures this is done and makes the type not ambiguous</li>
<li>or write a new hook that’s a thin wrapper around <code class="language-plaintext highlighter-rouge">useState</code> and ensures you don’t screw it up.</li>
</ul>
<hr />
<h2 id="6-reactmemo-and-prop-spreads">6. <code class="language-plaintext highlighter-rouge">React.memo</code> and prop spreads</h2>
<p>Sometimes you have a bucket of props which you want to spread onto a subcomponent:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">type</span> <span class="nx">WrapperProps</span> <span class="o">=</span> <span class="nx">ChildProps</span> <span class="o">&</span> <span class="p">{</span> <span class="na">important</span><span class="p">:</span> <span class="kr">string</span><span class="p">;</span> <span class="p">}</span>
<span class="kd">const</span> <span class="nx">Wrapper</span><span class="p">:</span> <span class="nx">React</span><span class="p">.</span><span class="nx">FC</span><span class="o"><</span><span class="nx">WrapperProps</span><span class="o">></span> <span class="o">=</span> <span class="p">(</span><span class="nx">props</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="p">{</span><span class="nx">important</span><span class="p">,</span> <span class="p">...</span><span class="nx">rest</span><span class="p">}</span> <span class="o">=</span> <span class="nx">props</span><span class="p">;</span>
<span class="k">return</span> <span class="p"><</span><span class="nc">Child</span> <span class="si">{</span><span class="p">...</span><span class="nx">rest</span><span class="si">}</span> <span class="p">/>;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And perhaps the <code class="language-plaintext highlighter-rouge"><Child></code> component is memoized:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">Child</span><span class="p">:</span> <span class="nx">React</span><span class="p">.</span><span class="nx">FC</span><span class="o"><</span><span class="nx">ChildProps</span><span class="o">></span> <span class="o">=</span> <span class="nx">React</span><span class="p">.</span><span class="nx">memo</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span> <span class="cm">/* .. */</span> <span class="p">});</span>
</code></pre></div></div>
<p>This can break in a weird and subtle way.</p>
<p>It turns out that, <em>only</em> when spreading props, Typescript/TSX will allow you to put props onto a child that it doesn’t actually have in its prop type. And it turns out that, although these props are ignored, they <em>do</em> break memoization. Sigh.</p>
<p>So what can happen is this unfortunate sequence of events:</p>
<ol>
<li>Originally, the child props list is small and immutable, so the child never updates.</li>
<li>Later on, the wrapper is given a new prop – perhaps <code class="language-plaintext highlighter-rouge">unimportant: string</code> – that it doesn’t unwrap in the spread. (Or perhaps the <code class="language-plaintext highlighter-rouge">important</code> prop is no longer used but someone forgets to remove it from the type. This happens more easily if the wrapper is conforming to some shared type for an injectable component, so usually it’s going to be in a larger and more abstractified codebase than your React starter app).</li>
<li>It will then include that prop on <code class="language-plaintext highlighter-rouge"><Child></code>, and this will typecheck, and it will look totally fine</li>
<li>…but now <code class="language-plaintext highlighter-rouge"><Child></code> can rerender constantly based on changes to a prop that it doesn’t even <em>have in its definition</em>.</li>
</ol>
<p>And sometimes that <code class="language-plaintext highlighter-rouge">React.memo()</code> call is doing a lot of work, such as if the spread is a long list of mostly unchanging props, and when suddenly fails it might start some catastrophic re-rendering.</p>
<p>It sounds like a rare case, and it is, but it’s so elusive when it happens that it’s worth being aware of. In a way it goes to show the danger of spreading props (and of using weird component abstractions like this… but sometimes it is the best choice). It is often better, if you can, to include the <code class="language-plaintext highlighter-rouge">childProps</code> as a <em>single</em> prop rather than part of the top-level props of <code class="language-plaintext highlighter-rouge">Wrapper</code>:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">type</span> <span class="nx">WrapperProps</span> <span class="o">=</span> <span class="p">{</span>
<span class="na">important</span><span class="p">:</span> <span class="kr">string</span><span class="p">;</span>
<span class="nl">childProps</span><span class="p">:</span> <span class="nx">ChildProps</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<hr />
<h2 id="7-accidentally-redefining-components">7. Accidentally redefining components</h2>
<p>Spot the bug:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">SomeComponent</span><span class="p">:</span> <span class="nx">React</span><span class="p">.</span><span class="nx">FC</span> <span class="o">=</span> <span class="p">({</span><span class="nx">userId</span><span class="p">,</span> <span class="p">...</span><span class="nx">rest</span><span class="p">})</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">PartialChild</span> <span class="o">=</span> <span class="p">(</span><span class="nx">rest</span><span class="p">)</span> <span class="o">=></span> <span class="p"><</span><span class="nc">Child</span> <span class="na">userId</span><span class="p">=</span><span class="si">{</span><span class="nx">userId</span><span class="si">}</span> <span class="si">{</span><span class="p">...</span><span class="nx">rest</span><span class="si">}</span> <span class="p">/>;</span>
<span class="k">return</span> <span class="p"><</span><span class="nc">SomeWrapper</span> <span class="na">childComponent</span><span class="p">=</span><span class="si">{</span><span class="nx">PartialChild</span><span class="si">}</span> <span class="p">/></span>
<span class="p">}</span>
</code></pre></div></div>
<p>This was an attempt to construct a component, to be passed into a child for rendering, that had some of its props filled out (in the example, <code class="language-plaintext highlighter-rouge">userId</code>). It’s a rare pattern but not entirely unheard of: sometimes a library will take a component that it’s supposed to use to render some UI, and it has a certain props list it expects, but you need it to depend on some other stuff also. But this is not the right way to do it.</p>
<p>The problem is that, any time this parent component updates, everything that uses <code class="language-plaintext highlighter-rouge">PartialChild</code> will re<em>mount</em> rather than re<em>render</em>. This is because <code class="language-plaintext highlighter-rouge">PartialChild</code> is returning a brand new function, every time. The way React determines whether a component “is the same component” and should rerender instead of remounting is by checking both:</p>
<ul>
<li>if the <code class="language-plaintext highlighter-rouge">key</code> is unchanged, where <code class="language-plaintext highlighter-rouge">key</code> is the thing we all know from making list components.</li>
<li>if the <code class="language-plaintext highlighter-rouge">type</code> if unchanged, where <code class="language-plaintext highlighter-rouge">type</code> is “the class or function or string (for DOM elements) which defines the JS tag”.</li>
</ul>
<p>Function components are normally always top-level declarations so there’s only one function object for them, ever, and it’s just re-used everywhere the component is used. But if you declare the component itself inside <em>another</em> component, it will be a brand new component whenever that definition is recomputed, so it will re-mount, from scratch, every time that code runs. It should go without saying that remounting when you intend to render is bad, for simple reasons like “it erases all your state”.</p>
<p>You also can’t totally avoid this by memoizing it:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">Component</span><span class="p">:</span> <span class="nx">React</span><span class="p">.</span><span class="nx">FC</span> <span class="o">=</span> <span class="p">({</span><span class="nx">userId</span><span class="p">,</span> <span class="p">...</span><span class="nx">rest</span><span class="p">})</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">MemoizedChild</span> <span class="o">=</span> <span class="nx">React</span><span class="p">.</span><span class="nx">useMemo</span><span class="p">(</span>
<span class="p">()</span> <span class="o">=></span> <span class="p">(</span><span class="nx">rest</span><span class="p">)</span> <span class="o">=></span> <span class="p"><</span><span class="nc">Child</span> <span class="na">userId</span><span class="p">=</span><span class="si">{</span><span class="nx">userId</span><span class="si">}</span> <span class="si">{</span><span class="p">...</span><span class="nx">rest</span><span class="si">}</span> <span class="p">/>,</span>
<span class="p">[</span><span class="nx">userId</span><span class="p">]</span>
<span class="p">);</span>
<span class="k">return</span> <span class="p"><</span><span class="nc">SomeWrapper</span> <span class="na">childComponent</span><span class="p">=</span><span class="si">{</span><span class="nx">MemoizedChild</span><span class="si">}</span> <span class="p">/>;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This has the same problem, although now only when <code class="language-plaintext highlighter-rouge">userId</code> changes, which is an improvement. But in practice it will often depend on more props that change more frequently.</p>
<p>The real solution is to zoom out a bit. It’s almost never what you want to remount a component of the same type in the same place, and if it was, you should explicitly change the <code class="language-plaintext highlighter-rouge">key</code> field to indicate that. The solution, if you have the option of changing <code class="language-plaintext highlighter-rouge"><SomeWrapper></code>, is to use a regular lower-case function that’s not a component:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">Component</span><span class="p">:</span> <span class="nx">React</span><span class="p">.</span><span class="nx">FC</span> <span class="o">=</span> <span class="p">({</span><span class="nx">userId</span><span class="p">,</span> <span class="p">...</span><span class="nx">rest</span><span class="p">})</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">PartialChild</span> <span class="o">=</span> <span class="p">(</span><span class="nx">rest</span><span class="p">)</span> <span class="o">=></span> <span class="p"><</span><span class="nc">Child</span> <span class="na">userId</span><span class="p">=</span><span class="si">{</span><span class="nx">userId</span><span class="si">}</span> <span class="si">{</span><span class="p">...</span><span class="nx">rest</span><span class="si">}</span> <span class="p">/>;</span>
<span class="k">return</span> <span class="p"><</span><span class="nc">SomeWrapper</span> <span class="na">childComponent</span><span class="p">=</span><span class="si">{</span><span class="nx">PartialChild</span><span class="si">}</span> <span class="p">/>;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This always memoizes just fine. But <code class="language-plaintext highlighter-rouge"><SomeWrapper></code> will have to be updated internally to call <code class="language-plaintext highlighter-rouge">child(props)</code> instead of <code class="language-plaintext highlighter-rouge"><Child {...props} /></code>.</p>
<p>As you can see, passing component definitions around as props should be considered an antipattern. Nevertheless I’ve come across it occasionally, especially in older React code before the modern idioms became established, or in code written by otherwise talented engineers who were new to React.</p>
<hr />
<h2 id="8-rerendering-during-animations">8. Rerendering during animations</h2>
<p>You should generally not trigger component rerenders on every frame of an animation. You may get away with it in small hobby apps and so think it is fine, and sometimes React tutorials will even do it to demonstrate how things work. But in a, uh, production-quality codebase, it’s a bad idea. It will fry the performance of anything else that is going on at the same time, and it will look a <em>lot</em> worse on your users’ weak machines than it looks on yours. So don’t do it.</p>
<p>Take a look at Devtool’s performance traces during an animation. Triggering React re-renders on every animation frame causes a <em>lot</em> of code to get run, all at once, at a time when you would specifically like the animation to proceed smoothly. React developers will often be working on very fast machines, like M1/M2 Macbooks, so the animation will seem perfectly smooth <em>to them</em> … and then when you go see the same code run on other people’s computers, the animation will stutter and glitch around, and their CPU fan will start spinning, because the React render loop is running so much code that it’s basically maxing out CPUs for the entire 16ms of each frame. So, again, just don’t do it.</p>
<p>If possible, animation should be entirely handled via CSS. But when it can’t be, it’s still okay to do it in JS. The trick is to do it all imperatively, without ever triggering a React state update. The basic technique looks like this:</p>
<ol>
<li>Set up a <code class="language-plaintext highlighter-rouge">ref</code> for any component that is going to be animated.</li>
<li>Manually update <code class="language-plaintext highlighter-rouge">ref.current.style</code> on each frame.</li>
<li>Perform state updates only at the start and end of the animation.</li>
<li>If the style updates get complicated, or involve many sub-components, use <code class="language-plaintext highlighter-rouge">useImperativeHandle</code> to abstract out functionality on each of the relevant subcomponents so they can handle their own animated styling.</li>
</ol>
<p>This is all a pain, of course. I think that in a perfect world you would be able to smoothly animate via React rerenders, but in my experience it really just doesn’t work right now. Maybe there are tricks I don’t know about or something.</p>
<p>By the way, don’t forget that this applies also to user-triggered events that run on every frame, such as:</p>
<ul>
<li>dragging components</li>
<li>triggering <code class="language-plaintext highlighter-rouge">mousemove</code> events</li>
<li>resizing the window, or resizing components</li>
<li>arguably, typing, but you might get away with it.</li>
</ul>
<p>None of these should ever trigger state updates on every frame. State updates on specific breakpoints, such as when you resize the window smaller than a certain size or when you move the mouse into a certain region, are fine, as long as they won’t thrash (e.g. if they’re triggered when the mouse is on the border, then they’ll fire in a loop if the user leaves the mouse there for a long time).</p>
<p>Look, you don’t have to listen to me. Script your animations in React state updates for all I care. But I will just say: in engineering, the point of wisdom is to fix bugs by never writing them in the first place. So you may as well do the gritty work now instead of later.</p>
<hr />
<h2 id="9-redux">9. Redux</h2>
<p>Don’t use Redux. Use Recoil or Jotai or something. Thank me later.</p>
<p>Why? Well, this is contentious, I guess. But I don’t think Redux scales correctly for large multi-module codebases.</p>
<p>The problem is that it becomes unclear whose job it is to set up or populate parts of the store. You run into cases where components mount that expect a slice of the store to (a) exist, (b) have its reducers and middleware aready set up, and (c) have its state pre-populated with the correct values. But the components don’t have a way of <em>forcing</em> that to happen, so they instead have to just assume it’s been done already… and usually it does, except when it occasionally doesn’t, and then you’ve managed to reinvent race conditions in a single-threaded language.</p>
<p>Whatever the right idiom for safe Redux usage is, it has to basically involve components atomically setting up whatever parts of the store they need if it’s not already there, so that everything can render whenever it feels like without worrying about what other code has already run. I’m not aware of a library that does this.</p>
<p>The other problem with Redux is that it puts developers in an “all you have is a hammer so everything looks like a nail” situation, vis-à-vis state management. Since it’s so easy to put any moderately global state into Redux, you do, and you end up with things that should be local to a particular subtree or ‘slice’ of the app going through the global reducer chain and triggering every <code class="language-plaintext highlighter-rouge">useReducer()</code> in the whole app to re-run. Using <code class="language-plaintext highlighter-rouge">React.Context</code>, which will only update its explicitly subscribed children when it explicitly changes, will be much more efficient for anything that updates quickly or needs fast feedback.</p>
<p>(If you’re still using class components, keep in mind that <code class="language-plaintext highlighter-rouge">mapStateToProps</code> is run on every subscribed component on every store update. In large apps there can be easily hundreds of store updates on loading (which is a bad idea but what can you do), so if <code class="language-plaintext highlighter-rouge">mapStateToProps</code> does anything non-trivial you pay the cost hundreds of times. So ideally you might want this to just do something dumb (‘extract a few props’) rather than something complicated (‘massage every prop into the form you want it in’). Although, tbh, if you try to memoize class components with Redux involved you’re just going to have a bad time no matter what.)</p>
<p>Especially egregious is using Redux to orchestrate UI feedback that happens on the frame-level, such as animation or just having buttons be very responsive. If you want something to be snappy, don’t call <code class="language-plaintext highlighter-rouge">dispatch()</code> and wait for it to finish. You’d be running potentially many thousands of lines of code to do something quickly! Why?! Stop it!</p>
<p>IMO a good rule of thumb is that Redux actions should be 1:1 with state changes that rebuild the UI in a non-trivial way, since those <em>necessarily</em> involve many components being aware of the update and having the chance to respond to it. For everything else, you can use Contexts.</p>
<hr />
<h2 id="10-logging-in-redux">10. Logging in Redux</h2>
<p>While I’m on the subject: if you do use Redux, definitely <em>don’t</em> do your logging with it. It will never be the case that the list of Redux actions is the same, or even necessarily similar to, the list of events you want to log. Even if it was in the Redux sample app. Logging in Redux just pollutes everything – the Redux action list, the reducer list, the number of Redux actions that you might have to step through in the debugger – for no benefit, really. It takes something that’s otherwise easy to read and splays it out into a mess of indirection (although arguably that’s what Redux always does, lol).</p>
<p>Instead, either pass a <code class="language-plaintext highlighter-rouge">Logger</code> object around anywhere you need it, and maybe set up a global <code class="language-plaintext highlighter-rouge">React.Context</code> with your logger in it and let any component grab it from there. I generally think you shouldn’t use middleware at all, just a store + reducers, but some people are bound to disagree.</p>
<hr class="thick" />
<p>Okay, that’s my list.</p>
<p>After all that, I should probably mention that I do love React. React is great. It would be foolish to not use it or something like it for web development in 2023. I don’t think it’s the final library we’ll be using to build interfaces on starships in 100 years, or whatever, but it’s better than everything that came before it, and the future of application development will probably come in the form of looking <em>more</em> Reacty, not less. Or rather, it’ll come in the form of better-delivering on React’s promises than React already does, for instance by not having these landmines everywhere.</p>
<div class="triangles">
<svg class="trianglesvg" xmlns="http://www.w3.org/2000/svg" height="20" width="20">
<polygon class="triangle" style="cursor:auto;" fill="#c3e281" stroke="#c3e281" stroke-width="2" points="6,4 6,16 16.39,10" />
</svg>
<svg class="trianglesvg" xmlns="http://www.w3.org/2000/svg" height="20" width="20">
<polygon class="triangle" style="cursor:auto;" fill="#c3e281" stroke="#c3e281" stroke-width="2" points="6,4 6,16 16.39,10" />
</svg>
<svg class="trianglesvg" xmlns="http://www.w3.org/2000/svg" height="20" width="20">
<polygon class="triangle" style="cursor:auto;" fill="#c3e281" stroke="#c3e281" stroke-width="2" points="6,4 6,16 16.39,10" />
</svg>
</div>
<hr />
<p>My other articles about React:</p>
<ol>
<li><a href="/2022/09/17/react.html">The Zen of React</a></li>
<li><a href="/2022/10/12/react-2.html">The Gist of Hooks</a></li>
<li><a href="/2023/04/25/react-mistakes.html">React Mistakes</a></li>
</ol>