Delta Function Miscellanea
Here’s some stuff about delta functions I keep needing to remember, including:
 the best way to define them
 how \(\delta(x)/x =  \delta'(x)\)
 possible interpretations of \(x \delta(x)\)
 some discussion of the \(\delta(g(x))\) rule
 how \(\delta(x)\) works in curvilinear coordinates.
1. Definitional Stuff
Quibbles about Definitions
I don’t like the way most books introduce delta functions. IMO, if all the ways of defining something give rise to the same properties, then that object “exists” and you don’t really need to define it in terms of another object. Sure, you can construct a delta (distribution) as a limit of Gaussians fixed a fixed integral or whatever, but why would you? \(\int \delta(x) f(x) \, dx = f(0)\) is just fine. (Well, you do have to include some other properties to ensure that \(\< \delta', f \> =  \< \delta, f \>\), but that’s not important.)
The most common definition in physics is the definition in terms of the Fourier transform:
\[\delta(k) = \frac{1}{2 \pi}\int e^{ikx} dx\]And I would emphasize that that is just an identity, not a definition, similar to \(\sin^2 + \cos^2 = 1\).
I also don’t care at all about the use of the word “function” vs. “generalized function” vs. “distribution”. For my purposes, everything is a distribution and demanding a value at a point is (possibly) a mistake. I imagine that in the far future we will use the word “function” for all of these things starting in high school and nobody will care.
Fourier Transform Interpretation
The Fourier transform of \(f(x)\) is given by:
\[\hat{f}(k) = \int f(x) e^{ikx} dx\]One interpretation of the Fourier transform is something like:
\(e^{ikx}\) is an orthogonal basis for frequencyspace functions. We write \(f\) In this basis \(\hat{f}(k)\) by projecting \(f(x)\) onto each component of basis by taking an inner product \(\< f(x), e^{ikx}\> = \int f(x) e^{ikx} dx\).
That’s a pretty good definition, and one that I hold dearly because it took me a while to figure out in college, but I think there’s an even better interpretation waiting in the wings. Something like:
A function \(f\) is a generic object that doesn’t know anything about our choices of bases. The positionspace implementation \(f\) is simply \(f\) written out in the position basis. The Fourier Transform of \(f(x)\) is \(f\) evaluated at \(\hat{k}\), where \(\hat{k}\) is a frequencyvalue rather than a position value, but the two bases live on equal footing and we can treat either as fundamental.
It just so happens that it’s implemented as an integral transform. In particular, the transform is kinda like computing \(f \ast \delta(\hat{k})\), where the convolution acts like an operation that projects objects into different bases. whatever that means. We could imagine expressing both \(f\) and \(\delta(\hat{k})\) in a third basis, neither position nor frequency, and that operation should still make sense.
2. Derivatives of \(\delta\) act like division
I always end up needing to look this up.
The rule for derivatives of delta functions are most easily found by comparing their Fourier transforms. Since we know the that \(\p_x^n = n x^{n1}\) we can compare, using \(\F(x f) = i \p_k \hat{f}\) and \(\F(\p_x f ) = i k \hat{f}\):
\[\begin{aligned} \F(\p_x x^n ) &= \F(n x^{n1} ) \\ (ik) (i \p_k)^{n} \delta_k &= n (i \p_k)^{n1} \delta_k \\  k \delta_k^{(n)} &= n \delta_k^{(n1)} \\ \end{aligned}\]This shows us the relationship between \(\delta_k^{(n)}\) and \(\delta_k^{(n1)}\). Evidently they differ by a factor of \(\frac{k}{n}\). Repeating the process (and switching the variables back to \(x\) since we don’t need the Fourier transforms anymore) gives
\[\begin{aligned}  x \delta^{(n)} &= n \delta^{(n1)} \\ (x)^2 \delta^{(n)} &= n (n1) \delta^{(n2)} \\ (x)^3 \delta^{(n)} &= n (n1) (n2) \delta^{(n3)} \\ & \vdots \\ (x)^n \delta^{(n)} &= (n!) \delta^{(0)} \\ \end{aligned}\]Rearranging things, we get a bunch of useful identites:
\[\begin{aligned} \delta' &=  \frac{1}{x} \delta \\ \delta^{(2)} &= 2\frac{}{x^2} \delta \\ &\vdots \\ \delta^{(n)} &= \frac{ n!}{(x)^n} \delta \end{aligned}\]And also:
\[\begin{aligned} \frac{\delta}{x} &=  \delta' \\ \frac{\delta}{x^2} &= \frac{\delta^{(2)}}{2} \\ & \vdots \\ \frac{\delta}{x^n} &= \frac{(1)^n}{n!} \delta^{(n)} \end{aligned}\]Etc. I write this all out because it is easy to get confused by the factorials in there (and as a reference for myself…). Note that if \(\delta(x)\) is replaced with something like \(\delta(x  a)\), then all those denominators become \(\frac{1}{(xa)^n}\).
When you actually go to integrate these against a test function, it reveals an interesting relationship between delta functions and derivatives.
\[\begin{aligned} \int \frac{ n!}{(x)^n} \delta f \d x &= \int \delta^{(n)} f \d x \\ &= \int (1)^n \delta f^{(n)} \d x \\ \frac{n!}{(0)^n} f(0) &\stackrel{?}{=} (1)^n f^{(n)}(0) \\ \frac{n!}{0^n} f(0) &\stackrel{?}{=} f^{(n)}(0) \end{aligned}\]The left side, of course, is really a principal value \(\P \int \frac{ n!}{(x)^n} \delta f \d x\), which we imagine to mean, basically, “evaluate this at zero but very carefully”. To see what this could mean, imagine that \(f(x)\) has a Taylor series \(f(x) = f_0 + f_1 x + \frac{x^2}{2!} f_2 + \ldots\). Then the left side sorta extracts the \(f_n\) term, because all the lowerorder terms like \(\frac{0}{0^n}\) go to infinity (which we ignore?) and all the higherorder terms like \(\frac{0^{n + m}}{0^n} = 0^m\) go to zero.
Somehow this hints at the true magic of delta functions but I don’t quite see it yet.
3. Multiplications of \(\delta\) act like integrals?
What about \(x^n \delta(x)\) where \(n > 0\)?
According to the actual rigorous theory of distributions, \(x^n \delta(x) = 0\) for any \(n > 1\), because its integral against a test function is zero. But I don’t believe them. I think there’s more going on here.
To illustrate this point, consider extending the argument of the last section to a function with a Laurent series (a finite number of negativepower terms):
\[f(x) = \ldots + f_{2} \frac{2!}{x^2} + f_{1} \frac{1}{x} + f_0 + f_1 + f_2 \frac{x^2}{2!} + \ldots\]Then it is fairly clear that we could extract the negativepower terms in the same way:
\[f_{n} = \P \int \frac{x^n}{n!} \delta f \d x\]Assuming that, once again, all the powers of zero other than \(0^0 = 1\) “cancel out” somehow. So I would argue that \(\frac{x^n}{n!} \delta(x)\) is extracting residues the same way that \(\frac{n!}{x^n} \delta(x)\) extracts derivatives. It’s very nicely symmetric, if you’re willing to allow that \(x \delta \neq 0\).
What does it mean to extract a residue with a delta function? Well, it means that \(\P \int x \delta f d x\) is zero (or some other value we pretend to equal zero) unless \(f(x) \sim \frac{f_{1}}{x}\) at that point, in which case it extracts that coefficient \(f_{1}\). Residues aren’t quite the same thing as integrals, but what seems to happen is that, when you close your integration contours, residues are the only thing that’s left — like how in \(\bb{C}\), a closed integral picks up only the residues inside the integration boundary.
I guess this is useful in two ways. One, it’s the same idea of a residue that you get in complex analysis using the Cauchy integral formula:
\[f_{1}(z) = \frac{1}{2\pi i} \oint_C \frac{f(z)}{z} \d z\]but it’s extracted in a much more intuitive way. I have written before about how the Cauchy integral formula works. The short version is that if you apply Stoke’s theorm it turns into \(\iint \delta(\bar{z}, z) \d \bar{z} \^ \d z\), which relies on the fact that, for mysterious reasons, \(\p_{\bar{z}} \frac{1}{z} = 2 \pi i \delta(z, \bar{z})\).
Two, it makes it a lot easier to see how you would generalize the Cauchy integral formula, the concept of residues, and Laurent series to higher dimensions. Integrating against a delta function in the coordinate you care about — easy. Concocting a whole theory of contour integration — super weird and hard. Works for me.
The one weakness, of course, is that it’s rather unclear what to do with the fact that, evidently, \(\P \int \frac{x}{x^2} \delta \d x = 0\). Shouldn’t \(\frac{0}{0^2} = \infty\), or something like that? Not sure. But this is just one out of very many instances where it seems like math doesn’t handle dividing by zero correctly, so I guess we can file it away in that category and not worry about it for a while.
To summarize, we claim with some handwaving that:
\[\< \frac{x^n}{n!} \delta, f \> = f^{(n)}(0)\]Where the meaning of a “negative derivative” of a function is that it is a residue, ie, the \((n)\)‘th term in the Laurent series of \(f\) around \(x=0\).
4. \(\delta(g(x))\) becomes a sum over poles
I always end up needing to look this up too.
Since \(\delta(x)\) integrates \(f(x)\) to \(f(0)\) at every zero of the \(\delta\), we of course have, via \(u = g(x)\) substitution:
\[\begin{aligned} \int \delta(g(x)) f(g(x)) \, dg(x) &= \int \delta(u) f(u) \, du \\ &=f(u)_{u = 0} \\ &=f(g^{1}(0)) \\ \int \delta(g(x)) f(g(x)) \ g'(x) \ dx &= f(g^{1}(0)) \\ \delta(g(x)) &= \sum_{x_0 \in g^{1}(0)} \frac{\delta(x  x_0)}{\ g'(x_0) \} \\ \end{aligned}\]For instance:
\[\delta(x^2  a^2) = \frac{\delta(x  a)}{2\ a \} + \frac{\delta(x + a)}{2\ a \}\]Somewhat harder to remember is the multivariable version:
\[\begin{aligned}\int f(g(\b{x})) \delta(g( \b{x})) \ \det \del g(\b{x}) \ d^n \b{x} &= \int_{g(\bb{R})} \delta(\b{u}) f(\b{u}) d \b{u} \\ \int f(\b{x}) \delta(g (\b{x})) d \b{x} &= \int_{\sigma = g^{1}(0)} \frac{f(\b{x})}{\ \nabla g(\b{x}) \} d\sigma(\b{x}) \\ \end{aligned}\]Where the final integral is in some imaginary coordinates on the zeroes of \(g(\b{x})\).
In general there is a giant model of “delta functions for surface integrals” which I’ve never quite wrapped my head around, but intend to tackle in a later article. Basically there’s a sense in which every line and surface integral, etc, can be modeled as an appropriate delta function. Wikipedia doesn’t talk about it much. There’s a couple lines on the delta function page, but there’s quite a more for some reason on the page for Laplacian of the Indicator.
I’d also love to understand the version of this for vector or tensorvalued functions as well. What goes in the denominator? Some kind of nonscalar object? Weird.
By the way, there is a cool trick which I found in a paper called Simplified Production of Dirac Delta Function Identities by Nicholas Wheeler^{1} to derive \(\delta(ax) = \frac{\delta(x)}{\ a \}\). We observe that \(\theta(ax) = \theta(x)\) if \(a > 0\) and \(\theta(ax) = 1  \theta(x)\) if \(a < 0\). So we can compute \(\p_x \theta(ax)\) in two different ways:
\[\begin{aligned} \p_x \theta(ax) &= \p_x \theta(ax) \\ a \theta'(ax) &= \sgn(a) \theta'(x) \\ \delta(ax) &= \frac{1}{\ a \} \delta(x) \end{aligned}\]That paper also observes another property I hadn’t thought about, which is that
\[\delta'(ax) = \frac{1}{a \ a \} \delta(x)\]Basically, the funny “absolute value” business only happens in the derivative of \(\theta(ax)\), not the rest of the chain. There are also ways of deriving the more general properties like the form of \(\delta(g(x))\) by starting from derivatives of \(\theta(g(x))\).
5. \(\delta\) is weird in other coordinate systems
I am often reminding myself how \(\delta\) acts in spherical coordinates.
It is useful to think about \(\delta\) as being defined like this:
\[\delta(x) = \frac{1_{x =0 }}{dx}\]In the sense that it is designed to perfectly cancel out \(dx\) terms in integral. It’s \(0\) everywhere, except at the origin where it perfectly cancels the out \(dx\). Point is, \(\delta\) always transforms like the inverse of how \(dx\) transforms. If you write \(dg(x) = \ g'(x) \ dx\), then of course \(\delta\) transforms as
\[\delta(g(x)) = \frac{\delta(x  g^{1}(0))}{\ g'(x) \}\]This, at least, makes it easy to figure out what happens in other coordinate systems.
By the way. The notation \(\delta^3(\b{x})\) customarily means that the function is separable into all the individual variables: \(\delta^3(\b{x}) = \delta(x) \delta(y) \delta(z)\). In other coordinate systems this doesn’t work: separating it requires introducing coefficients, as we’re about to see.
Here’s spherical coordinates:
\[\begin{aligned} \iiint_{\bb{R}^3} \delta^3(\b{x}) f(\b{x}) d^3 \b{x} &= f(x=0,y=0,z=0) \\ f(r=0, \theta=0, \phi=0) &= \int_0^{2 pi} \int_{\pi}^\pi \int_0^\infty \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} f(r, \theta, \phi) r^2 \sin \theta \, dr \, d \theta \, d \phi \\ &= \int_{\pi}^\pi \int_0^\infty \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} f(r, \theta, 0) ( 2 \pi r^2 \sin \theta) \, dr \, d \theta \\ &= \int_0^\infty \frac{\delta(r)}{4 \pi r^2 } f(r, 0, 0) ( 4 \pi r^2 ) \, dr \\ &= f(0,0,0) \end{aligned}\]So:
\[\begin{aligned} \delta(x, y, z) &= \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} \\ &= \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} \\ &= \frac{\delta(r)}{4 \pi r^2 } \end{aligned}\]There is some trickiness to all this, though. Be careful: the \(r\) integral is from \((0, \infty)\) instead of the conventional \((\infty, \infty)\). Sometimes identities that you’re used to working won’t work the same way if you are dealing with \(\delta(r)\) as a result. Also, it’s very unusual, but not impossible I suppose, to have functions that have a nontrivial \(\theta\) dependence even as \(r \ra 0\). I have no idea what that would be mean and I don’t know how to handle it with a delta function.
I’ve occasionally also seen it written in this weird way, where the \(\cos \theta\) factor causes the \(\sin \theta\) in the denominator to disappear.
\[\delta(x,y,z) = \frac{\delta(r, \cos \theta, \phi)}{r^2}\]Here’s the polar / cylindrical coordinate version:
\[\delta^2(x, y) = \frac{\delta(r, \theta)}{r} = \frac{\delta(r)}{2 \pi r}\]Evidently in \(\bb{R}^n\), the numerators are related to the surface areas of nspheres.
6. The Indicator Function \(I_x = x \delta(x)\)
Related to \(x^n \delta(x)\) up above…
Since \(\int \delta(x) f(x) \, dx = f(0)\), we could flip this around and say that this is the definition of evaluating \(f\) at \(0\). Or, more generally, we could say that integrating against \(\delta(xy) \, dx\) is “what it means” to evaluate \(f(y)\).
This is a bit strange though. Why does evaluation require an integral? Maybe we need to define a new thing, the indicator function, which requires no integral:
\[I_x f = f(x)\]The definition is
\[I_x = \begin{cases} 1 & x = 0 \\ 0 & \text{otherwise} \end{cases}\]But that probably masks its distributional character. A better definition is that it’s just
\[I_x = x \delta(x)\]Whereas \(\delta_x\) is infinite at the origin and is defined to integrate to \(1\), the \(I_x\) function is just required to equal one at the origin. Of course, its integral is \(0\). It could also be constructed like this:
\[I_x = \lim_{\e \ra 0^{+}} \theta(x  \e)  \theta(x + \e)\](Compare to the \(\delta\) version: \(\delta_x = \lim_{\e \ra 0^{+}} \frac{1}{x} [ \theta(x  \e)  \theta(x + \e)] \stackrel{?}{=} \P(\frac{I_x}{x})\).)
By either definition, \(I_x\) has zero derivative everywhere:
\[\p_x I_x = \delta(x)  \delta(x) = 0 \\ \p_x (x \delta(x)) = \delta(x) + x \delta'(x) = \delta(x)  \delta(x) = 0\]Compare to \(\p_x \delta(x) = \delta'(x) = \frac{\delta(x)}{x}\). It sorta seems like there might be some even further generalization of functions which could distinguish this derivative from \(0\), since obviously \(x \delta(x)\) is not, in fact, constant at \(x = 0\). The derivative would be some distributionlike object which has the property that \(I'(x, dx) = \begin{cases} 1 & x + dx = 0 \\ 0 & \text{otherwise} \end{cases}\) … which is weird.
\(I_x\) also has this deltafunctionlike property:
\[\int I_x \frac{f(x)}{x} dx = f(0)\](in a “principal value” sense, of course) It seems natural to consider a whole family of these with any power, \(x^k \delta(x)\). As we know, dividing \(\delta\) by powers of \(x\) like \(\delta^{(n)}(x)\) produces derivatives: \((\delta(x)/x )f(x) = f'(0)\). So I would guess that these positivepower \(x^n \delta(x)\) functions produce… integrals? But normally (cf contour integration) integrals add up contributions from (a) boundaries and (b) poles (and really poles are a kind of boundary, topologically). These \(x^n \delta(x)\) terms only add up the constributions from poles but do nothing at infinity. Maybe that’s because in some sense the \(x^{n} \delta(x) \propto \delta^{(n)}(x)\) terms are the ones that deal with poles at infinity?
I like this \(I_x\) object. It seems fundamental. Maybe we should just write \(f(x) = I_x f\) all the time.
7. Miscellaneous Breadcrumbs
Things I want to remember but don’t have much to say about:
There is a thing called a Wavefront Set that comes from the subfield of “microlocal analysis”. It allows ‘characterizing’ singularities in a way that, for instance, would extract which dimensions a delta function product like \(\delta(x) \delta(y)\) is acting in.
Among other things, the Wavefront Set allows you to say when multiplying distributions together is wellbehaved: as I understand it, they have to not have singularities “in the same directions”. \(\delta(x) \delta(x)\), for instance, is not allowed. (I bet \((x \delta)^2\) is, though.) Here’s a nice paper on the subject.
There are further generalizations of functions called hyperfunctions, which are instead defined in terms of “the difference of two holomorphic functions on a line” (which can be e.g. a pole at the origin). Gut reaction: relies on complex analysis, which sounds annoying.
A current is a differentialform distribution on a manifold. Some day I’m going to have to learn about those, but for now, nah, I’m good.

This paper also has something interesting, if not entirely comprehensible, things to say about the existence of forward and backwards time propagators in QFT wave equation solutions. ↩