Delta Function Miscellanea

October 14, 2023

I keep needing to remember some of the strange properties of Delta Functions, so here’s a reference.


1. Definitional Stuff

Quibbles about Definitions

IMO: if all the ways of defining something give rise to the same properties, then that object “exists” and you don’t really need to define it in terms of another object. Sure, you can construct a delta (distribution) as a limit of another sequence, but why would you? \(\int \delta(x) f(x) \, dx = f(0)\) is just as good as everything else. (Well, you do have to include some other properties to ensure that \(\< \delta', f \> = - \< \delta, f \>\), but that’s not important.)

The most common definition in physics is the definition in terms of the Fourier transform:

\[\delta(k) = \frac{1}{2 \pi}\int e^{-ikx} dx\]

But I still prefer to just claim that \(\delta(x)\) exists without any justification. Those are just implementations of it.

I also don’t care at all about the use of the word “function” vs. “generalized function” vs. “distribution”. For my purposes, everything is a distribution and demanding a value at a point is (possibly) a mistake. Someday in the future (fingers crossed) we’ll teach distributions in high school calculus and we’ll call them functions and no one will care.

I’m sure I’m wrong about all my quibbles, but that’s okay. I’m just trying to compress this stuff into a very small and easy-to-use part of my brain, and that’s only possible if I can ignore all the technical details.


Fourier Transform Interpretation

The Fourier transform of \(f(x)\) is of course given by:

\[\hat{f}(k) = \int f(x) e^{-ikx} dx\]

One interpretation of the Fourier transform equation is something like:

\(e^{-ikx}\) is an orthogonal basis for frequency-space functions and we construct \(\hat{f}(k)\) by projecting \(f(x)\) onto that basis, computing the projections via an inner product \(\< f(x), e^{-ikx}\>\).

And that’s fine, but I think there’s an even better interpretation:

\(f\) is a generic object that doesn’t know anything about our choices of bases. The Fourier Transform of \(f(x)\) is \(f\) evaluated at \(\hat{k}\), where \(\hat{k}\) is a frequency-value rather than a position value, but the two bases live on equal footing and we can treat either as fundamental.

It just so happens that it’s implemented as an integral transform. In particular, the transform is kinda like computing \(f \ast \delta(\hat{k})\), a sort of “convolution-like” operation between objects in different bases. whatever that means. We could imagine imagine expressing both \(f\) and \(\delta(\hat{k})\) in a third basis, neither position nor frequency, and that operation should still make sense.


2. Derivatives of \(\delta\) act like division

I always end up needing to look this up.

The rule for derivatives of delta functions are most easily found by comparing their Fourier transforms. Since \(\F(\p_x x^n) = \F(n x^{n-1})\) then:

\[\begin{aligned} \F(\p_x x^n ) &= \F(n x^{n-1} ) \\ (ik) (i \p_k)^{n} \delta_k &= n (i \p_k)^{n-1} \delta_k \\ - k \delta_k^{(n)} &= n \delta_k^{(n-1)} \\ \end{aligned}\]

So in particular:

\[\begin{aligned} -x \delta'_x &= \delta_x \\ -x \delta^{(2)}_x &= 2 \delta^{(1)}_x \\ x^2 \delta^{(x)}_x &= 2 \delta_x \\ &\vdots \\ (-x)^n \delta^{(n)}_x &= n! \delta_x \\ & \\ \delta'_x &= - \frac{\delta_x}{x} \\ \delta^{(n)}_x &= n! \frac{ \delta_x}{(-x)^n} \end{aligned}\]

Or, if you prefer:

\[\begin{aligned} \delta^{(n)}_x f(x) = (n! )\delta_x \frac{ f^{(n)}(x)}{x^n} \end{aligned}\]

Etc. It’s easy to forget that that \(n!\) coefficient shows up there!

(If you use these, probably make sure everything has a \(\P\) around it.)


3. Multiplications of \(\delta\) act like integrals?

For that matter, what about \(x^n \delta(x)\)?

Note: according to distribution theorists, \(x^n \delta(x) = 0\) for any \(n > 1\), because its integral against a test function is zero. But I don’t believe them.

First, a detour into complex analysis. The Laurent series of a complex function \(f(z)\) looks like this:

\[f(z) = \ldots \frac{k!}{z^k} f^{(-k)}(0) + \ldots \frac{2!}{z^2} f^{(-2)}(0) + f^{(-1)}(0){z} + f(0) + f^{(1)} z + \frac{z^2}{2!} f^{(2)}(0) + \ldots + \frac{z^k}{k!} f^{(k)}(0) + \ldots\]

It’s typically understood the converge in a particular annulus, for \(r < z < R\), where \(r\) is the radius of convergence of the positive-power around \(0\), and \(R\) is the radius of non-convergence of the negative-power part around \(0\), which is to say, the radius of convergence around \(\infty\). Most people do not include the factorials on the negative terms but I think they should.

The negative terms (well, all of them really, but you can just take the derivative for the positive ones also) are computed from the Cauchy integral formula:

\[f^{(n)}(0) = \frac{n!}{2 \pi i} \oint \frac{f(z)}{z^{n+1}} dz\]

The fact that this works seems miraculous, but in fact it is just a heavily obfuscated version of \(f(0) = \iint \delta(z, \bar{z}) f(z) \, d \bar{z} \^ dz\) which makes it obvious. Basically it turns out that \(\p_{\bar{z}}\frac{1}{z} = \delta(z, \bar{z})\) and then you apply Stoke’s theorem.

It is best to think of these negative-power terms as “negative-power derivatives” of \(f\). They’re antiderivatives but evaluated on a closed loop that mostly cancels out, so they pick up the value at particular poles rather than simply “the sum over all space”.

It is easier to see what this does for \(n < 0\) if we use the Stoke’s theorem representation: \(f^{(-n)}(0) = \iint \frac{z^{n}}{n!} \delta(z, \bar{z}) f(z) \, d \bar{z} \^ dz\). Evidently they are extracting the constant terms of \(z^n f(z)\), aka, the coefficients of the \(f_{-n} \frac{n! }{z^n}\) terms in the Laurent series.1

It’s worth noting that these integrands \(z^n f(z)\) are the same ones used to compute the “moments” of a function. Up to some scalar coefficients, \(\int f(x) dx\) is the average value, \(\int x f(x) dx\) is the center of mass, \(\int x^2 f(x)\) is the moment of inertia / variance, etc.

Anyway, I mention all this to say that, imo, \(f^{(n)}(0)\) is a perfectly useful concept for both positive and negative \(n\). For positive, it’s the \(n\)th derivative. For negative, it’s the \(n\)th residue / integral / moment / whatever. Since they’re contour integrals they’re not quite the same, but there is a close relationship. If \(\lim_{\| z \| \ra \infty} f(z) = 0\) then we can close the contour at infinity without adding any value to the integral so they should be literally the same.

We can compute a 1D derivative like this if we assume \(f(x) = f(0) + f'(0) x + \ldots\):

\[-\frac{1}{2} \P \int_{0^-}^{0+} \frac{f(x)}{x^2} dx = \frac{1}{2} \lim_{\e \ra 0} \frac{f(x+\e) - f(x - \e)}{\e} = f'(x)\]

(At least I think we can. Take anything I write that involves \(\P\) with a grain of salt, though, as the analysis goes over my head.)

It is essentially the residue of \(f(x)/x^2 = [f(0) + f'(0) x + \ldots] /x^2\), which extracts the “residue at infinity of the \(x\) term” by dividing through to move it to \(0\). We can even just find the value \(f(0)\) with:

\[\frac{1}{2} \P \int_{0^-}^{0^+} \frac{f(x)}{x} dx = f(0)\]

But doesn’t this look a lot like a delta function? Evidently, in some sense:

\[\int \delta_x f(x) dx = \frac{1}{2} \P \int_{0^-}^{0^+} \frac{f(x)}{x} dx\]

It gives another good reason to believe that \(\delta(x)/x = - \delta'(x)\):

\[-\int \frac{\delta_x}{x} f(x) dx = -\frac{1}{2} \P \int_{0^-}^{0^+} \frac{f(x)}{x^2} \; dx = f'(0)\]

And what about objects like \(x^n \delta_x\)? Well those seem like they must produce those integral values / moments / residues:

\[\int \frac{x^n}{n!} \delta_x f(x) dx \stackrel{?}{=} \frac{1}{2} \P \int_{0^-}^{0^+} \frac{x^{n-1}}{n!} f(x) dx \stackrel{?}{=} f^{(-n)}(0)\]

So assuming I haven’t gotten confused somewhere, I think it’s reasonable to view this as true, in analogy with the identities for \(\delta\) derivatives:

\[\begin{aligned} x \delta_x &= \delta^{(-1)}_x \\ \frac{x^n}{n!} \delta_x &= \delta^{(-n)}_x \end{aligned}\]

And the meaning is that it produces moments, the negative-power terms in the Laurent series:

\[\int \delta^{(-n)}_x f(x) \, dx = f^{(-n)}(0)\]

(Note: depending on you define these maybe they should have a factor of \((-1)^n\). Not sure.)


4. \(\delta(g(x))\) becomes a sum over poles

I always end up needing to look this up too.

Since \(\delta(x)\) integrates \(f(x)\) to \(f(0)\) at every zero of the \(\delta\), we of course have, via \(u = g(x)\) substitution:

\[\begin{aligned} \int \delta(g(x)) f(g(x)) \, dg(x) &= \int \delta(u) f(u) \, du \\ &=f(u)_{u = 0} \\ &=f(g^{-1}(0)) \\ \int \delta(g(x)) f(g(x)) \| g'(x) \| dx &= f(g^{-1}(0)) \\ \delta(g(x)) &= \sum_{x_0 \in g^{-1}(0)} \frac{\delta(x - x_0)}{\| g'(x_0) \|} \\ \end{aligned}\]

For instance:

\[\delta(x^2 - a^2) = \frac{\delta(x - a)}{2\| a \|} + \frac{\delta(x + a)}{2\| a \|}\]

Somewhat harder to remember is the multivariable version:

\[\begin{aligned}\int f(g(\b{x})) \delta(g( \b{x})) \| \det \del g(\b{x}) \| d^n \b{x} &= \int_{g(\bb{R})} \delta(\b{u}) f(\b{u}) d \b{u} \\ \int f(\b{x}) \delta(g (\b{x})) d \b{x} &= \int_{\sigma = g^{-1}(0)} \frac{f(\b{x})}{\| \nabla g(\b{x}) \|} d\sigma(\b{x}) \\ \end{aligned}\]

Where the final integral is in some imaginary coordinates on the zeroes of \(g(\b{x})\).

In general there is a giant model of “delta functions for surface integrals” which I’ve never quite wrapped my head around, but intend to tackle in a later article. Basically there’s a sense in which every line and surface integral, etc, can be modeled as an appropriate delta function. Wikipedia doesn’t talk about it much. There’s a couple lines on the delta function page, but there’s quite a more for some reason on the page for Laplacian of the Indicator.

I’d also love to understand the version of this for vector- or tensor-valued functions as well. What goes in the denominator? Some kind of non-scalar object? Weird.

By the way, there is a cool trick which I found in a paper called Simplified Production of Dirac Delta Function Identities by Nicholas Wheeler2 to derive \(\delta(ax) = \frac{\delta(x)}{\| a \|}\). We observe that \(\theta(ax) = \theta(x)\) if \(a > 0\) and \(\theta(ax) = 1 - \theta(x)\) if \(a < 0\). So we can compute \(\p_x \theta(ax)\) in two different ways:

\[\begin{aligned} \p_x \theta(ax) &= \p_x \theta(ax) \\ a \theta'(ax) &= \sgn(a) \theta'(x) \\ \delta(ax) &= \frac{1}{\| a \|} \delta(x) \end{aligned}\]

That paper also observes another property I hadn’t thought about, which is that

\[\delta'(ax) = \frac{1}{a \| a \|} \delta(x)\]

Basically, the funny “absolute value” business only happens in the derivative of \(\theta(ax)\), not the rest of the chain. There are also ways of deriving the more general properties like the form of \(\delta(g(x))\) by starting from derivatives of \(\theta(g(x))\).


5. \(\delta\) is weird in other coordinate systems

I am often reminding myself how \(\delta\) acts in spherical coordinates.

It is useful to think about \(\delta\) as being defined like this:

\[\delta(x) = \frac{1_{x =0 }}{dx}\]

In the sense that it is designed to perfectly cancel out \(dx\) terms in integral. It’s \(0\) everywhere, except at the origin where it perfectly cancels the out \(dx\). Point is, \(\delta\) always transforms like the inverse of how \(dx\) transforms. If you write \(dg(x) = \| g'(x) \| dx\), then of course \(\delta\) transforms as

\[\delta(g(x)) = \frac{\delta(x - g^{-1}(0))}{\| g'(x) \|}\]

This, at least, makes it easy to figure out what happens in other coordinate systems.

By the way. The notation \(\delta^3(\b{x})\) customarily means that the function is separable into all the individual variables: \(\delta^3(\b{x}) = \delta(x) \delta(y) \delta(z)\). In other coordinate systems this doesn’t work: separating it requires introducing coefficients, as we’re about to see.

Here’s spherical coordinates:

\[\begin{aligned} \iiint_{\bb{R}^3} \delta^3(\b{x}) f(\b{x}) d^3 \b{x} &= f(x=0,y=0,z=0) \\ f(r=0, \theta=0, \phi=0) &= \int_0^{2 pi} \int_{-\pi}^\pi \int_0^\infty \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} f(r, \theta, \phi) r^2 \sin \theta \, dr \, d \theta \, d \phi \\ &= \int_{-\pi}^\pi \int_0^\infty \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} f(r, \theta, 0) ( 2 \pi r^2 \sin \theta) \, dr \, d \theta \\ &= \int_0^\infty \frac{\delta(r)}{4 \pi r^2 } f(r, 0, 0) ( 4 \pi r^2 ) \, dr \\ &= f(0,0,0) \end{aligned}\]

So:

\[\begin{aligned} \delta(x, y, z) &= \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} \\ &= \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} \\ &= \frac{\delta(r)}{4 \pi r^2 } \end{aligned}\]

There is some trickiness to all this, though. Be careful: the \(r\) integral is from \((0, \infty)\) instead of the conventional \((-\infty, \infty)\). Sometimes identities that you’re used to working won’t work the same way if you are dealing with \(\delta(r)\) as a result. Also, it’s very unusual, but not impossible I suppose, to have functions that have a non-trivial \(\theta\) dependence even as \(r \ra 0\). I have no idea what that would be mean and I don’t know how to handle it with a delta function.

I’ve occasionally also seen it written in this weird way, where the \(\cos \theta\) factor causes the \(\sin \theta\) in the denominator to disappear.

\[\delta(x,y,z) = \frac{\delta(r, \cos \theta, \phi)}{r^2}\]

Here’s the polar / cylindrical coordinate version:

\[\delta^2(x, y) = \frac{\delta(r, \theta)}{r} = \frac{\delta(r)}{2 \pi r}\]

Evidently in \(\bb{R}^n\), the numerators are related to the surface areas of n-spheres.


6. The Indicator Function \(I_x = x \delta(x)\)

Related to \(x^n \delta(x)\) up above…

Since \(\int \delta(x) f(x) \, dx = f(0)\), we could flip this around and say that this is the definition of evaluating \(f\) at \(0\). Or, more generally, we could say that integrating against \(\delta(x-y) \, dx\) is “what it means” to evaluate \(f(y)\).

This is a bit strange though. Why does evaluation require an integral? Maybe we need to define a new thing, the indicator function, which requires no integral:

\[I_x f = f(x)\]

The definition is

\[I_x = \begin{cases} 1 & x = 0 \\ 0 & \text{otherwise} \end{cases}\]

But that probably masks its distributional character. A better definition is that it’s just

\[I_x = x \delta(x)\]

Whereas \(\delta_x\) is infinite at the origin and is defined to integrate to \(1\), the \(I_x\) function is just required to equal one at the origin. Of course, its integral is \(0\). It could also be constructed like this:

\[I_x = \lim_{\e \ra 0^{+}} \theta(x - \e) - \theta(x + \e)\]

(Compare to the \(\delta\) version: \(\delta_x = \lim_{\e \ra 0^{+}} \frac{1}{x} [ \theta(x - \e) - \theta(x + \e)] \stackrel{?}{=} \P(\frac{I_x}{x})\).)

By either definition, \(I_x\) has zero derivative everywhere:

\[\p_x I_x = \delta(x) - \delta(-x) = 0 \\ \p_x (x \delta(x)) = \delta(x) + x \delta'(x) = \delta(x) - \delta(x) = 0\]

Compare to \(\p_x \delta(x) = \delta'(x) = -\frac{\delta(x)}{x}\). It sorta seems like there might be some even further generalization of functions which could distinguish this derivative from \(0\), since obviously \(x \delta(x)\) is not, in fact, constant at \(x = 0\). The derivative would be some distribution-like object which has the property that \(I'(x, dx) = \begin{cases} 1 & x + dx = 0 \\ 0 & \text{otherwise} \end{cases}\) … which is weird.

\(I_x\) also has this delta-function-like property:

\[\int I_x \frac{f(x)}{x} dx = f(0)\]

(in a “principal value” sense, of course) It seems natural to consider a whole family of these with any power, \(x^k \delta(x)\). As we know, dividing \(\delta\) by powers of \(x\) like \(\delta^{(n)}(x)\) produces derivatives: \((\delta(x)/x )f(x) = -f'(0)\). So I would guess that these positive-power \(x^n \delta(x)\) functions produce… integrals? But normally (cf contour integration) integrals add up contributions from (a) boundaries and (b) poles (and really poles are a kind of boundary, topologically). These \(x^n \delta(x)\) terms only add up the constributions from poles but do nothing at infinity. Maybe that’s because in some sense the \(x^{-n} \delta(x) \propto \delta^{(n)}(x)\) terms are the ones that deal with poles at infinity?

I like this \(I_x\) object. It seems fundamental. Maybe we should just write \(f(x) = I_x f\) all the time.


7. Miscellaneous Breadcrumbs

Things I want to remember but don’t have much to say about:

There is a thing called a Wavefront Set that comes from the subfield of “microlocal analysis”. It allows ‘characterizing’ singularities in a way that, for instance, would extract which dimensions a delta function product like \(\delta(x) \delta(y)\) is acting in.

Among other things, the Wavefront Set allows you to say when multiplying distributions together is well-behaved: as I understand it, they have to not have singularities “in the same directions”. \(\delta(x) \delta(x)\), for instance, is not allowed. (I bet \((x \delta)^2\) is, though.) Here’s a nice paper on the subject.

There are further generalizations of functions called hyperfunctions, which are instead defined in terms of “the difference of two holomorphic functions on a line” (which can be e.g. a pole at the origin). Gut reaction: relies on complex analysis, which sounds annoying.

A current is a differential-form distribution on a manifold. Some day I’m going to have to learn about those, but for now, nah, I’m good.

  1. I haven’t closely checked these equations, although I mean to do so eventually. There might be missing coefficients or something. When I worked this out in a previous article I only did so for \(n=1\). 

  2. This paper also has something interesting, if not entirely comprehensible, things to say about the existence of forward- and backwards- time propagators in QFT wave equation solutions.