Delta Function Miscellanea

October 14, 2023

Here’s some stuff about delta functions I keep needing to remember, including:

the best way to define them
how \(\delta(x)/x = - \delta'(x)\) and mostly why it doesn’t
possible interpretations of \(x \delta(x)\)
some discussion of the \(\delta(g(x))\) rule
how \(\delta(x)\) works in curvilinear coordinates.
some discussion on why \(\p(\theta^k)\) is invalid while \(\p(\theta)\) is

1. Definitional Stuff

Quibbles about Definitions

I don’t like the way most books introduce delta functions. IMO, if all the ways of defining something give rise to the same properties, then that object “exists” and you don’t really need to define it in terms of another object. Sure, you can construct a delta (distribution) as a limit of Gaussians fixed a fixed integral or whatever, but why would you?

\[\int \delta(x) f(x) \, dx = f(0)\]

is just fine. (Well, you do have to include some other properties to ensure that \(\< \delta', f \> = - \< \delta, f \>\), but that’s not important.)

The most common definition in physics is the definition in terms of the Fourier transform:

\[\delta(k) = \frac{1}{2 \pi}\int e^{-ikx} dx\]

And I would emphasize that that is just an identity, not a definition, similar to \(\sin^2 + \cos^2 = 1\).

I have come to think that the best way to think about them, though, is that they are this:

\[\delta(x) = d \theta(x)\]

Which pretty much tells you everything you need to know. You can also think of this as \(1_dx/dx\), where \(1_dx\) is the “oriented indicator” function for the interval \((x, x + dx)\), so it’s \(1\) if that interval crosses \(0\) in the positive direction, \(-1\) if negative, and \(0\) otherwise. I wrote another post about this idea, and it’s pretty much how I think about them now.

I also don’t care at all about distinguishing the terms “function” vs. “generalized function” vs. “distribution”. For my purposes, everything is a distribution and demanding a value at a point is (possibly) a mistake. I imagine that in the far future we will use the word “function” for all of these things starting in high school and nobody will care.

Fourier Transform Interpretation

The Fourier transform of \(f(x)\) is given by:

\[\hat{f}(k) = \int f(x) e^{-ikx} dx\]

One interpretation of the Fourier transform is something like:

\(e^{-ikx}\) is an orthogonal basis for frequency-space. We write \(f\) in this basis \(\hat{f}(k)\) by projecting \(f(x)\) onto each component of basis by taking an inner product \(\< f(x), e^{-ikx}\> = \int f(x) e^{-ikx} dx\).

Or the slightly-better version:

\(e^{ikx}\) is an orthogonal basis for frequency-space, and \(1/e^{ikx} = (e^{ikx})^* = e^{-ikx}\) is its dual basis. We write \(f\) in the frequency basis by using the dual basis to find each of its components: \(\< e^{-ikx} \| f \> = \int f(x) e^{-ikx} \d x\).

That’s a pretty good definition, and one that I hold dearly because it took me a long time to figure out in college, but I think there’s an even better interpretation waiting in the wings. Something like:

A function \(f\) is a generic object that doesn’t know anything about our choices of bases. The position-space implementation \(f\) is simply \(f\) written out in the position basis. The Fourier Transform of \(f(x)\) is interpreted as evaluating \(f\) at \(\hat{k}\), where \(\hat{k}\) is a frequency-value point rather than a position value, but the two bases are thought of as existing on equal footing and we can treat either as fundamental.

Then evaluating \(f\) in either basis happens by the same procedure:

\[\begin{aligned} f(x) &= \< f, \delta(x) \> \\ f(\hat{k}) &= \< f, \delta(\hat{k}) \> \\ \end{aligned}\]

Which implies that \(\delta(x)\) is the position space delta and \(\frac{1}{2\pi} e^{-ikx}\) is the frequency space delta (as written out in position space). We could imagine expressing both \(f\) and \(\delta(\hat{k})\) in a third basis, neither position nor frequency, and that operation should still make sense.

(I suspect this is probably not quite the best way to think about things. It should probably be \(\< f, \hat{x} \> = \< \sum f(x) \hat{x}, \hat{x} \> = f(x)\), and then \(\< df, \delta(x) \> = \< \sum f(x) \d x, \delta(x) \>\) is a clumsy variant for use on one-forms? Something like that.)

Of course this fails completely with the actual rigorous interpretations of functions and Fourier transforms (certainly you have to figure out how to do something with poles). But it feels to me like it is at some level morally true.

2. Derivatives of \(\delta\) act like division (sometimes)

I always end up looking this up.

There is a sense in which a derivative \(\delta'\) of \(\delta\) acts like \(-\delta/x\). The justification is due to how it interacts with \(xf(x)\):

\[\begin{aligned} \int \delta'(x) [ x f(x) ] \d x &= -\int \frac{\delta}{-x} [x f(x)] \d x \\ \int \delta(x) (-\p_x) [x f(x)] \d x &= - \int \delta(x) f(x) \d x \\ \int \delta(x) [f(x) + x f'(x)] \d x &= f(0) \\ f(0) + 0 f'(0) &= f(0) \end{aligned}\]

It sorta seems like it ought to work for functions that aren’t of the form \(x f(x)\)? But maybe not? It’s a bit hard to see why. There are complicated arguments out there but I had a lot of trouble understanding them. But after some work I think I have figured out what is going on, which explains exactly when it is valid and when it isn’t.

The basic observation is that, when \(\delta'\) is integrating against a function \(f\), it evaluates \(f\) at exactly the points to create a negative derivative of \(f\), due to the fact that \(\int \delta(x + a) f(x) \d x = \int \delta(x) f(x - a) \d x= f(-a)\). Like this (where I’m omitting the limits and treating \(\e\) as an infinitesimal that goes to \(0\) at the end):

\[\begin{aligned} \delta' f &= \frac{\delta(x + \e) - \delta(x)}{\e} f(x) \\ &= \delta(x) \frac{f(x - \e) - f(x)}{\e} \\ &= - \delta f' \\ \end{aligned}\]

That’s the sense in which \(\delta' f = - \delta f'\). (Note how this derivation shows that it turns a right-derivative (\(f(x + \e) - f(x)\)) into a (negative) left-derivative (\(f(x) - f(x - \e)\)). That might matter if you had a function whose left- and right-derivatives were different, such as \(\| x \|\) at \(x = 0\)? But normally we just say the function doesn’t have a derivative at that point, or we represent it as a step function that captures both sides at once.)

Now, when \(f\) is of the form \(f = x g\), you get:

\[\begin{aligned} \int \delta' (x g) \d x &\approx \int \big[ \delta(x + \e) \frac{x g(x)}{\e} - \delta(x)\frac{x g(x)}{\e} \big] \d x \\ &= \int \delta(x + \e) \frac{x g(x)}{\e} \d x - \cancel{\int \delta(x) \frac{x g(x)}{\e} \d x} \\ &= \frac{- \e g(-\e)}{\e}\\ &\Ra - g(0) \\ \end{aligned}\]

So there are two terms in the derivative, but because one of them is proportional to \(x g(x)\), it just disappears when integrated against \(\delta(x)\). The remaining term is therefore of the form \(-\frac{1}{x} (x g(x))\), which integrates to \(- g(0)\).

Strangely, in the surviving term, there is something going around where the limiting \(\e\) from the derivative is what actually becomes the \(1/x\) in \(-\delta/x\). Both variables get set to zero, so they end up being fungible, sorta? But also not really, because if you’re allowed to interchange \(x \lra \e\), then you could have done it in the second term also, which would cause it to become \(- \int \delta g \d x = - g(0)\) as well… yeah. Something is fishy. It feels like it would be more “honest” to write the identity as \(\delta'(x) \approx -\delta(x + \e) / \e\) rather than \(-\delta/x\). The derivation sets off some flags for me; the fact that it only works against certain functions and conflates \(x \lra \e\) leads me to assume that it does not generalize well.

Point is: there is almost no sense in which \(\delta' = -\delta/x\) in general. It only works because of the way that the \(x\)-coefficient helpfully cancels out the other term inside \(\delta(x + \e) - \delta(x)\), and then the \(\e\) of the derivative is being conflated with the \(x\) of the delta function because they both end up going to zero. But sometimes it might be useful anyway, just to simplify expressions involving \(\delta'\).

Nevertheless we can iterate this argument: in an integral of \(\delta^{(n)}(x)\) against \(x^n g(x)\), it is possible to replace the derivatives with division, according to:

\[\begin{aligned} \delta^{(n)}(x) &= \frac{n!}{(-x)^n} \delta(x) \end{aligned}\]

Or:

\[\begin{aligned} (-\p)^n \delta(x) = \frac{n!}{x^n} \delta(x) \end{aligned}\]

Or even—just cause it’s suggestive of something?—

\[(i \p)^n \delta(x) = \frac{n!}{(ix)^n} \delta(x)\]

Or, using \(\overrightarrow{\p} \delta(x) =- \delta(x) \overrightarrow{\p}\):

\[\delta(x) \overrightarrow{\p}^n = \frac{\delta(x)}{(x^n / n!)}\]

Which is nice because you can tell kinda what it’s doing: taking the derivative \(n\) times is equivalent to dividing out the \(n\)th Taylor series term \(x^n/n!\)… if you are also deleting the rest of the terms by integrating against \(\delta(x)\) afterwards also:

\[\begin{aligned} \int \delta(x) \p^n f(x) \d x &= \int \delta(x) \p^n [f_n \frac{x^n}{n!} + f_{n+1} \frac{x^{n+1}}{(n+1)!} + \ldots + \frac{x^m}{m!} + \ldots] \d x \\ &= \int \delta(x) \frac{f_n \frac{x^n}{n!} + f_{n+1} \frac{x^{n+1}}{(n+1)!} + \ldots+ \frac{x^m}{m!} + \ldots}{x^n / n!} \d x \\ &= \int \delta(x) [f_n + f_{n+1} \frac{x}{n+1} + \ldots + \frac{x^{m-n}}{m!/n!} + \ldots] \d x \\ &= f_n \end{aligned}\]

This is another reason that the whole construction feels wrong. The division in the rest of the terms makes these awkward \(\frac{x^{m-n}}{m!/n!}\) terms that don’t really mean anything, but get canceled out afterwards. It just feels sketchy. I don’t trust it.

All that said, the strange rules of distributions mean that this is acceptable:

\[- x (\p_x \delta) = \delta\]

Yet \(\p \delta \neq -\delta / x\). The reason is clear enough: multiply by the \(x\) explicitly guarantees that the \(\p_x\) is integrated against a function of the form \(x f(x)\) (assuming that it meets the other requirements for being a Schwartz function, meaning that it does not, say, an \(x^{-n}\) dependence that wouldn’t be canceled out.) Similarly, the acceptable higher-derivative forms are

\[\frac{x^n}{n!} (-\p)^n \delta(x) = \delta(x)\]

The reason that \(-x (\p_x \delta) = \delta\) can’t be made equal to \(\p_x = -\delta/x\) is that dividing through by \(x\) means multiplying by \(1/x\) on both sides, which is not even a distribution (because it is not integrable around \(x=0\)). The principal value version, which I guess you might write as \(\P \frac{1}{x}\), is a distribution—but that is essentially because it is basically the function \(1/x\) with the interval that crosses \(x=0\) deleted (and there can be open degrees of freedom in how you do this).

Even with that, though, multiplying distributions is not defined, the simplest example being that \(\theta^k(x) = \theta(x)\) but \(\p (\theta^k(x)) \? k \theta^{k-1}(x) \delta(x) \neq \delta(x)\). There is a complicated theory of when it is allowed but I don’t understand it yet, and in any case it clearly fails here for the reasons we’ve discussed.

Incidentally, the sometimes-valid formula \(f^{(n)}(0) = \int \frac{\delta(x)}{x^n/n!} f(x) \d x\) reminds me a lot of the Cauchy Integral formula for computing derivatives of analytic functions in \(\bb{C}\), which in its most suggestive form is written

\[f^{(n)}(0) = \frac{1}{2\pi i} \oint \frac{f(z)}{z^n/n!} \frac{dz}{z}\]

This works basically because \(f(z)/(z^n/n!)\) changes the \(f_n z^n/n!\) term in the Laurent series to be constant, and then \(2 \pi i \oint dz/z^m = 1_{m=1}\) for reasons (basically because it’s a discrete Fourier transform in disguise). But the Cauchy version successfully extracts all integer power terms and fails on the non-integers. The delta version is only able to extract the lowest-order power.

Incidentally, although the distribution \(x \delta(x)\) is formally regarded to equal zero as a distribution because \(\< x \delta(x), f(x) \> = \int x \delta(x) f(x) \d x = 0 f(0)\) for all test functions \(f(x)\), it seems reasonable to me that it’s not really zero in some larger sense. Instead it’s a thing which extracts the lowest-order pole in a Laurent series: \(\< x^2 \delta(x) , f_{-2} 1/x^2 \> = f_{-2}\), for instance. In this sense it also gestures at the way contour integrals behave, but with significantly less power.

It feels like these computations are glimpses of some larger machinery that lets delta functions extract coefficients of a function in general, but some of components of the machinery are just missing and so we’re seeing what’s left. But I’m not sure how to fix it.

Really it would be completed solved if only there was a way to “drop” infinite terms like \(\int \delta(x)/x^{m > 0} \d x\) in an integral. The Cauchy Principal Value can do it for odd integer powers, and the Hadamard Finite Part is only the finite part of functions with \(\sim 1/x^2\) dependence, and in general a term of \(1/x^{m}\) can be removed by modifying the integral to avoid the pole in a certain way… but those methods require knowing the actual order of \(m\) ahead of time; I don’t know / can’t think of anything that would just work on a generic function that you don’t know anything about.

All I can think of is that, if this was going to somehow work, you would have to be able to divide by zero and then just agree to drop any infinite term. Which I guess is somewhat similar to how you can sometimes extract the finite terms from divergent series, e.g. \(1 + 2 + 3 + \ldots = -\frac{1}{12} + O(\infty)\)? But I’m not really sure how you could justify doing it. Nevertheless I have a feeling that this is an important and good idea. A lot of things become possible if you are allowed to drop the infinities from integrals, such as treating \(\delta(x) x^n\) as a sort of Fourier-like transform on polynomial functions. But I’m not sure how to pull it off, so that will just have to wait.

3. \(\delta(g(x))\) becomes a sum over poles

I always end up looking this up too.

Since \(\delta(x)\) integrates \(f(x)\) to \(f(0)\) at every zero of the \(\delta\), we of course have, via \(u = g(x)\) substitution:

\[\begin{aligned} \int \delta(g(x)) f(g(x)) \, dg(x) &= \int \delta(u) f(u) \, du \\ &=f(u)_{u = 0} \\ &=f(g^{-1}(0)) \\ \int \delta(g(x)) f(g(x)) \| g'(x) \| dx &= f(g^{-1}(0)) \\ \delta(g(x)) &= \sum_{x_0 \in g^{-1}(0)} \frac{\delta(x - x_0)}{\| g'(x_0) \|} \\ \end{aligned}\]

For instance:

\[\delta(x^2 - a^2) = \frac{\delta(x - a)}{2\| a \|} + \frac{\delta(x + a)}{2\| a \|}\]

Somewhat harder to remember is the multivariable version:

\[\begin{aligned}\int f(g(\b{x})) \delta(g( \b{x})) \| \det \del g(\b{x}) \| d^n \b{x} &= \int_{g(\bb{R})} \delta(\b{u}) f(\b{u}) d \b{u} \\ \int f(\b{x}) \delta(g (\b{x})) d \b{x} &= \int_{\sigma = g^{-1}(0)} \frac{f(\b{x})}{\| \nabla g(\b{x}) \|} d\sigma(\b{x}) \\ \end{aligned}\]

Where the final integral is in some imaginary coordinates on the zeroes of \(g(\b{x})\).

In general there is a giant model of “delta functions for surface integrals” which I’ve never quite wrapped my head around, but intend to tackle in a later article. Basically there’s a sense in which every line and surface integral, etc, can be modeled as an appropriate delta function. Wikipedia doesn’t talk about it much. There’s a couple lines on the delta function page, but there’s quite a more for some reason on the page for Laplacian of the Indicator.

I’d also love to understand the version of this for vector- or tensor-valued functions as well. What goes in the denominator? Some kind of non-scalar object? Weird.

By the way, there is a cool trick which I found in a paper called Simplified Production of Dirac Delta Function Identities by Nicholas Wheeler¹ to derive \(\delta(ax) = \frac{\delta(x)}{\| a \|}\). We observe that \(\theta(ax) = \theta(x)\) if \(a > 0\) and \(\theta(ax) = 1 - \theta(x)\) if \(a < 0\). So we can compute \(\p_x \theta(ax)\) in two different ways:

\[\begin{aligned} \p_x \theta(ax) &= \p_x \theta(ax) \\ a \theta'(ax) &= \sgn(a) \theta'(x) \\ \delta(ax) &= \frac{1}{\| a \|} \delta(x) \end{aligned}\]

That paper also observes another property I hadn’t thought about, which is that

\[\delta'(ax) = \frac{1}{a \| a \|} \delta'(x)\]

Basically, the funny “absolute value” business only happens in the derivative of \(\theta(ax)\), not the rest of the chain. There are also ways of deriving the more general properties like the form of \(\delta(g(x))\) by starting from derivatives of \(\theta(g(x))\).

Also: later on I write another post about delta functions which thinks about this stuff in some different ways.

4. \(\delta\) is weird in other coordinate systems

I am often reminding myself how \(\delta\) acts in spherical coordinates.

It is useful to think about \(\delta\) as being defined like this:

\[\delta(x) = \frac{1_{x =0 }}{dx}\]

In the sense that it is designed to perfectly cancel out \(dx\) terms in integral. It’s \(0\) everywhere, except at the origin where it perfectly cancels the out \(dx\). Point is, \(\delta\) always transforms like the inverse of how \(dx\) transforms. If you write \(dg(x) = \| g'(x) \| dx\), then of course \(\delta\) transforms as

\[\delta(g(x)) = \frac{\delta(x - g^{-1}(0))}{\| g'(x) \|}\]

This, at least, makes it easy to figure out what happens in other coordinate systems.

By the way. The notation \(\delta^3(\b{x})\) customarily means that the function is separable into all the individual variables: \(\delta^3(\b{x}) = \delta(x) \delta(y) \delta(z)\). In other coordinate systems this doesn’t work: separating it requires introducing coefficients, as we’re about to see.

Here’s spherical coordinates:

\[\begin{aligned} \iiint_{\bb{R}^3} \delta^3(\b{x}) f(\b{x}) d^3 \b{x} &= f(x=0,y=0,z=0) \\ f(r=0, \theta=0, \phi=0) &= \int_0^{2 pi} \int_{-\pi}^\pi \int_0^\infty \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} f(r, \theta, \phi) r^2 \sin \theta \, dr \, d \theta \, d \phi \\ &= \int_{-\pi}^\pi \int_0^\infty \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} f(r, \theta, 0) ( 2 \pi r^2 \sin \theta) \, dr \, d \theta \\ &= \int_0^\infty \frac{\delta(r)}{4 \pi r^2 } f(r, 0, 0) ( 4 \pi r^2 ) \, dr \\ &= f(0,0,0) \end{aligned}\]

So:

\[\begin{aligned} \delta(x, y, z) &= \frac{\delta(r, \theta, \phi)}{r^2 \sin \theta} \\ &= \frac{\delta(r, \theta)}{2 \pi r^2 \sin \theta} \\ &= \frac{\delta(r)}{4 \pi r^2 } \end{aligned}\]

There is some trickiness to all this, though. Be careful: the \(r\) integral is from \((0, \infty)\) instead of the conventional \((-\infty, \infty)\). Sometimes identities that you’re used to working won’t work the same way if you are dealing with \(\delta(r)\) as a result. Also, it’s very unusual, but not impossible I suppose, to have functions that have a non-trivial \(\theta\) dependence even as \(r \ra 0\). I have no idea what that would be mean and I don’t know how to handle it with a delta function.

I’ve occasionally also seen it written in this weird way, where the \(\cos \theta\) factor causes the \(\sin \theta\) in the denominator to disappear.

\[\delta(x,y,z) = \frac{\delta(r, \cos \theta, \phi)}{r^2}\]

Here’s the polar / cylindrical coordinate version:

\[\delta^2(x, y) = \frac{\delta(r, \theta)}{r} = \frac{\delta(r)}{2 \pi r}\]

Evidently in \(\bb{R}^n\), the numerators are related to the surface areas of n-spheres.

5. \(\sgn\) and such

Integrals of \(\sgn\) (integration constants omitted):

\[\begin{aligned} \sgn(x) &= \sgn(x) x^0\\ \int \sgn(x) &= \sgn(x) x = \| x \| \\ \int \int \sgn(x) &= \sgn(x) \frac{x^2}{2} = \frac{x \| x \| }{2} \\ \p^{-n} \sgn(x) &= \sgn(x) \p^{-n} (1) \\ &= \sgn(x) \frac{x^n}{n!} \\ &= \frac{x^{n-1} \| x \|}{2} \\ \end{aligned}\]

And the other way:

\[\begin{aligned} \sgn(x) &= 2 \theta(x) -1 \\ &= 2 (\theta(x) - \frac{1}{2}) \\ \p \sgn(x) &= 2 \delta(x) \\ \p^2 \sgn(x) &= 2 \delta'(x) \\ \end{aligned}\]

6. Derivatives of \(\theta^k\)

The regular rules of calculus don’t really apply to distributions, but it’s hard to tell why and how sometimes.

The simplest examples is that you can’t do calculus on \(\theta\) in the normal way

\[\theta^k(x) = \theta(x)\]

Because

\[\begin{aligned} \p \theta^k(x) &= \p^\theta(x) \\ k \theta^{k-1} (x) \delta(x) &\? \delta(x) \\ \end{aligned}\]

The LHS is the wrong one. But actually it’s a little confusing. Sure, you can just say “multiplication of distributions isn’t allowed” or something, but that’s not very satisfying. What actually breaks?

The basic reason it fails is that \(\p f^k = k f^{k-1} f'\) only applies if \(f\) is approximately equal to its linear approximation:

\[f^k(x+dx) \approx (f + f' dx)^k \approx f^k + k f^{k-1} f' \d x + \ldots\]

Which \(\theta\) is not. Sure, you can write down a derivative-like thing, \(\theta'(x) = \delta(x)\), but you can’t use it in in a linear approximation:

\[\theta(x + dx) \stackrel{?}{\approx} \theta(x) + \delta(x) \d x\]

The problem is that you can’t evaluate this linear approximation and expect to get a meaningful answer, because \(\delta(x) \d x\) only gives you the \(\pm 1\) from crossing zero if you integrate over it. Which is to say, the \(+ \delta(x) \d x\) has to be interpreted not as multiplication but as integration:

\[\theta(x + dx) = \theta(x) + \int_x^{x + dx} \delta(y) \d y\]

For the general case of a function with a linear approximation, this happens to work out to be the same as the usual expression

\[\begin{aligned} f(x + dx) &= f(x) + \int_x^{x + dx} f'(y) \d y \\ &\approx f(x) + \int_x^{x + dx} \underbrace{f'(x)}_{\text{replace } y \text{ with } x} \d y\\ &= f(x) + f'(x) [(x + dx) - (x)] \\ &= f(x) + f'(x) \d x \\ \end{aligned}\]

But not for \(\theta\). Let’s write \(1(x, dx)\)for this integral:

\[1(x, dx) = \int_x^{x + dx} \delta(y) \d y = \begin{cases} 0 & x < 0, x + dx < 0 \\ +1 & x < 0, x + dx > 0 \\ -1 & x > 0, x + dx < 0 \\ 0 & x > 0, x + dx > 0 \end{cases}\]

Which we’ll then abbreviate as \(1_{dx}\), a “signed indicator function” for the interval \((x, x + dx)\). (And also let’s abbreviate \(\theta_x\) for \(\theta(x)\) from now on). Then this is true:

\[\theta(x + dx) = \theta(x) + 1_{dx}\]

The difference from a regular differentiable \(f(x + dx)\) is that you can’t drop all the later terms in the expansion:

\[\begin{aligned} [\theta_{x + dx}]^k &= [\theta_x + 1_{dx}]^k \\ &= \theta_x^k + k \theta_x^{k-1} 1_{dx} + {k \choose 2} \theta_x^{k-2} 1_{dx}^2 + \ldots k \theta_x 1_{dx} + 1_{dx}^k \\ \end{aligned}\]

Look what happens, for instance, for \(k=2\):

\[\begin{aligned} [\theta_x + 1_{dx}]^2 &= \theta^2_{x + dx} = \theta_x^2+ 2 \theta_x 1_{dx} + 1_{dx}^2 &\\ (0 + 0)^2 &= 0 = 0^2 + 2 (0) (0) + 0^2 & x < 0, x + dx < 0 \\ (0 + 1)^2 &= 1 = 0^2 + 2 (0) (1) + 1^2 & x < 0, x + dx > 0 \\ (1 - 1)^2 &= 0 = 1^2 + 2 (1) (-1) + (-1)^2 & x > 0, x + dx < 0 \\ (1 + 0)^2 &= 1 = 1^2 + 2 (1) (0) + (0)^2 1 & x > 0, x + dx > 0 \\ \end{aligned}\]

You can see how the \(k \theta^{k-1} \delta\) term does end contributing: the \(k\) doesn’t vanish; it does mean something. It’s just that either

the \(1_{dx}\) is zero, so it does nothing,
the \(\theta\) is zero, so all the terms except \(1_{dx}^k\) do nothing, or
both are nonzero, so it does add \(\pm k\), but the rest of the terms in the series serve to cancel that out and keep the magnitude at \(0\)

This cleared things up quite a bit for me. I vaguely knew that \(\p (\theta^k) = \p (\theta)\) didn’t work in the normal way, but I’m happier understanding exactly why: it is because \(\theta(x + dx) = \theta(x) + \delta(x) \d x\) is not valid, and when write it as \(\theta_x + 1(x, dx)\), the latter term is not proportional to \(\| dx \|\), and therefore the higher-order terms in the linear “approximation” to \(\theta_{x + dx}^k = (\theta_x + 1_{dx})^k\) are not of vanishing magnitude and therefore must be preserved in the series expansion.

Ah well. Just another example of how the best way to think about the derivative is that it’s really

\[f'(x, dx) = f'(x + dx) - f(x)\]

And only in some cases can this be conveniently approximated as \(f'(x) dx\), and only in those cases is \(\p (f^k) = k f' f^{k-1}\) valid.

7. \(\theta(0)\)

What ought to be the value of the step function \(\theta(x)\) at \(x=0\)?

Normally one chooses either \(\theta(0) = 0\), \(\theta(0) = \frac{1}{2}\), or \(\theta(0) = 1\).

I think \(1/2\) is a bad idea and is right out. Afaik it is mostly used by people doing signal processing; the same people who define \(\sgn(0)\) to be \(0\). I don’t care for it much. Really \(\theta(x)\) defines a surface, and \(\theta(0)\) decides whether to include the boundary of the surface as part of the surface or not. There’s not much room for “half a boundary”. (Well okay, I have heard of half-edges is discrete differential geometry, as well as in the Length of a Potato paper… but for the most part it’s not what you want.)

The other two are somewhat arbitrary, and ideally your theory doesn’t care about which one your use (or at least—everything is symmetric under interchanging the two). In particular if

\[1_{x > 0} = \theta(x) = \begin{cases} 0 & x \leq 0 \\ 1 & x > 0 \end{cases}\]

then you can still get the version that equals \(1\) with

\[1_{x \geq 0} = 1 - \theta(-x) = \begin{cases} 0 & x < 0 \\ 1 & x \geq 0 \end{cases}\]

Normally I prefer not to choose, and to try to avoid writing down expressions in which it matters. But this time I’d rather look more closely at it. So let’s define \(\theta(0) = 0\) for now and then proceed.

What I really want to look at is this funny guy, the “indicator for a point”, which can be written in terms of the step function indicators:

\[\begin{aligned} 1_{x = 0} &= 1_{x \geq 0} - 1_{x > 0} \\ &= (1 - \theta(-x)) - \theta(x) \\ &= 1 - \theta(x) - \theta(-x) \end{aligned}\]

Which I will abbreviate as \(1_0\). And I suppose also interesting is its complement, the “indicator for everything but a point”

\[1_{x \neq 0} = \theta(x) + \theta(-x)\]

Which I will abbreviate as \(1_{\neq 0}\).

It’s always fun to find an odd representation like these and then try to make the rules of algebra and calculus work out on it. At some level, they must work, so it usually leads you to natural extensions of the rules to objects they are not normally defined on (e.g. in the previous section).

The interesting thing about these is:

Their non-trivial behavior happens on a set of measure zero
But they more-or-less have an expression as a distribution, which means that their derivatives and Fourier transforms and such should also be definable as distributions?
But I’ve never found anyone talking about them which means we get to figure out how they work ourselves.

Or at least, it should be possible to talk about their derivatives and Fourier transforms in some sense. I guess. I dunno. It’s fun to try at least.

The derivative of \(1_0\) ought to be:

\[\begin{aligned} 1_0' &= - \theta'(x) -\theta'(-x) \\ &= - \delta(x) + \delta(x) \\ &\? 0 \end{aligned}\]

But that doesn’t seem quite right. The function does do something \(x=0\); it’s just a bit hard to describe. Shouldn’t it be something like—using the \(f'(x, dx)\) notion of the previous section—this?

\[\begin{aligned} 1_0'(x, dx) = \begin{cases} 1 & x \neq 0, x + dx = 0 \\ -1 & x = 0, x + dx \neq 0 \\ 0 & \text{otherwise} \end{cases} \end{aligned}\]

Where did we go wrong? Is \(\theta'(-x)\) actually \(\theta'(-x) = - \delta'(-x) = - \delta'(x)\)? Hm. Not quite.

The problem is that the distinction between these two objects \(1_{x > 0}\) and \(1_{x \geq 0}\) is only discernable if you distinguish left- and right- limits \(\ra 0\), which means we have to additionally distinguish their left and right derivatives at \(0\). The difference is that the two detect the delta function in slightly different places:

\[\begin{aligned} \p_+ \theta(x) &\approx \frac{\theta(x + dx) - \theta(x)}{dx} \\ &= \delta(x) \\ \p_- \theta(x) &\approx \frac{\theta(x) - \theta(x - dx)}{dx} \\ &= \delta(x - dx) \end{aligned}\]

Nevermind that the \(dx\) should be canceled out in a limit. The only way to do anything like this and keep your sanity (so, without nonstandard analysis) is to keep everything in a finitary form and then take the limit at the end (or don’t, if you’re like me and have gradually converted to finitism).

Similarly:

\[\begin{aligned} \p_+ \theta(-x) &= -\delta(x + dx) \\ \p_- \theta(-x) &= \delta(x) \end{aligned}\]

Therefore we sorta need both terms to make this work:

\[\begin{aligned} \p_+ 1_0 = \p_+[1 -\theta(x) - \theta(-x)] &= \delta(x + dx) - \delta(x) \\ \p_- 1_0 = \p_-[1 -\theta(x) - \theta(-x)] &= \delta(x) - \delta(x - dx) \\ \end{aligned}\]

The two are offset slightly, but (fortunately) say basically the same thing:

\[1_{x=0}'= \delta(x + dx) - \delta(x)\]

And actually—check this out—we may as well write the second one as \(\delta(-x)\).

\[1_{x = 0}' = \delta(-x) - \delta(x)\]

Normally \(\delta(x) = \delta(-x)\), but really that’s because we’re not distinguishing the left- and right- derivatives. If we are, \(\delta(x) = \theta'(x)\) is clearly infinite when “leaving” \(x=0\), while \(\delta(-x) = - \theta'(-x)\) is infinite when “entering” \(x=0\) (or vice-versa depending on how you define \(\theta(0)\), and also vice-versa if your \(dx\) is negative, since \(\delta(x) = \delta(x, dx) = d \theta(x, dx) = 1_{dx} / dx\)…).

I don’t know of any situation where this would matter, but I thought it was interesting.

As for the Fourier transform, I got overwhelmed and gave up.

8. Miscellaneous Breadcrumbs

Things I want to remember but don’t have much to say about:

There is a thing called a Wavefront Set that comes from the subfield of “microlocal analysis”. It allows characterizing singularities in a way that, for instance, would extract which dimensions a delta function product like \(\delta(x) \delta(y)\) is acting in.

Among other things, the Wavefront Set allows you to say when multiplying distributions together is well-behaved: as I understand it, they have to not have singularities “in the same directions”. \(\delta(x) \delta(x)\), for instance, is not allowed. (I bet \((x \delta)^2\) is, though.) Here’s a nice paper on the subject.

There are further generalizations of functions called hyperfunctions, which are instead defined in terms of “the difference of two holomorphic functions on a line” (which can be e.g. a pole at the origin). Gut reaction: relies on complex analysis, which sounds annoying.

A current is a differential-form distribution on a manifold. Some day I’m going to have to learn about those, but for now, nah, I’m good.

This paper also has something interesting, if not entirely comprehensible, things to say about the existence of forward- and backwards- time propagators in QFT wave equation solutions. ↩

« How To Invert Everything By Dividing By Zero

Divergences and Delta Functions »

▲ Home