Delta Functions via Inverse Differentials
Here’s a dubious idea I had while playing with using delta functions to perform surface integrals. Also includes a bunch of cool tricks with delta functions, plus some exterior algebra tricks that I’m about 70% sure about. Please do not expect anything approaching rigor.
But first, some notations.
Notations
Our equations will involve lots of distributions, particularly \(\delta\) (the Dirac delta function), \(\theta\) (the Heaviside step function), and \(I\) (the indicator function). These get quite verbose if you write them all out with their arguments like \(\delta(xa)\), so I will be using some shorthands to make things easier to read:
We will omit the arguments for functions and distributions when the meaning is clear from context: \(f\) will be written instead of \(f(x)\) when there it is obvious that the argument is in the variable \(x\).
\[f(x) \equiv f\](I will be using \(\equiv\) to mean that something is defined as being equal to something else, as opposed to \(=\) which means they are algebraically equal.)
We’d often like to also be able to omit an argument when it’s of the form \((xa)\) as well, since this case is very common for distributions. We do this by moving the \(a\) into a subscript, writing \(\delta_a\) for \(\delta(xa)\). In practice we’ll sometimes write this as \(\delta_{(a)}\) instead, to make it clear that \(a\) refers to a point.
\[\begin{aligned} \delta(xa) &\equiv \delta_a \equiv \delta_{(a)} \\ \theta(xa) &\equiv \theta_a \equiv \theta_{(a)} \\ \end{aligned}\]An indicator function \(1_{P}(x)\) is equal to \(1\) anywhere that the predicate \(P(x)\) is true, even if we omit the \((x)\). For instance \(1_{x = a}\) is \(1\) if \(x=a\) and \(0\) otherwise. We’ll also omit the \(x\) in the predicate and just write this as \(1_a\) to mean the indicator for the point \(x=a\). We also generalize this and allow the subscript to be other types of surfaces: an interval \((a,b)\), or a generic surface \(A\).
\[\begin{aligned} 1_{(a,b)} &\equiv 1_{x \in (a,b)} \\ 1_a &\equiv 1_{(a)} \equiv 1_{x = a} \\ 1_A &\equiv 1_{x \in A} \end{aligned}\]The generic surface is written with a capital letter to distinguish it from a point. We’ll also write integrals over an interval or a surface with capital letter when the details of the surface don’t matter, like \(\int_I \d f\). Then the result of the integral is to evaluate \(f\) on the boundary of \(I\), which is written \(\p I\). If \(I\) is the interval \((a,b)\) then \(\p I\) is the pair of points \((b)  (a)\).
\[\int_I \d f = f \_{\p I}\]When we use the interval notation \(1_{(a,b)} \equiv 1_{x \in (a,b)}\), it is convenient to define \(1_{(b,a)} = 1_{(a,b)}\). Therefore we would really like a sort of “oriented indicator function” instead which can take the value \(\pm 1\) depending on whether \(a<b\) or \(a > b\). Bolding this becuase it’s important: in this article an indicator function over a range is understood to be an oriented indicator function, rather than the usual indicator of mathematics which always has the value \(1\).
\[1_{(a,b)} \equiv \begin{cases} +1 & a < b \\ 1 & a > b \end{cases}\]We can take the subscript notation further by also allowing a subscript to contain a linear combination of surfaces, like \(1_{A + B} = 1_A + 1_B\). In every case the linear combination simply distributes over the function, so \(\delta_{A+BC} = \delta_A + \delta_B  \delta_C\), etc. When we write linear combinations of points, we write the points with parentheses, as \((a) + (b)\), so that it doesn’t look like we’re adding the points as vectors \(a + b\).
\[\begin{aligned} 1_{(a) + (b)} &\equiv 1_a + 1_b \\ 1_{(b)  (a)} &\equiv 1_b  1_a \\ \delta_{(b)  (a)} &\equiv \delta_b  \delta_a \\ \theta_{(b)  (a)} &\equiv \theta_b  \theta_a \\ \end{aligned}\]It’s occasionally nice to also permit this notation for integrals. If \(A\) and \(B\) are two surfaces, then
\[\int_{A  B} df \equiv \int_A \d f  \int_B \d f\]Finally, know that I am going to be using the words “function” and “distribution” interchangeably without concern for their usual technical differences.
1. Integrating with Distributions
Presumably you are aware that a delta function can be “used” to evaluate a function at a point:
\[\int_{\bb{R}} \delta_a f \d x \equiv \int_{\bb{R}} \delta(xa) f(x) \d x = f(a)\]It turns out that a lot of other operations can be converted into integrals against distributions. For instance, an integral over a finite range \((a,b)\) can be written as integration against the indicator function for that range:
\[\int_a^b f' \d x = \int_{\bb{R}} 1_{(a,b)} f' \d x = f(b)  f(a)\]And we can write an (oriented) indicator function over a range as the sum of two step functions, \(1_{(a,b)} = \theta_a  \theta_b\). Integrating against either form gives the same result:
\[\begin{aligned} \int_{\bb{R}} 1_{(a,b)} f' \d x &= \int (\theta_a  \theta_b) f' \d x \\ &= \int_a^\infty f' \d x  \int_b^\infty f' \d x \\ &= [\int_a^b f' \d x + \cancel{\int_b^\infty f' \d x}]  \cancel{\int_b^\infty f' \d x}\\ &= \int_a^b f' \d x \\ &= f(b)  f(a) \end{aligned}\](This was the reason for using oriented indicator functions: defining \(1_{(a,b)} = 1_{(b,a)}\) makes it consistent with \(\theta_a  \theta_b = (\theta_b  \theta_a)\).)
Another way of getting the result is by moving the derivative over onto the indicator function with integrationbyparts. Recall that \(\int_I u v' \d x = (u v) \_{\p I}  \int_I u' v \d x\). Here \(I = \bb{R}\), so \(\p I = (\infty, + \infty)\), and the boundary terms vanish because \(1_{(a,b)}\) is \(0\) at \(\pm \infty\).
\[\begin{aligned} \int_{\bb{R}} 1_{(a,b)} [\p_x f] \d x &= \cancel{1_{(a,b)} f \_a^b}  \int_{\bb{R}} [\p_x 1_{(a,b)}] f \d x \\ &=\int_{\bb{R}} [(\p_x) (\theta_a  \theta_b)] f \d x \\ &= \int_{\bb{R}} (\delta_b  \delta_a) f \d x \\ &= f(b)  f(a) \\ \end{aligned}\]The \(f' = \p_x f\) passes its derivative over to \((\p_x) 1_{(a,b)}\), and the boundary terms vanish because \(1_{(a,b)}\) is \(0\) at \(\pm \infty\). The result is a pair of delta functions defined on the boundary of the underlying oriented surface \(\p(a,b) = (b)  (a)\). Note that the order of \(b\) and \(a\) switch because of the negative sign from the integrationbyparts: \((\p)(\theta_b  \theta_a) = \delta_a  \delta_b\).^{1}
This is all pretty neat to me. It’s intuitive if you think about it, but I didn’t notice it for a long time after I learned about \(\delta\). It turns out that a lot of properties of integration can be moved into the integrand itself by writing them as distributions. We’ll be doing a lot more of that in a minute.
First, though, consider how the resulting object \(\delta_{(b)  (a)}\) looks like a description of the boundary \((b)  (a)\), which makes \((\p_x )\) look like an expression of the boundary operator, implemented on the distribution representation of \((a,b)\) that is given by \(1_{(a,b)}\). But it’s not quite right. The appropriate description of the boundary \(\p(a,b) = (b)  (a)\) should probably also be an indicator like \(1_{(b)  (a)}\), which is not the same thing as \(\delta_{(b)  (a)}\). Indicators have value \(1\) at their nonzero points, while \(\delta_{(b)  (a)}\) has value \(\infty\)ish (delta functions do not really “have” a nonzero value but you know what I mean). Meanwhile, \(\delta_{(b)  (a)}\) is something you can actually integrate against, while \(1_{(b)  (a)}\) has measure zero, so integrating against it would give zero:
\[\int 1_{(b)  (a)} f \d x \stackrel{!}{=} 0\]So what’s going on? What is the relationship between these two objects?
\[\begin{aligned} 1_{\p(a,b)} &= 1_{(b)  (a)} \\ &\stackrel{?}{\equiv} \\ (\p) 1_{(a,b)} &= \delta_b  \delta_a \end{aligned}\]Or even these two?
\[1_{(a)} \stackrel{?}{\equiv} \delta_a\]What is the difference between an indicator for a point and a delta function for a point?
2. Integration with Inverse Differentials
I think the most intuitive answer is that the delta function may be regarded as an indicator function divided by the absolute value of a differential:
\[\boxed{\delta_a \? \frac{1_a}{\ d x \}}\]The \(1_{(a)}\) is a normal indicator function. The object \(\ dx \\) is basically “the magnitude of \(dx\)”. Unlike \(dx\), it is always positive when evaluated on a tangent vector. And for some strange reason it is in a denominator, which we will have to get used to.
(I first heard this idea in a book called “Burn Math Class” by Jason Wilkes, although his version omits the absolute value which I am pretty sure has to be there. Then I forgot about it for a while, before reinventing it and then thinking I had come up with it myself. Oops. Anyway, maybe it’s a bad idea for a reason I haven’t thought of? Or maybe it is already a thing somewhere and I just haven’t come across it? Dunno. But the reason I’ve written an article about it is that, the more I play with it, the more sense it keeps making to me.)
The basic idea is that splitting up a delta function into two pieces like this allows those pieces to be used in algebra with some very naturallooking rules. But it takes some getting used to. This is how it works in an integral:
\[\begin{aligned} \int_I \delta_a f(x) \d x &= \int_I \frac{1_a}{\ d x \} f(x) \d x \\ &= \int_I 1_a f(x) \frac{d x}{\ d x \} \\ &= f(a) \sgn(I) \\ \end{aligned}\]It is convenient to define
\[\widehat{dx} \equiv \frac{dx}{\ dx \}\]Which is meant to be the “unit vector” version of \(dx\), akin to \(\hat{x} = \frac{\vec{x}}{\ \vec{x} \}\) for regular vectors. If you are used to thinking of differential forms like \(\d x\) as functions from vectors to \(\bb{R}\), then its behavior is \(\widehat{dx}(\b{v}) = \frac{dx(\b{v})}{\ dx(\b{v}) \}\). If you think of \(\d x\) in the Riemannintegral sense, as an infinitesimal interval \(x_{i+1}  x_i\), then \(\widehat{dx}\) is the sign of that interval \(\sgn(x_{i+1}  x_i)\) without its magnitude.
So we have
\[\begin{aligned} \int_I \delta_a f(x) \d x &= \int_I 1_a f(x) \widehat{dx} \\ &= f(a) \sgn(I) \\ \end{aligned}\]The idea is that \(\frac{dx}{\ dx \}\) cancels out the magnitude of \(dx\), leaving only a “unit differential” \(\widehat{dx}\). We claim, because it seems to make sense, that the integral of a unit differential is trivial: it is simply \(\pm 1\) depending on the orientation of the range of integration. The resulting \(\sgn(I)\) is determined by whether the range of integration \(I\) was over a positivelyoriented range, such as \((\infty, \infty)\), versus a negatively oriented range like \((\infty, \infty)\) (we assume that \(a \in I\) though).^{2} Typically we just assume that all 1d integrals are over positivelyoriented ranges unless otherwise specified, in which case we could simply omit the sign and write \(f(a)\), but I’m trying to be careful now because it will matter more in higher dimensions.
The unit differential is necessary for this object to act like a delta function. Because an integral against a delta function gives \(\int_I \delta(x) \d x = \pm 1\) depending on the orientation of \(I\), we cannot fully cancel out the value of \(dx\); we have to keep its sign. So we needed to invent something which cancels out its magnitude but leaves the direction.
The actual integration step is supposed to be easy once the integrand is proportional to \(1_a \widehat{dx}\). The \(1_a\) reduces the integral to a single point, while the \(\widehat{dx}\) integrates out to give the sign of the integration range at that point. I guess we just trust that that is how it works:
\[\int_{\bb{R}} 1_a f(x) \widehat{dx} = f(a)\]…but here’s some pseudotheoretical justification anyway.
Often we implement integration as the limit of a Riemann sum, which decomposes the integration range into a bunch of oriented cells, each of which is described by a tangent vector \(\b{v}_i\) (which in 1d is often simplified to \(x_{i+1}  x_i\)). Then we evaluate \(f \d x\) on each of those tangent vectors and add up the result. In the limit this converges (for some wellbehaved class of functions) to the definite value for the integral. We write this as \(\int_I f \d x = \lim \sum_{i \in I} f(x) \d x (\b{v}_i)\), where the limit takes the number of partitions to infinity.
In our scheme \(\ dx \\) is an object that has \(\ d x \ (\b{v}) = \ d x(\b{v}) \\) (similar to the integration measure in an arclength integral), and \(\widehat{dx}\) is the object that has \(\widehat{dx} (\b{v}) =\frac{d \b{x} (\b{v})}{\ d \b{x} (\b{v}) \}\), which in \(\bb{R}^1\) is simply \(\sgn (dx(\b{v}))\). In higher dimensions it will include a direction, but in \(\bb{R}^1\) there are only two possible directions, corresponding to \(\pm 1\).
Normally what allows the summation’s limit to converge to the integral value is that \(dx(\b{v}_i) \propto \ \b{v}_i \\), so as the integration partitions’ size goes to zero with their total magnitude bounded by the length of the range, the sum of \(dx(\b{v}_i)\) is held proportional to that length. When using \(\widehat{dx}\) the value is \(\pm 1\), so obviously we can’t add up a bunch of these. Instead the only reason the integral “converges” is that the indicator \(1_a\) has limited the range to a single point, or a sum of a finite number of points, instead.
…probably. If I haven’t missed anything But I find it intuitive: each point in the indicator \(1_{(a)}\) selects a point at which the integrand is evaluated, and then at that point the resulting contribution to the integral is \(\widehat{dx}\) times the orientation of the range at that point, giving \(f(a)\).
This construction is nice because it makes some of the common disclaimers that normally have to be made about \(\delta(x)\) really trivial:
 You can’t evaluate \(\delta(x)\) outside of an integral for the exact same reason that you can’t evaluate \(f(x) \d x\) outside of an integral: because it uses the symbol \(d x\) whose value comes from the integral. Yet you can do algebra with it, as long as you keep track of the \(d x\)s and $$ dx $s appropriately.
 \(\delta(x)\) doesn’t have a value at \(x=0\) because it depends on an invisible variable, \(1/\ dx \\). The value is not exactly infinite: it’s “whatever is required to cancel out a \(dx\) and leave only its sign”.
 You can’t multiply two delta functions in the same variable by each other, like \(\delta(x) \delta(x) = \frac{1_{x=0} 1_{x=0}}{\ dx \^2 }\), because the two copies of \(\ dx \\) aren’t going to cancel out a single \(dx\) in the numerator and will leave an overall factor of \(1/\ dx \\) that you have no way to integrate.
Also, compare this construction to a typical “nascent delta function” construction. Delta functions are often defined as the limit of a series of smooth functions whose properties integrals go, in the limit, to the behavior of a delta function. Usually the smooth functions are a Gaussian, square cutoffs, or some other \(\e \eta(x/\e)\) for an integrable \(\eta\) that has \(\int \eta \d x = 1\). But these, I think, are trying to express exactly the idea of \(\frac{1_a}{\ dx \}\). They want to make something whose (1) integral, in the limit, converges to being nonzero at exactly a single point, and which (2) perfectly cancels out the value of \(dx\) at that point, except for its sign, integrating to \(\pm 1\). Well why not just write that directly? (Well, it does not solve for the main reason you might be using nascent delta constructions, which is that you are demanding things be rigorously constructed out of classical functions for some reason. But I’m not concerned about that.)
Also, it makes \(\delta\)’s changeofvariable rules obvious. For instance \(\delta((xa)) = \delta(xa)\) is given by
\[\delta((xa)) = \frac{1_{x=a}}{\{dx} \} = \frac{1_{x=a}}{\ dx \} = \frac{1_{x=a}}{\ d(xa) \} = \delta(xa)\]And \(\delta(ax) = \delta(x)/\a \\) is given by
\[\delta(ax) = \frac{1_{ax = 0}}{\ a \d x \} = \frac{1_{x=0}}{\a \ \ d x \} = \frac{\delta(x)}{\ a \}\]And in general:
\[\begin{aligned} \delta(g(x)) &= \frac{1_{g(x) = 0}}{\d g(x) \} \\ &= \sum_{x_0 \in g^{1}(0)} \frac{1_{x_0}}{\ g'(x_0) \d x \ } \\ &= \sum_{x_0 \in g^{1}(0)} \frac{1_{x_0}}{\ g'(x_0) \ \, \\d x \} \\ &= \sum_{x_0 \in g^{1}(0)} \frac{\delta(xx_0)}{\ g'(x_0) \ } \end{aligned}\]So that’s neat.
Anyway, I don’t find the use of an extra \(dx\) in an integrand that strange. Here’s why:
We are very used to integrating integrands of the form \(dF = f(x) \d x\). But in full philosophical generality, an integrand could be written as \(dF = f(x, dx) = F(x + dx)  F(x)\). That’s an object that perfectly expresses the derivative of \(F\), rather than approximates it. It just so happens that in most cases we care about this can be written as a linear function in \(dx\), \(F(x + dx)  F(x) = f(x) \d x\), and then we can do calculus the normal way. But in some cases, such as when dealing with the derivative of a step function \(\theta(x)\), the value of \(F(x + dx)  F(x)\) depends not linearly on \(dx\), but on some other condition, such as whether \(0 \in (x, x + dx)\). In that case you end up with an integrand that is not proportional to \(dx\) but depends on it in some other way, which is how you get identites like \(\theta' = \delta\).
Well, extending that argument: for the general case of \(dF = f(x,dx) = F(x + dx)  F(x)\), there is nothing preventing it from having any kind of weird functional dependence on \(dx\). So why not \(\frac{1}{\ dx \}\) or something else? Sure, it might be hard to figure out how to integrate something like \(dF = a \d x ^2 + b \d x + c\)… but it is still a reasonable object to think about. And in this case, we do have a way of integrating it; just, it’s an unfamiliar way. Fine with me!
3. The Multivariable Case
In the more dimensions this notation gives a lot of results for free, but there is a very important and weird caveat.
At first it seems like a product of two delta functions, which are each an inverse differential, should be turn into a product of two inverse differentials:
\[\delta(x) \delta(y) \? \frac{1_{x=0}}{\ dx \} \frac{1_{y = 0}}{\ dy \}\]But this doesn’t work! The problem is, what if we have a product of two delta functions that overlap in direction, like this?
\[\int \delta(x) \delta(x+y) f(x) \d x \d y\]In an integrand this should evaluate at the point that satisfies \(x=0\) and \(x+y=0\), meaning that \(x=y=0\) and the result is \(f(0, 0)\). But because \(\ d(x+y) \ = \sqrt{2}\), in the indicator notation we would get \(f(0, 0)/\sqrt(2)\) if we naively divide through by \(\ dx \ \ dx + dy \\). That doesn’t work. The problem is that the denominator of \(\delta(x) \delta(x + y)\) should cancel out the magnitude of a \(dx \^ d(x+y) = dx \^ dy\) in the numerator. So it is very important that the denominator is this new notation become a wedge product of all the terms in the delta functions:
\[\delta(x) \delta(x + y) \stackrel{!}{=} \frac{1_{x=0} 1_{y=0}}{\ dx \^ dy \}\]Which means that its behavior in an integral is this:
\[\begin{aligned} \int_{\bb{R}^3} \delta(x) \delta(x + y) f(x,y) dx \^ dy &= \int_{\bb{R}^3} 1_{x=x+y=0} f(x,y) \, \frac{dx \^ dy}{\ dx \^ dy \} \\ &=\int_{\bb{R}^3} 1_{x=y=0} f(x,y) \widehat{dx \^ dy} \\ &= f(0, 0) \\ &\neq \int_{\bb{R}^3} 1_{x=y=0} f(x,y) \, \widehat{dx} \^ \widehat{dy} \; \; \text{ (wrong!)} \end{aligned}\]Weird, but as far as I can tell necessary? Basically, \(\delta(x) \delta(x+y)\) needs to cancel out the magnitudes of \(dx \^ d(x+y) = dx \^ dy\). Since the numerator combines with a wedge product, the denominator has to also. In general, since \(\int \delta(f) \delta(g) \d f \^ d g\) ought to equal \(\pm 1\), the delta functions need to be proportional to \(\frac{1}{df \^ dg}\), even if \(df\) and \(dg\) are not orthogonal (although they cannot be parallel or we’d end up dividing by zero).
This will take some getting used to. Evidently the denominators are not just scalars: they are actually something like “differential forms” as well. Maybe they are “negativegrade absolute differential forms”? Or maybe the object \(\delta(x) \delta(y)\) should be regarded as \(\delta^2(x,y)\) and therefore its denominator is a compound object \(\ d^2(x,y) \\) from the start, and factoring it into \(\delta(x) \delta(y)\) only “works” when those terms are orthogonal directions? Or maybe delta functions really act like measures and it’s even more not okay to regard them as functions? Not sure. I really don’t know the best way to explain it.
In case you need more convincing, note that it is wellknown (although somewhat hard to find) that the changeofvariables formula for a multivariable delta function with argument \(\b{u}(\b{x}): \bb{R}^n \ra \bb{R}^n\) is
\[\delta(\b{u}(\b{x})) = \frac{\delta(\b{x}  \b{u}^{1}(0))}{\ \det (\p\b{u} / \p\b{x}) \}\]That is, the denominator is the determinant of the Jacobian (hate that name) of \(\b{u}\), \(\p\b{u} / \p\b{x}\), and a determinant is not the product of all the individual magnitudes. That is basically what we’re dealing with here as well, only we’ve factored \(\delta(x, x+y)\) as \(\delta(x) \delta(x+y)\), which makes this combiningwith\(\^\) behavior look more strange.
Anyway, we will have to live with this.
(Hopefully it goes without saying that I’m rather unsure of all this. But whatever, let’s see what happens.)
Here’s what happens in an integral:
\[\begin{aligned} \int_V \delta^3(\b{x}  \b{a}) f(\b{x}) \d^3 \b{x} &= \int_V 1_{\b{a}} f(\b{x}) \frac{d^3 \b{x}}{\ d^3 \b{x} \} \\ &= \int_V 1_{\b{a}} f(\b{x}) \, \widehat{d^3 \b{x}} \\ &= \sgn(V) f(\b{a}) \end{aligned}\]The \(\sgn(V)\) comes from whether the integration is performed over a positively or negativelyoriented volume. (Note that \(d^3 \b{x}\) is just a shorthand for \(dx \^ dy \^ dz\). I prefer to not write this as \(dV\) because it can be useful to reserve \(V\) as the label of a specific volume, like we’ve done here, rather than all of space, since \(V\) may in general be oriented differently than \(d^3 \b{x}\) is.)
We can also integrate a 2d delta function in \(\bb{R}^3\). These turn some, but not all, of the terms in the differential into a unit differential.
\[\begin{aligned} \int_V \delta(x) \delta(y) f(\b{x}) \d^3 \b{x} &= \int_V 1_{x=y=0} f(\b{x}) \, \frac{dx \^ dy \^ dz}{\ dx \^ dy \} \\ &= \int_V 1_{x=y=0} f(\b{x}) \, \widehat{dx \^ dy} \^ dz \\ &= \sgn(V_{xy}) \int_{V_z} f(0, 0, z) \, dz\\ \end{aligned}\]The sign is strange. There’s not really a canonical way to choose it. We need the overall integral when the \(z\) coordinate is completed to have the right sign, but really we could either take out a factor of \(\sgn(V)\) or change the orientation of the \(z\) integral. Consider the simplest case, where \(V\) is the product of three ranges, like \(V = [\infty, \infty]^{3}\). Then we imagine “factoring” it into two parts, as \(V = V_{xy} \times V_z\), and we imagine that this factorization preserves its orientation. Then it is clear that we can either extract the overall sign of \(V\) in the first integral, or we can extract whatever sign we want for the \(V_{xy}\) integral so long as \(\sgn(V_{xy}) \times \sgn(V_z) = \sgn(V)\). Above I’ve allowed myself to assume that \(V_z\) is positively oriented afterwards, so all of the sign of \(V\) is captured in the \(V_{xy}\), but I admit that this is all pretty sketchy. And of course this will be weird when \(V\) is not a cuboid (that is, a rectangular prism). But it’s a decent mental model anyway.
And here’s a single delta:
\[\begin{aligned} \int_V \delta(x) f(\b{x}) \d^3 \b{x} &= \int_V 1_{x=y=0} f(\b{x}) \, \frac{dx \^ dy \^ dz}{\ dx \} \\ &= \int_V 1_{x=0} f(\b{x}) \, \widehat{dx} \^ dy \^ dz \\ &= \sgn(V_x) \int_{ V_{yz} } f(0, y, z) \d y \^ dz\\ \end{aligned}\]Same deal with the signs again. There’s not a canonical way to do it; we have to pick the integration bounds of the result such that the overall orientation of \(V_x \times V_{yz}\) matches \(V\).
4. Implicit Surfaces
This gets more interesting when we deal with delta functions of generic surfaces.
A single delta composed with a function \(\delta(g(\b{x}))\) becomes an integral over a 2d implicit surface, the level set \(g(\b{x}) = 0\). We assume that \(g\) defines a regular surface, so \(\ dg \ \neq 0\) anywhere.
\[\begin{aligned} \int_V \delta(g(\b{x})) f(\b{x}) \d^3 \b{x} &= \int_V \frac{1_{g(\b{x}) = 0}}{\ d g(\b{x}) \} f(\b{x}) \d^3 \b{x} \\ \end{aligned}\]The easiest way to solve this is going to be if we can write the numerator as \(d^3 \b{x} = dg \^ d^2 \b{w}\), where \(\b{w} = (w_1, w_2)\) becomes a pair of coordinates on the level set of \(g^{1}(0)\). But in general we don’t have these coordinates. What can we do?
Well, we can cheat a bit. We know from exterior algebra that
\[\star dg = dg \cdot d^3 \b{x}\]And, defining \(\Vert dg \Vert = \ \del g \\) as the actual magnitude of a differential (that is, the scalar value, not a weird type of differential form):
\[dg \^ \star dg = \Vert dg \Vert^2 d^3 \b{x} = \ \del g \^2 d^3 \b{x}\]Example of these: \(\star dx = dy \^ dz\), so \(dx \^ \star dx = d^3 \b{x}\) and \((a \d x) \^ \star (a \d x) = a^2 d^3 \b{x}\).
So we can write
\[\begin{aligned} \int_V \delta(g(\b{x})) f(\b{x}) \d^3 \b{x} &= \int_V 1_{g = 0} f \, \frac{dg \^ \star dg}{\ \del g \^2 \ dg \} \\ &= \int_V 1_{g = 0} f \, \frac{\widehat{dg} \^ \star \widehat{dg}}{\ \del g \} \\ &= \sgn(V_g) \int_{g^{1}(0)} f \frac{\star \widehat{dg}}{\ \del g\} \end{aligned}\]Where \(\star \widehat{dg}\) is the twoform which is the Hodge dual of \(\widehat{dg}\).
I have no idea how to do that integral in general, but we can try it out on an easy surface that we know the parameterization for. \(\delta(rR)\) describes the surface of a sphere in \(\bb{R}^3\). Then \(dr\) is the differential for that surface, and \(\star dr = d \Omega = r^2 \sin \theta \d\theta \^ d \phi\), because \(dr \^ d \Omega = d^3 \b{x}\). Helpfully, \(\Vert dr \Vert = \ \del r \ = 1\) (I had to doublecheck). Therefore:
\[\begin{aligned} \int_V \delta(rR) f \d^3 \b{x} &= \int_V \frac{1_{r=R}}{\ dr \} f \d r \^ d\Omega \\ &= \int_V 1_{r=R} f \, \widehat{dr} \^ d\Omega \\ &= \sgn(V) \int_{r=R} f(0, \theta, \phi) \d\Omega \end{aligned}\]Since the \(\Omega\) coordinates are always oriented in a standard way, I’ve let the overall sign of \(V\) get handled by this one integral. This calculation also works out if we use a different implicit function for the sphere, e.g. \(\delta(r^2  R^2)\) or \(\delta(\sqrt{r^2  R^2})\), although keep in mind that \(\delta(r^2  R^2) = \delta(r  \pm R)/(2 R)\) if you work it out.
We could also have written \(\delta(rR)\) out in rectilinear coordinates, \(\delta(\sqrt{x^2 + r^2 + z^2}  R)\), with \(dr = (x \d x + y \d y + z \d z)/r\). Then we get the same answer, after a tedious but perhaps useful calculation:
\[\begin{aligned} \iiint_V \delta(rR) f \d^3 \b{x} &= \iiint_V 1_{r=R} f \frac{d^3 \b{x}}{\ x \d x + y \d y + z \d z \/r} \\ &= \iiint_V 1_{r=R} f \frac{dx \^ dy \^ dz}{\ x \d x + y \d y + z \d z \/r} \\ &= \iiint_V 1_{r=R} f \frac{[x \d x + y \d y + z \d z] \^ [x \d y \^ dz + y \d z \^ dx + z \d x \^ dy]}{r \ x \d x + y \d y + z \d z \} \\ &= \iiint_V 1_{r=R} f [\widehat{x \d x + y \d y + z \d z}] \^ \frac{x \d y \^ dz + y \d z \^ dx + z \d x \^ dy}{r} \\ &= \sgn(V) \oiint_{r=R} f \; \frac{x \d y \^ dz + y \d z \^ dx + z \d x \^ dy}{R} \\ &= \sgn(V) \oiint_{r=R} f \d \Omega \end{aligned}\](It turns out that \((x \d y \^ dz + y \d z \^ dx + z \d x \^ dy) / R\) does equal \(d \Omega\). I had no idea.)
There’s a simple objection to all this, which is: why bother? All of this works without any special formulas for delta functions. When you have an integral \(\int \delta(g(\b{x})) f \d^3 \b{x}\), it was always possible to factor it as \(\int \delta(g(\b{x})) f \frac{dg \^ \star \d g}{\ dg \^2} = \int_{g =0} f [\star dg]/\\del g \^2\), or to apply a delta identity to \(\delta(g(\b{x}))\) to factor it first.
And, yeah, I suppose that works. I guess I prefer the new version because it boils the somewhat adhoc calculus of delta functions down into simpler objects, which better capture “what’s really going on”. But eh, if you don’t like it, that’s fine too. I am just enjoying seeing how it works (although I would be concerned if it led to any false conclusions—but I haven’t found any yet).
Okay, what about products of more than one implicit function?
\[\begin{aligned} \int_V \delta(f(\b{x})) \delta(g(\b{x})) f \d^3 \b{x} &= \int \frac{1_{f = g = 0}}{\ df \^ dg \} f \d^3 \b{x} \\ &= \int 1_{f=g=0} \, f \, \frac{\widehat{df \^ dg} \^ \star(df \^ dg)}{\ \del f \^ \del g \^2} \\ &= \sgn(V) \int_{f=g=0} f \frac{\star(df \^ dg)}{\ \del f \^ \del g \^2} \end{aligned}\]The result is over the intersection of the zero level sets \(f\) and \(g\), assuming that \(df \^ dg \neq 0\) everywhere. (Once again I have to used the fact that \((df \^ dg) \^ \star(df \^ dg) = \Vert df \^ dg \Vert^2 d^3 \b{x} = \ \del f \^ \del g \^2 \d^3 \b{x}\).) The sign term \(\sgn(V)\) assumes that the resulting 1integral is chosen to be over a positively oriented range.
Well, it is easy enough to produce a differential for the surface (via \(\star (df \^ dg)\) times a normalization factor). But as usual I have no idea how you would actually use it, because in general you will not have any sort of coordinates available for the surface.
The one case where it is easy(ish) to use is when you have enough implicit equations that their intersection is a \(0\)surface, i.e. a pointset.^{3} In that case you can find the \(0\)set of the functions \(\{f(x), g(x)\, \ldots \}\) by whatever algebraic method you like, and then compute the integral that way. Here is an example problem (albeit in 2d) that I found on StackExchange:
\[\begin{aligned} & \int_{\bb{R}^2} \delta(x^2 + y^2  4) \delta((x1)^2 + y^2  4) f(x,y) \d x \d y \\ &=\int \frac{1_{x^2 + y^2  4 = 0} 1_{(x1)^2 + y^2  4}}{\ 2 x \d x + 2 y \d y\ \^ \ 2 x \d x  2 \d x + 2 y \d y \} f(x,y) \d x \^ d y \\ &= \int \frac{1_{(x,y) = (\frac{1}{2}, \pm \frac{\sqrt{15}}{2})}}{\ 4 y \d x \^ \d y \} f(x,y) \d x \^ d y \\ &= \int \frac{1_{(x,y) = (\frac{1}{2}, \pm \frac{\sqrt{15}}{2})}}{\ 4 y\} f(x,y) \widehat{\d x \^ d y} \\ &= \frac{1}{2\sqrt{15}} \big( f(\frac{1}{2},\frac{\sqrt{15}}{2}) + f(\frac{1}{2}, \frac{\sqrt{15}}{2}) \big) \end{aligned}\]Which is the right answer. Of course this is not much different from using the wellknown delta function identity \(\delta(g(\b{x})) = \sum_{x_0 \in g^{1}(0)} \frac{\delta(xx_0)}{\ \del g(x_0) \}\). But IMO it is at least easier to think about?
I suppose that the general problem of “finding the solution to systems of arbitrary equations” is a prerequisite to parameterizing them and integrating over them, and that is basically the field of algebraic geometry. So I’ll have to stop there and stick with just wondering about it for now.
5. Stokes’ Theorem
We can also do Stokes’ Theorem. We’ll do the Divergence Theorem version of Stokes first because it is easiest to think about.
Suppose \(g(\b{x})\) is a wellbehaved implicit function which is positive on the interior of a closed region \(V\). Write \(\b{n} =  \frac{\del g}{\ \del g \}\) for the outwardpointing normal vector of \(V\). We can describe the \(3\)surface \(V\) by a step function \(\theta(g(\b{x}))\)
\[\theta(g(\b{x})) = \begin{cases} 1 & \b{x} \in V \\ 0 & \text{ otherwise}\end{cases}\]And we can describe the \(2\)surface \(\p V\) by its negative derivative
\[(\del) \theta(g(\b{x})) =  (\del g) \delta(g(\b{x})) = \ \del g \ \b{n} \, \delta(g(\b{x}))\]The divergence theorem says
\[\int_{V} \del \cdot \b{F} \d V = \oint_{\p V} (\b{F} \cdot \b{n}) \d A\]Where \(\b{F}\) here is a vector field. Its divergence is \(d \b{F} = (\p_x F_{x} + \p_y F_{y} + \p_z F_{z}) d^3 \b{x} = (\del \cdot \b{F}) d^3 \b{x}\).
Then
\[\begin{aligned} \int_{g > 0} (\del \cdot \b{F}) \d V &= \int_{\bb{R}^3} \theta(g(\b{x})) (\del \cdot \b{F}) \d^3 \b{x} \\ &= \cancel{\int_{\p \bb{R}^3} \del(\theta \b{F}) \d^3 \b{x}}  \int [\del \theta(g(\b{x}))] \cdot F \d^3 \b{x} \\ &= \int \delta(g(\b{x})) [\del g \cdot \b{F}] \frac{dg \^ \star dg}{\ \del g \^2} \\ &= \int 1_{g=0} (\b{n} \cdot \b{F}) \frac{dg \^ \star dg}{\ \del g \^2} \\ &= \int 1_{g=0} (\b{n} \cdot \b{F}) \; \widehat{dg} \^ \star \widehat{dg}\\ &= \oint_{g=0} (\b{n} \cdot \b{F} ) \; {\star \widehat{dg}} \\ \end{aligned}\]Which (as far as I can tell? this stuff is tricky) should be the correct area element on the surface. As always, not very helpful but I thought it was cool that it works. (I used the fact that integration by parts works with a scalar function times a vector field: \(G (\del \cdot F) = \del G \cdot F\) so long as \(\del(GF)\) is zero at infinity, which it is because \(\theta(g) = 0\) outside of \(V\).)
That’s the classical version. The exterior calculus version is somewhat more elegant. In this, we treat \(F\) as a bivector field rather than a vector field, and we’re trying to get
\[\int_{g > 0} dF = \int_{g =0} F\]We can imagine expanding \(F\) in a fictitious \((g, u, v)\) coordinate system that parameterizes the \(g> 0\) region, and regard \(F\) as a bivector field \(F = F_{uv} d u \^ dv + F_{vg} d v \^ dg + F_{gu} d g \^ du\). (If starting from a vector field, this is \(\star F\).^{4}) So the divergence is:
\[dF = (\p_g F_{uv} + \p_u F_{vg} + \p_v F_{gu}) (dg \^ du \^ dv)\]The volume element \(d^3(g, u, v)\) is not necessarily of magnitude \(1\) in the ambient coordinates. Keeping track of all the “types” like this tells us exactly how to change coordinates if we need to.
When we integrate in parts in these coordinates, both the \(\p_u\) and \(\p_v\) derivatives will vanish because \(\theta = \theta(g)\) only. Also, there’s no extra \(\del g\) term because \(\p_g \theta(g) = \delta(g)\). It looks like this:
\[\begin{aligned} \int_{g > 0} dF &= \int \theta(g) [\p_g F_{uv} + \cancel{\p_u F_{vg} + \p_v F_{gu}}] \d g \^ du \^ dv \\ &= \int (\p_g) \theta(g) F_{uv} \d g \^ du \^ dv \\ &=  \int \delta(g) F_{uv} \d g \^ du \^ dv \\ &=  \int 1_{g = 0} \, F_{uv} \widehat{\d g} \^ du \^ dv \\ &= \oint_{g = 0} F_{uv} \d u \^ dv \\ &= \oint_{g =0} F \end{aligned}\](Where’d the negative sign go? Well, \(dg\) points into the surface, not out of it, so I removed it when integrating over \(\widehat{dg}\) for consistency with the assumed orientation of \(du \^ dv\).)
Although to be honest I get really lost in some these exterior calculus computations so I wouldn’t vouch too heavily for this. But I do think this trick of “inventing coordinates for a surface, then writing down delta and step functions for it” is suspiciously powerful.
Incidentally, this type of integration is discussed on the Wikipedia page Laplacian of the Indicator. It turns out that in some contexts it’s useful to take further derivatives of \(\delta(g)\) to produce \(\delta'(g)\) functions on surfaces.
The same basic derivation should work for the other types of Stokes’ theorem, such as \(\int \del \times F \d A = \oint F \d \ell\) and \(\int_C \del F d \ell = \int_{\p C} F\). But I’m running out of steam so I’ll leave that for a later article.
6. Summary
Although my goal was to justify the funnylooking formula \(\delta(x) = \frac{1_a}{\ dx \}\), but I ended up getting somewhat sidetracked playing around with using it to manipulate integrals in 3d. I guess the point is to just show that everywhere I’ve tried to use that notation, it is has proven rather natural and intuitive, so long as you remember that funny rule: that the differentials in the denominator combine with the wedge product, and are used to turn differentials in the numerator into “unit” differentials like \(\widehat{dg}\).
No idea if there’s any rigorous basis for any of it, of course. But I’m just glad to know how to produce some of the delta function identities more quickly now.

It seems like the distribution for the range \((a,b)\) is \(\theta_a  \theta_b\), and the distribution for the boundary is \(\delta_b  \delta_a\), created by the \(\p\) operator, rather than \(+\p\) as you might guess. Why? I think it’s because, for a function like \(\delta_a\) or \(\theta_a\), the point \(a\) actually enters with a negative sign, in \(\theta(xa)\). So if you wanted to take a derivative “with regard to the point \(a\)”, you would really want the object \(\p_a \theta_a\). It just happens that \(\p_a \theta_a = (\p_x) \theta_a\), so the negative derivative \(\p_x\) does the same thing as the positive derivative \(+\p_a\). ↩

A slightly more sophisticated object would be something like \(\sgn_a(I)\) which measures “the sign of \(a\) in \(I\)” (which I have also seen written as \(a \diamond I\)). The difference is that this would be \(0\) if \(a \notin I\). But I figure it’s probably not necessary to include that additional complexity here. ↩

Aside: it is good to think of the \(\pm\) symbol (or any discrete index, such as those created by \(\sqrt[n]{x}\)) as referring to coordinates on a \(0\)surface. \(x^2 = 4\) is a “onedimensional constraint in one dimension”; the resulting surface is zerodimensional and consists of two points. ↩

It is somewhat nontrivial to see how this corresponds to the usual definition of divergence in curved coordinates. The important bit is to note that by writing the vector field as a bivector field we’ve already picked up some extra factors. For instance, in spherical coordinates, we have \(d^3 \b{x} = r^2 \sin \theta \d r \^ d \theta \^ d\phi\), and so \(F_{\theta \phi}\) is given by \(\star F_r (\b{r}) = (r^2 \sin \theta) F_r\). The total radial term ends up being \(\frac{1}{r^2 \sin \theta} \p_r [r^2 \sin \theta F_r] = \frac{1}{r^2} \p_r (r^2 F_r)\). ↩