Further Meditation on Taylor Series

February 5, 2026

I have spent a lot of time trying to think intuitively about calculus, Taylor series, divergent series, and things like that. Here are a couple things I realized at some point which I would like to have written down. (This is sort of a sequel to a much more elementary post some years ago. I know a lot more now, and have also apparently gotten a lot more verbose.) Maybe they are well-known to some people, or maybe not, but they were at least interesting to me.

As usual: don’t expect anything even resembling rigor in here; I’m just poking around.

1. Taylor Series Five Ways

The general form of a Taylor series is

\[\begin{aligned} f(x+a) &= f(x) + f'(x) a + f''(x) \frac{a^2}{2!} + f'''(x) \frac{a^3}{3!} + \ldots \\ &= \sum_{k=0}^{\infty} f^{(k)}(x) \frac{a^k}{k!} \\ \end{aligned}\]

For example consider the series expansion of \(\frac{1}{1-x}\), my favorite function (because it’s so mysterious), around \(x=0\):

\[S_0(x) = 1 + x + x^2 + x^3 + \ldots\]

This series approximation converges for \(\| x \|<1\), at which point the \(x^k\) terms never go to \(0\) and the sum diverges. If you want a version that works for \(\| x \|>1\) you can use the expansion around \(x=\infty\) instead. Simply write \(y=1/x\) and expand everything in terms of \(y\) instead:

\[\begin{aligned} \frac{1}{1-x} &= -\frac{1}{x} \frac{1}{1 - \frac{1}{x}} \\ &= -\frac{1}{x} [1 + \frac{1}{x} + \frac{1}{x^2} + \ldots] \\ S_{\infty}(x)&= -\frac{1}{x} - \frac{1}{x^2} - \frac{1}{x^3} - \ldots \end{aligned}\]

And some algebra gives a version around any \(x=k\):

\[S_k(x) = \frac{1}{1-k} \frac{1}{1-\frac{x-k}{1-k}} = \frac{1}{1-k}[1 +(\frac{x-k}{1-k}) + (\frac{x-k}{1-k})^2 + \ldots]\]

Those are a;lways good to keep around as examples.

Now here are six ways of thinking about where the Taylor series for \(1/(1-x)\) “comes from”. Probably there is some insight to be gleaned from the fact that they all lead to the same thing.

Way 1: Repeated integration

The most elementary way is to repeatedly apply the fundamental theorem of calculus. Assuming \(f(x)\) has all of its derivatives and things like that, we repeatedly expand and then factor out all the constant terms:

\[\begin{aligned} f(x+a) &= f(x) + \int_x^{x+a} f'(x_1) d x_1 \\ &= f(x) + \int_{x}^{x+a} [f'(x) + \int_x^{x_1} f''(x_2) \d x_2] \d x_1 \\ &= f(x) + \int_x^{x+a} f'(x) + [\int_x^{x_1} [f''(x) + \int_{x}^{x_2} f'''(x_3) \d x_3] \d x_2] \d x_1 \\ &= \text{(etc...)} \\ &= f(x) + \int f'(x) \d x_1 + \int \int f''(x) \d x_2 \d x_1 + \int \int \int f'''(x) \d x_3 \d x_2 \d x_1 + \ldots \\ &= f(x) + f'(x) a + f''(x) \frac{a^2}{2} + f'''(x) \frac{a^3}{3!} + \ldots \end{aligned}\]

Easy enough. In the case of \(1/(1-x)\) you get \(\p^k_x (1-x)^{-1}= (1-x)^{-(k+1)}\), so all the derivatives are \(1\) at \(x=0\).

Way 2: Exponentiation of a derivative

Since \(f(x + dx) \approx f(x) + f'(x) dx = (I + dx \p_x) f(x)\), we can interpret \((I + dx \p_x)\) as an operator \(T^{dx}\) which translates \(f(x) \mapsto f(x+dx)\):

\[T^{dx} = I + dx \p_x\]

To translate over some finitesimal¹ distance \(a\) we then apply this repeatedly, up to \(a/dx\) times, so that we will have translated by \(a\) in total. When we take the limit as \(dx \ra 0\) this becomes exactly the construction of the exponential function:

\[\begin{aligned} \lim_{dx \ra 0} [T^{dx}]^{a/dx} f(x) &= \lim_{dx \ra 0} [I + dx \p_x]^{a/dx} f(x) \\ &= [\lim_{dx \ra 0} [I + dx \p_x]^{1/dx}]^a f(x) \\ f(x+a) &= e^{a \p_x} f(x) \\ &= [1 + a \p_x + \frac{(a \p_x)^2}{2!} + \frac{(a \p_x)^3}{3!} + \ldots] f(x) \end{aligned}\]

Which is why we write \(T^{dx}\) instead of, say, \(T_{dx}\).² Now you might call that circular because we end up using the Taylor series for \(\exp\) to produce the general form. But, we can also come up with the series from a simple combinatoric argument (well, simple in the sense that it’s simple to follow. It still involves a sketchy limit.):

Suppose that the product \((1 + \e x)^{1/\e}\) has an integer number of terms (so we’re assuming \(n= 1/\e\) an integer). Then the binomial theorem gives

\[\begin{aligned} (1 + \e x )^n &= \sum_{k=0}^n (\e x )^k 1^{n-k} \\ &= \binom{n}{0} 1 + \binom{n}{1} (\e x ) + \binom{n}{2} (\e x )^2 + \binom{n}{3} (\e x )^3 + \ldots \\ &= 1 + \frac{n!}{(n-1)!} (\e x x) + \frac{n!}{(n-2)! 2!} (\e x )^2 + \frac{n!}{(n-3)! 3!} (\e x )^3\ldots \\ &= 1 + n \e x + \frac{n(n-1)}{2!} (\e x )^2 + \frac{n(n-1)(n-2)}{3!} (\e x)^3 \ldots \\ &\approx 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \ldots \end{aligned}\]

Another way of thinking about it is that in the giant product

\[(1 + \e x)^n = (1 + \e x)(1 + \e x)(1 + \e x)(1 + \e x)\cdots\]

the linear term \(n \e x = x\) consists of \(n\) ways of having one factor of \(\e x\), the quadratic term consists of all \(\binom{n}{2} = n(n-1)/2!\) ways of having two factors of \(\e x\), etc. As \(n\ra \infty\) these are dominated by their highest-order term \(n^k/k!\), which is where the terms in the Taylor series come from. (I guess this is obvious when you think about it but I remember not realizing it for a long time.)

Note that there is an approximation going on: we end up collapsing terms like \(n(n-1) \e^2 = (n \e)^2 [1 - \frac{1}{n}] = n^2 \e^2 - n \e^2 = 1 - \e\) to just \((n\e)^2 = 1\), dropping a factor of \(\e\). A completely-correct series (for integer \(n\)) would be \((1 + \e x)^n = 1 + x + (1-e) \frac{x^2}{2!} + (1 - 3 \e + 2 \e^2)\frac{x^3}{3!} + \ldots\), which is kinda interesting. But anyway.

Writing \(e^{a \p_x} f(x)\) perspective treats \(\p_x\) as a generator in the sense of groups, specifically in the Lie algebras of Lie groups. It ‘generates’ translation in the \(x\) coordinate, according to

\[e^{a \p_x} f(x) = f(x+a)\]

A Taylor series in any other variable can be produced by writing \(f(x) = f(g^{-1}(g(x)))\); then

\[e^{a \p_{g(x)}} f(x) = f(g^{-1}(g(x) + a))\]

For example,

\[e^{a x \p_x} f(x) = e^{a \p_{\, \ln x}} f(e^{\ln x}) = f(e^{\ln x + a}) = f(ax)\]

See the Wiki on shift operators for more examples.³. We can also take logarithms in the same framework. Using the approximation \(\log (1+x) \approx x\):

\[\log T^{dx} = \log (1 + dx \p_x) \approx dx \p_x\]

and more generally

\[\log T^{a} = \log e^{a \p_x} = a \p_x\]

which is neat.

Something I like about this approach is that it is valid even outside the radius of convergence of the series, in a certain sense. In the case of \(e^{a \p_x} \frac{1}{1-x}\), for example, the derivative stops existing at \(a=1\). However it is still meaningful to speak of the \(T^{dx} f(x)\) at a point like \(f(1 - dx/2)\)—it is simply \(f(1+dx/2)\). Nevermind that this value is infinite as \(dx \ra 0\). That just means that the ‘true’ Taylor series for \(1/(1-x)\) ought to be something like

\[\frac{1}{1-x} \? 1 + x + x^2 + \ldots - 1_{\| x \|-1} [2/dx]\]

where that term on the end is sort of like an ‘infinite-order’ derivative. After all in the fininite approximations, \(f'(x)\) involves \(f(x+dx)\), \(f''(x)\) involves \(f(x + 2 \d x)\), etc. So it must be that the terms involving \(f(x+1)\) or higher only show up after the \(1/dx\)‘th entry in the series—way off at infinity. Now it may seem hard to write such at thing down correctly. But in fact we do know what the series is at that point. The result must be

\[\begin{aligned} \frac{1}{1-x} &= S_0(x) + 1_{\| x \|-1} [S_{\infty}(x) - S_0(x)] \\ &= S_0(x) - 1_{\| x \|-1}[\ldots + x^{-2} + x^{-1} + 1 + x + x^2 + \ldots] \end{aligned}\]

Which is maybe an interesting way of thinking about it? (Note that that infinite sum refers to the literal value of the displacement between \(1/(dx)\) and \(1/(-dx)\).) And \(e^{a \p_x}\) should give this because the construction should be valid everywhere, even if it’s not clear what algebra would produce such an answer.

Way 3: direct combinatorics

This is related to the previous argument, but somewhat suggestive in a different direction.

For mysterious but significant reasons it is mathematically valid to apply the binomial theorem for fractional or negative exponents. Every term in \((a+b)^n\) still looks like \(\binom{n}{k} a^k b^{n-k}\), but the rule is that extend the sum to infinity instead of stopping at \(k=n\). In fact there is no need to think of it as a rule: since \(\binom{n}{k} = \frac{n!}{k! (n-k)!}\), we simply observe that if \(n\) is negative or fractional, then it is no longer the case that \(\binom{n}{k} = 0\) for \(k > n\). This requires doing some dubious things with “negative” factorials”, but they… seem to work?

\[\begin{aligned} \binom{-1}{k} &= \frac{(-1)!}{k!(-1 - k)!} \\ &= \frac{(-1) (-2) (-3) \times \ldots \times (-k) \cancel{(-1-k)!}}{k! \times \cancel{(-1-k)!}} \\ &= \frac{k! (-1)^k}{k!} \\ &= (-1)^k \\ \end{aligned}\]

So this gives, for \(1/(1-x)\):

\[\begin{aligned} (1-x)^{-1} &= \sum_{k=0}^{\infty} \binom{-1}{k} 1^k (-x)^{-1 - k} \\ &= - \sum_k (-1)^k (-1)^k x^{-1-k} \\ &= -\frac{1}{x} [1 + \frac{1}{x} + \frac{1}{x^2} + \frac{1}{x^3} + \ldots] \end{aligned}\]

Well, that’s the Taylor series around \(x=\infty\). Strangely, if you swap the roles of the two variables in the summation, you get the series around \(x=0\) instead:

\[\begin{aligned} (1-x)^{-1} &= \sum_{k=0}^{\infty} \binom{-1}{k} (-x)^k 1^{-1-k} \\ &= \sum (-1)^k (-1)^k x^k \\ &= 1 + x + x^2 + x^3 + \ldots \end{aligned}\]

This I find very mysterious. More generally, there are two series for \((x+y)^{-1}\) depending on whether you divide through by \(x\) or \(y\) first:

\[\begin{aligned} \frac{1}{x+y} &=\frac{1}{x} (1 + \frac{y}{x})^{-1} = x^{-1} - y x^{-2} + y^2 x^{-3} - y^3 x^{-4} + \ldots \\ &= \frac{1}{y} (1 + \frac{x}{y})^{-1} = y^{-1} - x y^{-2} + x^2 y^{-3} - x^3 y^{-4} + \ldots \end{aligned}\]

Observe that when you multiply one of these series by \((x+y)\) it does indeed seem to give \(1\):

\[\begin{aligned} (x+y)(x+y)^{-1} &= (x+y) (x^{-1} - y x^{-2} + y^2 x^{-3} - y^3 x^{-4} + \ldots) \\ &= (x+y) x^{-1} - (x+y) yx^{-2} + (x+y) y^2 x^{-3} - (x+y) y^3 x^{-4} + \ldots \\ &= (1 + \cancel{y x^{-1}}) - (\cancel{x y^{-1}} + \cancel{y^2 x^{-2}}) + (\cancel{x^{-2} y^2} - \cancel{y^3 x^{-3}}) + \ldots \\ &= 1 \end{aligned}\]

And likewise for the other (as it must, by symmetry). Interestingly you can also use linear combinations of both solutions. If we write \(S_1 = x^{-1} - y x^{-2} + y^2 x^{-3} - \ldots\) and \(S_2 = y^{-1} - x y^{-2} + x^2 y^{-3} + \ldots\) then

\[(x+y) (\lambda S_1 + (1-\lambda) S_2) = 1\]

for any \(\lambda\) (technically this is not allowed because both are not going to converge at the same time, but I suspect it’s still meaningful somehow).

It’s intriguing that the series you can get from these expansions are the same choices as the Taylor series that you get as you vary the point you’re expanding around. Whatever mysterious meaning underlies these “unnatural combinatorics” (unnatural in the sense that… well… the exponents are fractions or negative, so not \(\in \bb{N}\)…), it is the same mystery as to why Taylor series break at poles (and then sorta keep working anyway if you squint).

Although I’m not gonna fill up space showing it, the same sort of thing happens for series like \((x+y)^{1/2}\) and any other non-natural exponent. And if you feel like splitting a product up into pieces

\[(x+y) = (x+y)^{3/2} (x+y)^{-1/2}\]

You will find that the two series perfectly cancel out however you do it and whichever variable you expand in, even if the series don’t converge. So I am inclined to believe, just for elegances’ sake, that all of these series are really valid at all inputs, and that it is really the notion of “convergence”, or maybe just the notion of “addition”, which is not useful so useful for understanding them.⁴

Way 4: Long Division

Another way of producing the combinatoric series is polynomial long division. The normal approach: to compute divide \(g(x)\) by \(f(x)\) we repeatedly compute \(g_i(x) = f(x) q_i(x) + r_i(x)\), setting \(g_{i+1} = r_i\) on each step. Consider calculating \(1/(1-x)\), that is, \(g(x) = 1\) and \(f(x) = 1-x\).

First we have \(1 = 1(1-x) + x\), so \(q_0 = 1\) and \(r(x) = x\).
Next \(x = x(1-x) + x^2\), so \(q_1 = x\) and \(r_1 = x^2\).
Next \(x^2 = x^2(1-x) + x^3\), so \(q_2 = x^2\) and \(r_2 = x^3\).
Etc, getting \(q_k = x^k\).

We get the series \(1 + x + x^2 + \ldots\). The algorithm doesn’t terminate, for the same reason it doesn’t terminate when you divide non-divisible numbers like \(22/7\). In a sense \(1 + x + x^2 + \ldots\) is the “base-\(x\) representation” of \((1-x)^{-1}\).

The long division algorithm basically corresponds to the series of nested substitutions into the definition of \(S(x)\):

\[\begin{aligned} S &= \frac{1}{1-x} \\ 1 &= (1-x) S \\ S &= 1 + x S \\ &= 1 + x (1+ xS) \\ &= 1 + x (1 + x(1+xS)) \ldots \\ &= 1 + x + x^2 + x^3 + \ldots \end{aligned}\]

There is also the option of doing the division / substitution on the other term. We can write \(1 = (-\frac{1}{x})(1-x) + \frac{1}{x}\), \(-\frac{1}{x} = \frac{1}{x^2} (1-x) - \frac{1}{x^2}\), etc, so instead of finding that \(1\) copy of \((1-x)\) goes into \(1\) with remainder \(x\), we find that \(-\frac{1}{x}\) copies goes in with remainder \(1/x\), and this gives the Taylor expansion \(S_{\infty}(x)\) instead.

Way 5: Orthonormal Bases

A standard operation in linear algebra is to project a vector onto an orthornomal basis, as in \(\b{v} = v_x \b{x} + v_y \b{y} + v_z \b{z}\). The same basically works for functions, although a lot more analytic details are involved in talking about when it’s possible (Something like: your function has to be an element of a Hilbert space, meaning an infinite-dimensional inner-product space which is closed under limits of sequences). When the construction is valid you end up with a an expansion like

\[f(x) = \sum_k \<f, \phi_k\> \phi_k(x)\]

which expresses \(f(x)\) in terms of its projections onto some family of functions \(\{ \phi_k \}\). which has \(\< \phi_i, \phi_j \> = 1_{i=j}\). The inner product \(\< f, g \>\) usually looks like infinite-dimensional dot product—that is, an integral like \(\int_X f(x) \overline{g(x)} \d x\). The overline indicates complex conjugation, which is invariably involved whenever the functions are allowed to be complex.

Standard examples (which are more-or-less equivalent to each other if you gloss over all the analysis) include:

The Fourier transform, with \(\phi_k = e^{ikx}\), writes \(f(x) = \int_{\bb{R}} f_k e^{ikx} dx\)
The Laplace Transform, with \(\phi_k = e^{st}\), writes \(f(x) = \int_{c + i\bb{R}} f_s e^{sx} ds\) (ish)
The Cauchy Integral Formula, with \(\phi_k = z\), writes \(f(z) = \sum_{k \in \bb{Z}} f_k z^k\), giving the Laurent series for \(f(z)\) (which generalizes the Taylor series to allow negative coefficients.)

I also like to include function evaluation itself on this list: after all what is \(f(x)\) but the ‘coefficient’ of \(f(x)\) along the point \((x)\), that is, the value of \(\int f(x) \delta(x) \d x\)?

We could certainly say that (for analytic \(f\)) that the Taylor series terms are all given by the Cauchy integral formula

\[f^{(n)}(0) = \frac{n!}{2\pi i} \oint \frac{f(z)}{z^{n+1}} dz\]

Which after a change of variables to \(z = r e^{i \theta}\) becomes

\[\begin{aligned} f^{(n)}(0) &= \frac{n!}{2\pi i} \oint \frac{f(z)}{(re^{i \theta})^{n+1}} (\cancel{dr e^{i\theta}} + i r e^{i \theta} d \theta) \\ &=\frac{n!}{2\pi} \oint \frac{f(z)}{(re^{i\theta})^n} d\theta \end{aligned}\]

When \(f(z) = z^m\) this cancels out unless \(m-n = 0\), in which case it gives \(2\pi\), which is why it extracts the \(f^{(m)}(0)\) term from \(f(z)\) when it has a Laurent series. This constuction essentially takes the Fourier transform of \(f(r e^{i \theta})\) in the \(\theta\) coordinate. In this sense the Taylor/Laurent series is a sort of projection onto the orthonormal basis given by \(z^n\).

I happen to disprefer anything that resorts to complex analysis, so here’s an attempt at a strictly real version of the same projection. We would want to treat \(\phi_k = \frac{x^k}{k!}\) as a basis and then come up with some sort of dual basis \(\phi^j\) such that \(\< \phi_k, \phi_j \> = \int \phi^j \phi_k dx = 1_{1=j}\). The obvious thing to try is just the inverse, \(\phi^j = \frac{j!}{x^j}\), but it doesn’t work

\[\begin{aligned} \< \phi_j, \phi_k \> &= \int_{\bb{R}} \phi^j \phi_k dx \\ &= \int \frac{j!}{x^j} \frac{x^k}{k!} \d x \\ &= \int \frac{j!}{k!} x^{k-j} \d x\\ &\? \begin{cases} 0 & k > j \\ 1 & k = j \\ \text{???} & k<j \end{cases} \end{aligned}\]

So, not quite. What does work, although it feels like cheating, is using derivatives of delta functions,

\[\phi^j(x) = (-\p_x)^j \delta(x)\]

Which pass over to anything they’re intergrated against via integration by parts: \(\int \delta' (x) f(x) dx = - \int \delta(x) f'(x) \d x = -f'(0)\).

\[\< \phi_j , \phi_k \> =\int (-\p_x)^j \delta(x) \frac{x^k}{k!} dx = \int \delta(x) (\p_x)^j \frac{x^k}{k!} \d x = 1_{j=k}\]

This is basically cheating. It lets us write a \(\phi_k\) which is actually just the derivative operator \(\p_k\), such that the inner product \(\< \phi_j, \phi_k \>\) with this definition just ends up computing \(\p_x^j \phi_k \|_{x=0}\), which is exactly the definition of a Taylor series we already knew. It also does not work if \(f(x)\) has any negative-order Laurent series terms, which makes it sorta strictly less powerful than the Cauchy integral form.

An interesting but questionable modification comes from using the translation of a delta derivative into a polynomial, writing \(\delta(x)/x = -\delta'(x)\). This is based on how it behaves in an integral: \(\int (\delta(x)/x) f(x) \d x = \int \delta(x) f'(x) \d x\) if \(f(x) = x g(x)\), that is, if it does not have a constant term.

\[\phi^j(x) = (-\p_x)^j \delta(x) \? \frac{j!}{x^j} \delta(x)\]

You see this operation around sometimes, but it feels really sketchy; it really relies on the “not having a constant term”, though (I wrote about its sketchiness some here). Maybe there’s a principal value / finite part-type-justification for it even in cases where it doesn’t literally apply? But it’s hard to imagine how you could get \(\int (\delta(x)/x^2) dx = 0\) to ever work on the reals alone without a “finite part” operation (which ends up being quite similar to dropping the divergent parts of divergent series, by the way). I mention it only because the presence of \((x^j/j!)^{-1}\) in this form seems suspiciously relevant.

In any case, treating \(\phi^k = (-\p_x)^k \delta(x)\) as the “dual basis” for \(\phi_k = x^k/k!\), lets us regard the Taylor series (or the Laurent series) as a projection onto a basis like any other vector:

\[f(x) = \sum \< \phi_k, f \> \phi_k = f(0) + f'(0) x + f''(0) \frac{x^2}{2!} + \ldots\]

Which is an appealing intepretation, at least, even if the justification is questionable. It seems like that ‘morally’ a Taylor series is projecting a function onto a basis like this, even if the literal algebraic operations don’t quite seem to do that. Residue integrals seem to be the standard way of making things work, but I wonder if there is another way that might be more conceptually simple.

Another Way 5? More Basis Projections

Here is another way of producing an orthonormal-basis-projection-type formula for the Taylor series. This one is kinda weird. It’s also cheating because it’s circular.

In the exponential series \(e^{a \p_x} = 1 + a \p_x + \frac{a^2}{2!} \p_x^2 + \ldots\), each of the \(a\) terms is \(a^k/k!\), which we can write as a sort of inverse of a derivative operator

\[\frac{a^k}{k!} = \int_0^a \int_0^{a'} \int_0^{a''} \cdots (1) \cdots da'' da' da \? \p_a^{-k} (1)\]

Using these each term \(\frac{a^k}{k!} f^{(k)}(0)\) results from one of each type of operator, both evaluated at \(x=0\). I’ll will write as a fraction shorthand, like this:

\[\begin{aligned} \frac{a^k}{k!} f^{(k)}(0) &= (\p_a^{-k} \p_x^k) f(x) \\ &= (\p_a^{-1} \p_x)^{k} f(x) \\ &= (\frac{ \p_x}{ \p_a})^k f(x) \end{aligned}\]

The whole Taylor series is

\[[1 + (\frac{ \p_x}{ \p_a}) + (\frac{ \p_x}{ \p_a})^2 + \ldots] f(x)\]

which…. we can then write as a Taylor series in this \(\p_x/\p_a\) thing, I guess?

\[\frac{1}{1-\frac{\p_x}{\p_a}} f(x) \? f(x+a) = e^{a \p_x} f(x)\]

This operator \((\frac{\p_x}{\p_a})^k\) acts like a projection onto polynomials of degree \(k\), so it’s a lot like a projection onto an orthonormal basis as well. In this case each power of the same operator produces the projection onto the next term. This approach doesn’t seem to clearly do anything about the negative powers that Laurent series are able to handle, but that’s ok. Weird, I know. But I think it’s another interesting way of looking at things.

Also, fun observation: \((1-\p_x/\p_a)^{-1} = e^{a \p_x}\) is basically a valid differential equation.

\[\begin{aligned} (1 - \frac{\p_x}{\p_a}) e^{a \p_x} &= e^{a \p_x} - \p_x [\int_0^a e^{a \p_x} \d a] \\ &\? e^{a \p_x} - \p_x \frac{1}{\p_x}[e^{a \p_x} - 1] \\ &\? e^{a \p_x} - e^{a \p_x} + 1 \\ &= 1 \end{aligned}\]

Maybe this could be thought of as a definition of \(e^{a \p_x}\)? But um, don’t look too hard at that step in the middle.

So those are all the ways I know of to invent a Taylor series. I like to see them all laid out side-by-side, since I am looking for, among other things, some elegant logic behind the fractional calculus and fractional combinatorics in general. Basically everything seems connected somehow in a way that I am not aware of math describing, and there are clues everywhere.

I have one other thing I want to talk about which feels closely related to all of the above, which is the mysteriously blurriness of the distinction between integrals and infinite series in general.

2. Connection to Divergent Series

Consider again the case of \(\frac{1}{1-x}\). The Taylor series around \(x=0\) is

\[S_(x) = 1 + x + x^2 + x^3 + \ldots\]

For example,

\[\frac{1}{1-\frac{1}{2}} = 1 + \frac{1}{2} + \frac{1}{4} \ldots = 2\]

And the expansion around \(x=\infty\):

\[\begin{aligned} \frac{1}{1-x} &= -\frac{1}{x} - \frac{1}{x^2} - \frac{1}{x^3} - \ldots \end{aligned}\]

For example,

\[\frac{1}{1-2} = -\frac{1}{2} \frac{1}{1 - \frac{1}{2}} = -\frac{1}{2}[1 + \frac{1}{2} + \frac{1}{4} + \ldots] = -1\]

It is very interesting to contemplate the fact that plugging \(2\) into the first series sorta works: \(S(2) = 1 + 2 + 4 + 8 + \ldots\) gives a series whose “sum”, by any of the various divergent series summation techniques, equals \(-1\). For example it obeys appears to obey

\[(1-x) S(x) \mapsto S(2) - 2(S(2)) = 1\]

One might interpret this to mean that although the result is not a number, it still ‘contains’ the data \(-1\) somehow, maybe in a form like \(-1 + O(\infty)\)? This is an explanation I saw online a while ago. But I’ve come to think that all of the divergent series ‘explanations’ are making things too complicated. There is a very simple way of thinking about this that is completely satisfactory for intuition:

It is not actually the case that \(S_0(x)\) is the multiplicative inverse of \((1-x)\). This is clear if you write it as the limit of a finite sum, \(S_0(x) = 1 + x + x^2 + \ldots + x^N\), as \(N \ra \infty\). Then

\[(1-x) S_0(x) = 1 - x^{N+1}\]

That is,

\[S_0(x) = \frac{1}{1-x} - \frac{x^{N+1}}{1-x}\]

In the case where \(\lim_{N \ra \infty} x^{N} = 0\) this works out to give the right answer. Otherwise, as in the case of \(S_0(2) = (1 - 2^{N+1})/(1-2) = - 1 + 2^{N+1}\), it does not. But that is why it has the “value” \(-1\) under the divergent-sum techniques: because they are simply ignoring the \(x^N\) term entirely, even though it contributes to the result.

Meanwhile \(S_{\infty}(x) = -\frac{1}{x} - \frac{1}{x^2} - \ldots - \frac{1}{x^N}\) is not the inverse either:

\[(1-x) S_{\infty}(x) = 1 - \frac{1}{x^N}\]

But for \(\| x \| > 1\) the second term eventually disappears.

A way to reconcile all this is that the “true” values of \((1-x)^{-1}\) are

\[\begin{aligned} \frac{1}{1-x} &= S_0(x) + \frac{x^{N+1}}{1-x} \\ &= S_{\infty}(x) + \frac{x^{-N}}{1-x} \\ &= S_k(x) + \frac{1}{1-x} (\frac{x-k}{1-k})^{N+1} \end{aligned}\]

and these are both exact, and equal to each other. (This sort of thing is clear if you go through the derivation of Taylor series rigorously, because you always track the remainder term. But I feel like people forget that it completely explains what’s going on with divergent series as well.)

A more complicated example is the function \(g(x) = \frac{1+x}{1+x+x^2}\) which should have \(g(-1) = 2/3\). The summation \(g_0(-1)\) appears to give an unsummable divergent series

\[\begin{aligned} \frac{1+x}{1+x+x^2} &= \frac{1-x^2}{1-x^3} \\ &= (1-x^2)(1 + x^3 + x^6 + \ldots) \\ &= 1 - x^2 + x^3 - x^5 + x^6 - x^8 + \ldots \\ &\mapsto 1 - 1 - 1 + 1 + 1 - 1 + \ldots \end{aligned}\]

The manipulation \(\frac{1+x}{1+x+x^2} = \frac{1-x^2}{1-x^3}\) is valid by l’Hôpital’s rule, but the series expansion doesn’t work. In fact

\[\frac{1-x^2}{1-x^3} = (1-x^2)(1 + x^3 + x^6 + \ldots + \frac{(x^3)^{N+1}}{1-x^3})\]

and the remainder term keeps everything equivalent. Whereas erasing the \(N\)-dependence leaves something unusable. It seems to me that there is no point trying to work around this fact to come up with ‘rules’ for extracting values for these divergent series when they are invalid approximations in the first place.

Evidently the way that divergent series techniques is that we are implicitly performing a projection that removes the \(N\)-dependence, and which is valid only in cases where that removed part goes to zero. Let us write \(\pi_{\perp N}\) for this ‘projection’ operator, where \(N\) is the “ambient variable” that is implicitly being used in some of the internal limits. Then \(\pi_{\perp N} [S_0(x)] = \frac{1}{1-x}\), and yet \(\pi_{\perp N} [S_0(2)] = -1\). But \(S_0(2) = -1 + 2^{N+1}\) is simply a different value. I think this is a good way of thinking about what’s going on with a lot of the standard divergent series examples.

This idea of factoring equations in terms of their ambient variables shows up in other places and is, I think, generally under-appreciated. It’s the idea behind the entropy of a continuum, in which you separate out the part of the entropy which depends on your fine-graining of space from the constant part \(H[p(x)] = - \int p(x) \log p(x) dx + H[\mathcal{U}(0, 1)]\). It’s also implied (I believe) in the concepts of mixed volumes and numerosity, although I haven’t had the chance to figure out that out in detail. The general theme is that when you take limits there is often an ambient variable involved in how the limit is taken, and even if you suppress it in the notation, it can end up mattering later if two expressions couple the limit together. A standard example would be the sum of two divergent integrals in a Cauchy principle value:

\[\lim_{a \ra 0} \int_{-1}^a \frac{dx}{x} + \lim_{b \ra 0} \int_{b}^1 \frac{dx}{x}\]

whose value is really a function of \((a,b)\), and in some cases it is possible to get this to come out to a finite value. Only in the case where you assume something about their relationship does it have a definite value. In particular the Cauchy principal value is the case where you assume \(a=b\); since this assumption is hidden you end up with an expression that is implicitly dependent on coordinates (see here, say). The fact that this is not written explicitly always seemed like a hack to me. I don’t like hacks, so I think we need a standard way of dealing with things this, and remembering the dependence on values that are normally hidden feels like a good way of doing it.

Something else to notice here is that, if we allow limits to go off to infinity, then it is the case that

\[(1-x)(\sum_{k = -\infty}^{\infty} x^k) = (1-x) (\ldots + x^{-2} + x^{-1} + 1 + x + x^2 + \ldots) = 0\]

This means that at least in infinite-divergent-series-land, \((1-x)\) is a zero divisor even if \(x \neq 0\): there exists a nonzero divergent series \(h(x)\) with \((1-x)h(x) = 0\). Therefore an “inverse” of multiplication by \((1-x)\) has a free term: if \(S(x) = \frac{1}{1-x}\) then \(S(x) + \lambda h(x)\) is also a solution.

This is sort of the sense in which \(S_0(x)\) and \(S_{\infty}(x)\) are “both” solutions to \((1-x)^{-1}\), if you ignore their \(N\)-dependence: they differ by a factor of \(h(x)\), which has no constant (non-\(N\)-dependent) part. After all \(S_{\infty}(x) + h(x) = S_0(x)\). The term disappears in the divergent summation techniques because \(\pi_{\perp N}(h(x)) = 0\); the whole value depends on \(N\). So that arithmetic is actually illegitimate when you track the limits at infinity, but when you’re not, it’s a good way of interpreting what’s going on when multiple divergent series solve the same equation.

3. Integration as an Infinite Series

(This section rehashes something I talked about in More on \(\sqrt{\pi}\) because it’s relevant here also. I first learned of this from this pdf by Nicholas Wheeler.)

Earlier I talked about the translation-by-\(dx\) operator \(T^{dx} = 1 + dx \, \p_x\), which has \(T^{dx} f(x) = f(x+dx)\). Flipping this around we can write a “finitesimal” version of the derivative operator

\[\p_x = \frac{T^{dx} - 1}{dx}\]

which in the limit gives the usual derivative

\[\lim_{dx \ra 0} f(x) = \lim_{dx \ra 0} \frac{f(x+dx) - f(x)}{dx} = f'(x)\]

A fun thing to do is to expand this to the \(-1\) power and see if it works like an integral. After all an integral is the more-or-less the inverse of a derivative operator, right?

\[\begin{aligned} \p_x^{-1} &= [\frac{T^{dx} - 1}{dx}]^{-1} \\ &= -dx \frac{1}{1-T^{dx}}\\ &\? -dx (1 + T^{dx} + T^{2 dx} + T^{3 dx} + \ldots) \\ \end{aligned}\]

In fact that looks very much like a Riemann integral. Acting on a function it seems to give

\[\begin{aligned} \p_x^{-1} f(x) &= -dx [f(x) + f(x+dx) + f(x+2 \d x) + \ldots] \\ &\? - \int_{x}^{\infty} f(x') \d x'\\ &\? \int_{\infty}^x f(x') \d x' \end{aligned}\]

It makes sense that the integration bound has to end at \(x\): after all, we would want \(\p_x \p_x^{-1} f(x) = f(x)\) again. But the fact that we didn’t get to choose an integration bound seems like it might be a problem.

We can control the integration bound like this. In the expression \(\p_x^{-1} = -dx (1 + T^{dx} + T^{2 dx} + \ldots)\) there are really two limits being implied at once: the limit as \(dx \ra 0\) and the limit of the number of terms \(N\) as the series is extended to infinity. By writing them out explicitly we can couple how the limits are taken together. In particular if we want to limit the integral to a distance \(a\), then we need for the final term in the sum to correspond to \(T^{(N-1) dx} = T^{a - dx}\). Is this reasonable to do? I dunno: normally one does not get to pick a “stopping” point for their infinite series. But we’ll do it anyway.⁵

\[\begin{aligned} \p_x^{-1} f(x) &= \lim_{dx \ra 0} (- dx )[\sum_{k=0}^{a/dx - 1} T^{k \d x}] f(x) \\ &= - \int_{x}^{x+a} f(x') \d x' \\ &= \int_{x+a}^x f(x') \d x' \end{aligned}\]

We get a different result if we Taylor expand \(\p_x^{-1}\) around \(T^{-dx}\), which is the equivalent of expanding around \(x=\infty\) earlier except now the object is an operator so it’s really not that weird. (Well… okay, it’s still kinda weird, because I’m pretending like I can divide by it like a variable. sorry.)

\[\begin{aligned} (\p_x^{-1}) f(x) &= \frac{dx}{T^{dx} - 1} \\ &= \frac{1}{T^{dx}} \frac{dx}{1 - T^{-dx}} f(x) \\ &\? T^{-dx} [1 + T^{-dx} + T^{-2dx} + \ldots + T^{-(a-1)d x}] f(x) dx \\ &= [T^{-dx} + T^{-2 dx} + T^{-3 dx} + \ldots + T^{-a \, d x}] f(x) dx \\ &= \int_{x-a}^x f(x') \d x' \end{aligned}\]

So it seems like:

different choices of how to take your Taylor-expand the operator and how to take the limits corresponds to different integration bounds
but, the bounds always end at \(x\).
and the choice of whether to extend the integral to \(x+a\) vs \(x-a\) is probably equivalent to the choice of the specific bound \(a\): it corresponds to taking the limit in a positive vs. negative direction, relative to the limit of \(dx\).

So inverting \(\p_x\) using the Taylor series leads to this object which does initially seem to describe \(\p_x^{-1}\). Unfortunately it is not so simple. The resulting object \(\p_x^{-1}\) is not actually the inverse of \(\p_x\), in the sense of \(\p_x \circ \p_x^{-1} = 1\).

\[\begin{aligned} \p_x \circ \p_x^{-1} &= \frac{ T^{dx} - 1 }{dx} [-(dx)(1 + T^{dx} + T^{2 dx} + \ldots + T^{(N-1)dx})] \\ &= (T^{dx}-1)(1 + T^{dx} + T^{2 dx} + T^{3 dx} + \ldots + T^{(N-1)dx}) \\ &= 1 - T^{N \d x} = 1 - T^a\\ [\p_x \circ \p_x^{-1}] f(x) &= [1 - T^{N \d x}]f(x) = f(x) - f(x+a) \end{aligned}\]

This is the correct derivative of the integral \(\int_{x+a}^x f(x') dx'\) because both integration bounds have \(x\) dependences and according to the Leibniz integral rule:

\[\p_x \int_{a(x)}^{b(x)} f(x) \d x = f(b(x)) b'(x) - f(a(x)) a'(x)\]

which in this case is just \(f(x) - f(x+a)\). But it’s not \(f(x)\) again. What went wrong?

Well it is the same thing again. The problem is that \(1+x+x^2+x^3 + \ldots\) is not equal to \(1/(1-x)\) unless \(\| x \| < 1\), or abstractly, if the terms at infinity can be ignored. So our use of the series expansion to turn \(-dx/(1 - T^{dx})\) into an integral is not valid unless \(T^{a} f(x) = 0\). A correct way to write the equivalence for all \(f(x)\) is,

\[\begin{aligned} \p_x^{-1} f(x) &= \p_x^{-1} f(x+a) - \int_x^{x+a} f(x') \d x' \\ &= F(x + a) - \int_x^{x+a} f(x') \d x' \end{aligned}\]

where \(F(x)\) is an (imagined) explicit antiderivative of \(f(x)\), if we have one (or a local approximation \(f(x^*+a) x\) around \(f(x+a)\), where \(x^*\) means the value is treated as a constant). The same fix that made divergent sums correct in the previous section is required to have \(\p_x \circ \p_x^{-1} = 1\).

(I suppose another option is to allow the limits to explicitly depend on \(x\) as well: to pick \(N = N(x, dx) = (a-x)/dx\) such that \(a = x + N \d x\) is no longer a function of \(x\) at all, giving \(\p_x^{-1} f(x) = \int_x^a f(x) \d x\). But then you have to contend with \(\p_x [\p_x^{-1} f(x)]\) taking the derivative of the new definition of \(\p_x^{-1}\) as well… seems like a hack to me.)

To summarize, we can see (Riemann) integration as equivalent in a sense to the Taylor series of \(1/\p_x\), but only if we do not discard the term at infinity, like

\[\begin{aligned} \p_x^{-1} f(x) &= \frac{-dx}{1 - T^{dx}} f(x) \\ &= -dx (1 + T^{dx} + \ldots + T^{N}) f(x) dx + \frac{T^{N+1}}{1 - T^{dx}} f(x) dx \\ &= \int_{x+a}^x f(x) \d x + \underbrace{\p_x^{-1} f(x+a + dx) \d x}_{\text{equivalent to the } \frac{x^{N+1}}{1-x} \text{ from earlier}}\\ \end{aligned}\]

and a “free parameter” shows up in the relative rates of the limits of \(N\) and \(dx\) which corresponds to the choice of integration bound. There also ought to be a term corresponding to the kernel of \(\p_x\), that is, an integration constant.⁶ In particular if we treat \(N\) and \(dx\) as uncoupled, then there are two valid interpretations, which correspond to the Taylor expansions \(S_0(T^{dx})\) and \(S_{\infty}(T^{dx}) = S_0(T^{-dx})\), that is, the limits as \(dx \ra \pm 0\).

\[\p_x^{-1} f(x) \? \begin{aligned} \int_\infty^x f(x) \d x &= F(x) - F(\infty) \\ \int_{-\infty}^x f(x) \d x &= F(x) - F(-\infty) \end{aligned}\]

where \(F(\infty)\) is the imagined value of actually summing that integral forever (even though it may not converge). These are only valid inverses of \(\p_x\) in cases where the integral converges such that the result does not have an \(x\)-dependence from how the \(N \ra \infty\) limit is taken, which is the integral equivalent of cases where the Taylor series converges.

Despite this there is still a way to produce an antiderivative that has \(\p_x \p_x^{-1} f(x) = f(x)\) again. The ‘true’ inverse \(\p_x^{-1}\) comes with an integration constant corresponding to terms which have \(\p_x c = 0\). One such term is \(\p_x^{-1} f(y) = F(y) - F(\infty)\), which (assuming we use the same \(dx\) variable in the definition) corresponds to the same \(F(\infty)\) as the one in \(F(x) - F(\infty)\). So we can freely say that

\[\p_x^{-1} f(x) = [F(x) - F(\infty)] - [F(y) - F(\infty)] = F(x) - F(y) = \int_y^x f(x') dx'\]

recovering what we expect to be the true value of \(\p_x^{-1} f(x)\). This only works because we allow ourselves to cancel out the two \(F(\infty)\) terms since they ought to be exactly the same.

Flipping this around we might imagine interpreting the Taylor series \(1/(1-x) = 1 + x + x^2 + \ldots\) as performing a sort of (finite) integral. Suppose we think of as the monomials \(x^k\) as representing the points of \(\bb{Z}\). Then multiplication by \(x\) looks like translation in the \(k\) coordinate

\[x \times x^k = x^{k+1} = e^{(1) \p_k} x^k = T_k^1 x^k\]

Then the operation \((1-x)\) is a finite difference \(\Delta_k x^k = (1 - T^{1}_k) x^k\) and its inverse is (like) an integral

\[\frac{1}{1-x} x^k = \frac{1}{\Delta_k} x^k \? \begin{cases} \sum_{-\infty}^x x^k \\ \sum_{\infty}^x x^k \end{cases}\]

No idea if it’s useful, but, I often get a vague impression that a lot of the analysis on \(\bb{R}\) which works some of the time but in other cases fails to converge or is ill-defined in other cases would be better-behaved if interpreted as geometric operations on points like this. And regardless it’s always fun to think of one thing as a different kind of thing. So there you go.

Before concluding this section, I was also curious what this looks like if used a different Taylor expansion \(\p_x^{-1}\) other than the ones corresponding to \(S_0\) or \(S_\infty\). As before we have

\[S_k(x) = \frac{1}{1-k} \frac{1}{1-\frac{x-k}{1-k}} = \frac{1}{1-k}[1 +(\frac{x-k}{1-k}) + (\frac{x-k}{1-k})^2 + \ldots]\]

For instance around \(x=2\) it’s \(1/(1-x) = -[1 - (x-2) + (x-2)^2 - (x-2)^3 + \ldots]\).

If we plug \(T^{dx}\) into that we get something that does not seem especially usable as an integral: \(\p_x^{-1} \? -dx [1 - (T^{dx}-2) + (T^{dx}-2)^2 - \ldots]\). Maybe it corresponds to integrating \(f(x)\) in a diferent variable, or something like that? But I doubt it. More likely this manipulation is invalid. After all the \(1\) in the equation is not really a ‘number’ so much as the identity operator \(T^{0 dx} = I\). So maybe no actual ‘numbers’ should anywhere; perhaps we have to consistently treat these as manipulations of operators, not numbers.

In that case we should really be expanding around a different translation operator, like

\[\begin{aligned} \frac{-dx}{1 - T^{dx}} &\? \frac{-dx}{1 - T^{2 d x}}[1 + (\frac{T^{dx} - T^{2 dx}}{1 - T^{2 dx}}) + (\frac{T^{dx} - T^{2 dx}}{1 - T^{2 dx}})^2 + \ldots] \\ \end{aligned}\]

That seems more correct, but I’m not sure how to interpret it. It seems to be an infinite series of nested derivatives and integrals? First, we have

\[\frac{T^{2 dx} - 1}{dx} f(x) = 2 \frac{f(x + 2 dx) - f(x)}{2 dx} = 2 \p_{2 x}\]

which ought to equal \(\p_x\), but maybe it will be relevant that there it will translate twice as far: should it be that \(\p_{2x}^{-1} f(x) = \int_{x+2a}^x f(x) \d x\)? I’m not sure. But anyway, proceeding naively and treating everything like a variable, the terms in the sum are

\[\frac{T^{dx} - T^{2 dx}}{1 - T^{2 dx}} = T^{dx} \frac{(T^{dx} - 1)}{dx} \frac{1}{2} \frac{2 dx}{T^{2 dx} - 1} \? \frac{1}{2} T^{dx} \p_x \p_{2x}^{-1}\]

Making the overall series something like

\[\p_x^{-1} \? \frac{1}{2} \p_{2x}^{-1} [1 + (\frac{1}{2} T^{dx} \p_x \p_{2x}^{-1}) + (\frac{1}{2} T^{dx} \p_x \p_{2x}^{-1})^2 + \ldots]\]

if I did not make any algebra mistakes, and if my calculation is legitimate. I have no idea what to make of that, but I guess it is exactly what it seems like, which is a Taylor expansion of an integral operator.

Presumably there is a way to interpret it under a Fourier transform, whereupon it becomes the literal Taylor series of \(\p_x^{-1} = -dx/(1- T^{dx}) \mapsto -dx/ik\). But Fourier transforms involving \(1/x\) and (equivalently) integration are fraught with principal-value-type dilemmas that seem too messy to make clear sense of, so I’ll skip that. But anyway it (sorta) makes sense that infinite series of regular variables in Fourier space correspond to infinite series of integral and derivative operators in position space and the constructions more-or-less pass through the transform. Everything is the same.

I’ll mention in passing that all of the above also gives one of the versions (among many) of the fractional calculi when you allow yourself to do fractional combinatorics on the \(\p_x^k\) operator. For instance Binomial-expanding/Taylor-expanding \(\p_x^{1/2}\) gives a working version of a half-derivative that has \([(\p_x)^{1/2}]^{\circ 2} = \p_x\) (the Grünwald derivative, specifically). Good luck figuring out what it means, though.

4. Conclusion

What have we learned?

There are several ways to produce the series \(1/(1-x) = 1 + x + x^2 + \ldots\) or \(1/(1-x) = -x^{-1} - x^{-2} - \ldots\).
However, those series are not ‘exact’ except in their radii of convergence. On the other hand, the identity \(1/(1-x) = 1 + x + x^2 + \ldots + x^{N+1}/(1-x)\) is exact, and equivalent to any other exact series. This approach is a good way of ignoring any trickery involving divergent series.
The inverse of a (finitesimal) derivative \(\p_x = (T^{dx}-1)/dx\) can be sort of interpreted using the same series. It gives an integral \(\int_{x+a}^x f(x) \d x\), but the integration bound is an artifact of using the Taylor approximation (and taking limits in a certain way). With the exact series, the term corresponding to \(x^{N+1}/(1-x)\) serves to endure that the integral goes off to infinity (and diverges, outside of certain cases).
However, the difference of two \(\p_x^{-1} f(x) - \p_x^{-1} f(y)\) will cancel out those infinite parts, which is why it’s possible to end up with \(\p_x (\p_x^{-1} f(x)) = f(x)\) again. \(\p_x^{-1} f(x) = F(x) - F(\infty)\), perhaps, but we can freely add a constant, such as \(F(\infty) - F(y)\), to get \(\p_x^{-1} f(x) = F(x) - F(y)\) as well.

I can’t help but think that there is a sort of underlying logic to all of this which connects summations, integrals, derivatives, combinatorics, and everything else into a simple framework with no surprises—and is not the logic that you acquire in, say, complex analysis, which does not really make sense, as far as I know, of all these similarity. The closest thing I’m aware of is in Fourier/Laplace/etc analysis, since the transforms turn derivatives into variables and vice-versa, showing some of the similarities here. But I don’t think they go as far as I want. If what I’m describing does turn out to exist in some obscure parts of higher math, apologies, I was not aware of it, and I would like to know more!

Anyway, towards the goal of figuring out how it all works, I feel like my approach of making everything ‘finitesimal’ and trying to write down exact expressions wherever possible is going to be important. For everything to play well together, we can’t take the usual shortcuts of analysis, because hiding in all those constructions are simplifications and approximations that don’t always hold in other cases. Keeping everything literal and exact for as long as possible avoids this.

In particular a lot of things that are mysterious in calculus (divergences of series for instance) seem more natural when implemented on translation operators or derivative operators, which do not have the property that their norm blows up on higher powers. E.g. \(x^k\) blows up for \(\| x \| \geq 1\), whereas \(T^{k \d x}\) just corresponds to translation in another space entirely. So I keep getting the sense that the most correct way to think about everything is to keep it all in the operator notation for as long as possible. Once you project onto numbers a lot of the structure disappears and is hard to recover. For example, it is really a lot more interesting to me to view an infinite sum as a sort of integral (which is always defined, in a geometric sense) compared to a literal sum of numbers (which may sporadically diverge depending on the situation).

Okay, that’s all I’ve got for now. Hope it was interesting. And please join me in telling people to stop telling other people that \(\sum 2^k = -1\). Thanks.

I like to use this word ‘finitesimal’, as opposed to ‘finite’, to refer to objects which are normally infinitesimal (so going to zero in a limit), but which are instead being treated as literally finite. Like I’ve done here by using \(dx\) as a variable without a limit sign in front of it. I’m not sure if anyone else uses the word this way, or where I got it, but I kinda like it—the suffix ‘-esimal’ refers to the division into parts, e.g. centisaml means (in Latin) division into a hundred parts and infinitesimal means division into infinite parts. So finitesimal means… division into finite parts. Pretty accurate for the concept, right? ↩
Also the reason why one should use \(R^{\theta} = (e^{R})^{\theta}\) for rotation operators, instead of \(R_{\theta}\). ↩
This sense of generators comes up a lot in quantum mechanics, which calls operators such as the energy operator \(\hat{E} = i \hbar \p_t\) generators of translation. These are almost the same idea except for the extraneous \(i\) and \(\hbar\) factors; when plugged into \(e^{-i/\hbar \hat{E} t} = e^{t \p_t}\) they become the same thing. The idea is that a quantum systems’ evolution in time can be implemented by \(e^{-i/\hbar \hat{E} \Delta t} \ket{x, t} = \ket{x, t+ \Delta t}\). The Schrödinger equation then says that time and space evolution are coupled by \(E^2 = m^2c^4 + p^2 c^2\). ↩
Miscellaneous breadcrumb: in Donald’s Knuth’s Two Notes on Notation he discusses the fact that the two kinds of Stirling numbers (…‘of the first kind’ and ‘of the second kind’) are really the same kind, if you allow their domains to extend into negatives despite that not making a lot of sense for their combinatoric interpretation. In the same way it is clear that factorials and binomial coefficients also have some validity in the negatives. I expect that the Stirling numbers can also be happily interpolated into fractions or imaginary numbers as well, just like the factorials can. It seems likely to be case that all of these continuations are instances of the same general construction, a single way of thinking about combinatorics on negative and fractional sets that is shows up all over math. ↩
The \(a-dx\) is in order to have the last partition be \((a-dx, a)\), to match the partitions in a Riemann integral over the same range. Maybe that’s not that important though. Also, I’m not sure if we should track what happens to the partitions when we negate the orientation of the integral. I’m gonna pretend like it doesn’t matter since it should be immaterial after the limit. ↩
Thought experiment: consider how an integration constant could be equivalent to an infinite series \(c = \ldots + T^{-2 \d x} + T^{-dx} + 1 + T^{dx} + T^{2 \d x} + \ldots = \sum_{\bb{N}} T^{k \, dx}\), since both are (plausibly) in \(\ker \p_x\). ↩

« Locating the Lemniscate

▲ Home