All About Taylor Series

December 28, 2018

Here is a survey of understandings on each of the main types of Taylor series:

  1. single-variable
  2. multivariable \(\bb{R}^n \ra \bb{R}\)
  3. multivariable \(\bb{R}^n \ra \bb{R}^m\)
  4. complex \(\bb{C} \ra \bb{C}\)

I thought it would be useful to have everything I know about these written down in one place.

Particularly, I don’t want to have to remember the difference between all the different flavors of Taylor series, so I find it helpful to just cast them all into the same form, which is possible because they’re all the same thing (seriously why aren’t they taught this way?).

These notes are for crystallizing everything when you already have a partial understanding of what’s going on. I’m going to ignore discussions of convergence so that more ground can be covered and because I don’t really care about it for the purposes of intuition.


1. Single Variable

A Taylor series for a function in \(\bb{R}\) looks like this:

\[\begin{aligned} f(x + \e) &= f(x) + f'(x) \e + f''(x) \frac{\e^2}{2} + \ldots \\ &= \sum_n f^{(n)} \frac{\e^n}{n!} \end{aligned}\]

It’s useful to write this as one big operator acting on \(f(x)\):

\[\boxed{f(x + \e) = \big[ \sum_{n=0}^\infty \frac{\p^n_x \e^n}{n!} \big] f(x)} \tag{Single-Variable}\]

Or even as a single exponentiation of the derivative operator, which is commonly done in physics, but you probably shouldn’t think too hard about what it means:

\[f(x + \e) = e^{\e \p_x} f(x)\]

I also think it’s useful to interpret the Taylor series equation as resulting from repeated integration:

\[\begin{aligned} f(x) &= f(0) + \int_0^x dx_1 f'(x_1) \\ &= f(0) + \int_0^x dx_1 [ f'(0) + \int_0^{x_1} dx_2 f''(x_2) ]] + \ldots\\ &= f(0) + \int dx_1 f'(0) + \iint dx_1 dx_2 f''(0) + \iiint dx_1 dx_2 dx_3 f'''(0) + \ldots \\ &= f(0) + x f'(0) + \frac{x^2}{2} f''(0) + \frac{x^3}{3!} f'''(0) + \ldots \end{aligned}\]

This basically makes sense as soon as you understand integration, plus it makes obvious that the series only works when all of the integrals are actually equal to the values of the previous function (so you can’t take a series of \(\frac{1}{1-x}\) which passes \(x=1\), because you can’t exactly integrate past it (though there are tricks))

… plus it makes sense in pretty much any space you can integrate over.

plus it makes it obvious how to truncate the series, how to create the remainder term, and it even shows you how you could – if you were so inclined – have each derivative be evaluated at a different point, such as \(f(x) = f(0) + \int_1^x f'(x_1) dx_1 =f(0) + (x-1) f'(1) + \frac{(x-1)(x-2)}{2} f''(2) + \ldots\), which I’ve never even seen done before (except for here?), though good luck with figuring out convergence if you do that.


L’Hôpital’s rule about evaluating limits which give indeterminate forms follows naturally if the functions are both expressible as Taylor series. If \(f(x) = g(x) = 0\), then:

\[\begin{aligned} \lim_{\e \ra 0} \frac{f(x + \e)}{g(x + \e)} &= \lim_{\e \ra 0} \frac{ f(x) + \e f'(x + \e) + O(\e^2)} {g(x) + \e g'(x + \e) + O(\e^2)} \\ &= \lim_{\e \ra 0}\frac{f'(x+\e) + O(\e) }{g'(x+\e) + O(\e)} \\ &= \lim_{\e \ra 0} \frac{f'(x+\e)}{g'(x + \e)} \end{aligned}\]

Which equals \(\frac{f'(x)}{g'(x)}\) if the limit exists, and otherwise might be solvable by applying the rule recursively. None of this works of course if limit doesn’t exist. If \(f(x) = g(x) = \infty\), evaluate \(\lim \frac{1/g(x)}{1/f(x)}\) instead. If the indeterminate form is \(\infty - \infty\), evaluate \(\lim f(x) - g(x)\) instead.


2. Multivariable -> Scalar

The multivariable Taylor series looks messier at first, so let’s start with only two variables, writing \(f_x \equiv \p_x f(\b{x})\) and \(\b{v} = (v_x, v_y)\), and we’ll work it into a more usable form.

\[\begin{aligned} f(\b x + \b v) &= f(\b x) + [f_x v_x + f_y v_y] + \frac{1}{2!} [f_{xx} v_x^2 + 2 f_{xy} v_x v_y + f_{yy} v_y^2] \\ &+ \frac{1}{3!} [f_{xxx} v_x^3 + 3 f_{xxy} v_x^2 v_y + 3 f_{xyy} v_x v_y^2 + f_{yyy} v_y^3] + \ldots \end{aligned}\]

(The asymmetry of the terms like \(2 f_{xy} v_x v_y\) and \(3 f_{xxy} v_x^2 v_y\) is because these are really sums of multiple terms; because of the commutativity of partial derivatives on analytic functions, \(f_{xy} = f_{yx}\), we can write \(f_{xy} v_x v_y + f_{yx} v_y v_x = 2 f_{xy} v_x v_y\).)

The first few terms are often arranged like this:

\[f(\b x + \b v) = f(\b x) + \b{v} \cdot \nabla f(\b{x}) + \b{v}^T \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{pmatrix} \b{v} + O(v_3)\]

\(\nabla f(\b{x})\) is the gradient of \(f\) (the vector of partial derivatives like \((f_x, f_y)\). The matrix \(H = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{pmatrix}\) is the “Hessian matrix” for \(f\), and represents its second derivative.

… But we can do better. In fact, every order of derivative of \(f\) in the total series has the same form, as powers of \(\b{v} \cdot \vec{\nabla}\), which I prefer to write as \(\b{v} \cdot \vec{\p}\), because it represents a ‘vector of partial derivatives’ \(\vec{\p} = (\p_x, \p_y)\):

\[\begin{aligned} f(\b x + \b v) &= f(\b x) + (v_x \p_x + v_y \p_y) f(\b x) + \frac{(v_x \p_x + v_y \p_y)^2}{2!} f(\b x) + \ldots \\ &= \big[ \sum_n \frac{(v_x \p_x + v_y \p_y)^n}{n!} \big] f(\b x) \\ &= \boxed{ \big[ \sum_{n=0}^\infty \frac{(\b{v} \cdot \vec{\p})^n}{n!} \big] f(\b x) } \end{aligned} \tag{Scalar Field}\]

So that looks pretty good. And it can still be written as \(e^{ \b{v} \cdot \vec{\p}} f(\b{x})\). The same formula – now that we’ve hidden all the actual indexes – happily continues to work for dimension \(> 2\), as well.

… Although really the multivariate Taylor series of \(f(\b{x})\) is really just a bunch of single-variable series multiplied together:

\[\begin{aligned} f(x+ v_x, y + v_y) &= e^{v_x \p_x} f(x, y + v_y) \\ &= e^{v_x \p_x}e^{v_y \p_y} f(x,y) \\ &= e^{v_x \p_x + v_y \p_y} f(x,y) \\ &= e^{\b{v} \cdot \vec{\p}} f(\b{x}) \end{aligned}\]

I mention all this because it’s useful to have a solid idea of what a scalar function is before we move to vector functions.

Note that, when exponentiating operators, \(e^{v_x \p_x}e^{v_y \p_y} f(x,y) = e^{v_x \p_x + v_y \p_y} f(x,y)\) is not always allowed. There are complicated rules for how to combine exponentiated operators—but fortunately, when the exponents commute (ie \(\p_x \p_y = \p_y \p_x\), which we’re just assuming is true here), you can add them in the normal way.


L’Hôpital’s rule is more subtle for multivariable functions. In general the limit of a function may be different depending on what direction you approach from, so an expression like \(\lim_{\b{x} \ra 0} \frac{f(\b{x})}{g(\b{x})}\) is not necessarily defined, even if both \(f\) and \(g\) have Taylor expansions. On the other hand, if we choose a path for \(\b{x} \ra 0\), such as \(\b{x}(t) = (x(t), y(t))\) then this just becomes a one-dimensional limit, and the regular rule applies again. So, for instance, while \(\lim_{\b x \ra 0} \frac{f(\b{x})}{g(\b x)}\) may not be defined, \(\lim_{t \ra 0} \frac{f(t \b{v})}{g(t \b{v})}\) is for any fixed vector \(\b{v}\).

The path we take to approach \(0\) doesn’t even matter, actually; what matters is the gradients when we’re infinitesimally close to \(0\). For example, suppose we \(f(0,0) = g(0,0) = 0\) and we’re taking the limit on the path given by \(y = x^2\):

\[\lim_{\e \ra 0} \frac{f(\e,\e^2)}{g(\e,\e^2)} = \lim_{ \e \ra 0 } \frac{ f_x(0,0) \e + O(\e^2) }{ g_x(0,0) \e + O(\e^2)} = \lim_{\e \ra 0} \frac{f(\e,0)}{g(\e,0)}\]

The \(f_y\) and \(g_y\) terms are of order \(\e^2\) and so drop out, leaving a limit taken only on the \(x\)-axis, corresponding to the fact that the tangent to \((x,x^2)\) at 0 is \((1,0)\).

In fact, this problem basically exists in 1D also, except that limits can only come from two directions: \(x^+\) and \(x^-\), so lots of functions get away without a problem. L’Hôpital’s rule seems to require that the functions be expandable as a Taylor series on the side the limit comes from. Indeed, we might just define a sort of “any-sided limit” which associates with each direction of approach a (potentially) different value. I’m not quite sure I fully understand the complexity of doing that in \(N > 1\) dimensions, but clearly if you can just reduce to a 1-dimensional limit the difficulties should be removed. See, perhaps, this paper for a lot more information.


3. Vector Fields

There are several types of vector-valued functions: one-dimensional curves like \(\gamma: \bb{R} \ra \bb{R}^n\), or arbitrary-dimensional maps like \(\b{f}: \bb{R}^m \ra \bb{R}^n\) (including from a space to itself), or maps between arbitrary differentiable manifolds \(f: M \ra N\). In each case there is something like a Taylor series that can be defined. It’s not commonly written out, but I think it should be, so let’s try.

Let’s imagine our function maps spaces \(X \ra Y\), where \(X\) has \(m\) coordinates and \(Y\) has \(n\) coordinates, and \(m\) might be 1 in the case of a curve. Then along any particular coordinate in \(Y\) out of the \(n\)—call it \(y_i\)—the Taylor series expression from above holds, because \(f_i = \b{f} \cdot y_i\) is just a scalar function.

\[f(\b{x} + \b{v})_i = e^{\b{v} \cdot \vec{\p}} [f(\b{x})_i]\]

But of course this holds in every \(i\) at once, so it holds for the whole function:

\[\b{f}(\b{x} + \b{v}) = e^{\b{v} \cdot \vec{\p}} \b{f}(\b{x})\]

The subtlety here is that the partial derivatives \(\p\) are now being taken termwise—once for each component of \(\b{f}\). For example, consider the first few terms when \(X\) and \(Y\) are 2D:

\[\begin{aligned} \b{f}(\b{x} + \b{v}) &= \b{f}(\b{x}) + (v_{x_1} \p_{x_1} + v_{x_2} \p_{x_2}) \b{f} + \frac{(v_{x_1} \p_{x_1} + v_{x_2} \p_{x_2})^2}{2!} \b{f} + \ldots\\ &= \b{f} + \begin{pmatrix} \p_{x_1} \b{f}_{y_1} & \p_{x_2} \b{f}_{y_1} \\ \p_{x_1} \b{f}_{y_2} & \p_{x_2} \b{f}_{y_2} \end{pmatrix} \begin{pmatrix} v_{x_1} \\ v_{x_2} \end{pmatrix} + \ldots \\ &= \b{f} +(\p_{x_1}, \p_{x_2}) \o \begin{pmatrix}\b{f}_{y_1} \\ \b{f}_{y_2} \end{pmatrix} \cdot \begin{pmatrix} v_{x_1} \\ v_{x_2} \end{pmatrix} + \ldots \end{aligned}\]

That matrix term, the \(n=1\) term in the series, is the Jacobian Matrix of \(f\), sometimes written \(J_f\), and is much more succinctly written as \(\vec{\p}_{x_i} \b{f}_{y_j}\), or just \(\vec{\p}_i \b{f}_j\) or even just \(\p_i \b{f}_j\).

\[J_f = \p_i f_j\]

The Jacobian matrix is the ‘first derivative’ of a vector field, and it includes every term which can possibly matter to compute how the function changes to first-order. In the same way that a single-variable function is locally linear (\(f(x + \e) \approx f(x) + \e f'(x)\)), a multi-variable function is locally a linear transformation: \(\b{f}(\b{x + v}) \approx \b{f}(\b{x}) + J_f \b{v}\).


Higher-order terms in the vector field Taylor series generalize ‘second’ and ‘third’ derivatives, etc, but they are generally tensors rather than matrices. They look like \((\p \o \p) \b{f}\), \((\p \o \p \o \p) \b{f}\), or \(\p^{\o n} \b{f}\) in general, and they act on \(n\) copies of \(\b{v}\), ie, \(\b{v}^{\o n}\).

The full expansion (for \(X,Y\) of any number of coordinates) is written like this:

\[\begin{aligned} \b{f}(\b{x} + \b{v}) &= \b{f} + \p_i \b{f} \cdot v_i + \frac{1}{2!}(\p_i \p_j \b{f}) \cdot v_i v_j + \frac{1}{3!} (\p_i \p_j \p_k \b{f}) \cdot v_i v_j v_k + \ldots \\ &= \b{f} + \p_i \b{f} \cdot v_i + \frac{1}{2!}(\p_i \p_j) \b{f} \cdot (v_i v_j) + \ldots \\ &= \b{f} +(\b{v} \cdot \vec{\p}) \b{f} + \frac{(\b{v} \cdot \vec{\p})^2}{2!} \b{f} + \ldots \\ \b{f}(\b{x} + \b{v}) &= \boxed{ \big[ \sum_{n=0}^\infty \frac{(\b{v} \cdot \vec{\p})^n}{n!} \big] \b{f}(\b{x}) } \tag{Vector Field} \end{aligned}\]

We write the numerator in the summation as \((\b{v} \cdot \vec{\p})^{n}\), which expands to \((\sum_i v_i \p_i) (\sum_j v_j \p_j) \ldots\), and then we can still group things into exponentials, only now we have to understand that all of these terms have derivative operators on them that need to be applied to \(\b{f}\) to be meaningful:

\[\b{f}(\b{x + v}) = e^{\b{v} \cdot \vec{\p}} \b{f}(\b{x})\]

We could have included indexes on \(\b{f}\) also:

\[\begin{aligned} f_k(\b{x} + \b{v}) &= \b{f}_k + \p_i \b{f}_k \cdot \b{v}_i + \frac{1}{2!}(\p_i \p_j) \b{f}_k \cdot (\b{v}_i \b{v}_j) + \ldots \\ &= \big[ \sum_{n} \frac{(\b{v} \cdot \vec{\p})^n}{n!} \big] f_k(\b{x}) \end{aligned}\]

It seems evident that this should work any other sort of differentiable object also. What about matrices?

\[M_{ij}(\b{x} + \b{v})= \big[ \sum_{n} \frac{(\b{v} \cdot \vec{\p})^n}{n!} \big] M_{ij}(\b{x})\]

I don’t want to talk about curl and divergence here, because it brings in a lot more concepts and I don’t know the best understanding of it, but it’s worth noting that both are formed from components of \(J_f\), appropriately arranged.


4. Complex Analytic

The complex plane \(\bb{C}\) is a sort of change-of-basis of \(\bb{R}^2\), via \((z,\bar{z}) = (x + iy, x - iy)\):

\[z \lra x\b{x} + y\b{y}\] \[\bar{z} \lra x\b{x} - y\b{y}\]

Therefore we can write it as a Taylor series in these two variables:

\[f(z + \D z, \bar{z} + \D \bar{z}) = \big[ \sum_{n=0}^\infty \frac{(\D z \p + \D \bar{z} \p_{\bar{z}})^n}{n!} \big] f(z, \bar{z})\]

One subtlety: it should always be true that \(\p_{x_i} \b{x}^j = 1_{i = j}\) when changing variables. Because \(z\) and \(\bar{z}\), when considered as vectors in \(\bb{R}^2\), are not unit vectors, there is a normalization factor required on the partial derivatives. Also, for \(\bb{C}\) the factors of \(i\) cause the signs to swap:

\[\begin{aligned} \p_z &\underset{\bb{C}}{=} \frac{1}{2}(\p_x - i \p_y) \underset{\bb{R}^2}{=} \frac{1}{2}(\p_{\b{x}} + \p_{\b{y}}) \\ \p_{\bar{z}} &\underset{\bb{C}}{=} \frac{1}{2}(\p_x + i \p_y) \underset{\bb{R}^2}{=} \frac{1}{2}(\p_{\b{x}} - \p_{\b{y}}) \end{aligned}\]

In complex analysis, for some reason, \(\bar{z}\) is not treated as a true variable, and we only consider a function as ‘complex differentiable’ when it has derivatives with respect to \(z\) alone. Notably, we would say that the derivative \(\p_z \bar{z}\) does not exist—the value of \(\lim_{(x,y) \ra (0,0)} \frac{x + iy}{x - i y}\) is different depending on the path you take towards the origin. These statements turn out to be almost equivalent:

  • \(f(z)\) is a function of only \(z\) in a region
  • \(\p_{\bar{z}} f(z) = 0\) in a region
  • \(f(z)\) is complex-analytic in a region
  • \(f(z)\) has a Taylor series as a function of \(z\) in a region

So when we discuss Taylor series of functions \(\bb{C} \ra \bb{C}\), we usually mean this:

\[\boxed{f(z + \D z) = \big[ \sum_{n=0}^\infty \frac{(\D z \p_z)^n}{n!} \big] f(z)} \tag{Complex-Analytic}\]

If we write \(f(z(x,y)) = u(x,y) + i v(x,y)\), the requirement that \(\p_{\bar{z}} f(z) = \frac{1}{2}(\p_x + i \p_y) f(z) = 0\) becomes the Cauchy-Riemann Equations by matching real and complex parts:

\[\begin{aligned} u_x &= v_y \\ u_y &= - v_x \end{aligned}\]

But seriously, \(\p_{\bar{z}} f(z) = 0\) is a much better way of expressing this.


There is one important case where a function \(f(z, \bar{z})\) is a function of only \(z\), yet it is not analytic and \(\p_{\bar{z}} f(z) \neq 0\), and it is solely responsible for almost all of the interesting parts of complex analysis. It’s the fact that:

\[\p_{\bar{z}} \frac{1}{z} = 2 \pi i \delta(z, \bar{z})\]

Where \(\delta(z, \bar{z})\) is the two-dimensional Dirac Delta function. I find this to be quite surprising. Here’s an aside on why it’s true:

Importantly, \(\p_{\bar{z}} z^n \neq 0\) is only true for \(n = -1\). This property gives rise to the entire method of residues, because if \(f(z) = \frac{f_{-1}(0) }{z} + f^*(z)\), where \(f^*(z)\) has no terms of order \(\frac{1}{z}\), then integrating a contour \(C\) around a region \(D\) which contains \(0\) gives, via Stokes’ theorem:

\[\begin{aligned} \oint_C f(z) dz &= \iint_D \p_{\bar{z}} \big[ \frac{f_{-1}(0) }{z} + f^*(z) \big] \; d\bar{z} \^ dz \\ &= 2 \pi i \iint_D \delta(z, \bar{z}) f_{-1}(0) \; d\bar{z} \^ dz \\ &= 2 \pi i f_{-1}(0) \end{aligned}\]

(If the \(\bar{z}\) derivative isn’t \(0\), you get the Cauchy-Pompeiu formula for contour integrals immediately.)

By the way: Fourier series are closely related to contour integrals, and thus to complex Taylor series. You can change variables to write \(\frac{1}{2 \pi i} \oint_C \frac{F(z)}{z^{k+1}} dz\) as \(\frac{1}{2 \pi} \oint_C F(re^{i \theta})e^{-ik\theta} d\theta\), which is clearly a Fourier transform for suitable \(F\).