# All About Taylor Series

[December 28, 2018]

Here is a survey of understandings on each of the main types of Taylor series:

1. single-variable
2. multivariable $\bb{R}^n \ra \bb{R}$
3. multivariable $\bb{R}^n \ra \bb{R}^m$
4. complex $\bb{C} \ra \bb{C}$

I thought it would be useful to have everything I know about these written down in one place.

These notes are not pedagogical; they’re for crystallizing everything when you already have a partial understanding of what’s going on. Particularly, I don’t want to have to remember the difference between all the different flavors of Taylor series, so I find it helpful to just cast them all into the same form, which is possible because they’re all the same thing (seriously why aren’t they taught this way?).

In these notes I am going to ignore discussions of convergence so that more ground can be covered. Generally it’s important to address convergence in order to, well, not be wrong. And I’m certain that I’ve made statements which are wrong below. But I am just trying to make sure I understand what happens when everything works, because in my interests (physics) it usually does.

## 1. Single Variable

A Taylor series for a function in $\bb{R}$ looks like this:

It’s useful to write this as one big operator acting on $f(x)$:

Or even as a single exponentiation of the derivative operator, which is commonly done in physics, but you probably shouldn’t think too hard about what it means:

I also think it’s useful to interpret the Taylor series equation as resulting from repeated integration:

This basically makes sense as soon as you understand integration, plus it makes obvious that the series only works when all of the integrals are actually equal to the values of the previous function (so you can’t take a series of $\frac{1}{1-x}$ which passes $x=1$, because you can’t exactly integrate past it (though there are tricks))

… plus it makes sense in pretty much any space you can integrate over.

plus it makes it obvious how to truncate the series, how to create the remainder term, and it even shows you how you could – if you were so inclined – have each derivative be evaluated at a different point, such as $f(x) = f(0) + \int_1^x f'(x_1) dx_1 =f(0) + (x-1) f'(1) + \frac{(x-1)(x-2)}{2} f''(2) + \ldots$, which I’ve never even seen done before (except for here?), though good luck with figuring out convergence if you do that.

L’Hôpital’s rule about evaluating limits which give indeterminate forms follows naturally if the functions are both expressible as Taylor series. If $f(x) = g(x) = 0$, then:

Which equals $\frac{f'(x)}{g'(x)}$ if the limit exists, and otherwise might be solvable by applying the rule recursively. None of this works of course if limit doesn’t exist. If $f(x) = g(x) = \infty$, evaluate $\lim \frac{1/g(x)}{1/f(x)}$ instead. If the indeterminate form is $\infty - \infty$, evaluate $\lim f(x) - g(x)$ instead.

## 2. Multivariable -> Scalar

The multivariable Taylor series looks messier at first, so let’s start with only two variables, writing $f_x \equiv \p_x f(\b{x})$ and $\b{v} = (v_x, v_y)$, and we’ll work it into a more usable form.

(The strangeness of the terms like $2 f_{xy} v_x v_y$ and $3 f_{xxy} v_x^2 v_y$ is because these are really sums of multiple terms; because of the commutativity of partial derivatives on analytic functions, $f_{xy} = f_{yx}$, we can write $f_{xy} v_x v_y + f_{yx} v_y v_x = 2 f_{xy} v_x v_y$.)

The first few terms are often arranged like this:

$\nabla f(\b{x})$ is the gradient of $f$ (the vector of partial derivatives like $(f_x, f_y)$. The matrix $% $ is the “Hessian matrix” for $f$, and represents its second derivative.

… But we can do better. In fact, every order of derivative of $f$ in the total series has the same form, as powers of $\b{v} \cdot \vec{\nabla}$, which I prefer to write as $\b{v} \cdot \vec{\p}$, because it represents a ‘vector of partial derivatives’ $\vec{\p} = (\p_x, \p_y)$:

(This can also be written as a sum over every individual term using multi-index notation.)

So that looks pretty good. And it can still be written as $e^{ \b{v} \cdot \vec{\p}} f(\b{x})$. The same formula – now that we’ve hidden all the actual indexes – happily continues to work for dimension $> 2$, as well.

… Actually, this is not as surprising a formula as it might look. The multivariate Taylor series of $f(\b{x})$ is really just a bunch of single-variable series multiplied together:

I mention all this because it’s useful to have a solid idea of what a scalar function is before we move to vector functions.

L’Hôpital’s rule is more subtle for multivariable functions. In general the limit of a function may be different depending on what direction you approach from, so an expression like $\lim_{\b{x} \ra 0} \frac{f(\b{x})}{g(\b{x})}$ is not necessarily defined, even if both $f$ and $g$ have Taylor expansions. On the other hand, if we choose a path for $\b{x} \ra 0$, such as $\b{x}(t) = (x(t), y(t))$ then this just becomes a one-dimensional limit, and the regular rule applies again. So, for instance, while $\lim_{\b x \ra 0} \frac{f(\b{x})}{g(\b x)}$ may not be defined, $\lim_{t \ra 0} \frac{f(t \b{v})}{g(t \b{v})}$ is.

And the path we take to approach $0$ doesn’t even matter – only the gradients when we’re infinitesimally close to $0$. For example, suppose we $f(0,0) = g(0,0) = 0$ and we’re taking the limit on the path given by $y = x^2$:

The $f_y$ and $g_y$ terms are of order $\e^2$ and so drop out, leaving a limit taken only on the $x$-axis – corresponding to the fact that the tangent to $(x,x^2)$ at 0 is $(1,0)$.

In fact, this problem basically exists in 1D also, except that limits can only come from two directions: $x^+$ and $x^-$, so lots of functions get away without a problem (but you can also abuse this). L’Hôpital’s rule only needs that the functions be expandable as a Taylor series on the side the limit comes from.

I think that the concept of a limit that doesn’t specify a direction of approach is more common than it should be, because it’s really quite problematic in practice. I’m not quite sure I fully understand the complexity of solving it in $N > 1$ dimension – but clearly if you just reduce to a 1-dimensional limit, you sweep the difficulties under the rug anyway. But see, perhaps, this pre-print for a lot more information.

## 3. Vector Fields

There are several types of vector-valued functions – curves like $\gamma: \bb{R} \ra \bb{R}^n$, or maps between manifolds like $\b{f}: \bb{R}^m \ra \bb{R}^n$ (including from a space to itself). In each case there is something like a Taylor series that can be defined. It’s not commonly written out, but I think it should be, so let’s try.

Let’s imagine our function maps spaces $X \ra Y$, where $X$ has $m$ coordinates and $Y$ has $n$ coordinates, and $m$ might be 1 in the case of a curve. Then along any particular coordinate in $Y$ out of the $n$ – call it $y_i$ – the Taylor series expression from above holds, because $f_i = \b{f} \cdot y_i$ is just a scalar function.

But of course this holds in every $i$ at once, so it holds for the whole function:

The subtlety here is that the partial derivatives $\p$ are now being taken termwise – once for each component of $\b{f}$. For example, consider the first few terms when $X$ and $Y$ are 2D:

That matrix term, the $n=1$ term in the series, is the Jacobian Matrix of $f$, sometimes written $J_f$, and is much more succinctly written as $\vec{\p}_{x_i} \b{f}_{y_j}$, or just $\vec{\p}_i \b{f}_j$ or even just $\p_i \b{f}_j$.

The Jacobian matrix is the ‘first derivative’ of a vector field, and it includes every term which can possibly matter to compute how the function changes to first-order. In the same way that a single-variable function is locally linear ($f(x + \e) \approx f(x) + \e f'(x)$), a multi-variable function is locally a linear transformation: $\b{f}(\b{x + v}) \approx \b{f}(\b{x}) + J_f \b{v}$.

Higher-order terms in the vector field Taylor series generalize ‘second’ and ‘third’ derivatives, etc, but they are generally tensors rather than matrices. They look like $(\p \o \p) \b{f}$, $(\p \o \p \o \p) \b{f}$, or $\p^{\o n} \b{f}$ in general, and they act on $n$ copies of $\b{v}$, ie, $\b{v}^{\o n}$.

The full expansion (for $X,Y$ of any number of coordinates) is written like this:

We write the numerator in the summation as $(\b{v} \cdot \vec{\p})^{n}$, which expands to $(\sum_i v_i \p_i) (\sum_j v_j \p_j) \ldots$, and then we can still group things into exponentials, only now we have to understand that all of these terms have derivative operators on them that need to be applied to $\b{f}$ to be meaningful:

We could have included indexes on $\b{f}$ also:

It seems evident that this should work any other sort of differentiable object also. What about matrices?

I don’t want to talk about curl and divergence here, because it brings in a lot more concepts and I don’t know the best understanding of it, but it’s worth noting that both are formed from components of $J_f$, appropriately arranged.

## 4. Complex Analytic

The complex plane $\bb{C}$ is a sort of change-of-basis of $\bb{R}^2$, via $(z,\bar{z}) = (x + iy, x - iy)$:

Therefore we can write it as a Taylor series in these two variables:

One subtlety: it should always be true that $\p_{x_i} \b{x}^j = 1_{i = j}$ when changing variables. Because $z$ and $\bar{z}$, when considered as vectors in $\bb{R}^2$, are not unit vectors, there is a normalization factor required on the partial derivatives. Also, for $\bb{C}$ the factors of $i$ cause the signs to swap:

In complex analysis, for some reason, $\bar{z}$ is not treated as a true variable, and we only consider a function as ‘complex differentiable’ when it has derivatives with respect to $z$ alone. Notably, we would say that the derivative $\p_z \bar{z}$ does not exist – the value of $\lim_{(x,y) \ra (0,0)} \frac{x + iy}{x - i y}$ is different depending on the path you take towards the origin. These statements turn out to be almost equivalent:

• $f(z)$ is a function of only $z$ in a region
• $\p_{\bar{z}} f(z) = 0$ in a region
• $f(z)$ is complex-analytic in a region
• $f(z)$ has a Taylor series as a function of $z$ in a region

So when we discuss Taylor series of functions $\bb{C} \ra \bb{C}$, we usually mean this:

If we write $f(z(x,y)) = u(x,y) + i v(x,y)$, the requirement that $\p_{\bar{z}} f(z) = \frac{1}{2}(\p_x + i \p_y) f(z) = 0$ becomes the Cauchy-Riemann Equations by matching real and complex parts:

There is one important case where a function $f(z, \bar{z})$ is a function of only $z$, yet it is not analytic and $\p_{\bar{z}} f(z) \neq 0$, and it is solely responsible for almost all of the interesting parts of complex analysis. It’s the fact that:

Where $\delta(z, \bar{z})$ is the two-dimensional Dirac Delta function. I find this to be quite surprising. Here’s an aside on why it’s true:

Importantly, $\p_{\bar{z}} z^n \neq 0$ is only true for $n = -1$. This property gives rise to the entire method of residues, because if $f(z) = \frac{f_{-1}(0) }{z} + f^*(z)$, where $f^*(z)$ has no terms of order $\frac{1}{z}$, then integrating a contour $C$ around a region $D$ which contains $0$ gives, via Stokes’ theorem:

(If the $\bar{z}$ derivative isn’t $0$, you get the Cauchy-Pompeiu formula for contour integrals immediately.)

By the way: Fourier series are closely related to contour integrals, and thus to complex Taylor series. You can change variables to write $\frac{1}{2 \pi i} \oint_C \frac{F(z)}{z^{k+1}} dz$ as $\frac{1}{2 \pi} \oint_C F(re^{i \theta})e^{-ik\theta} d\theta$, which is clearly a Fourier transform for suitable $F$.