Everything Is Logarithms

May 25, 2026

Some connections between things, which I have not seen elsewhere. Maybe they mean something?


1. The Baseless Logarithm

Normally one writes a logarithm with a base, \(\log_b (x)\), to mean

\[y = \log_b (x) \Lra b^y = x\]

And then you can change the base of the logarithm with

\[\log_b (x) = \frac{\log_a (x)}{\log_a(b)}\]

Which follows from rearranging \(\log_a (x) = \log_a (b^{\log_b x}) = \log_b (x) \times \log_a (b)\).

One way of thinking about what this formula does is that it is a change of units. Similar to writing \(2 \text{ km} = 2000 \text{ m} / \frac{1000 \text{ m}}{1 \text{ km}}\) or \(5 \text{ bytes} = 40 \text{ bits}/\frac{8 \text{ bits}}{1\text{ byte}}\). It says: how many copies of \(b\) are in \(x\)? It’s the number of copies of \(a\) in \(x\), divided by the number of copies of \(a\) that are in \(b\).

This is perfectly simple, but for some reason it’s hard to think about logarithms that way. The notation kind of… obfuscates things? Specifically it is hard to read \(\log_b x\) as “how many copies of \(b\) are in \(x\)”, because that English expression should correspond to the notation \(x/b\), not \(\log_b x\).

I found a way of thinking about logarithms which I think makes this clearer, but you have to allow a sort of odd object that I am call the baseless logarithm. It is simply a logarithm without a base:

\[\log N\]

which we regard as an abstract object, not a number. Then we write our normal “based” logarithm as a ratio of two of these baseless logarithms:

\[\log_2 N = \frac{\log N}{\log 2}\]

Note, this is already sort of a thing people colloquially do, e.g. leaving out the base of logarithms in asymptotic formulas. But I do not mean it as a shorthand. It is useful to regard it as an actual algebraic object.

We interpret \(\log 2\) as being the unit “bits”. To write \(\log N\) in bits is to factor it as a multiple of \(\log 2\):

\[\log N = \frac{\log N}{\log 2} \log 2 = \log_2 (N) \log 2 = \log_2 (N) \text{ bits}\]

Then the change-of-base for logarithms follows from just writing the same geometric quantity in different units. For example \(\log e\) as a unit is sometimes called “nats”:

\[\begin{aligned} \log N = \frac{\log N}{\log 2} \log 2 = \log_2 (N) \text{ bits} = \frac{\log N}{\log e} \log e = \ln (N) \text{ nats} \end{aligned}\]

The baseless \(\log N\) is sort of the multiplicative version of an object that might be familiar from discussions of vectors. It is common with vectors to distinguish between points and displacements: a displacement vector \(\b{v}\) is given by the difference of two points \(\v = (b) - (a)\). When we write think of points as having coordinates, this involves an explicit choice of origin \(\O\), such that \(\b{a} \equiv (a) - \O\) and \(\b{b} \equiv (b) - \O\). Then a displacement vector is constructed by subtracting off the factors of \(\O\), \(\b{v} = \b{b} - \b{a} = ((b) - \O) - ((a) - \O) = (b) - (a)\). The baseless logarithm implemens the same thing but with multiplication: the value \(\log N\) may be thought of as \(\log N / \log \O\) for an unspecified choice of origin; turning it into an actual numeric value involves dividing two such logarithms to cancel out the origin, \(\log_M N = \log N / \log M = (\log N / \log \O) / (\log M / \log O)\). I think of \(\log N\) as the point corresponding to \(N\) and \(\log N / \log \O\) as its corresponding displacement vector once you pick a coordinate system. I prefer to think of the point as more fundamental.

You might ask: if we have a baseless logarithm \(\log N\), do we also have a “baseless exponential”? Normally \(b^{\log_b N}\) can be written as something like \(b^{\log_b N} = b^{\ln N / \ln b} = e^{\ln N} = N\); is there any way to do this without actually choosing a base? I think the answer has to be “no”. All we can say is that we have split the one object, a logarithm \(\log_b N\) which is the solution of \(b^y = N\), into two objects, \(\log N\) and \(\log b\), each of which on their own are without “units” and so have no numerical meaning. It is just like points in space: a point on its own has no operation of addition and does not have a length. We can subtract points to produce vectors (relative to a symmetry group) but not add them, and the usual operations in coordinates all require a choice of origin.

In fact there are many surprising similarities between logarithms and vectors.


2. Logarithms are Vectors

When doing vector algebra and differential geometry in a properly covariant way, we distinguish between abstract vectors and vectors in a particular coordinate system. My personal convention for this is to refer to the abstract vectors as “geometric” vectors and always write them in bold, \(\v\), whereas “coordinate” vectors, tuples of their values in coordinates, are written with an arrow over them like \(\vec{v} = (v_x, v_y, v_z)\). Boldface geometric vectors are always coordinate-free, whereas coordinate vectors are just collections of numbers or other objects. The geometric vector \(\b{v}\) can be written as a dot product of its coordinates with a ‘frame’ \(X = (\x, \y, \z)\) of basis vectors

\[\b{v} = \vec{v} \cdot X = (v_x, v_y, v_z) \cdot (\x, \y, \z) = v_x \x + v_y \y + v_z \z\]

The projection of \(\v\) onto a basis vector \(\x\) is then given by ‘measuring’ the vector against the basis vector (which does not have to be of unit length). I like to write this as division because it acts a lot like division (although it’s technically pseudodivision instead):

\[\frac{\v}{\x} = v_x\]

That’s in my own very nonstandard notation1 for vector division here. The more common way to write this is to project a component of a differential \(df = f_x dx + f_y dy + f_z dz\) with a partial derivative, which is also the pseudodivision operation (which is incidentally the sense in which partial derivatives kinda work like division but not really):

\[\frac{\p f}{\p x} = f_x\]

I will write things in both forms to make it easy to translate between them; I do prefer my vector-division version because it avoids bringing in the irrelevant notations of differential calculus, but since the latter is actually standard I ought to include it for comparison.

Suppose \(\b{v}\) is one-dimensional, \(\b{v} = v_x \x\). Then the projection onto a ‘measuring stick’ \(\b{m} = m \x\) measures its length in terms of multiples of \(m\):

\[\frac{\v}{\b{m}} = \frac{v_x \x}{m \x} = \frac{v_x}{m}\]

Multiplying by \(\b{m}\) again is what we mean by “writing \(\b{v}\) in units of \(\b{m}\):

\[\frac{\b{v}}{\b{m}} \b{m} = (\frac{v_x}{m}) \text{m} \x\]

(In differentials, this is the differential of \(f\) restricted to its \(dx\) component: \(\frac{\p f}{\p x} dx = f_x dx = df \mid_{x}\), which is a perfectly interesting object (a covariant derivative) that one does not see written in this way very often. By the way, it’s not really important here, but is possible to view all measurements of the length of vectors in this way by thinking first of rewriting an arbitrary vector \(\v = v_x \x + v_y \y + v_z \z\) in a polar form \(\v = v_r \r + v_{\theta} \theta\) and then projecting onto \(\r\), \(\| \v \| = \v/\r\). This tends to be a good way of looking at things.)

The baseless logarithm is performing the same operation on logarithms, where \(\log N\) is filling the role of the geometric vector \(\v\) and \(\log 2 = \text{bits}\) is the unit vector or measuring stick, which takes the role of \(\x\).

\[\begin{aligned} \frac{\log N}{\log 2} &= \log_2 N \\ \frac{\log N}{\log 2} \log 2 &= \log_2 N \text{ bits} \end{aligned}\]

In this sense baseless logarithms write numbers in coordinates in exactly the same way that measuring sticks write vectors in coordinates.

The equivalence of logarithms in different units

\[\begin{aligned} \log N &= \frac{\log N}{\log 2} \log 2 = \log_2 (N) \text{ bits} \\ &= \frac{\log N}{\log e} \log e = \ln (N) \text{ nats} \end{aligned}\]

is the same as the equivalence of geometric vectors in different units

\[\begin{aligned} \v &= \frac{\v}{\x} \x = v_x \x \\[1em] &= \frac{\v}{\x'} \x' = v_{\x'} \x' \\ \end{aligned}\]

or

\[\begin{aligned} df &= \frac{\p f}{\p x} dx = f_x dx \\ &= \frac{\p f}{\p x'} dx' = f_{x'} dx' \end{aligned}\]

And the change of base formula that computes a ratio of logarithms in different bases

\[\begin{aligned} \log_2 N \text{ bits}&= \ln N \text{ nats} \\ \log_2 N &= \frac{\text{nats}}{\text{bits}} \ln N\\ &= \frac{\log e}{\log 2} \ln N \\ &= \log_2 (e) \ln N \end{aligned}\]

is exactly like the change of coordinates for a vector, where \(\x\) and \(\x\) are two units for the same quantity.

\[\begin{aligned} v_x \x &= v_{x'} \x' \\ v_x &= \frac{\x'}{\x} v_{\x'} \\ \end{aligned}\]

or2

\[\begin{aligned} f_x dx &= f_{x'} dx' \\ f_x &= \frac{dx'}{dx} f_{x'} \end{aligned}\]

What logarithms don’t allow you to do that partial derivatives and vector division do allow to actually talk about a partial derivative operation in isolation. For example, if \(N = 2^a 3^b\), you can only talk about the ratio with respect to a single unit \(\log 2\)

\[\frac{\log N}{\log 2} = a \frac{\log 2}{\log 2} + b \frac{\log 3}{\log 2} = a + b \log_2 3\]

which is equivalent to writing a vector as a multiple of a single basis vector (like in Clifford/geometric algebra)

\[\frac{\v}{\x} = v_x + v_y \frac{\y}{\x}\]

or to a total derivative

\[\frac{df}{dx} = f_x + f_y \frac{dy}{dx}\]

But there is no direct equivalent of the operation of partial differentiation—there’s nothing that acts like \(N \? (\log_2 N) \log 2 + (\log_3 N) \log 3\).

However, I keep finding that people have gone and invented the projection / partial derivative operation on logarithms anyway. For example, the p-adic valuation in number theory

\[\nu_p (n) = \max \{ k \in \bb{N} \mid p^k \mid n \}\]

corresponds to extracting the coefficient of \(\log p\) of an natural number in a logarithmic basis

\[\begin{aligned} \log n &= \log 2^{n_2} 3^{n_3} 5^{n_5} \cdots \\ &= n_2 \log 2 + n_3 \log 3 + n_5 \log 5 + \ldots \\ \nu_p (n) &= n_p \end{aligned}\]

Each coefficient is a positive integer, and \(\nu_p\) just takes the component corresponding to \(\log p\). Clearly \(\log n\) acts like a vector (although since the coefficients are in \(\bb{N}\) it is technically a commutative monoid instead of a vector space… nevertheless, it has the familiar structure of a vector). Since \(\nu_p\) is a ‘projection’ out of this logarithm, it still obeys logarithmic identities like \(\nu_p(m/n) = \nu_p(m) - \nu_p(n)\). But there is not really a good notation for actually expressing it as a projection, so sadly it gets a whole separate nomenclature that you have to learn.3

The same thing also works for rational \(n\) or radical \(n\) (meaning it is the product of radicals of prime factors), in which case the coefficients become integers or rationals. (As a bonus the resulting objects live in an actual vector space.)

Another example of these logarithmic projections: in complex analysis the “order of vanishing” \(\text{ord}_a f(z)\) of a meromorphic function \(f(z)\) at a point \(z=a\) is the order of the pole or zero at a point (where zeroes are like negative poles). That is, it is the degree \(n\) of the lowest-degree term in the Laurent series of the function around the point \(z=a\),

\[f(z) = f_{-n} (z-a)^{-n} + f_{-n+1} (z-a)^{-n+1} + \cdots + f_{-1} (z-a)^{-1} + f_0 + f_1 (z-a) + \cdots\]

(that is, the value of \(n\) such that \((z-a)^n f(z)\) is holomorphic around \(a\)). This is extracted with a logarithm:

\[\text{ord}_a f(z) = \lim_{z \ra a} \frac{\log f(z)}{\log (z-a)} = -n\]

since for \(z \approx a\), \(f(z) \sim f_{-n} (z-a)^{-n}\) which dominates the other terms that blow up less quickly. If we write \(g(z)\) for the rest of \(f(z)\) which has \(\text{ord}_a (g(z)) > -n\):

\[\begin{aligned} \lim_{z \ra a} \frac{\log f(z)}{\log (z-a)} &= \lim_{z \ra a} \frac{\log (f_{-n} (z-a)^{-n} + g(z))}{\log (z-a)}\\ &= \lim_{z \ra a} \frac{\log f_{-n} (z-a)^{-n} (1 + \frac{g(z)}{f_{-n}} (z-a)^n)}{\log (z-a)} \\ &= \lim_{z \ra a} \frac{\log f_{-n}}{\log (z-a)} -n \frac{\log (z-a)}{\log (z-a)} + \frac{\log (1 + c (z-a))}{\log (z-a)} \\ &= -n \end{aligned}\]

So this is a very similar operation: the limit \(\lim_{z \ra a} \log (z-b)/\log(z-a) = 1_{a=b}\) serves to cancel out the rest of the terms, like how \(\p_j dx^i \sim (\p x^i)/(\p x^j) = 1_{i=j}\) serves to cancel out the terms in a partial derivative, extracting the \(dx\) component of \(df = f_x dx + f_y dy + \ldots\).

(I’m not very good at complex analysis so that’s all I’m going to say about that. Still, it seems clear that this is basically the same operation.)

We see that the baseless logarithm \(\log n\) works a lot like a vector \(\v\) or differential \(df\), and then expressing a logarithm in a base like \(\log_2 n = \log n / \log 2\) is a lot like a total derivative \(df/dx\) or Clifford division \(\v \ast \b{x}^{-1}\). What is missing is some equivalent of the partial derivative / projection operator that projects only onto that component… but various fields have gone and Found a way to invent that anyway, either in the form of a partial derivative \(\p f/\p x\), or just by making up the \(p\)-adic valuation \(\nu_p\), or by the limits \(\lim_{z\ra a} \log f(z) / \log (z-a)\) in complex analysis. The similiarities are all suspicious, though, and I can’t help but think there is some unifying theory here that ties all this together… but I can’t see what it is yet.

One thing that we might try in order to invent a \(\log_2 N\) that acts like \(\p_x f\) or \(\b{v}/\x\) is to somehow restrict the values of the logarithms to certain spaces, e.g. integers or rationals. Since the \(\{\log p_i\}\) are linearly indepedent (which is essentially equivalent to prime factorizations being unique), you would end up with objects like \(\log_2 3 = \log_3/\log_2\) which have no value in \(\bb{Q}\); “zeroing” those out then gives something that acts like a partial derivative. But I don’t know if that’s useful. Certainly it doesn’t help in any numeric context.

Anyway, onto more things that are logarithms.


3. Vectors are also Logarithms?

In differential geometry one interprets vectors like \(\v = v_x \x + v_y \y\) being written in a basis of partial derivative operators, \(\v = v_x \p_x + v_y \p_y\). These can then be used to create discrete translations which move around in the various coordinates,

\[T^{\v} = e^{\v} = e^{v_x \p_x + v_y \p_y }\]

The partial derivatives are here in order to make it operate on functions

\[e^{v_x \p_x + v_y \p_y} f(x,y) = f(x + v_x, y + v_y)\]

which is true at the level Taylor expansions as well. I often find it easier to dispense with the partial derivatives and just think of these as translation operators on the space \((x,y)\) directly

\[e^{v_x \p_x + v_y \p_y} (x, y) = (x + v_x, y + v_y)\]

(You can also think of this acting on the function \(f(x) = x\) also, but that feels like overkill.)

In any case, all this is really doing (in flat space, at least) is changing the additive vector \(\b{v}\) into a multiplicative form \(T^{\b{v}}\) which corresponds to the same operation, but whose terms are multiplied instead of added, and whose scalar coefficients are applied via exponentiation instead of multiplication. The basis is now translation operators in each coordinate:4

\[T^{\v} = e^{v_x \p_x} e^{v_y \p_y} = T_x^{v_x} T_y^{v_y}\]

(In non-flat space this is not so simple because the translations in different coordinates may not commute; you can still write it in this form but it’s a lot more complicated.)

What this means for us is: look, vectors are logarithms too

\[\begin{aligned} \ln T^{\v} &= \ln T_x^{v_x} T_y^{v_y} \\ &= v_x \ln T_x + v_y \ln T_y \\ &= v_x \p_x + v_y \p_y \end{aligned}\]

I can’t exactly say why, but it seems preferable to have this written in terms of baseless logarithms also. We do this by realizing that \(T_x = e^{\p_x} = T^{\p_x}\) and thinking of this symbol \(T\) as a sort of ‘generic’ base for translations, absent the numeric meaning of the symbol \(e\), which has \(\log T_x = \log T^{\p_x} = \p_x \log T\). Then

\[\log T^{\v} = \v \log T = v_x \p_x \log T + v_y \p_y \log T\]

And then we can write \(\v = \log_T T^{\v} = \log T^{\v} / \log T\). This is equivalent to the natural log version but it avoids explicitly depending on the numeric value of \(e\): any choice of base for the logarithm \(T\) gives the same concept of a vector, written in terms of the exponentiation of \(T\), but now we make explicit that the ‘units’ on \(\v\) come in part from the units on \(\log T\) itself.

So vectors in differential geometry may also be thought of as logarithms, specifically, the logarithms of translation operators.

Regular multiplication can even be viewed as an example of this. A product like \(xa\) can be rewritten as “translation” in the \(\ln a\) coordinate:

\[xa = e^{\ln x} e^{\ln a} = e^{(\ln x) \p_{\, \ln a}} a = x^{\p_{\, \ln a}} a\]

I’m not sure how that would be ever be useful but maybe it’s a bit interesting?


4. Logarithms are Derivatives?

This part doesn’t really matter, I just thought I would mention it so that this article contains every fun fact about logarithms that I know.

One way of defining the natural logarithm is

\[\ln x = \lim_{a \ra 0} \frac{x^a - 1}{a}\]

I find this formula neat for a few reasons. Mostly it explains where a lot of the behaviors of \(\ln\) in calculus comes from.

It follows from substituting \(x^a = e^{a \ln x}\) and then Taylor expanding:

\[\frac{x^a - 1}{a} = \frac{e^{a \ln x} - 1}{a} = \frac{(1 + a \ln x + \ldots) - 1}{a} \stackrel{a \ra 0}{=} \ln x\]

Plugging in \((1+x)\) also gives the Taylor series for \(\ln\):

\[\begin{aligned} \ln (1+x) &= \frac{(1+x)^a -1}{a} \\ &= \frac{\sum \binom{a}{k} 1^{n-k} x^k - 1}{a} \\ &= \frac{(1 + ax + \frac{a(a-1)}{2} x^2 + \ldots) - 1}{a} \\ &\stackrel{a \ra 0}{=} x - \frac{1}{2} x^2 + \frac{1}{3} x^3 - \ldots \end{aligned}\]

The \(\lim_{a \ra 0} (x^a - 1)/a\) formula for \(\ln x\) resembles a derivative. To make it explicit, we can write it as

\[\ln x = \lim_{dy \ra 0} \frac{x^{y + dy} - x^y}{dy} \mid_{y=0} = \p_{y} x^y \mid_{y =0}\]

This form also shows how there is sort of a connection between \(\ln x\) and the polynomials \(x^k\), which maybe explains the otherwise-somewhat-mysterious fact that \(\int x^{k} = \ln x\) for \(k=-1\), whereas it is a polynomial for all other values of \(k\). Why is a logarithm like a polynomial? Well, it’s because in a lot of ways \(\ln x\) acts like \(x^0\). More specifically, it acts the ‘interesting’ part of \(x^0\), that is, its first order approximation around \(x=1\)

\[\ln x \sim \frac{x^0 - 1}{0}\]

Just for fun, try using \(\p_x x^k = k x^{k-1}\) on it:

\[\p_x \ln x = \p_x \frac{x^0 - 1}{0} = \frac{0 x^{-1}}{0} = \frac{1}{x}\]

That’s all I really have to say about this. But I wonder if some of the other ideas on this page would benefit from being interpreted via the \(\ln x = \p_y x^y \mid_{y=0}\) form.


5. Dimensions are Logarithms

Another thing which clearly acts like a logarithm is the dimension operator \(\dim\) in linear algebra.

Compare:

\[\begin{aligned} \dim_{K} K^n &= n \dim_K K = n \\ \dim_K U \oplus V &= \dim_K U + \dim_K V \\ \dim_K U/V &= \dim_K U - \dim_K V \\ \dim_K U \o V &= (\dim_K U) \times (\dim_K V) \\ \end{aligned}\]

(where \(\dim_K V\) means its dimension as a vector space over the base field \(K\), and assume we’re only talking about finite-dimensional spaces here) with

\[\begin{aligned} \log_k k^n &= n \log_k k = n \\ \log_k u \times v &= \log_k u + \log_k v \\ \log_k u/v &= \log_k u - \log_k v \\ \log_k k^{\log_k u \times \log_k v} &= (\log_k u) \times (\log_k v) \end{aligned}\]

The direct sum \(\oplus\) corresponds to multiplication \(\times\), which is really just a notational accident, since it is the same as the direct product on finite-dimensional vector spaces; the \(\oplus\) symbol reflects the fact that it adds bases as sets.5 meanwhile the tensor product \(otimes\) multiplies bases on sets, but corresponds in arithmetic to a sort of “commutative exponentiation” \(e^{\log_k u \log_k v} = u^{\log_k v}\) that you don’t see very much, sometimes called a commutative hyperoperation. (The next ‘displacement’ operation after \(b-a\) and \(b/a\) is therefore \(e^{\ln b / \ln a} = b^{1/\ln a}\).)

I am a bit upset that I have never seen anyone point out that \(\dim\) is a logarithm, since it’s so obviously the case. Maybe I’m missing something? After all I am ignoring the infinite-dimensional cases entirely. But I suspect it’s mostly just because math likes to stay on more rigorous “solid ground” than this. I, however, love to speculate about underappreciated connections between things, so I have no problem saying: dimension is a logarithm.

The simple reason why \(\dim_K\) acts like \(\log_k\) in the case of finite \(K\) is as follows. We need three observations:

One, the dimension of a vector space is defined as the cardinality of its basis. An individual vector \(\b{v} = v_1 \x_1 + v_2 \x_2 + \ldots + v_n \x_n \in K^n \simeq V\) can be thought of as a choice of function \(\dim_K V \ra K\), since it assigns a coefficient \(v_i \in K\) to each basis vector \(\x_i\).

Two, the cardinality of the functions between sets \(B \ra A\) is given by \(\| A \|^{\| B \|}\), which is why we use the symbol \(A^B\) for the sets \(B \ra A\). For example the powerset of \(A\), that is, the set of all possible subsets of \(A\), is notated \(2^A\) because it is equivalent to the functions \(A \ra \{ 0, 1 \} \equiv \b{2}\), where a given subset is identified with the elements that map to \(1\).

Three: applying that to a vector space \(V \simeq K^n\), we can interpret \(K^n\) as describing the set of functions from \(\b{n} = \{ \x_1, \x_2, \ldots, \x_n \}\) from a choice of basis into the underlying field \(K\), which naturally has cardinality \(\| V \| = \|K\|^{\| \dim_K V \|}\). Therefore the logarithm of this is the dimension of \(V\) over \(K\):

\[\dim_K V = \log_{\| K \|} \| V \| = \log_{\| K \|} \|K \|^{\dim_K V}\]

This is literally true in the case where \(V\) is finite dimensional and the field \(K\) is also finite. It’s less solid if either is infinite; however, I tend to think that expressions of this form are also literally true in the case of infinite dimensions, if you define things in a slightly better way. In particular you have to use a concept other than “cardinality” to measure if you want infinite expressions like \(\log_{\| \bb{R} \|} \| \bb{R}^2 \| = 2\) to make any sense. I am pretty sure the right choice is what’s sometimes called numerosity, although I don’t know how compatible that is with the rest of linear algebra. More on that some other day.

Anyway, even if you only take this as meaningful on cardinalities of finite-dimensional sets over finite fields, I think it’s strange that it never really comes up, since it is such a natural construction! Or maybe it does and I’ve missed it. But anyway, I like it, and I happen to think the correspondence here is much stronger and more significant than what I’ve just described.

If we repeat the above with ‘baseless’ logarithms, we get expressions like

\[\dim K^n = n \dim K\]

such that

\[\dim_K V = \frac{\dim V}{\dim K}\]

The one place we have to be careful is in the definition of a tensor product. We want it to be the case that

\[\dim_K K^a \o K^b = \dim_K K^a \times \dim_K K^b = a \times b\]

But the naive approach has an extra factor of \(\dim K\):

\[\dim_K (K^a \o K^b) = \frac{\dim K^a \dim K^b}{\dim K} = \frac{a \dim K b \dim K}{\dim K} = ab (\dim K)\]

The problem is that the definition of the tensor product is a bit more complicated than just multiplying bases. A vector \(\b{u} \o \b{v} \in K^a \o K^b\) is not the Cartesian product of vectors \(\b{u}\) and \(\b{v}\), but rather the Cartesian product modulo a quotient on its scalar coefficient which combines two scalars \((k_1, k_2)\) into one \((k_1 k_2)\). Since this divides out a factor of \(K\), we have to do the same with our \(\o\) operation in order to make the cardinalities work out. This is done by specifying an \(\o_K\) operation, the “tensor product with respect to the field \(K\)”, as

\[U \o_K V = K^{\dim_K U \dim_K V} = K^{\dim U \dim V / \dim K}\]

which allows \(\dim_K K^a \o_K K^b \dim_K K^{ab} = ab\) to work. (I suspect sometimes that the quotient in the definition of \(\o_K\) is not actually needed for most purposes, which would have the nice side effect of making this all work out more simply, but let’s not get into that.)

The definition

\[\dim_K K^a = \frac{\dim K^a}{\dim K} = \frac{a \dim K}{\dim K}\]

seems to imply that one could take the dimension/logarithm of a vector space with respect to a different underlying object, not the field \(K\), and get a meaningful result. For example it is my dream to be able to say that this is how you construct a vector space with a “fractional dimension” of \(\frac{1}{2}\):

\[\dim_{K^2} \? K = \frac{\dim K}{2 \dim K} = \frac{1}{2}\]

This works fine at the level of cardinalities, more or less (if you allow that the rationals are invented precisely to let you make objects like \(1/2\) which interpolate between ratios of non-divisible integers). But it is hard to imagine how it should work if you want anything like a “field” or a “vector space” with its usual axioms to be meaningful. Maybe a vector \(\b{v} \in \bb{R}^4\) is viewed as a vector over \(\bb{R}^2\) via \(\b{v} = (v_w, v_x) \cdot (\w, \x) + (v_y, v_z) \cdot (\y, \z)\). But then how does scalar multiplication work? If the scalars are \(\in K^2\), they have zero divisors, so you are not working in a field anymore. And what is meant by a vector with dimension \(\frac{1}{2}\) would be spanned by ‘half’ a basis vector over that pseudo-field? Maybe its elements look like \(\u = (u_x, \bullet) \cdot (\x, \bullet)\)? One must attempt to define versions of the theorems of linear algebra which are compatible with this sort of decomposition. No idea how to do that at the moment, but I suspect it can be done, with sufficient imagination, I hope to attempt it in a future article.


6. Bases are Logarithms

The dimension of a vector space is the cardinality of its basis. But just like we use expressions like \(B^A\) for functions between sets because they are respected at the level of cardinalities \(\| B \|^{\| A \|}\), we may as well interpret the \(\dim\) operator in the same way: if \(\dim\) returns the cardinality of the basis, then let’s say that \(\log\) returns the basis itself, which happens to have that cardinality. For instance if a vector space \(V \simeq K^3\) has basis \((\x, \y, \z)\), we might write

\[\begin{aligned} \log_K V &= (\x, \y, \z) \\ \end{aligned}\]

And then define \(\dim_K\) as the cardinality of this:

\[\begin{aligned} \dim_K V &= \| \log_K V \| \\ &= \| (\x, \y, \z) \| \\ &= 3 \end{aligned}\]

Why not? \((\x, \y, \z)\) is an object for which \(K^{(\x, \y, \z)} \simeq V\), sorta, therefore \(\log K^{(\x, \v, \y)} = (\x, \v, \y)\). (One could also just let \(\dim_K\) refer to both operations, perhaps, or maybe write capital \(\text{Dim}_K V\) for the same thing.) Perpaps it’s a bit weird to treat \(K^{(\x, \y, \z)}\) as a set exponentiation when the exponent is an tuple / Cartesian product, but it should be easy to adjust things to make it work.

There is an obvious issue, though. Why would this particular choice of basis be the value of \(\log_K V\), since \(V\) has very many possibly valid bases and no reason to choose one a particular one?

Maybe it is more correct to \(\log_K V\) as really being an object which refers to all possible bases of \(V\) at once (I’m not sure what it’s called. Sort of a frame bundle but with only one base point?) We can give it coordinates: the space \(X = \log_K V\) is parameterizable by coordinates \((X_0, \Lambda)\), where \(X_0 = (\x, \y, \z)\) is an arbitrary ‘origin’ frame and \(\Lambda\) is an arbitrary linear transformation \(\in GL(V)\), the automorphisms of \(V\).6 I guess we can should just write

\[X = \{ \Lambda X_0 \mid \Lambda \in GL(V) \}\]

and then the dimension itself is the cardinality of the quotient of this by \(\Lambda\), which will be a sort of generic object that represents the size of any choice of basis.

\[\dim_K V = \| \frac{\log_K V}{\Lambda} \| = \| \frac{X}{\Lambda} \|\]

If \(\log_K V = X\), then there ought to be an operation which goes the other way, that reconstructs a vector space from its basis. We may as well equate this with the linear span operation;

\[\span(X) = K^X = V\]

This is not quite how span is normally defined. Usually it’s something like: “\(\span(\x, \y, \z)\) is subspace of the (ambient) vector space \(V\) over the (ambient) field \(K\) which contains the vectors \((\x, \y, \z)\) and is of minimal dimension”. To interpret it algebraically, though we don’t really want to make reference to an “ambient” vector space or field, because it should just be an operation on the vectors itself. For this we need to at least explicitly indicate the underlying field, by writing \(\span_K\) with a subscript:

\[\span_K(X) = K^X = V\]

All of this is definitely rife with abuses of notation, and I’m not sure that it’s quite the best way to think about things. But I still wanted to mention it because it’s nice to think of the operators \(\dim\) and \(\span\) as being linear algebra analogues of \(\log\) and \(\exp\).

It is also interesting consider what might be meant by the baseless logarithm in the sense of bases. In the expression

\[\log_K K^X = \frac{\log K^X}{\log K} = \frac{X \log K}{\log K}\]

what would be meant by \(X \log K\) as a ‘basis’? Presumably the division by \(\log K\) corresponds to some sort of quotient… but we will need a way of interpreting \(\log K\) itself. Perhaps as a “basis for \(K\)”? I’m not sure. I do think there’s something here, but it gets much more speculative so I will leave it for another time.

However I do want to investigate the meaning more generally because it seems pretty general.


7. Functions are Logarithms?

Treating \(\log_K K^n = n\) as returning a basis for \(K^n\) as a set is an example of a general procedure which doesn’t quite have a name as far as I know. It is sort of like categorification, but not quite. Rather than locating categories for set operations, we’re locating sets for algebraic operations, and not making any reference to categories really. So I’m not sure. Maybe ‘setification’? Or ‘structurization’? I dunno.

The standard example of this ‘setification’ is to treat arithmetic operations on natural numbers like \(A+B\), \(AB\) and \(B^A\) as being projections out of set operations \(A \sqcup B\), \(A \times B\), and \(B^A\) (the functions \(A \ra B\)). This works nicely for finite sets because the operations respect cardinalities. (As mentioned earlier, I think you have to replace ‘cardinality’ with something like ‘numerosity’ to make this work elegantly on infinite sets, and I don’t want to get into that.)

A compelling reason for thinking this way is that the setified arithmetic operations in fact explicitly enumerate the sets they describe. For example, given you have sets \(A = \{ a, b \}\) and \(X = \{ x, y \}\), then you can expand \(A^X\) algebraically (presupposing all the variables will equal \(1\) later):

\[(a+b)^{x+y} = (a+b)^x (a+b)^y = (a^x + b^x)(a^y + b^y) = a^x b^x + a^x b^y + a^y b^x + a^y b^y\]

Then upon setting the variables to \(1\) this correctly describes the relationship in cardinalities, \(2^2 = 1 + 1 + 1 + 1\), since the number of functions \(X \ra A\) is in fact \(4\). But it also describes the sets themselves: each term in the expanded sum is one of the four possible functions \(X \ra A\) exactly when we interpret \(a^x b^y\) as the function which maps \(x \ra a\) and \(y \ra b\). Also, evaluation of these variables corresponds to evaluating the functions, e.g. setting e.g. \(x=1\) and \(y=0\) to get \(a^x b^y \mapsto a^1 b^0 = a\). Setting one variable but leaving the other gives restriction, e.g. \(y=0\) sets \(a^x b^y \mapsto a^x\). All of this basically also works if the variables have values other than \(1\), in which case they represent unlabeled sets of whatever cardinality; however, the algebraic manipulations \((a+b)^x = a^x + b^x\) are not valid and you have to use a binomial expansion instead.

You can do similar constructions with a lot of combinatoric objects, although they don’t always so cleanly correspond to algebraic manipulations. Factorials are

\[\begin{aligned} (a+b+c)! &= a^a b^b c^c + a^a b^c c^b + a^b b^a c^c + a^b b^c c^a + a^c b^b c^a + a^c b^a c^b \end{aligned}\]

which enumerates the \(3! = 6\) permutations of \(3\) elements. Combinations

\[\begin{aligned} \binom{a+b+c}{x+y} &= \frac{1}{x^x y^y+x^y y^x}[ a^x b^y + a^y b^x + a^x c^y + a^y c^x + b^x c^y + b^y c^x] \\ &= a^{q} b^q + b^q c^q + c^q a^q \end{aligned}\]

enumerate the \(\binom{3}{2} = 3\) \(2\)-element combinations of \(3\) elements. Here the \(\frac{1}{x^x y^y+x^y y^x}\) corresponds to \((x+y)!\). Dividing through by the number of permutations implements the quotient \(x \sim y\) that avoids double counting, and \(q\) is a new variable that represents carrying out this quotient (I’m not sure if this is the best way to write this). Note that although all these variables will end up equaling \(1\), by leaving them as independent variables they track meaningful information from step to step.

I suspect that every arithmetic identity has some equivalent setified expression like this (this is the spirit of my ongoing quest to make sense of fractional permutations). I also notice that a lot of information is lost when you map these set expressions back onto arithmetic: for example you elide the distinctions between all possible quotients that lead to the same cardinality. Probably there is a lot of interesting structure there.

Anyway, for our purposes, I want to observe one thing about these. When thinking of functions as sets we usually picture them as ‘relations’: a function \(f: X \ra A\) is modeled as the set

\[\begin{aligned} f = \{ (x, f(x)) \mid x \in X \} \subset X \times A \\ \end{aligned}\]

Or \(\{ (x,a), (y, b) \} = xa + yb\) in our example. This set happens to have the cardinality \(\| f \| = \| X \|\), although it’s not clear what use that is.

Now consider \((a+b)^{x+y} = a^x b^x + a^x b^y + a^y b^x + a^y b^y\) from earlier. If \(a^x b^y\) is supposed to describe a single function from \(X = \{ x, y\}\) to \(A = \{ a,b \}\) , then why doesn’t it setify to something like \(\{ (x, a), (y, b) \}\), with cardinality \(2\)?

Maybe you see where I’m going with this. \(f = a^x b^y\) has cardinality \(1\), because it’s one function. Its logarithm, however, looks more like the relation model:

\[\log f \? x \log (a) + y \log (b)\]

This looks a lot like \(xa + yb\), but it’s also suspiciously different. Also, it doesn’t have a cardinality since we need to divide by a base, but when we do it seems like any choice we make has to give the cardinality \(\log_b f = \log f / \log b = x \log_b a + y \log_b b = x (0) + y(0) = 0\). What do we make of this?

After thinking this for a while I still don’t really feel like I have a good explanation for it, but I think we are supposed to think of it as equivalent to \(x a + yb\), just with the \(a\) and \(b\) written in a different basis, so it is more like a comparison between \(a \o x + b \o y\) and \(a \o \log x + b \o \log y\) than between numeric expressions. The cardinality being \(0\) doesn’t matter, because it’s not meaningful to talk about the cardinality of a function. And the role of \(\log x\) is just to change algebras for \(x\) from multiplicative to additive, but the two objects are supposed to be isomorphic and regarded as the same, at least in this case where the cardinality doesn’t mean anything.

I’m not sure about this part, and might come back and rewrite it later if I find a better interpretation. In any case I think it is interesting (or amusing, maybe) that \(\log f = \log a^x b^y\) gives something that at least resembles the function’s representation as a relation. Everything is logarithms?


8. Everything is Logarithms

What we have been discussing is the most simple and well-behaved version of a logarithm in mathematics, the isomorphism between the additive real algebra \((\bb{R}, +)\) and the multiplicative one \((\bb{R}^{\geq 0}, \times)\). Of course there are logarithms in mathematics which are more complicated than that, such as the complex logarithm \(\log z = \text{Log } z + 2\pi i k \mid k \in \bb{Z}\), or its messier cousins like the logarithm of a matrix. But I suspect these are a confusion of concepts. What’s really going on in the logarithm on \(\bb{C}\), for instance, is that angles really take their values \(\in S_1\), not \(\bb{R}\), which has a different topology, and the weird behavior follows from not respecting this. A different set of conventions would move the problem out of the logarithm and into the definitions of the values themselves. Unfortunately that’s not how things are are defined today so you have to deal with it—but, still, it doesn’t seem like the logarithm’s fault to me.

Anyway the discussion in this article ignores those cases and assumes that \(\log\) really is an isomorphism: it’s just a way of taking something expressed in a multiplicative form and re-expresses it in an additive form. This is turns out corresponds to many operations that one learns in math, such as the \(\dim\) operator in linear algebra and the \(\nu_p\) operation in number theory (sorta) and the total derivative in calculus (also sorta).

All of these things which appear to be very different seem to in some way be instances of the same basic primitives. And although these associations arise from my sort of… numerology… I can’t shake the feeling that it’s all too clean to not matter. Perhaps math needs to clean this all up: we are somehow missing the forest for the trees by keeping all this redundancy buried in the notations; actually there are only a few basic operations which are being written differently everywhere, and with all the patterns disguised everything is a lot harder than it needs to be. I suspect the patterns I’ve written about in this article should not feel like things I had to rediscover for myself. They follow naturally from the material that everybody learns.

I also keep finding that the math of physics seems to end up at a lot of the same structure. I first noticed these patterns in the operator formulation of quantum mechanics because it seems to insist on a certain ontology for its mathematics. I wonder if this is because physics is telling us how things “should be done”. Since in physics the mathematics is a human lens through which we view reality, the math must not impose its own views on how things are done, and any views you accidentally impose eventually clash with the requirements of the physics.

This is the idea behind the concept general covariance, that the properties of objects are independent of the coordinates we use to express them, and so the meaningful theorems about reality end up being expressed in coordinate-free ways. The same philosophy applied to linear algebra or differential geometry leads to their covariant formulations that are indisputably ‘better’ than the forms in coordinates.

The baseless logarithm, which seems somewhat nonsensical mathematically, is an example of this applied on purely mathematical terms. It basically says that the isomorphism from multiplicative to additive algebraic representations of the same thing is separate from the choice of units on those algebras, but most of its properties are unrelated to the units. Just like how the concept of a geometric vector is distinct from its projection onto a particular coordinate system. Meanwhile a bunch of other things with other notations are basically the same operation as the logarithm, or closely related to it.

When you take general covariance to its extreme you end up asking that all of your mathematics be formulated in a covariant way, as explicit relations between one thing you measure to another thing you measure. For example we think of say having a certain cardinality, but in fact cardinality is a property of the set that we measure, and we have to be clear about how we do that because it’s all relative to the “coordinate system” for those measurements. Such a formulation is necessary to find the answers to why questions about how mathematics works, about what is ‘actually’ going on independent of the human definitions and frameworks of set or category theory or whatever. The observations in this article are not very deep, but they seem to me to be among the many clues which point towards that formulation. I still can’t see it, though.

  1. I hope to write a better standalone article about this notation soon. I’ve been trying to do so for a few years now but I seem to start losing my sanity whenever I try to work on it so it hasn’t happened yet. When I do finally manage to do it I’ll update this. 

  2. The \(f_x = \p_x f = \p f / \p x\) notation for partial derivatives is unfortunate; it should be \(df_{dx}\), to indicate that it is the “\(dx\) component” of the vector \(df\), or \(d_x f\), meaning the \(x\) component of \(d\) acting on \(f\). Better yet it would be \((\p f)_{\p x} = \p_x f\) and the \(d\) symbol would be retired, but that seems like a tall order. 

  3. There is also a thing called an arithmetic derivative and a corresponding partial derivative \(D_p(n) = \nu_p(n)/p\), but as far as I can tell it’s not quite the same thing and not what I’m looking for. 

  4. If you happen to have a vector in a polar form like \(\b{v} = v_r e^{R v_{\theta}}\), that refers to a second layer of exponential representation, via \(T^{\v} = T^{v_r e^{\p_{\theta} v_{\theta}}(\p_x)}\), where \(\p_x\) is a choice of origin for the rotational \(\theta\) coordinate (which may be multidimensional as well). 

  5. Apparently the \(\oplus\) symbol is due to Bourbaki because everything was a mess prior to that. Also it happens to be a coproduct (which came later) and those do correspond to addition on sets, so there is at least a connection to addition… but at present I think it is largely a mistake. 

  6. The technical term is that it is a \(GL(V)\)-torsor since the choice of origin \(X\) is arbitrary. The concept is easier to understand from Baez. This is one of those mathematical terms which I don’t like because it is so simple that it should not really have a special name (nor such a technical Wikipedia article).