The essence of quantum mechanics

[July 24, 2020]

Here’s what I know about QM. I’m trying to learn QFT and it helps to have the prerequisites compressed into the simplest possible representation.

This will make little sense if you haven’t studied quantum mechanics.

Conventions: \(c = 1\), \(g_{\mu \nu} = diag(+, -, -, -)\). I like to write \(S_{\vec{x}}\) for \(\nabla S\).

1. QM but starting from the solutions

QM makes a lot more sense to me if you (a) handle everything relativistically from the start and (b) just assume the form of the wave function solutions instead of deriving them. If I had my way I’d start a quantum mechanics course with special relativity, followed by introducing the scalar wave function, like this:

Consider a function on spacetime with the form \(\psi(t, \vec{x}) = e^{ i S(t, \vec{x})/\hbar}\) which assigns a complex phase to every point. It is fully determined by the action \(S(\vec{x}, t)\), and in particular given an initial state \(\psi_0\), is determined by the action gradient \(dS = S_{\mu} dx^\mu\). This lets us compare quantum states by integrating over some path \(\Gamma\):

\[\psi(t, \vec{x}) = e^{i/\hbar \int_{\Gamma} dS} \psi_0\]

Later on when potentials are involved we will need to be specific about the path of integration, but for now we can think of \(S\) as a scalar function that determines \(\psi\) everywhere.

Relativistic invariance insists that \(\psi\) have the same value in any reference frame, and \(- i \hbar \p \psi = - i \hbar (\p S) \psi = (S_t, S_{\vec{x}}) \psi\) must be a covariant 4-vector. Contraction with \(\bar{\psi}\) extracts the vector components: \(\< \psi \| {- i \hbar \p} \| \psi \> = \bar{\psi} (S_t, S_{\vec{x}}) \psi = (S_t, S_{\vec{x}})\). Finally, \(\| (S_t, S_{\vec{x}}) \| = \sqrt{S_{t}^2 - S_{\vec{x}}^2}\) must be a Lorentz-invariant scalar.

\(i \hbar \p_t = \hat{E}\) and \(- i \hbar \p_x = \hat{P}\) are the energy and momentum operators. The quantum mechanical operators apparently extract properties of \(S\), but because \(S\) is packed inside an exponential, they extract them as eigenvalues: \(i \hbar \p_t \psi = - S_t \psi\). Our quantum-mechanical inner product and our operators are just tools for extracting properties of \(S\), since \(\psi\) is the only thing we can directly operate on. When an equation like the Schrödinger equation contains a \(\hat{P} = - i \hbar \p_x\) operator, it’s just a skew way of writing \(S_{\vec{x}} = \vec{p}\).

The exact values of \(\psi\) up to a phase, and therefore \(S\) up to a constant, are not physically observable.

For a free massive spinless particle the action gradient is \(S_{\mu} = - p_{\mu} x^{\mu} = \int -p_\mu dx^{\mu}\), where \(p\) is the four-momentum and \(\| p \| = m\), the rest mass. In the rest frame this is simple \(S = -m \tau = - \int m d\tau\). In the absence of a potential this gives the wave function:

\[\psi = e^{- i/\hbar \int p_\mu dx^\mu} = e^{- i/\hbar p_\mu \D x^\mu} = e^{i/\hbar ( \vec{p} \cdot \vec{x} - Et)}\]

which is a Fourier mode with momentum \(p\). Time evolution via exponentiation of the Hamiltonian amounts to translating in \(t\):

\[\psi(t + \Delta t) = e^{i/\hbar \hat{H} \Delta t} \psi(t) = e^{\Delta t \p_t} \psi(t) = e^{i/\hbar (\vec{p} \cdot \vec{x} - E(t + \Delta t))} \psi_0\]

(This uses the idea that exponentiating a differential operator translates in that coordinate: \(e^{a \p_x} f(x) = f(x+a)\).)

When an initial state is not a pure Fourier mode with a definite momentum, we expand it as a sum of modes. For instance, if at \(t=0\) we measure an electron at \(\vec{x} = 0\), then the initial state is \(\psi(0, \vec{x}) = \delta(\vec{x}) = \int e^{i \vec{p} \cdot \vec{x}} d \vec{p}\).

When potentials are involved, \(dS\) is modified. The electromagnetic field, for instance, enters as \(p \mapsto p - i qA\), so \(dS = (p_{\mu} - i q A_{\mu}) dx^{\mu}\). Depending on the field configuration we may no longer be able to easily integrate \(\int dS\): if \(A\) includes a current, then it contains a ‘line’ of divergence, and the path integral’s result will depend on how many times \(\Gamma\) circles this divergence. This causes the path integral to give different values based on the choice of path. Summing over these paths corresponds in QFT to summing over the number of photons that are exchanged (I think. Will work it out in detail when I get to QFT).

2. Correspondences

Many concepts are not far away when you start with the solutions:

Energy-momentum: For a free particle \(S_t = E\) and \(S_{\vec{x}} = \vec{p}\), and \(m = \sqrt{E^2 - p^2}\) is the relativistic rest energy/momentum relation. The wave function looks like \(\psi = e^{i/\hbar \int \vec{p} \cdot d\vec{x} - E dt} \psi_0\). A high energy/momentum corresponds to a rapidly changing action, and thus to a wave function that is quickly rotating as you translate in time or space. Ultimately, the mass \(m\) corresponds to the speed of angular rotation in a particle’s rest frame, and its energy and momentum are the results of Lorentz-transforming \(dS = - m d\tau\) into other frames.

Path integration: Relative changes in \(S\) can be found by integrating: \(S(f) - S(i) = \int_{\Gamma} dS\) along any curve \(\Gamma\) from \(i\) to \(f\), and \(\psi(f) = e^{i/\hbar (S(f) - S(i))} \psi(i)\). Thus \(e^{i/\hbar (S(f) - S(i))}\) is the ‘transition matrix’ between any two states, along a given path. The total transition amplitude is a sum over all possible paths between two states. This extends handily to QFT’s path integrals when creation/annihilation of particles is included.

The roles of \(\hbar\) and \(i\): \(S \mapsto e^{iS / \hbar }\) is the conversion from ‘action’ space to ‘phase’ space. \(\hbar\) changes units from action (energy \(\times\) time) to radians; if we set \(\hbar = 1\) we are declaring that we measure action in radians. The resulting space after mapping by \(e^{iS}\) is physically meaningful, because in some cases we’ll end up summing these phase factors from multiple starting states and seeing interference patterns. I suspect that the output space is the \(U(1)\) that is identified with the electromagnetic gauge field but am not sure. If so, I think it would be good to write \(R_{EM}\) instead of \(i\), in order to avoid accidentally conflating the \(i\) factors from rotations in different spaces. (Actually that’s my stance on \(i\) in general.)

Angular momentum: The orbital angular momentum operator, \(\hat{L}_z = -i \hbar \p_{\phi}\), does the same thing as \(\hat{P} = - i \hbar \p_{\vec{x}}\) but for a wave function in spherical coordinates. The azimuthal angle term looks like \(\psi \sim e^{i/\hbar (l_z \phi - E t)}\), and \(\hat{L}_z \psi = l_z \psi\). The azimuthal quantum number \(l_z\) (often written \(m\)) measures how many times \(\psi\) oscillates in a rotation of the polar angle \(\phi\); it is quantized precisely because the \(\phi\) coordinate has a built-in periodicity. A \(z\)-angular momentum value of \(l_z\) labels the number of periods the wave makes as you rotate \(\phi\) in the \(z\)-plane.

Spin-\(\frac{1}{2}\): If \(l_z = 1/2\), then \(\psi_{\pm} \sim e^{i/\hbar (\pm \frac{1}{2} \phi - E t)}\) acts like a spinor (by modeling the spin as orbital angular momentum, and omitting the \(r\) and \(\theta\) components). This function appears trivially unphysical, since it has different values at \(\phi = 0\) vs \(\phi = 2 \pi\). The resolution is the fact that it’s only meaningful to use the wave function to compare states that are connected by a path – and for a spinor it’s correct that \(\< \psi(\phi = 2 \pi) \| \psi(\phi = 0) \> = -1\). (This is a useful mental model but isn’t the full story. My next post will be a truly exhaustive exploration of spinors.)

Spin-\(1\): A vector-valued wave function \(\vec{\psi} = (\psi_x, \psi_y, \psi_z)\), where the terms transform according to physical rotations, is a spin-1 object. These are a lot easier to understand. To consider its angular momentum we diagonalize \(\hat{L}_z\) by changing bases to

\[(\psi_{+1}, \psi_0, \psi_{-1}) = (\frac{\psi_x - i \psi_y}{\sqrt{2}}, \psi_z, \frac{\psi_x + i \psi_y}{\sqrt{2}})\]

These are known as spherical tensors. This transform performs a change of basis for the tangent vectors:

\[(\hat{x}, \hat{y}, \hat{z}) \ra (\frac{\hat{x} + i \hat{y}}{\sqrt{2}}, \hat{z}, \frac{\hat{x} - i \hat{y}}{\sqrt{2}})\]

Notably, this new basis rotates with \(\phi\): \(\hat{x} \pm i \hat{y} = e^{i \phi} (\hat{\rho}+ i \rho \hat{\phi})\), capturing the idea that any vector valued function has an intrinisic \(\phi\)-derivative, independent of reference frame, just by virtue of being a vector.

So a vector wave function \(\vec{\psi}\) locally looks like \(e^{i \hbar (s_z + l_z) \phi + f(\rho, z) - E t}\), where \(l_z\) is its orbital angular momentum and \(s_z \in (+1, 0, -1)\) is a frame-independent contribution from its vectorial nature. The \(0\) component of \(L_z\) spin corresponds to a vector-valued wave function pointing only in the \(z\) direction. \(\pm 1\) components correspond to having \(x\) or \(y\) components, with the sign determined by their relative phase.

Note what it means to have spin \(1\): it’s not that it fixes the value of the angular momentum; rather, it specifies the different ways that the angular momentum can transform under rotation. The three choices determine whether \(\vec{\psi}\) is in the \(z\) direction \((s_z = 0)\) or whether it has a positive or negative ‘rotational’ components in the \(xy\) plane (\(s_z = \pm 1\)). Particularly, having angular momentum \(s_z = +1\) means that the \(y\) component is advanced in phase compared to the \(x\) component – this is why the ‘ladder’ operator \(\hat{S}_+ = \hat{S}_x + i \hat{S}_y\) serves to increase the angular momentum, because it includes a factor of \(e^{i \phi}\):

\[\hat{L}_+ = (\hat{L}_x + i \hat{L}_y) \sim e^{i \phi}\]

By the way, photons are spin-1 particles, but to be in the \(s_z = 0\) state would mean require that there be a rest frame where their vector points the same way as their velocity – and they have no rest frame, so the \(l_z\) value is forbidden. The remaining \(\pm 1\) states correspond to photon polarizations.

The Schrödinger Equation: We can write \(S_t^2 - S_{\vec{x}}^2 = m^2\) as \(S_t = \sqrt{m^2 + S_x^2} = m \sqrt{1 + \frac{S_x^2}{m^2}}\) and expand as a Taylor series (note that \(\| S_x/m \| = \| p / m \| \ll 1\)) to get:

\[S_t = m (1 + \frac{1}{2} \frac{S_x^2}{m^2} + O((\frac{S_x^2}{m^2})^2) \approx m + \frac{S_x^2}{2m}\]

Using our operator forms we get the free-particle Schrödinger equation:

\[\hat{E} \psi \approx (m + \hat{P}^2/2m) \psi\]

Interpreting, this says that the time-derivative of the action is a constant (the mass) plus a term proportional to the kinetic energy, plus higher-order terms that vanish at low momentums.

The initial \(m\) term is normally ignored in non-relativistic QM. It corresponds to a constant change in phase along any path (and adds a constant term to the Lagrangian), but it drops out of any calculation if you (a) only integrate over time and (b) never create/annihilate particles.

Schrödinger with potential: The \(V\) term in the non-relativistic Schrödinger ends up next to the kinetic energy term: \(\hat{E} \psi \approx (m + \hat{P}^2/2m + V) \psi\). Working backwards through the derivation, we figure that the constraint on \(S\) must be: \(S_t - V = \sqrt{m^2 + S_x^2}\). But there is no particular reason this would have a clean relativistic form, since we treat our potential non-relativistically anyway.

Nevetheless we can add to our interpretation: the role of a classical scalar potential \(V\) is to modify the phase change as a wave function translates in time, such that the particle acts like it has energy \((E - V)\) instead of \(E\). The role of the vector potential is to modify the momentum, \(\vec{p} \mapsto \vec{p} - q \vec{A}\).

The relativistic version uses the 4-potential \(A = (\phi, \vec{A})\). The electromagnetic wave function is something like \(\psi = e^{i/\hbar \int (\vec{p} - i q \vec{A}) \cdot d\vec{x} - (E - i q \phi)]dt } = e^{i/\hbar \int (\p_{\mu} S - i q A_{\mu}) dx^{\mu}}\). Interpreting: the potential serves to modify the contributions of energy and momentum by a path-dependent phase, by changing the action gradient as \(\p S \mapsto \p S - i q A\). A particle traversing a potential acts like it has a modified 4-momentum, of \(p \ra p - i q A\).

The Lagrangian: The integral \(\Delta S = \int dS\) can be parameterized by time as

\[\Delta S = \int (S_{\vec{x}} \cdot d\vec{x}/dt - S_t) dt = \int L \, dt\]

\(L = dS / dt\) is the source of the (single-particle) Lagrangian, and is where the elementary form \(L = T - V\) comes from. For a free particle, \(L dt = -m d\tau\), and \(\Delta S = - \int m d \tau\). In a classical scalar potential with \(S_t = E = T + V\):

\[L = S_{\vec{x}} \cdot d\vec{x}/dt - S_t = \vec{p} \cdot \vec{v} - E\]

In classical mechanics often \(E = T + V\) and \(\vec{p} \cdot \vec{v} = 2 T\), giving

\[L = 2 T - (T + V) = T - V\]

Regardless of how we parameterize \(S = \int dS\), applying stationary-action will give the classical trajectory. Feynman’s classic explanation of this is that all but the ‘stationary’ path – the choice of \(\Gamma\) such that \(\delta S / \delta \Gamma \vert_{\Gamma} = 0\) – will exhibit destructive interference in the macroscopic limit, resulting in the laws of classical physics. Quantitatively, this means that in the classical limit as \(\hbar \ra 0\), the path integral is dominated by the minimal path:

\[\begin{aligned} \lim_{\hbar \ra 0} \int d\Gamma e^{i S[\Gamma] /\hbar} &= \lim_{\hbar \ra 0} \int d (\Delta \Gamma) e^{i S[\Gamma_{\text{min}} + \Delta \Gamma] /\hbar} \\ &\sim \lim_{\hbar \ra 0} e^{i S[\Gamma_{\text{min}}] /\hbar } \end{aligned}\]

I don’t exactly know how to make that rigorous but it makes heuristic sense: as \(\hbar \ra 0\) the function oscillates infinitely rapidly, cancelling itself out in the integral over \(\Delta \Gamma\), but at least the minimal path, where \(\delta S / \delta \Gamma = 0\), oscillates less than the rest do. We could imagine expanding \(S\) as a Taylor series \(S = S[\Gamma_{\text{min}}] + (\delta S / \delta \Gamma) \delta S + \ldots\), but I really don’t know if that’s allowed.

Noether’s Theorem: Suppose there is some dynamical variable \(q\) that \(S\) depends on. Then we can locally approximate \(\Delta S = S(q, \ldots) + S_q \Delta q\), adding a phase to the wave function \(\psi \ra e^{i S_q \Delta q /\hbar} \psi\). This leaves physics unchanged if and only if \(S_q\) is a constant, such that this is a uniform global phase transformation.

But if \(q\) is a physical symmetry of the system, then it must lead to the same physics; therefore \(S_q\) is a constant throughout the system’s evolution (gauge fields notwithstanding). \(S_q\) is called the ‘Noether charge’ corresponding to the \(q\) symmetry. \(E\) is the charge associated with \(t\); \(\vec{p}\) for \(\vec{x}\), \(\vec{L}\) for \(\vec{\theta}\), etc. (Presumably \(q = S_{\vec{A}}\) but I’m not sure what that means yet.)

Gauge transformations:

Since physics is determined by an action integral like \(\int dP - i q A_\mu dx^\mu\) (writing \(dP = p_{\mu} dx^{\mu}\) for the momentum part of the action), any integrable (exact) form added to \(A\) and will only affect the action in a path-independent way:

\[\int_i^f dP - i (q A_\mu dx^\mu + d \Lambda) = P_i^f - i \Lambda_i^f - i q \int_i^f A_\mu dx^\mu\]

This performs a gauge transform that doesn’t affect the relative amplitudes of different paths between \(i \ra f\) – only the resulting phase. As such there is no way to observe this effect in a closed system, so the addition of \(d \Lambda\) is a free variable in the theory. However, it turns out to be important when considering interacting systems, in ways that I haven’t learned yet but will be very important in QFT.

Covariant Derivatives:

Given a wave function of the form \(\psi = e^{i/\hbar (p_{\mu} - i q A_{\mu}) x^{\mu}}\), we can extract the \(p_{\mu}\) term with a more involved derivative operator, the ‘covariant derivative’ \(D_{\mu} = \p_{\mu} + i q A_{\mu}\):

\[- i \hbar D_{\mu} \psi = - i \hbar (\p_{\mu} + i q A_{\mu}) \psi = p_{\mu} \psi\]

Roughly speaking, gauge fields will be added in QFT by assuming a freedom to add a spatial-coordinate-dependent phase factor \(e^{i \Lambda(x,t)}\) of this form. The covariant derivative is called such because it will change proportionally, as \(D_{\mu} \mapsto D_{\mu} + i \Lambda_{\mu}\), to remove the \(\Lambda\) factor and preserve the resulting physics.

The Born Rule: The probability of observing a state \(\alpha \| \psi \>\) is proportional to \(\| \alpha \|^2\). My pet theory is that this is due to destructive interference between histories of the observer. See my previous post.


  1. QM makes a lot more sense if you just take as granted the fact that the wave function has the form \(\psi = e^{i S/\hbar}\).
  2. Operators and inner products are ways of extract properties of \(S\).
  3. The Schrödinger equation for a free particle is a low-energy approximation of the statement that \(\| \p S \| = m\).
  4. The only free physical quantity in a wave function is the 4-vector \(\p S\), which measures which part of the variation in \(S\) is in the spatial vs timelike direction.
  5. Potentials enter by modifying \(\p S\), eg \(\p S \mapsto \p S - i q A\). \(\int_i^f dS = S(f) - S(i)\) may no longer hold depending on the properties of \(A\).
  6. Intrinsic angular momentum is a property of what kind of object the wave function’s value is (scalar, vector, spinor, etc).

Normally you have to unlearn QM to learn relativistic QM, but the relativistic version makes much more sense in the first place so why not start there?

Next up, spinors.