The canonical example of quantum mechanics in action is the harmonic oscillator, which is something like a mass on a spring. In classical mechanics, it wobbles back and forth periodically when it is given energy, if it's at a position , wobbling about $x_{0}$ and moving with velocty $v$ we can say its energy contains a potential term $V$ proportional to $(x - x_{0})^{2}$ , and a kinetic term $T$ proportional to $v^{2}$ , with an overall form:

$E = \frac{1}{2} k (x - x_{0})^{2} + \frac{1}{2} m v^{2}$

We could try and find a distribution over $x$ and $v$ , but continuous distributions tend not to "play well" with entropy. They're dependent on a choice of characteristic unit. Instead we'll go to the quantum world.

One of the major results of quantum mechanics is that systems like this can only exist in certain energy levels. In the harmonic oscillator these levels are equally-spaced, with a spacing proportional to the frequency associated with the classical oscillator. Since the levels are equally-spaced, we can think about the energy coming in discrete units called "phonons".

Our beliefs about the number of phonons $N$ in our system can be expressed as a probability distribution $P (N = n)$ over $n \in N$ :

This is progress: we've reduced an uncountably infinite set of states to a countable one, which is a factor of infinity! But if we do our normal trick and try to find the maximum entropy distribution, we'll still hit a problem: we get $P (N = n) = 0$ for all $n \in N$ .

The Trick: Distribution Families

Thinking back to our previous post, an answer presents itself: phonons are a form of energy, which is conserved. Since we're uncertain over $N$ , we'll place a restriction on $E (N)$ of our distribution. We can solve the specific case here, but it's actually more useful to solve the general case.

Maths, Lots of Maths, Skippable:

Consider a set of states of a system $s \in S$ . To each of these we assign a real numeric value written as $s \to x_{s} \in R$ . We also assign a probability $s \to p_{s} \in R^{+}$ constrained by the usual condition $\sum s \in S p_{s} = 1$ .

Next, define $E (X) = \sum s \in S p_{s} x_{s}$ and $H (S) = - \sum s \in S p_{s} ln p_{s}$ .

Imagine we perform a transformation to our distribution, such that the distribution is still valid and $E (X)$ remains the same. We will consider an arbitrary transformation over elements ${1, 2, 3}$ :

$p_{1} \to p_{1} + d p_{1}, p_{2} \to p_{2} + d p_{2}, p_{3} \to p_{3} + d p_{3}$

$d p_{1} + d p_{2} + d p_{3} = 0$

$x_{1} d p_{1} + x_{2} d p_{2} + x_{3} d p_{3} = 0$

Now let us assume that our original distribution was a minimum of $H (S)$ , which can also be expressed as $d H (S) = 0$ .

$d H (S) = - d (p_{1} ln p_{1} + p_{2} ln p_{2} + p_{3} ln p_{3}) = 0$

$d (p_{1} ln p_{1} + p_{2} ln p_{2} + p_{3} ln p_{3}) = 0$

$ln p_{1} d p_{1} + d p_{1} + ln p_{2} d p_{2} + d p_{2} + ln p_{3} d p_{3} + d p_{3} = 0$

$ln p_{1} d p_{1} + ln p_{2} d p_{2} + ln p_{3} d p_{3} = 0$

The solution for this to be equal to zero in all cases is the following relation:

$ln p_{s} \propto x_{s} + c o n s t ⟹ p_{s} = A exp (- B x_{s})$

We can plug this back into our equation to verify that we do in fact get zero:

$ln A (d p_{1} + d p_{2} + d p_{3}) - B (x_{1} d p_{1} + x_{2} d p_{2} + x_{3} d p_{3}) = 0 - 0 = 0$

The choice of a negative value for $B$ is so that our distribution converges when values of $x_{s}$ extend up to $\infty$ , which is common for things like energy. We will then get a distribution with the following form:

$P (S = s) = A exp (- B \times x_{s})$

Where $B$ parameterizes the shape of the distribution and $A$ normalizes it such that our probabilities sum to $1$ . We might want to write down $A$ in terms of $B$ :

$A = 1 / (\sum s \in S exp (- B \times x_{s}))$

But we will actually get more use out of the following function $Z = 1 / A$ :

$Z = \sum s \in S exp (- B \times x_{s})$

First consider the derivative $\frac{d Z}{d B}$ :

$\frac{d Z}{d B} = \sum s \in S - x_{s} exp (- B \times x_{s})$

$\frac{d Z}{d B} = - Z \sum s \in S x_{s} p_{s}$

Which gives us the remarkable result:

$E (X) = - \frac{1}{Z} \frac{d Z}{d B} = - \frac{d ln Z}{d B}$

We can also expand out the value of $H (S)$ :

$H (S) = - \sum s \in S p_{s} ln p_{s}$

$H (S) = - \sum s \in S p_{s} ln (exp (- B \times x_{s}) / Z)$

$H (S) = - \sum s \in S p_{s} (- B \times x_{s} - ln Z)$

$H (S) = \sum s \in S (B \times p_{s} x_{s} + p_{s} ln Z)$

$H (S) = B \sum s \in S p_{s} x_{s} + ln Z \sum s \in S p_{s}$

$H (S) = B E (X) + ln Z$

$H (S) = - B \frac{d ln Z}{d B} + ln Z$

And get this in terms of $Z$ too! We also get one of the most important results from all of statistical mechanics:

$\frac{d H (S)}{d E (X)} = B + E (X) \frac{d B}{d E (X)} + \frac{d ln Z}{d E (X)}$

Now use the substitution:

$E (X) \frac{d B}{d E (X)} = - \frac{1}{Z} \frac{d Z}{d B} \frac{d B}{d E (X)} = - \frac{1}{Z} \frac{d Z}{=} d E (X) = - \frac{d ln Z}{d E (X)}$

To get our final result:

$\frac{d H (S)}{d E (X)} = B$

So $B$ is not "just" a parameter for our distributions, it's actually telling us something about the system. As we saw last time, finding the derivative of entropy with respect to some constraint is absolutely critical to finding the behaviour of that system when it can interface with the environment.

<\Maths>

To recap the key findings:

The probability of a system state $s$ with value $x_{s}$ is proportional to $exp (- B \times x_{s})$
This parameter $B$ is also the (very important to specify) value of $\frac{d H (S)}{d E (X)}$
We can define a function $Z (B) = \sum s \in S exp (- B \times x_{s})$
$E (X) = - \frac{d ln Z}{d B}$
$H (S) = - B \frac{d ln Z}{d B} + ln Z$

Which we can now apply back to the harmonic oscillator.

Back to the Harmonic Oscillator

So we want to find a family of distributions over $n \in N \equiv N$ . We can in fact assign a real number to each value of n, trivially (the inclusion $N ∋ n ↪ n \in R$ if you want to be fancy). Now we know that our distribution over $N$ must take the form:

$P (N = n) = A exp (- B \times n)$

But we also know that the most important thing about our system is the value of our partition function $Z (B)$ :

$Z = \infty \sum n = 0 exp (- B \times n)$

Which is just the sum of a geometric series with $a = 1$ , $r = e^{- B}$ :

$Z = \frac{1}{1 - e^{- B}}$

$ln Z = - ln (1 - e^{- B})$

Which gives us $E (N)$ and $H (N)$ in terms of $B$ :

$E (N) = \frac{e^{- B}}{1 - e^{- B}}$

$H (N) = B \frac{e^{- B}}{1 - e^{- B}} - ln (1 - e^{- B})$

T instead of B

Instead of $B$ , we usually use a variable $T = 1 / B$ for a few reasons. If we want to increase the amount of $X$ in our system (i.e. increase $E (X)$ ) we have to decrease the value of $B$ , whereas when $B$ gets big, $E (X)$ just approaches the minimum value of $x_{s}$ and our probability distribution just approaches uniform over the corresponding $s$ . Empirically, $T$ is often easier to measure for physical systems, and variations in $T$ tend to feel more "linear" than variations in $B$ .

Let's plot both $E (N)$ and $H (N)$ of our systems as a function of $T$ :

$E (N)$ converges on the line $T - \frac{1}{2}$ . Rather pleasingly the energy of a quantum harmonic oscillator is actually proportional to $N + \frac{1}{2}$ , not $N$ . This little correction is called the "zero point energy" and is another fundamental result of quantum mechanics. If we plot the energy $E$ instead of $E (N)$ , it will converge on $T$ . $H (N)$ converges on $ln (T) - 1$ .

These are general rules. $E$ is in general proportional to $T$ , and $H$ is almost always

So far we've ignored the fact that our values of $N$ actually correspond to energy, and therefore there must be a spacing involved. What we've been calling $T$ so far should actually be called $\frac{T}{E_{p}}$ where $E_{p}$ is the energy of a single phonon. This is the spacing of the ladder of energy levels.

If we swap $\frac{T}{E_{p}}$ into our equations and also substitute in the energy $E = E_{p} (E (N) + \frac{1}{2})$ (we will omit the $E$ when talking about energy) we get the following equations:

$E = E_{p} (\frac{e^{- E_{p} / T}}{1 - e^{- E_{p} / T}} + \frac{1}{2})$

$H (N) = \frac{1}{T} \frac{e^{- E_{p} / T}}{1 - e^{- E_{p} / T}} - ln (1 - e^{- E_{p} / T})$

Both functions now have a "burn-in" region around $T = 0$ , where the function is flat at zero. This is important. This region is common to almost all quantum thermodynamic systems, and it corresponds to a phenomenon when $T ≪ E_{p}$ . When this occurs the exponential term $e^{- E_{p} / T}$ can be neglected for all states except the lowest energy one:

$ln Z = ln (e^{- B E_{m i n}}) \approx - B E_{m i n} ∴ E = \frac{d ln Z}{d B} \approx E_{m i n}$

Showing $E$ doesn't respond to changes in $T$ . This is the same as saying that the system has a probability $\approx 1$ of being in the lowest energy state, and therefore of having $E = E_{m i n}$ .

True Names

$T$ stands for temperature. Yep. The actual regular temperature appears as the inverse of a constant we've used to parameterized our distributions. $B$ is usually called $β$ in thermodynamics, and is sometimes called the "inverse temperature".

In thermodynamics, the energy of a system has a few definitions. What we've been calling $E$ should properly be called $U$ , which is the internal energy of a system at constant volume.

Entropy in thermodynamics has the symbol $S$ . I've made sure to use a roman $H$ for our entropy because $H$ (italic) in thermodynamics is a sort of adjusted version of energy called "enthalpy".

In normal usage, temperature has different units to energy which is because, if written as energy, the temperature would be a very small number. It is also because they were discovered separately. Temperature is measured in Kelvin $K$ , which are converted to energy's Joules $J$ with something known as the Boltzmann constant $k_{B}$ . For historical reasons which are absolutely baffling, thermodynamics makes the choice to incorporate this conversion into their units of $S$ , so $S = k_{B} H (s y s t e m)$ . This makes entropy far, far more confusing than it needs to be.

Anyway, there are two reasons why I have done this:

I want to avoid cached thoughts. If you already know what energy and entropy are in a normal thermodynamic context, you risk not understanding the system properly in terms of stat mech
I want to extend stat mech beyond thermodynamics. I will be introducing a framework for understanding agents in the language of stat mech around the same time this post goes up.

Conclusions

Maximum-entropy distributions with constrained $E (X)$ always take the form $e^{- B x}$
This $B$ represents the derivative $\frac{d H (s y s t e m)}{d E (X)}$ , if $X$ represents energy, we can write $B$ as $β$
$B$ is inverse to $T$ which, if $X$ is energy, is the familiar old temperature of the system

We have learnt how to apply these to one of the simplest systems available. Next time we will try them on a more complex system.

LESSWRONG
is fundraising!
LW