kuniga.me > NP-Incompleteness > Cepstrum

23 Oct 2021

Cepstrum is a signal transform that can be used in speech analysis for determining voice pitch, separate formants (transfer function of the vocal tract) from voiced and unvoiced sources, and in mechanics, for example to detect local faults in gear [1].

Cepstrum is a wordplay with Spectrum, obtained by reversing the first four letters, and was coined by John Tukey, pictured on the left. Another interesting fact is that Cepstrum was invented before the FFT [1].

In this post we’ll study the Cepstrum and why it’s a useful tool for dealing with human speech.

Recall from Discrete Filters, that the result of a filter with an impulse response $\vec{h}$ over an input signal $\vec{x}$ can be described by a convolution:

\[\vec{y} = \vec{x} * \vec{h}\]Also recall from the Source-Filter Model that the human voice can be modeled via a excitation source $\vec{e}$ (vocal chords, impulse train or noise) going through a filter (vocal tract, which amplifies specific frequencies) with impulse response $\vec{h}$:

\[\vec{x} = \vec{e} * \vec{h}\]We only have access to the output signal $\vec{x}$ and it would be desirable to extract the impulse response from this output by removing the noisy signal.

In *Discrete Filters* we learned about the *Convolution theorem*, which says that if $\vec{z} = \vec{x} * \vec{y}$, then their DFTF satisfy $\mathscr{F}(\vec{z}) = \mathscr{F}(\vec{x}) \mathscr{F}(\vec{y})$, that is, the convolution in the time domain becomes a product in frequency domain.

In a similar vein, we could define a transform $\mathscr{C}$ that turns the convolution into a sum:

\[\mathscr{C}(\vec{z}) = \mathscr{C}(\vec{x}) + \mathscr{C}(\vec{y})\]Such transforms are called **homomorphic** (the name means same + form: structure-preserving), and we’ll show the Cepstrum transform satisfies this property. We’ll also denote the Cepstrum of a signal $\vec{x}$ as $\vec{\hat x}$.

Suppose we apply the Cepstrum transform to an output signal $\vec{x}$ and now have:

\[\vec{\hat x} = \vec{\hat e} + \vec{\hat h}\]For voiced sounds in human speech, the Cepstrum has another important property which we’ll call **separability**. This means that there exists $N$ such that $\hat h_t \approx 0$ for $t \ge N$ and $\hat e_n \approx 0$ for $t < N$. So if we can find that $N$ and apply a filter to remove the entries with $t > N$ and obtain:

We can then apply the inverse of $\mathscr{C}$ to recover $\vec{h} = \mathscr{C}^{-1}(\vec{\hat h})$ or $\vec{e} = \mathscr{C}^{-1}(\vec{\hat e})$. This property is very useful for example in extracting the pitch from human voice.

In the rest of this post we’ll formalize the notion of Cepstrum and then prove its 2 main properties: homomorphism and separability.

The **complex Cepstrum** is defined as:

Recalling that $\mathscr{F}(\vec{x}, \omega)$ is the DTFT and is a function of $\omega$:

\[\mathscr{F}(\vec{x}, \omega) = \sum_{t = 0}^{N-1} x_t e^{-i \omega t} \quad -\pi \le \omega \le \pi\]We can represent the complex result of the DTFT in polar form, $\mathscr{F}(\vec{x}, \omega) = \abs{\mathscr{F}(\vec{x}, \omega)} e^{i\varphi(\omega)}$, where $\varphi(\omega)$ is the argument/phase of the complex number and depends on $\omega$. The complex natural logarithm is then:

\[\ln \mathscr{F}(\vec{x}, \omega) = \ln(\abs{\mathscr{F}(\vec{x}, \omega)}) + i \varphi(\omega)\]If we instead take the (real) logarithm of the absolute value, then result is called the **real Cepstrum**:

We can also express (1) using the inverse of the DTFT, $\mathscr{F}^{-1}$, which is defined as:

\[\mathscr{F}^{-1}(\lambda(\omega)) = \frac{1}{2 \pi} \int_{-\pi}^{\pi} \lambda(\omega) e^{i \omega t} d\omega\]So we set $\lambda(\omega) = \ln \abs{\mathscr{F}(\vec{x}, \omega)}$, we can write (1) as:

\[\hat x_t = \mathscr{F}^{-1}(\ln \abs{\mathscr{F}(\vec{x}, \omega)})\]Note that the *real Cepstrum* is **not** the real part of the *complex Cepstrum*. Since the real Cepstrum is more common in practice, when mentioning Cepstrum without qualifiers we’ll be referring to the real Cepstrum.

We can obtain the real cepstrum from the complex cepstrum via a simple identity, as state in *Proposition 1*.

**Proposition 1.** If $\vec{x}$ is real, both the *real* and *complex Cepstrum* are real. Further, let $\vec{\hat x}^{C}$ and $\vec{\hat x}^{(R)}$ be the complex and real cepstrum of $\vec{x}$ respectively. Then

*Proof.* See *Appendix*.

Recall that in Fourier transforms discussions we refer to the original signal $\vec{x}$ being in the *time domain*, and its Fourier transform $\mathscr{F}(\vec{x})$ being in the *frequency domain*.

Now we note that if we didn’t apply the logarithm in (1) before integrating, we’d be applying the inverse of the DTFT:

\[x_t = \frac{1}{2 \pi} \int_{-\pi}^{\pi} \mathscr{F}(\vec{x}, \omega) e^{i \omega t} d\omega\]And thus going back to the time domain. But because we did apply the log, the value $\vec{\hat x}$ doesn’t really correspond to the time domain, but instead this domain is defined as the **quefrency** domain, which is yet another word play, this time with “frequency”.

Let’s compute the Cepstrum of a convolution $\vec{z} = \vec{x} * \vec{y}$ and show that $\vec{\hat z} = \vec{\hat x} + \vec{\hat y}$. We know from the convolution theorem that:

\[\mathscr{F}(\vec{z}, \omega) = \mathscr{F}(\vec{x} * \vec{y}, \omega) = \mathscr{F}(\vec{x}, \omega) \mathscr{F}(\vec{y}, \omega)\]Representing them by polar forms we have:

\[\mathscr{F}(\vec{z}, \omega) = \abs{\mathscr{F}(\vec{x}, \omega)} \abs{\mathscr{F}(\vec{y}, \omega)} e^{i \varphi_x + \varphi_y}\]Where $\varphi_x$ and $\varphi_y$ are the phases of the DTFT of $\vec{x}$ and $\vec{y}$, respectively.

Taking the logarithm of the absolute value we obtain:

\[\ln \abs{\mathscr{F}(\vec{z}, \omega)} = \ln \abs{\mathscr{F}(\vec{x}, \omega)} + \ln \abs{\mathscr{F}(\vec{y}, \omega)}\]Plugging this into (1) we have:

\[\hat z_t = \frac{1}{2 \pi} \int_{-\pi}^{\pi} \ln \abs{\mathscr{F}(\vec{z}, \omega)} e^{i \omega t} d\omega = \frac{1}{2 \pi} \int_{-\pi}^{\pi} (\ln \abs{\mathscr{F}(\vec{x}, \omega)} + \ln \abs{\mathscr{F}(\vec{y}, \omega)}) e^{i \omega t} d\omega\]Which can be split into two integrals:

\[\hat z_t = \frac{1}{2 \pi} \int_{-\pi}^{\pi} \ln \abs{\mathscr{F}(\vec{x}, \omega)} e^{i \omega t} d\omega + \frac{1}{2 \pi} \int_{-\pi}^{\pi} \ln \abs{\mathscr{F}(\vec{y}, \omega)} e^{i \omega t} d\omega = \hat x_t + \hat y_t\]*QED*

The same can be proven for the complex Cepstrum.

We’ll show how to compute the Cepstrum of impulse response, $\vec{\hat h}$, from its rational transfer function $H(z)$. Recall that the DTFT is a special case of the Z-transform, so if we set $z = e^{i \omega}$, then $H(z)$ is the DTFT of the impulse response $\vec{h}$.

Thus we can compute the inverse DTFT of the log of $H(z)$ to obtain $\hat h_t$. By still assuming that $z = e^{i \omega}$, we can use the inverse Z-transform to compute the inverse DTFT of $\ln \abs{H(z)}$ and thus obtain $\vec{\hat h}$:

\[(3) \qquad \vec{\hat h} = \mathscr{Z}^{-1} (\ln \abs{H(z)}) \qquad z = e^{i \omega}\]In Z-transforms we defined the rational transfer function $H(z)$ as a function of its poles and zeros:

\[H(z) = b_0 \frac{\prod_{k = 0}^{M-1} (1 - z_k z^{-1})}{\prod_{k = 0}^{N-1} (1 - p_k z^{-1})}\]Recalling that if $z = z_k$ for any $k \in [0, M-1]$ then $H(z) = 0$ and if $z \rightarrow p_k$, then $H(z) \rightarrow \infty$.

Now we wish to rewrite these products such that all roots $z_k$ and $p_k$ have magnitude less or equal to 1. We split each product into two: for $k \in [0, M_i-1]$ for $\abs{z_k} \le 1$ and $k \in [0, M_o-1]$ for $\abs{z_k} > 1$. Note the index $i$ in $M_i$ stands for *inside* the unit circle and $o$ in $M_o$ as outside. We split the product in the denominator analogously for $N_i$ and $N_o$.

Now for $\abs{z_k} > 1$ we define $z’_k = \frac{1}{z_k}$. We then rewrite $(1 - z_k z^{-1})$ as $(1 - z’_k z)$ since when $z = z_k$, this expression will yield 0 as before. Thus we can rewrite as:

\[(4) \qquad H(z) = b_0 \frac{\prod_{k = 0}^{M_i-1} (1 - z_k z^{-1}) \prod_{k = 0}^{M_o-1} (1 - z_k' z)}{\prod_{k = 0}^{N_i-1} (1 - p_k z^{-1}) \prod_{k = 0}^{N_o-1} (1 - p'_k z)}\]Where $\abs{z_k}, \abs{z’_k}, \abs{p_k}, \abs{p’_k} \le 1$.

Let’s apply log to $H(z)$ and replace by (4). We’ll use the fact that given complex numbers $c$ and $d$, $\ln cd = \ln c + \ln d$.

\[(5) \quad \ln H(z) = \ln b_0 + \sum_{k = 0}^{M_i-1} \ln(1 - z_k z^{-1}) + \sum_{k = 0}^{M_o-1} \ln(1 - z_k' z) \\ - \sum_{k = 0}^{N_i-1} \ln(1 - p_k z^{-1}) - \sum_{k = 0}^{N_o-1} \ln(1 - p'_k z)\]We can use the Taylor series expansion to compute the natural logarithm of $\ln(1 - x)$ as in (6):

\[(6) \qquad \ln(1-x) = - \sum_{n=1}^{\infty} \frac{x^n}{n}\]Then the sum

\[\sum_{k = 0}^{M_i-1} \ln(1 - z_k z^{-1})\]becomes

\[\sum_{k = 0}^{M_i-1} \sum_{n=1}^{\infty} - \frac{(z_k z^{-1})^n}{n}\]We can swap the order of the sums and move $\frac{z^{-1}}{n}$ out:

\[- \sum_{n=1}^{\infty} \frac{z^{-n}}{n} \sum_{k = 0}^{M_i-1} z_k^n\]We do this for all the four sums in (5), so we can group by the outer sum:

\[\ln H(z) = \ln b_0 + \sum_{n=1}^{\infty} \frac{1}{n} \left( - z^{-n} \sum_{k = 0}^{M_i-1} z_k^n - z^n \sum_{k = 0}^{M_o-1} {z'}_k^n + z^{-n} \sum_{k = 0}^{N_i-1} p_k^n + z^n \sum_{k = 0}^{N_o-1} {p'}_k^n \right)\]We can group further the sums that have a factor $z^{-n}$ and $z^{n}$, respectively.

\[\ln H(z) = \ln b_0 + \sum_{n=1}^{\infty} \frac{1}{n} \left(z^{-n} (\sum_{k = 0}^{N_i-1} p_k^n - \sum_{k = 0}^{M_i-1} z_k^n) + z^n (\sum_{k = 0}^{N_o-1} {p'}_k^n - \sum_{k = 0}^{M_o-1} {z'}_k^n) \right)\]In this form, it now becomes a bit more apparent we have a power series:

\[\sum_{t = -\infty}^{\infty} \alpha_t z^{-t}\]We have $\alpha_0 = \ln b_0$. For $t > 0$, $\alpha_t$ are the sums being multiplied by $z^{-t}$, that is the sums multiplied by $z^{-n}$ when $n=t$:

\[\alpha_t = \frac{1}{t} (\sum_{k = 0}^{N_i-1} p_k^t - \sum_{k = 0}^{M_i-1} z_k^t)\]Similarly for $t < 0$ we have:

\[\alpha_t = \frac{1}{t} (\sum_{k = 0}^{N_o-1} {p'}_k^t - \sum_{k = 0}^{M_o-1} {z'}_k^t)\]It will soon become clear why we went through all this trouble to define $\ln H(z)$ as a power series.

In the post about Z-transforms we defined it as:

\[(7) \qquad \mathscr{Z}(\vec{x}, z) = \sum_{t = -\infty}^{\infty} x_t z^{-t}\]The inverse of the Z-transform is an operation such that when applied to the Z-transform allows us to recover $\vec{x}$:

\[x_t = \mathscr{Z}^{-1}(\mathscr{Z}(\vec{x}, z))\]We won’t cover the general definition of the inverse of the Z-transform in this post. We’ll focus on the special case where the result of a Z-transform can be written as a power series of $z$, then we can use (7) to recover $\vec{x}$.

For example, suppose the Z-transform $\mathscr{Z}(\vec{x}, z)$ is:

\[(8) \qquad a z^{-2} + b z^{-1} + c + d z + e z^{2}\]If we set $x_t$ to the coefficient corresponding to the power $z^{-t}$, or more explicitly: $x_{2} = a, x_{1} = b, x_{0} = c, x_{-1} = d, x_{-2} = e$, then we can pad the other entries with 0, and we obtain $\vec{x}$ such that $\mathscr{Z}(\vec{x}, z)$ is (8).

Thus, it’s very easy to find the inverse of the Z-transform for this special case.

By combining the discussions of the prior section, we can finally obtain $\hat h_t$:

\[\begin{equation} \hat h_t=\left\{ \begin{array}{@{}ll@{}} \ln b_0, & \text{if}\ t = 0 \\ \frac{1}{t} (\sum_{k = 0}^{M_i-1} z_k^t + \sum_{k = 0}^{N_i-1} p_k^t), & \text{if}\ t > 0 \\ \frac{1}{t} (\sum_{k = 0}^{M_o-1} {z'}_k^t - \sum_{k = 0}^{N_o-1} {p'}_k^t), & \text{if}\ t < 0 \\ \end{array}\right. \end{equation}\]Since we have $\abs{z_k}, \abs{z’_k}, \abs{p_k}, \abs{p’_k} \le 1$, for sufficiently large values of $\abs{t}$, we’ll be able to approximate $\hat h_{\abs{t}}$ to $0$.

Note that we used the definition of complex cepstrum to show this result, but this can be extended for the real cepstrum by the fact we can obtain the real cepstrum from the complex cepstrum via (2), that is, since $\hat h_{\abs{t}} \approx 0$ for large values of $\abs{t}$ then $\hat h_{t} + \hat h_{-t} \approx 0$.

In the source-filter model, the source can be either voiced or unvoiced. For voiced sounds, the vocal chords can be modeled as a periodic impulse train, that is at regular intervals we have a peak magnitude.

We can express such signals as a sum of impulse trains (see Reproducing Formula in Discrete Time Filters).

\[(9) \quad \vec{e} = \sum_{k = 0}^{\infty} \delta_{t - kN}\]Recall that $\delta_0 = 1$ and 0 elsewhere. We’ll have non-zero values for $x_t$ whenever $t = kN$ for $k \in \mathbb{Z}$, that is, for multiples of $N$. Thus $N$ represents the period of this signal.

**Proposition 2.** The Cepstrum of the impulse train (9) is

*Proof.* See *Appendix*.

This shows the Cepstrum of an impulse train is also an impulse train and has 0s for $t < N$. It’s also the case that it’s a decaying sequence due to the denominator $r$, but it decays more slowly than $\vec{\hat h}$.

To get a better sense on why, consider some arbitrary value of $r_0$ of $r$ and $t_0 = r_0N$. The denominator for $\hat e_{t_0}$ is $r_0$ while for $\hat h_{t_0}$ it’s $t_0$, which is $N$ times greater than $r_0$.

From the discussions above, for the voiced speech the Cepstrum of the source signal will have peaks for $t > N$, while for the filter, it gets attenuated as $t$ grows. Whether this attenuation drops rapidly enough to allow separating source and filter will depend on the coefficients of the filter.

If we consider the higher end of the human fundamental frequency, 300Hz, we get periods greater than 3.33ms, so we need the attenuation of the filter Cepstrum to happen significantly before this value. This seems to work reasonably well in practice though [5].

This post has been primarily based on the *Spoken: Language Processing* book. It’s a bit sparse in the details of the mathematical steps, so I had to complement with other sources like Wikipedia and some homework assignment solutions. Following the even-odd rabbit hole was particularly interesting, which merited a post on its own.

I’m looking forward to implementing this and trying with real instances.

*Proof of Proposition 1:*

We’ll leverage a result of Hermitian Functions, namely that any function $f: \mathbb{R^n} \rightarrow \mathbb{C}$ has a Hermitian decomposition and it’s unique. Hermitian functions are a general case of even-odd functions so we can use the same results.

According to the even-odd decomposition for $f(t) = x_t^{(C)}$, we can defined the even function:

\[f_e(t) = \frac{\hat x_t^{(C)} + \hat x_{-t}^{(C)}}{2}\]and an odd function:

\[f_o(t) = \frac{\hat x_t^{(C)} - \hat x_{-t}^{(C)}}{2}\]Such that $x_t^{(C)} = f_e(t) + f_o(t)$.

Thus if we can show that the real cepstrum ($x^{(R)}_t$) is an even function and that the imaginary cepstrum ($x^{(I)}_t$) is an odd function and that adding them together yields the complex cepstrum, this would imply that $x^{(R)}_t = f_e(t)$ and $x^{(I)}_t = f_o(t)$, since even-odd decompositions are unique.

We now show that the real cepstrum is an even function in regards to $t$. In Hermitian Functions, we showed that the modulus of a Fourier transform is an even function:

\[\abs{\mathscr{F}(\vec{x}, \omega)} = \overline{\abs{\mathscr{F}(\vec{x}, -\omega)}}\]which is clearly preserved under the logarithm operation:

\[\ln \abs{\mathscr{F}(\vec{x}, \omega)} = \ln \abs{\mathscr{F}(\vec{x}, -\omega)}\]Let’s expand $e^{i \omega t} = \cos(\omega t) + i \sin(\omega t)$. We note that

\[\ln \abs{\mathscr{F}(\vec{x}, \omega)} \sin(\omega t) = - \ln \abs{\mathscr{F}(\vec{x}, -\omega)} \sin(-\omega t)\]Which means that if we integrate the above over $\omega \in [-\pi, \pi]$, the negative values of $\omega$ will cancel out the positive ones, so:

\[\int_{-\pi}^{\pi} \ln \abs{\mathscr{F}(\vec{x}, \omega)} \sin(\omega t) d\omega = 0\]This means the real cepstrum can be written as:

\[\frac{1}{2 \pi} \int_{-\pi}^{\pi} \ln \abs{\mathscr{F}(\vec{x}, \omega)} \cos(\omega t) d\omega\]Since $\cos(\omega t)$ is even in regards to $t$ and the sum of even functions is even, we conclude that $x^{(R)}_t$ an even function.

Now let’s show that the imaginary cepstrum, $x^{(I)}_t$, is an odd function. The imaginary cepstrum is the complex Cepstrum by taking the imaginary part of the complex natural logarithm:

\[\frac{1}{2 \pi} \int_{-\pi}^{\pi} i \varphi(\omega) e^{i \omega t} d\omega\]In *Hermitian Functions* we showed that $\varphi(\omega)$ is an odd function, $\varphi(\omega) = - \varphi(-\omega)$. We note that

Which means that if we integrate the above over $\omega \in [-\pi, \pi]$, the negative values of $\omega$ will cancel out the positive ones, so:

\[\int_{-\pi}^{\pi} \varphi(\omega) \cos(\omega t) d\omega = 0\]This means the imaginary cepstrum can be written as:

\[\frac{1}{2 \pi} \int_{-\pi}^{\pi} i^2 \varphi(\omega) \sin(\omega t) d\omega = -\frac{1}{2 \pi} \int_{-\pi}^{\pi} \varphi(\omega) \sin(\omega t) d\omega\]Since $\sin(\omega t)$ is odd in regards to $t$ and the sum of odd functions is odd, we conclude that $x^{(I)}_t$ is an odd function.

It’s also clear that $x_t^{(C)} = x^{(R)}_t + x^{(I)}_t$, which means $x^{(R)}_t$ and $x^{(I)}_t$ form an even-odd decomposition, which then proves

\[f_e(t) = x^{(R)}_t = \frac{\hat x_t^{(C)} + \hat x_{-t}^{(C)}}{2}\]Note that we have also shown that $x^{(R)}_t, x^{(I)}_t, x_t^{(C)}$ are all real.

*QED*

*Proof of Proposition 2:*

To show that the Cepstrum of (9) is

\[\vec{\hat e} = \sum_{r = 1}^{\infty} \frac{\delta_{t - rN}}{r}\]we’ll first prove a different result. We’ll compute the Cepstrum of

\[\vec{e} = \sum_{k = 0}^{M - 1} \alpha^k \delta_{t - kN}\]For $\alpha < 1$ and some integer $M$. Applying the Z-transform we get:

\[\mathscr{Z}(\vec{e}, z) = \sum_{t = -\infty}^{\infty} e_t z^{-t}\]If we consider only the entries where $e_t$ is non-zero, that is, for $t = kN$, $k \in [0, M-1]$, we have:

\[\mathscr{Z}(\vec{e}, z) = 1 + \alpha z^{-N} + (\alpha z^{-N})^2 + \cdots + (\alpha z^{-N})^{M-1}\]which is a geometric series and can thus be written as:

\[\mathscr{Z}(\vec{e}, z) = \frac{1 - (\alpha z^{-N})^M}{1 - \alpha z^{-N}}\]Applying the log:

\[\ln \mathscr{Z}(\vec{e}, z) = \ln (1 - (\alpha z^{-N})^M) - \ln (1 - \alpha z^{-N})\]Using the Taylor expansion,

\[\ln \mathscr{Z}(\vec{e}, z) = \sum_{r = 1}^{\infty} \frac{(\alpha^r}{r} z^{-Nr} - \sum_{r = 1}^{\infty} \frac{\alpha^{Mr}}{r} z^{-NMr}\]We can re-arrange the terms so that it looks like a power series on $z^{-1}$:

\[\ln \mathscr{Z}(\vec{e}, z) = \sum_{t=0}^{\infty} p^t z^{-t}\]As we discussed in *The Inverse of the Z-transform for Power Series*, we know that the coefficients of this power series $\vec{p}$ is also the inverse Z-transform:

Which is exactly $\vec{\hat e}$ from (3).

The non-zero entries for $\hat e_t$ in the first sum are $t = Nr$, while for the second sum is $t = MNr$.

We now assume $M \rightarrow \infty$, so we’ll never really encounter the non-zero entries for the second sum, so we can effectively ignore them. Further we can now assume $\alpha \rightarrow 1$.

Thus $\hat e_t = \frac{1}{r}$ when $t = Nr$ for $r = 1, \cdots$ or in terms of the reproducing formula:

\[\vec{\hat e} = \sum_{r = 1}^{\infty} \frac{\delta_{t - rN}}{r}\]*QED*

- [1] A history of cepstrum analysis and its application to mechanical problems - Robert Randall
- [2] Spoken: Language Processing - X. Huang, A. Acero and H. Hon
- [3] Department of Electrical and Computer Engineering Digital Speech Processing - Homework No. 6 Solutions
- [4] Facts About Speech Intelligibility
- [5] Short-Time Spectrum and “Cepstrum” Techniques for Vocal-Pitch Detection - A. M. Noll
- [6] Mathematics - DTFT of Impulse train is equal to 0 through my equation.