2 The Exponential Family

2.1 The Natural Exponential Family

We will assume that the observations come from a distribution in the natural exponential family of distributions. This means that the probability density function (pdf) can be written in the form:

\[\begin{equation} f(y_{i})=\exp\left\{\frac{y_{i}\theta_{i}-b(\theta_{i})}{a_{i}(\phi)}+c(y_{i},\phi)\right\} \tag{2.1} \end{equation}\]

Here \(\theta_{i}\) and \(\phi\) are parameters and \(a_{i}(\phi)\), \(b(\theta_{i})\) and \(c(y_{i},\phi)\) are known functions. In all models considered here the function \(a_{i}(\phi)\) has the form \[a_{i}(\phi)=\frac{\phi}{p_{i}},\] where \(p_{i}\) is a known prior weight, often \(1\).

The parameters \(\theta_{i}\) and \(\phi\) are essentially location and scale parameters. It can be shown that if \(Y_{i}\) has a distribution in the exponential family then it has mean and variance:

\[\begin{eqnarray} \mathbf{E}(Y_{i})&=&\mu_{i}=b'(\theta_{i})\\ var(Y_{i})&=&\sigma_{i}^{2}=b''(\theta_{i})a_{i}(\phi) \end{eqnarray}\]

where \(b'(\theta_{i})\) and \(b''(\theta_{i})\) are the first and second derivatives of \(b(\theta_{i})\).

The exponential family just defined includes lots of very useful special cases, including the Normal, Binomial, Poisson, Exponential, Gamma and Inverse Gaussian distributions.

2.2 The Exponential Family (more general definition)

Previously, we have considered the definition of a member of the natural exponential family of distributions, which means that they can be expressed in the form:

\[f(x)=\exp\left(\frac{\theta x-b(\theta)}{a(\psi)}+c(x,\psi)\right)\] Distributions of this form have canonical link functions (will be introduced in Sections 4.1 and 4.4). However, more formally, a member of the exponential family of distributions can be written as: \[f(x)=\exp\left(\psi^{t}T(x)-A(\psi)+q(x)\right)\] where \(T(x)\) is a sufficient statistic for the distribution. The easiest way to calculate the sufficient statistic is actually to calculate the minimal sufficient statistic: \(T(x)\) is a minimal sufficient statistic if: \[\frac{L(\mathbf{x}_{n}|\theta)}{L(\mathbf{y}_{n}|\theta)} \textrm{is not a function of }\theta \Leftrightarrow T(\mathbf{x}_{n})=T(\mathbf{y}_{n})\]

The likelihood function \(L(\mathbf{x}_{n}|\theta)=L(x_{1},x_{2},\ldots,x_{n}|\theta)\) of an iid sample from any distribution \(f(x|\theta)\) is: \[L(\mathbf{x}_{n}|\theta)=L(x_{1},\ldots,x_{n}|\theta)=\prod_{i=1}^{n}f(x_{i}|\theta).\] So, for the Normal distribution: \[L(x|\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi \sigma^{2}}}\exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right)\] For \(\mathbf{x}_{n}=(x_{1},x_{2},\ldots,x_{n})\) iid random samples from the same Normal distribution: \[L(\mathbf{x}_{n}|\mu,\sigma^{2}) = \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi \sigma^{2}}}\exp\left(-\frac{(x_{i}-\mu)^{2}}{2\sigma^{2}}\right)= \left(\frac{1}{\sqrt{2\pi \sigma^{2}}}\right)^{n}\prod_{i=1}^{n}\exp\left(-\frac{(x_{i}-\mu)^{2}}{2\sigma^{2}}\right)\] and \[L(\mathbf{y}_{n}|\mu,\sigma^{2}) = \left(\frac{1}{\sqrt{2\pi \sigma^{2}}}\right)^{n}\prod_{i=1}^{n}\exp\left(-\frac{(y_{i}-\mu)^{2}}{2\sigma^{2}}\right)\] Hence: \[\begin{eqnarray} \frac{L(\mathbf{x}_{n}|\theta)}{L(\mathbf{y}_{n}|\theta)} &=& \frac{\prod_{i=1}^{n}\exp\left(-\frac{(x_{i}-\mu)^{2}}{2\sigma^{2}}\right)}{\prod_{i=1}^{n}\exp\left(-\frac{(y_{i}-\mu)^{2}}{2\sigma^{2}}\right)}\nonumber\\ &=& \prod_{i=1}^{n} \exp\left(-\frac{(x_{i}-\mu)^{2}}{2\sigma^{2}}+\frac{(y_{i}-\mu)^{2}}{2\sigma^{2}}\right)\nonumber\\ &=& \exp\left(\sum_{i=1}^{n}-\frac{(x_{i}-\mu)^{2}}{2\sigma^{2}}+\frac{(y_{i}-\mu)^{2}}{2\sigma^{2}}\right)\nonumber\\ &=& \exp\left(\sum_{i=1}^{n}-\frac{x_{i}^{2}}{2\sigma^{2}}+\frac{2\mu x_{i}}{2\sigma^{2}}-\frac{\mu^{2}}{2\sigma^{2}}+\frac{y_{i}^{2}}{2\sigma^{2}}-\frac{2\mu y_{i}}{2\sigma^{2}}+\frac{\mu^{2}}{2\sigma^{2}}\right)\nonumber\\ &=&\exp\left(\sum_{i=1}^{n}\frac{y_{i}^{2}-x_{i}^{2}}{2\sigma^{2}}+\frac{\mu(x_{i}-y_{i})}{\sigma^{2}}\right)\nonumber \end{eqnarray}\]

So \[\frac{L(\mathbf{x}_{n}|\theta)}{L(\mathbf{y}_{n}|\theta)}\] is constant with respect to \(\mu\) and \(\sigma^2\) if and only if \(\sum_{i}x_{i}=\sum_{i}y_{i}\) and \(\sum_{i}x_{i}^{2}=\sum_{i}y_{i}^{2}\). Therefore the minimal sufficient statistic for a \(N(\mu; \sigma^{2})\), where both \(\mu\) and \(\sigma^2\) are unknown is \(T(\mathbf{x}_{n}) = (\sum_{i}x_{i},\sum_{i}x_{i}^{2})\).

If \(\sigma^2\) was known, then only \(\sum_{i}\mu(x_{i}-y_{i})\) needs to be made constant with respect to \(\mu\) – so the minimal sufficient statistic for \(\mu\) if \(\sigma^2\) is known is \(\sum_{i}x_{i}\).

2.3 Proof that the Beta distribution is a member of the Exponential Family

Consider the \(beta(a,b)\) distribution: \[L(\mathbf{x}_{n}|a,b)=\prod_{i=1}^{n}\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x_{i}^{a-1}(1-x_{i})^{b-1}\] The (minimal) sufficient statistic is: \[\begin{eqnarray} \frac{L(\mathbf{x}_{n}|a, b)}{L(\mathbf{y}_{n}|a,b)} &=& \frac{\prod_{i=1}^{n}\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x_{i}^{a-1}(1-x_{i})^{b-1}}{\prod_{i=1}^{n}\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}y_{i}^{a-1}(1-y_{i})^{b-1}} \nonumber\\ &=& \frac{\left(\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\right)^n}{\left(\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\right)^n}\frac{\prod_{i=1}^{n}x_{i}^{a-1}(1-x_{i})^{b-1}}{\prod_{i=1}^{n}y_{i}^{a-1}(1-y_{i})^{b-1}}\nonumber \\ & = & \frac{\prod_{i=1}^{n}x_{i}^{a-1}(1-x_{i})^{b-1}}{\prod_{i=1}^{n}y_{i}^{a-1}(1-y_{i})^{b-1}}\nonumber \\ \log\left(\frac{L(\mathbf{x}_{n}|a, b)}{L(\mathbf{y}_{n}|a,b)}\right) &=& \log\left(\frac{\prod_{i=1}^{n}x_{i}^{a-1}(1-x_{i})^{b-1}}{\prod_{i=1}^{n}y_{i}^{a-1}(1-y_{i})^{b-1}}\right)\nonumber\\ &=& \sum_{i=1}^{n}\log(x_{i})^{a}+\sum_{i=1}^{n}\log(1-x_{i})^{b-1}-\sum_{i=1}^{n}\log(y_{i})^{a}-\sum_{i=1}^{n}\log(1-y_{i})^{b-1}\nonumber \end{eqnarray}\] This is constant with respect to \(a\) if and only if \(\sum_{i=1}^{n}\log(x_{i})=\sum_{i=1}^{n}\log(y_{i})\) and constant with respect to \(b\) if and only if \(\sum_{i=1}^{n}\log(1-x_{i})^{b-1}=\sum_{i=1}^{n}\log(1-y_{i})^{b-1}\). Thus \[T(\mathbf{x}_{n})=\left(\sum_{i=1}^{n}\log(x_{i}),\sum_{i=1}^{n}\log(1-x_{i})\right)\] If \(n=1\) (a single observation): \[T(x)=\left(\log(x),\log(1-x)\right)=\left(\begin{array}{c}\log(x)\\ \log(1-x) \end{array}\right)\] so: \[\begin{eqnarray} f(x|a,b) &=& \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{a-1}(1-x)^{b-1}\nonumber\\ \log f(x|a,b)&=& \log \Gamma(a+b)- \log \Gamma(a) - \log \Gamma(b) +(a-1)\log x +(b-1)\log(1-x)\nonumber\\ f(x|a,b) &=& \exp\left\{a\log x - \log x +b\log(1-x) -\log(1-x) \right. \nonumber\\ & & \left.\log \Gamma(a+b)- \log \Gamma(a) - \log \Gamma(b)\right\}\nonumber\\ &=& \exp\left\{ (a\ b)\left(\begin{array}{c}\log x\\ \log(1-x)\end{array} \right)-\left(\begin{array}{c}\log x\\ \log(1-x)\end{array} \right)-A(a\ b)\right\}\nonumber \end{eqnarray}\]

where \(\psi^{t}=(a\ b)\) and \[A(\psi)=A(a\ b) =\log \Gamma(a) + \log \Gamma(b) - \log \Gamma(a+b)\] and: \[q(x)=-\left(\begin{array}{c}\log x\\ \log(1-x)\end{array} \right)\] so that \[f(x|a,b) = \exp\left\{ \psi^{t}T(x)-A(\psi)+q(x)\right\}\textrm{ as required.}\]

2.4 Example: Normal distribution is a member of the Natural Exponential Family

The Normal distribution has density: \[f(y_{i})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left\{\frac{-1}{2}\frac{(y_{i}-\mu_{i})^{2}}{\sigma^{2}}\right\}.\] Recall that, to show that a distribution is a member of the natural exponential family, it must be possible to write it in the form: \[f(y_{i})=\exp\left\{\frac{y_{i}\theta_{i}-b(\theta_{i})}{a_{i}(\phi)}+c(y_{i},\phi)\right\}\] Now rewriting the density: \[f(y_{i})=\exp\left\{\frac{-1}{2}\frac{(y_{i}-\mu_{i})^{2}}{\sigma^{2}}-\frac{1}{2}\log\left(2\pi\sigma^{2}\right) \right\}\] Expanding the square in the exponent we get \[(y_{i}-\mu_{i})^{2}=y_{i}^{2}+\mu_{i}^{2}-2y_{i}\mu_{i}.\] The coefficient of \(y_{i}\) is \(\frac{\mu_{i}}{\sigma^{2}}\). This result identifies \(\theta_{i}\) as \(\mu_{i}\) and \(\phi\) as \(\sigma^{2}\), with \(a_{i}(\phi)=\phi\). \[f(y_{i})=\exp\left\{\frac{y_{i}\mu_{i}-\frac{1}{2}\mu_{i}^{2}}{\sigma^{2}}-\frac{y_{i}^{2}}{2\sigma^{2}}-\frac{1}{2}\log\left(2\pi\sigma^{2}\right) \right\}.\]

\[f(y_{i})=\exp\left\{\frac{y_{i}\theta_{i}-b(\theta_{i})}{a_{i}(\phi)}+c(y_{i},\phi)\right\}\]

Comparing these equations and noting that \(\mu_{i}=\theta_{i}\), then \[b(\theta_{i})=\frac{1}{2}\theta_{i}^{2},\qquad b'(\theta_{i})=\theta_{i}\qquad \textrm{and }b''(\theta_{i})=1.\]
Thus the mean and the variance then are: \[\begin{eqnarray} \mathbf{E}(Y_{i})&=&\mu_{i}=b'(\theta_{i})=\mu_{i}\nonumber\\ var(Y_{i})&=&\sigma_{i}^{2}=b''(\theta_{i})a_{i}(\phi)=\sigma^{2}.\nonumber \end{eqnarray}\]

2.5 In the Exponential Family

Distributions that can be shown to be members of the Exponential Family of Distributions:

Normal / Gaussian distribution

\[f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(\frac{-\left(x-\mu\right)^{2}}{2\sigma^{2}}\right)\]
Exponential distribution

\[f(x)=\left\{\begin{array}{cr}\lambda\exp(-\lambda x),& x\geq0\\0,& x<0 \end{array}\right. \]
Bernouilli distribution

\[f(x)=p^{x}(1-p)^{1-x} \ x\in \left\{0,1\right\} \]
Binomial distribution

\[f(x)=\left(\begin{array}{c}n \\ x \end{array}\right)p^{x}(1-p)^{n-x},\ x=0,1,2,\ldots,n\]
Poisson distribution

\[f(x)=\left\{\begin{array}{cr}\frac{\lambda^{x} \exp(-\lambda)}{x!},& \lambda>0,\ x=0,1,2,\ldots\\ 0, & \textrm{otherwise}\end{array}\right.\]

Geometric distribution

\[f(x)=(1-p)^{x-1}p\ \textrm{for} \ x=1,2,3,\ldots \]
Gamma distribution

\[f(x)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}\exp(-\beta x)\ \textrm{for } x, \alpha, \beta >0 \]

\(\chi^{2}\) distribution

\[f(x)=\left\{\begin{array}{cr}\frac{1}{2^{k/2}\Gamma\left(k/2\right)}x^{\frac{k}{2}-1}\exp(\frac{-x}{2}),& x\geq 0\\ 0, & x<0\end{array}\right. \]

Beta distribution

\[f(x)=\frac{\Gamma\left(\alpha+\beta\right)}{\Gamma\left(\alpha\right)\Gamma\left(\beta\right)}x^{\alpha-1}(1-x)^{\beta-1}\]

Weibull (with known shape parameter (\(k\))) distribution

\[f(x)=\left\{\begin{array}{cr}\frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}\exp\left(-\left(\frac{x}{\lambda} \right)^{k}\right),& x\geq 0\\ 0, & x<0 \end{array}\right.\]

Inverse Gaussian distribution

\[f(x)=\left[\frac{\lambda}{2\pi x^{3}}\right]^{\frac{1}{2}}\exp\left(\frac{-\lambda\left(x-\mu\right)^{2}}{2\mu^{2}x}\right)\ x>0,\ \mu>0\ \textrm{(mean)}, \lambda>0\ \textrm{(shape)}\]

Negative binomial distribution (known \(r\))

\[f(x)=\left(\begin{array}{c}x+r-1\\x\end{array}\right)p^{r}(1-p)^{x}\ \ \textrm{for}\ x=0,1,2,\ldots \]

Multinomial distribution

\[f(x_{1},\ldots,x_{k})=\left\{\begin{array}{cr}\frac{n!}{x_{1}!\cdots x_{k}!}p_{1}^{x_{1}}\cdots p_{k}^{x_{k}},& \textrm{when} \sum_{i=1}^{k}x_{i}=n\\ 0, & \textrm{otherwise} \end{array}\right. \]

Dirichlet distribution (complicated distribution used as a conjugate prior for the multinomial distribution in Bayesian statistics)

Which of these can be shown to be in the Natural Exponential Family of distributions?

A youtube playlist can be found here