4 The GLM Model
Let \(y_{1},\ldots,y_{n}\) denote \(n\) independent observations on a response. We treat \(y_{i}\) as a realisation of a random variable \(Y_{i}\). In the general linear model we assume that \(Y_{i}\) has a normal distribution with mean \(\mu_{i}\) and variance \(\sigma^{2}\) \[Y_{i}\sim N(\mu_{i},\sigma^{2}),\] and we further assume that the expected value \(\mu_{i}\) is a linear function of \(p\) predictors that take values \(\mathbf{x}'_{i}=(x_{i1},\ldots,x_{ip})\) for the \(i^{th}\) case, so that \[\mu_{i}=\mathbf{x}_{i}\mathbf{\beta}\] where \(\mathbf{\beta}\) is a vector of unknown parameters. We will generalise this in two steps, dealing with the stochastic and systematic components of the model.
4.1 Link function
The second element of the generalisation is that instead of modelling the mean, as before, we will introduce a one-to-one continuously differentiable transformation \(g(\mu_{i})\) and focus on: \[\nu_{i}=g(\mu_{i}).\] The function \(g(\mu_{i})\) is known as the link function. Examples of commonly used link functions include the identity, log, reciprocal, logit and probit functions. We further assume that the transformed mean follows a linear model, so that \[\nu_{i}=\mathbf{x}_{i}'\mathbf{\beta}.\]
The quantity \(\nu_{i}\) is the linear predictor. Since the link (by construction) is one-to-one it is invertible, so we can then obtain: \[\mu_{i}=g^{-1}(\mathbf{x}_{i}'\mathbf{\beta}).\]
An important thing to note is that we are not transforming the response \(y_{i}\) but rather the expected value of the response \(\mu_{i}\). So, a model where \(\log(y_{i})\) is linearly dependent on \(x_{i}\) is not the same as a generalised linear model where \(\log(\mu_{i})\) is linear on \(x_{i}\).
When the link function makes the linear predictor \(\nu_{i}\) the same as the canonical parameter \(\theta_{i}\) we have what is known as a canonical link. The identity function is the canonical link for the Normal distribution. We will see that the logit is the canonical link for the binomial distribution and the log is the canonical link for the Poisson distribution. Therefore the canonical link leads to some natural pairings of types of data with link functions. These do not preclude the use of other link functions, but have the advantage that a minimal sufficient statistic for \(\beta\) exists so that all the information about \(\beta\) is contained in a function of the data of the same dimensionality as \(\beta\).
4.2 Poisson Errors and Log Link
Application of general theory to the Poisson case.
4.2.1 The Poisson Distribution
A Poisson random variable has the probability distribution \[f_{i}(y_{i})=\frac{\exp(-\mu_{i})\mu_{i}^{y_{i}}}{y_{i}!}\] for \(y_{i}=0,1,2,\ldots\). The mean and the variance of \(Y_{i}\) both equal \(\mu_{i}\). \[\log f_{i}(y_{i})=y_{i}\log(\mu_{i})-\mu_{i}-\log(y_{i}!)\] It is immediately apparent that \(\theta_{i}=\log(\mu_{i})\), this being the canonical parameter, indicating that the canonical link is the natural log. Solving for \(\mu_{i}\) we see that the inverse link is \(\mu_{i}=\exp(\theta_{i})\). Therefore, \(\mu_{i}=b(\theta_{i})=\exp(\theta_{i})\). The last term is a function of the data \(y_{i}\), but not the parameter, therefore \(c(y_{i},\phi)=-\log(y_{i}!)\). Left to note is that \(a_{i}(\phi)=\phi\) where \(\phi=1\).
To confirm the mean and the variance are as expected: \[\mu_{i}=b'(\theta_{i})=\exp(\theta_{i})=\mu_{i}\] and \[v_{i}=a_{i}(\phi)b''(\theta_{i})=\exp(\theta_{i})=\mu_{i}\]
4.3 Binomial Errors and Logit Link
Application of the theory of generalised linear models to the case of binary data, in particular to logistic regression models.
4.3.1 The Binomial Distribution
Recall that the probability distribution function (pdf) of a Binomial distribution is: \[f_{i}(y_{i})=\left(\begin{array}{c} n_{i}\\y_{i}\end{array} \right)\tau_{i}^{y_{i}}(1-\tau_{i})^{n_{i}-y_{i}}.\] Taking logs we see that \[\log f_{i}(y_{i})=y_{i}\log \tau_{i} + (n_{i}-y_{i})\log (1-\tau_{i})+\log\left(\begin{array}{c} n_{i}\\y_{i}\end{array} \right).\] Collect the \(y_{i}\) terms we see that: \[\log f_{i}(y_{i})=y_{i}\log \left(\frac{\tau_{i}}{1-\tau_{i}}\right) + (n_{i})\log (1-\tau_{i})+\log\left(\begin{array}{c} n_{i}\\y_{i}\end{array} \right).\] Comparing to Equation ((2.1)) \[f(y_{i})=\exp\left\{\frac{y_{i}\theta_{i}-b(\theta_{i})}{a_{i}(\phi)}+c(y_{i},\phi)\right\}\] we can see that \(a_{i}(\phi)=1\), \(\theta_{i}=\log \left(\frac{\tau_{i}}{1-\tau_{i}}\right)\).
Solving for \(\tau_{i}\): \[\begin{eqnarray}\theta_{i}&=&\log \left(\frac{\tau_{i}}{1-\tau_{i}}\right)\nonumber\\ \exp(\theta_{i})&=&\left(\frac{\tau_{i}}{1-\tau_{i}}\right)\nonumber\\ (1-\tau_{i})\exp(\theta_{i})&=&\tau_{i}\nonumber\\ \exp(\theta_{i})&=&\tau_{i}(1+\exp(\theta_{i}))\nonumber\\ \frac{\exp(\theta_{i})}{1+\exp(\theta_{i})}&=&\tau_{i}\nonumber \end{eqnarray}\]so \[1-\tau_{i}=\frac{1+\exp(\theta_{i})}{1+\exp(\theta_{i})}-\frac{\exp(\theta_{i})}{1+\exp(\theta_{i})}=\frac{1}{1+\exp(\theta_{i})}\] Therefore: \[\log(1-\tau_{i})=-\log(1+\exp(\theta_{i}))\] and then \[b(\theta_{i})=n_{i}\log(1+\exp(\theta_{i})).\]
The remaining term in the pdf is not a function of \(\tau_{i}\) but is a function of \(y_{i}\); so \[c(y_{i},\phi)=\log\left(\begin{array}{c}n_{i}\\y_{i} \end{array}\right).\] Previously we noted that \(a_{i}(\phi)=1\), however this is actually because \(\phi=1\) and we would claim that \(a_{i}(\phi)=\phi\).
Now verifying the mean and the variance. Differentiate \(b(\theta_{i})\) with respect to \(\theta_{i}\) to find that: \[\mu_{i}=b'(\theta_{i})=n_{i}\frac{\exp(\theta_{i})}{1+\exp(\theta_{i})}=n_{i}\tau_{i},\] as expected.
\[v_{i}=a_{i}(\phi)b''(\theta_{i})=n_{i}\frac{\exp(\theta_{i})}{(1+\exp(\theta_{i}))^{2}}=n_{i}\tau_{i}(1-\tau_{i}),\] again agreeing with our knowledge of basic statistics.
4.4 Canonical Link functions
\[f(y_{i})=\exp\left\{\frac{y_{i}\theta_{i}-b(\theta_{i})}{a_{i}(\phi)}+c(y_{i},\phi)\right\}\] In this form, \(\theta\) is known as the canonical parameter and \(\phi\) as the dispersion parameter. If \(\theta=g(\mu)\) for some function \(g\) of the mean \(\mu\), then \(g(\mu)\) is known as the canonical link function.
| Distribution | Link Name | Link Function | Mean Function |
|---|---|---|---|
| Normal | Identity | \(X\mathbf{\beta}=\mu\) | \(\mu=X\mathbf{\beta}\) |
| Exponential | Inverse | \(X\mathbf{\beta}=\mu^{-1}\) | \(\mu=(X\mathbf{\beta})^{-1}\) |
| Gamma | Inverse | \(X\mathbf{\beta}=\mu^{-1}\) | \(\mu=(X\mathbf{\beta})^{-1}\) |
| Inverse Gaussian | Inverse Squared | \(X\mathbf{\beta}=\mu^{-2}\) | \(\mu=(X\mathbf{\beta})^{-1/2}\) |
| Poisson | Log | \(X\mathbf{\beta}=\log(\mu)\) | \(\mu=\exp(X\mathbf{\beta})\) |
| Binomial | Logit | \(X\mathbf{\beta}=\log\left(\frac{\mu}{1-\mu}\right)\) | \(\mu=\frac{\exp(X\mathbf{\beta})}{1+\exp(X\mathbf{\beta})}=\frac{1}{1+\exp(-X\mathbf{\beta})}\) |
| Multinomial | Logit | \(X\mathbf{\beta}=\log\left(\frac{\mu}{1-\mu}\right)\) | \(\mu=\frac{\exp(X\mathbf{\beta})}{1+\exp(X\mathbf{\beta})}=\frac{1}{1+\exp(-X\mathbf{\beta})}\) |
In the exponential family of distributions, the parameter \(\theta\) is known as the canonical parameter. Knowing the expression for your canonical parameter will tell you your canonical link function.
4.4.1 Example: Gamma distribution
\[\begin{eqnarray} f(y)&=&\frac{\beta^{\alpha}}{\Gamma(\alpha)}y^{\alpha-1}\exp(-\beta y)\ \textrm{for } y, \alpha, \beta >0 \nonumber\\ &=&\exp\left(-y\beta+(\alpha-1)\log y+\alpha\log\beta-\log\left(\Gamma(\alpha)\right)\right)\nonumber\\ &=&\exp\left(\frac{y\frac{-\beta}{\alpha}+\frac{\alpha}{\alpha}\log\beta}{\frac{1}{\alpha}}+(\alpha-1)\log y-\log\left(\Gamma\left(\alpha\right)\right)\right)\nonumber\\ &=&\exp\left(\frac{y\frac{\beta}{\alpha}-\log\beta}{\frac{-1}{\alpha}}+(\alpha-1)\log y-\log\left(\Gamma\left(\alpha\right)\right)\right)\nonumber \end{eqnarray}\] now if \(\theta=\frac{\beta}{\alpha}\) and \(\phi=\frac{-1}{\alpha}\) so that \(\alpha=\frac{-1}{\phi}\) and \(\beta=-\theta\alpha=\frac{-\theta}{\phi}\) then: \[\begin{eqnarray} f(y)&=&\exp\left(\frac{y\theta-\log\left(-\frac{\theta}{\phi}\right)}{\phi}+\left(-\frac{1}{\phi}-1\right)\log y-\log\left(\Gamma\left(-\frac{1}{\phi}\right)\right)\right)\nonumber\\ &=&\exp\left(\frac{y\theta-\log\theta}{\phi}+\frac{\log -\phi}{\phi}-\left(\frac{1}{\phi}+1\right)\log y-\log\left(\Gamma\left(-\frac{1}{\phi}\right)\right)\right)\nonumber \end{eqnarray}\] Now recall that \(\mathbf{E}(Y_{i})=\mu_{i}=b'(\theta_{i})\) so that comparing to our expression above: \[b(\theta)=\log(\theta)\] so that \[b'(\theta)=\frac{1}{\theta}=\mu=\frac{\alpha}{\beta}.\] The variance can be found to be \(b''(\theta)\phi=\frac{\phi}{-\theta^{2}}=\frac{-1}{\alpha}\frac{-\alpha^{2}}{\beta^{2}}=\frac{\alpha}{\beta^{2}}\). Then parameterise the distribution in terms of its expected value: \[\begin{eqnarray} f(y)&=&\exp\left(\frac{y\frac{1}{\mu}-\log\frac{1}{\mu}}{\phi}+\frac{\log -\phi}{\phi}-(\frac{1}{\phi}+1)\log y-\log\left(\Gamma\left(-\frac{1}{\phi}\right)\right)\right)\nonumber\\ &=&\exp\left(\frac{y\mu^{-1}-\log\mu^{-1}}{\phi}+\frac{\log -\phi}{\phi}-\left(\frac{1}{\phi}+1\right)\log y-\log\left(\Gamma\left(-\frac{1}{\phi}\right)\right)\right)\nonumber\\ &=&\exp\left(\frac{y\mu^{-1}+\log\mu}{\phi}+\frac{\log -\phi}{\phi}-\left(\frac{1}{\phi}+1\right)\log y-\log\left(\Gamma\left(-\frac{1}{\phi}\right)\right)\right)\nonumber \end{eqnarray}\]indicating that the canonical link function is: \[g(\mu)=\frac{1}{\mu}=\mu^{-1}.\]
4.4.2 Further Examples
Use the same procedure to show that the canonical link functions for the Poisson and Binomial distributions are those listed in the table 4.1.