1 Introduction
Analysis of variance and covariance (ANOVA / ANCOVA) can be referred to as general linear modelling. The data comprise multiple measurements made on groups of subjects and are assumed to be observations from a normal distribution usually with constant variance. In addition, the mean of the distribution is assumed to be a linear function of unknown parameters with known coefficients.
Generalised linear modelling (GLM) relaxes these assumptions:
- observations may come from a very general class of distributions;
- any twice differentiable one-to-one function of the mean is represented via a linear function of unknown parameters.
Since the normal distribution belongs to the permitted class of distributions then using this and the identity function means that general linear modelling is a special case of GLM. Dealing with data from normal distributions has the consequence that decision making processes are available that use \(\chi^2\), t and F distributions. However, for non-normal GLM we do not have this luxury and instead have to rely on a number of asymptotic results the properties of which are currently only partly understood. Thus we can see that GLM provides a broader choice of modelling opportunities than general linear modelling, however the latter does have the luxury of greater precision in decision making.
Prior to the advent of GLM, data from non-normal distributions was transformed to normality; for instance, taking the square root of Poisson data. Consequently the power of the normal distribution theory could be utilised. However, these transformations are themselves only asymptotic and the resulting model can appear very strange and artificial.
Nelder and Wedderburn (1972) initially proposed the GLM methodology. Two good references for GLM are:
- Dobson - An introduction to generalised linear models;
- McCullagh and Nelder - Generalised linear models.
The following books cover the application of GLM (and many other statistical areas) from the R perspective which is very similar to how one would do things in R:
- Crawley - Statistical Computing : An Introduction to Data Analysis using R;
- Venebles and Ripley - Modern Applied Statistics with R (S).
The examples of model building and evaluation in R are given by code in GLMpart1.R and the associated output is in the final section of this document for reference purposes. You are advised to download the R file from blackboard and run it in your own time. This version of the notes are written in Rmarkdown, which can be used for reproducible research; the code is embedded within the text of the document.