Introduction to Mixed Modelling: Beyond Regression and Analysis of Variance
Illustrates how the capabilities of regression analysis can be combined with those of ANOVA by the specification of a mixed model. Presents the application of mixed model analysis to a wide range of situations and explains how to obtain and interpret Best Linear Unbiased Predictors BLUPs. Features a supplementary website containing solutions to exercises, further examples, and links to the computer software systems GenStat and R. About the Author Nicholas W. A respected consultant and researcher in the pharmaceutical industry with extensive teaching experience.europeschool.com.ua/profiles/hetudin/jalu-monasterio-de-san.php
General Linear Models (GLM)
Permissions Request permission to reuse content from this site. Table of contents Preface. The need for more than one random-effect term when fitting a regression line. The need for more than one random-effect term in a designed experiment. Estimation of the variances of random-effect terms.
Interval estimates for fixed-effect terms in mixed models. Estimation of random effects in mixed models: best linear unbiased predictors. More advanced mixed models for more elaborate data sets. Two case studies. The use of mixed models for the analysis of unbalanced experimental designs.
Beyond mixed modelling. Why is the criterion for fitting mixed models called residual maximum likelihood? Extra Companion Website Click here for a supplementary website containing solutions to exercises, further examples, and links to the computer software systems GenStat and R. Features Provides a straightforward introduction to mixed modelling techniques.
Suitable for a wide audience of students and practitioners, from statistics, bioinformatics, medicine, industry and economics. All models contain random adjustments to the intercept for participants and items u 0 p and u 0 i. Model 1, presented in Example 1 above, is similar to model 3, but model 1 also contains the interaction between morph and priming among the fixed effects.
The random effects are still participant-specific and item-specific adjustments to the intercept. The examples above have all dealt with one or more factors that each had only two levels e. If a factor has more than two levels, a test for a significant effect of that factor is usually followed by an examination of which levels differ from each other. Similar to normal ANOVA procedures, this examination can be done with planned comparisons also called contrasts or omnibus comparisons also called post hoc tests.
To illustrate this, the same data set is used as before, but the two factors priming and morph are now combined into one factor, form. Note that the factor form is constructed for didactic purposes only; this analysis will not clearly distinguish between primed and unprimed words, thereby obscuring one of the more important influences on RT. This model has one theoretically relevant predictor, the factor form with three levels.
The model contains an intercept and random effects for participant-specific and item-specific adjustments to the intercept. Two planned comparisons were run, one comparing the levels inflection and derivation, and one comparing the average of these two levels with the stem form. Both planned comparisons were significant: Derivations versus inflections has a difference estimate of Stems versus mean of inflections and derivations has a difference estimate of Syntax for mixed model 4 with specification of post hoc tests and contrasts, line numbers added.
The djmixed syntax for planned comparison is shown in Fig. Similar to standard ANOVA contrasts, there should be as many coefficients per contrast as there are levels of the variable. The coefficients in each contrast should sum to zero. The number of contrasts should equal the number of levels minus one, with individual contrasts separated by pipe symbols.
A knowledge of the ordering of the levels of the variable is necessary to design and interpret contrasts. SPSS orders levels either numerically or alphabetically, depending on the values of the variable. It is advisable to use a numerical coding with value labels to avoid surprises. Here, numerical values are used, and the ordering is stem, derivation, and inflection.
Note that while the post hoc output shows the variable labels, the output of planned comparison does not include this convenience. Post hoc tests were requested in line 7; the option is followed by the name of the variable for which post hoc tests have to be computed.
Because post hoc tests involve multiple comparisons, the familywise alpha has to be controlled. Note that all comparisons are based on expected means, not observed means. This implies that a comparison based on a model that does not fit the data well may result in unreliable post hoc comparisons. In both regression and the ANOVA, the distribution of the residuals can inform us about the overall fit of the data and about the specifics of the fit. The djmixed package can produce four informative plots: a histogram of residuals, a Q-Q plot of observed versus expected residuals, a detrended Q-Q plot, and a plot of normalized residuals by predicted values.
We will look only at the first plot here. Histogram of residuals with normal curve superimposed for models with the dependent variable reaction time [RT; left panel] and log-transformed RT [ log RT — ; right panel]. On the one hand, the aim of psycholinguistic studies is usually not to provide a perfect fitting model of the data but to determine whether certain factors make a significant contribution to the prediction of RT or not.
Under that view, a moderate to small departure from normality should not overly worry us, although it should be reported. However, ill-fitting residuals can be a sign of a model that does not capture the data very well.
A mixed model analysis of log-transformed RTs resulted in the same significance levels as in model 1 but improved the distribution of residuals Fig. Whether the additional fit gained from log-transformation is important should be decided on a case-by-case basis.
Log transforms are not frequently used in the psychology of language literature, but they are common in neighboring fields such as cognitive modeling and corpus research. The solution is to subtract an estimate of the minimum RT, effectively moving the zero point to the right Rouder et al. Plot of observed primed and unprimed reaction times for each derivational pair, with negative priming dashed lines and strong positive priming solid lines highlighted. In almost every case, the stepwise analysis presented above should be sufficient to draw psycholinguistically valid conclusions from the data.
The mixed model can be further extended in two ways, which will be briefly outlined here. The first extension concerns covariates: On some occasions, there are known covariates that may help explain the differences between items or more rarely, participants. If there are theoretical reasons to expect that, for example, log frequency will co-determine RTs, this predictor should be included in the model. The mixed models described here are very similar to normal regression models, and an effect of frequency can be added as a predictor in a straightforward way.
The model formula and the djmixed syntax are included in the online Appendix. The second possible extension concerns the way differences between items and participants influence the expected outcomes. In the models so far, the random effect of items and participants has been added to the intercept to indicate whether an item is generally easy or hard. It is possible that the effect of a predictor say, priming also differs between items or between participants. One way to account for that is to include a second random effect for items, which modifies the strength of the effect of priming.
A number of statistical complications arise with this type of analysis, and the online Appendix goes into some detail on how to work around these. However, there are further reasons why this type of analysis may not be applicable to most psycholinguistic experiments. First of all, the exact structure of the random effects is rarely a psycholinguistic goal in itself. A model with a random effect on the slope of priming does not give a better theoretical explanation; it merely adds a device for capturing unexplained variance.
A well-chosen covariate is often a better option, since it does add theoretical strength to the model. Second, extracting three or more random effects from the data is demanding. Psycholinguistic data tend to have one observation per participant—item combination, and the numbers of items and participants tested are sizable, but not in the hundreds. The two random effects related to items u 0 i and u 1 i are most often correlated, which makes it harder to arrive at estimates for each of them, necessitating large number of items and participants.
Third, the extended analysis assumes that the item-related random effects modifying the intercept and the slope of priming are independent influences, which may be correlated. As has been argued by Rouder et al. In sum, there seem to be few compelling reasons to add random effects that modify slopes. For psycholinguistic data, models like those presented above already provide a better description of the data than do classical ANOVA models, and it may well turn out that the extra complications caused by adding random effects modifying slope are rarely necessary in practice.
Authors should run the usual regression diagnostics to determine whether the data were fitted reasonably well or whether further statistical explorations are necessary. This article has presented a simple framework for addressing the issue of random participants and random items in language experiments. The djmixed extension to SPSS should put this mixed models analysis within the reach of every psycholinguist. Mixed models should be used only when the data set is large enough and after outliers and wrongly coded observations have been removed.
Conceptually, a separate regression line is estimated for each level of participants and also for each level of items, so an extreme outlier can have a large influence if the number of observations per participant or the number of observations per item is low. As compared with an ANOVA, the restrictions on the data imposed by mixed modeling are very relaxed, since missing data and unequal cell sizes are not a problem and homoscedascity is not an a priori requirement either. Mixed models require equality of residual variance; that is, the predictors should capture not only the difference in average RTs, but also any difference in variability of RTs.
For most data sets, this seems a tenable assumption but see Rouder et al. Mixed models are a relatively recent extension to the statistical canon, and although the pace of development has slowed down, further improvements to these models and their evaluation will most certainly be found. However, the methods of model evaluation that are suggested here F -tests and LRT have shown their merits outside of mixed modeling, and they are implemented in major statistical packages such as SPSS and SAS and are generally recommended in various fields of science.
Faraway suggested using a parametric bootstrap cf. It is shown that the LRT can be too liberal when testing for the inclusion of a factor with 15 levels in a data set with only 60 observations. In practical psycholinguistic applications, the number of levels of a fixed factor hardly ever exceeds five, so if the general recommendations for sizable numbers of participants and items—and, therefore, observations—are followed, this criticism should not overly concern us.
When models for the inclusion of a fixed effect are compared, the multiple comparison tests are similar to the F -test that was used here in conjunction with the LRT. The z tests can be very conservative, and Raudenbush and Bryk suggested using a t -distribution instead. The issue was side-stepped here by using F -tests technically, Type 3 F -tests of fixed effects instead. To evaluate the significance of fixed effects, it has been suggested to askew F -tests Baayen, ; Bates, , and use MCMC Monte Carlo Markov chains , a simulation technique based on Bayesian principles to approximate the significance of each fixed effect on an analysis-by-analysis basis.
It also requires one to work within a Bayesian inference framework, which has various advantages and disadvantages that fall outside of the scope of this article. In a discussion of which test to use, Faraway , recommended the combined use of the F -test and the LRT. Of course, statistics is a scientific discipline just like psycholinguistics, and dissenting opinions, alternative approaches. Mixed models, and hierarchical linear models as their special case, are a mature technique, and they have been implemented in the major statistical packages since SAS , R , and SPSS.
Mixed models are easy to construct in SPSS and SAS, and the mixed model results are straightforward to understand when the focus remains on the fixed effects. I would like to thank Nicolas Dumay and Lea Hald for constructive comments on an earlier draft. Thanks to my former colleagues at the University of Kent and especially Joachim Stoeber for their continuing support.
This study has been shaped by a number of workshops on mixed modeling that I have given; thanks to all participants for clarifying to me what makes mixed modeling so hard to understand. Skip to main content Skip to sections. Advertisement Hide. Download PDF. Twice random, once mixed: Applying mixed models to simultaneously analyze random effects of language and participants.
Article First Online: 22 August Table 1 Dummy coding for a predictor with four levels and corresponding expected reaction times, E Y. In other words, the item-specific value u i adjusts the expected RT to reflect the relative speed of item i. The conceptualization of u as a vector of item-specific adjustments to the modeled RTs has been shown to greatly aid understanding of mixed models. Quite intuitively, all adjustments u i are assumed to center around a mean according to a normal distribution with a certain variance, as shown in Fig.
In the left panel, each line represents the effect of priming on one hypothetical item. The priming effect is constant, but some items are inherently faster or slower than others, which leads to a distribution of lines. In the right panel, the distribution of item adjustments around the overall item mean is shown, which is close to normal. This means that the items within one condition are modeled as necessarily similar to each other, with the majority of the items having an adjustment that is close to zero. Outlier items are possible, but they should be less likely the further they are removed from zero.
Open image in new window. The procedure and implications of using a mixed model analysis will be demonstrated from an example data set containing priming data obtained from 34 participants. The two factors of interest were priming the critical word was the first word of a pair, priming absent, or the second word of a pair, priming present and morph the critical word was part of an inflectional or a derivational pair. Other properties of the items that might influence the RT were matched.
All stimulus words were part of 31 triplets formed by a base word, one of its inflections, and one of its derivations. Each participant first saw either a derivation or an inflection, followed by the matching base word. This is a real data set, in which I artificially strengthened the effect of morph for didactic purposes. Table 2 Data for Example 1 : average reaction times in milliseconds and standard deviations. A mixed model was fitted to the data that contained the fixed effects of priming, morph, and their interaction and two random effects accounting for participant-specific and item-specific adjustments to the intercept.
Table 3 Fixed and random effects for model 1, example 1. Most parts should be self-explanatory. Currently, covariates interval-level predictors cannot be included in djmixed. The NAME statement line 7 is used to give the model a name. Names should be enclosed in quotes but can otherwise be freely chosen.
The plain SPSS syntax for all commands used here is listed in the online Appendix and, for this model, is shown in Fig. When DJMIXED is not used, the output cannot be directed to two windows, and the model summary is not available and neither is the model comparison that we will encounter later.
The second model fitted is a so-called null model, which is an intercept-only model without any predictors. In a mixed model analysis, the null model should contain the random factors as described above: adjustments to the intercept for individual items and individual participants. There were no fixed effects of interest. Table 4 Random effects for model 2, the null model. The third model extends the null model with the fixed factors priming and morph, but it does not include their interaction.
No additional random effects are included in this model, but the existing random effects that adjust the intercept for participants and items may change due to the inclusion of the new predictors. This is to be expected: The fixed factors morph and priming should explain some of the variation between items. Table 5 Fixed and random effects in the third model, main effects only. The value of LRT can be statistically evaluated against a chi-squared distribution, using the difference in the number of model parameters as the degrees of freedom.
Model 2 has four parameters, and model 3 has six, so a chi-squared with 2 degrees of freedom should be used. Table 7 Relationship between the factors priming, morph, and form. The djmixed syntax for this model is shown in Fig. The specification of the fixed and random terms follows the same pattern as before. In line 7, post hoc tests are requested, which will be discussed below.
Although the post hoc tests are a theory-free and cautious approach to determining any difference in levels, the application of planned comparisons is more popular. The drawback of using planned comparisons is that any application that is slightly data driven leads to highly inflated alpha rates. In other words, if the planned comparison was determined after obtaining the means or preliminary means , the alpha rate is much higher than promised.
The histogram of residuals for Model 1 is shown in the left panel of Fig. Raw data plots and plots of residuals after intermediary models are fit can also be very instructive as to the structure of the data set.
- Tutorial Sections:.
- Introduction to Mixed Modelling: Beyond Regression and Analysis of Variance by N. W. Galwey?
- Chop Suey: A Cultural History of Chinese Food in the United States;
- Mechanical Fault Diagnosis and condition monitoring.
- Tutorial Sections:.
- General Linear Models (GLM)!
For the present data set, a plot of the observed data for the primed versus unprimed condition for derivational pairs only is shown in Fig. In this plot, every item set is represented by a single line. Evidence for word-specific adjustments to the intercept can be found at the left edge of the figure, which shows that the item-specific intercepts differ substantially.
This could be unsystematic variation of the efficacy of priming, but facing such data, it is wise to investigate whether there are any factors that may help explain this.
Introduction to linear mixed models
Acknowledgements I would like to thank Nicolas Dumay and Lea Hald for constructive comments on an earlier draft. Abdi, H. Salkind Ed. Thousand Oaks, CA: Sage. Google Scholar. Baayen, R. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.
CrossRef Google Scholar. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, — The subjects as a simple random effect fallacy: Subject variability and morphological family effects in the mental lexicon.
- Introduction to Linear Mixed Models?
- The Business Cycle: Theories and Evidence: Proceedings of the Sixteenth Annual Economic Policy Conference of the Federal Reserve Bank of St. Louis?
- Culture and Enterprise: The Development, Representation and Morality of Business.
Brain and Language, 81, 55— Barr, D. Bates, D. Fitting linear mixed models in R. R News, 5, 27— The lme4 package [Computer software manual]. Cheng, C. A mixed-effects expectancy-valence model for the Iowa gambling task.
Behavior Research Methods, 41, — Clark, H. The language-as-a-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12, — Coleman, E. Generalizing to a language population.
Introduction to Mixed Modelling: Beyond Regression and Analysis of Variance / Edition 1
Psychological Reports, 14, — Faraway, J. Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models. Forster, K. Journal of Verbal Learning and Verbal Behavior, 15, — Hox, J. Applied multilevel analysis. Amsterdam: TT-Publikaties. Jackson, S. Janssen, D. Randomisation test in language typology. Linguistic Typology, 10, — Keppel, G. Kreft, I.
Introducing multilevel modeling. Lee, M. Exemplars, prototypes, similarities and rules in category representation: An example of hierarchical Bayesian analysis. Cognitive Science, 32, — Maxwell, S. Robustness of the quasi F statistic to violations of sphericity. Psychological Bulletin, 99, — Mislevy, R. Exploiting auxiliary information about examinees in the estimation of item parameters. Applied Psychological Measurement, 11, 81— Nezlek, J. An introduction to multilevel modeling for social and personality psychology.