Explanatory IRT tutorial with the TAM package in R
In this post, we’ll go through the basics of running latent regression models in TAM. In essence, these are IRT models with covariates. These are especially useful for expanding on inferences or measurement claims made with the Rasch model. While not particularly conceptually difficult, there are a number of moving parts that I hope this will help clarify.
Part 1. Understanding the Data and the Rasch Model
Get the Data Read in
First, I want to make sure everybody gets the data read into R in the exact same way.
Dataset Description
Once the data is read in, let’s describe the data set.
The data set comes from a general test given to a set of first year undergrads at UCSB. The test was administered to 1500 students in a large biology class. Approximately half the students are in a class that is run the standard way. The other class involves an all together different curriculum. Therefore, there’s a treatment and control group.
The variables:
id
: Random Student IDMath...
Math Question from the test scored correct or incorrect (1 or 0)Science...
Science type question scored correct or incorrect (1 or 0)ELA...
Reading Comprehension type question, science related scored correct or incorrect (1 or 0)MathWordProb...
Math Word Problem Type Question scored correct or incorrect (1 or 0)treat
: 1 if student was assigned to treatment class, 0 if assigned to standard classproflevel
: Categorical variable representing proficiency category on previous chemistry exam. 1 is the lowest, least proficient category. 4 is the most proficient.abilcov
: Whether student is a chemistry(3), molecular biology(2), or other (1) major. Categorical.
There are 40 items on the test in total.
Normally, it would be wise to look at descriptives. We’ll skip that since the emphasis is on fitting the models in TAM.
Running the Rasch Model
Let’s run some basic IRT using tam on this data. This should help us get an idea.
The smalles infit value is .96, the larges it 1.03. Therefore, items are fitting pretty well, though, a few wouldn’t meet the acceptable standard based on certain fit criteria.
This is all to say, we’re good to start running our latent regressions.
Part 2. Latent Regression
Example Q1. Is there a noticable difference in general ability (based on the test we’re analyzing) between groups who were in the treatment group (coded as 1) and those who were not (coded as 0)?
One of the advantages of the latent regression approach is that it gets you item and student level information, potentially, and takes measurement error into account. If you were to simply regress test total score on the group identifier, you would have no measurement error. Plus, this regression approach wouldn’t really be a latent variable model. It takes the observed score and regresses it on the observed student classification.
Two ways for analyzing group differences
Treat the “treat” variable as placing students in groups, 1 and 0. We can do this two ways in TAM.
Method 1
Formula for latent regression:
\[\theta_s = \beta_1*X_{1s} + \epsilon_s\]where \(\beta_1\) is the regression coefficient for treatment value of 1. \(X_{1s}\) takes on a value of 1 if the student s is in the treatment group. Error is basically person specific (though, drawn from a normal distribution). \(\theta_s\) is now the student specific theta value.
With a dichotomous treatment variable, there are a few ways to do this in TAM. The first treats the data as if it comes from two “groups.” This method gets us group variances. So for instance, we can not only see if the two groups have major differeces via their theta estimates, we can see if their underlying distributions have different “shapes.”
Under the section, “covariances, variances” we see that the two groups have pretty similar distributions.
Method 2
parm | dim | est | StdYX | StdX | StdY |
---|---|---|---|---|---|
Intercept | 1 | 0.0000000 | NA | NA | NA |
Y1 | 1 | 0.6989754 | 0.547852 | 0.3495732 | 1.095436 |
We can see that the latent regression explains about 30% of the variance in theta estimates (the rest going to person ability and error).
As we can see, the person parameters, the EAP estimates are quite different after “adjusting for” a student being in the treatment group or the non-treatment group. The formula now looks like:
\[Logit[Pr(X=1|\theta_p, \delta_i)] = \beta_1*treat + \theta - \delta_i\]In this model, the variable, treat
is often called a “fixed effect.” TAM constrains the model so that the reference category fixed effect (no treatment) 0, has a value of zero. So, the model, for person 1, (who was in the treatment group) and item 1, looks like this:
- Fixed effect of
treat
= .70 logits (rounded) - Student was in the “treatment group” = 1
- .51 logits is the student’s ability after adjusting for the student being in the treatment group.
- Item difficulty for item 1 is -.85 logits.
We’ve decomposed the variance of theta.
Adding to the Latent Regression model
We can make the latent regression model more complicated by adding predictors beyond just treatment/not-treatment.
- Create a matrix of covariates.
- Create a latent regression formula object for formulaY1
- Run the model.
We’ll make the model more complicated by predicting theta with a latent regression controlling for proficiency level.
So the model is now:
\[\theta = \beta_1*treat + \beta_2*proflevel + \epsilon\]And the full model will now look something like:
\[Logit[Pr(X=1|\theta_p, \delta_i)] = \beta_1*treat + \beta_2*proflevel + \theta - \delta_i\]parm | dim | est | StdYX | StdX | StdY |
---|---|---|---|---|---|
Intercept | 1 | 0.0 | NA | NA | NA |
treattreat | 1 | 0.6 | 0.5885051 | 0.3000734 | 1.176722 |
proflevel2 | 1 | 0.6 | 0.5132791 | 0.2617163 | 1.176722 |
proflevel3 | 1 | 0.6 | 0.5123953 | 0.2612657 | 1.176722 |
proflevel4 | 1 | 1.2 | 1.0035342 | 0.5116929 | 2.353445 |
est.Dim1 | se.Dim1 |
---|---|
0.0 | 0.0000000 |
0.6 | 0.0011625 |
0.6 | 0.0016158 |
0.6 | 0.0016201 |
1.2 | 0.0016713 |
Note all estiimates are statistically significant. The last step in this section is comparing overall model fit.
Model | loglike | Deviance | Npars | Nobs | AIC | BIC | AIC3 | AICc | CAIC |
---|---|---|---|---|---|---|---|---|---|
mod1 | -31109.02 | 62218.04 | 40 | 1500 | 62298.04 | 62510.57 | 62338.04 | 62300.29 | 62550.57 |
lat1 | -30923.31 | 61846.63 | 41 | 1500 | 61928.63 | 62146.47 | 61969.63 | 61930.99 | 62187.47 |
latreg2 | -27663.39 | 55326.77 | 44 | 1500 | 55414.77 | 55648.55 | 55458.77 | 55417.49 | 55692.55 |
The Item Side: The Linear Logistic Test Model (LLTM)
The LLTM is simply a more parsimonious Rasch model. Instead of each individual item being estimated, estimated item difficulties are made just based on item indicators. For instance, the data we have has four item types. Each item type gets its own difficulty estimate. However, there are some complications here. There is an assumption that each item indicator is responsible for the item difficulty. This is not a safe assumption.
The other unfortunate part about this is that the simplest way to go about fitting this model is by converting the data to “long” data. After that is done, we’ll create an indicator for each variable type (a categorical variable, denoting math, ela, science, or wordprob type item.
To make the data long, we’ll use the “gather” function from tidyr in the tidyverse package. Then we’ll add an indicator based on if_else statements.
To set up the LLTM, we have to use TAM.mml.mfr using the facets formula. TAM will create so-called “pseudo facets” for parameters that don’t have estimates.
Ok, so the LLTM model fit in TAM will only give you “difficulties” for the particular item properties. Now, instead of conceptualizing the Rasch model as ability - item difficulty, the model is “ability - item property difficulty.” There will only be as many item properties as you specify. There can be “crossloadings.”
parameter | facet | xsi | se.xsi |
---|---|---|---|
ittype1 | ittype | -0.1092315 | 0.0167853 |
ittype2 | ittype | 0.0063631 | 0.0167639 |
ittype3 | ittype | -1.2873576 | 0.0198834 |
ittype4 | ittype | 0.0249119 | 0.0167649 |
psfPF101 | psf | 0.0000000 | 0.0000000 |
psfPF102 | psf | 0.0000000 | 0.0000000 |
psfPF103 | psf | 0.0000000 | 0.0000000 |
psfPF104 | psf | 0.0000000 | 0.0000000 |
psfPF105 | psf | 0.0000000 | 0.0000000 |
psfPF106 | psf | 0.0000000 | 0.0000000 |
psfPF107 | psf | 0.0000000 | 0.0000000 |
psfPF108 | psf | 0.0000000 | 0.0000000 |
psfPF109 | psf | 0.0000000 | 0.0000000 |
psfPF110 | psf | 0.0000000 | 0.0000000 |
Finally, let’s compare the overall fit of the models:
Model | loglike | Deviance | Npars | Nobs | AIC | BIC | AIC3 | AICc | CAIC |
---|---|---|---|---|---|---|---|---|---|
mod1 | -31109.02 | 62218.04 | 40 | 1500 | 62298.04 | 62510.57 | 62338.04 | 62300.29 | 62550.57 |
lat1 | -30923.31 | 61846.63 | 41 | 1500 | 61928.63 | 62146.47 | 61969.63 | 61930.99 | 62187.47 |
latreg2 | -27663.39 | 55326.77 | 44 | 1500 | 55414.77 | 55648.55 | 55458.77 | 55417.49 | 55692.55 |
lltm | -38564.85 | 77129.71 | 5 | 1500 | 77139.71 | 77166.27 | 77144.71 | 77139.75 | 77171.27 |
It is perhaps of no surprise that the most complicated model has the best fit.