Explanatory IRT tutorial with the TAM package in R
In this post, we’ll go through the basics of running latent regression models in TAM. In essence, these are IRT models with covariates. These are especially useful for expanding on inferences or measurement claims made with the Rasch model. While not particularly conceptually difficult, there are a number of moving parts that I hope this will help clarify.
Part 1. Understanding the Data and the Rasch Model
Get the Data Read in
First, I want to make sure everybody gets the data read into R in the exact same way.
#install packages to get data
#install.packages("tidyverse")
#install.packages("TAM")
#install.packages("lme4")
#install.packages("readr")
#install.packages("optimx")
library(readr)
library(tidyverse)
library(lme4)
library(TAM)
library(optimx)
library(knitr)
#get the data read in. Have to be connected to internet.
eirtdata <- read_csv("https://raw.githubusercontent.com/danielbkatz/EIRT/master/eirtdata.csv")
#get rid of the extra column
eirtdata <- eirtdata[-1]
#optional, if you want to save the data as a csv file on your computer.
#Note, it adds a column.
#write.csv(eirtdata, "eirtdata.csv")
#make sure these person level covariates are treated as factors
eirtdata <- eirtdata %>% mutate(treat = ifelse(treat==1, "treat", "not-treat"))
eirtdata$treat <- as.factor(eirtdata$treat)
eirtdata$proflevel <- as.factor(eirtdata$proflevel)
eirtdata$abilcov <- as.factor(eirtdata$abilcov)
Dataset Description
Once the data is read in, let’s describe the data set.
The data set comes from a general test given to a set of first year undergrads at UCSB. The test was administered to 1500 students in a large biology class. Approximately half the students are in a class that is run the standard way. The other class involves an all together different curriculum. Therefore, there’s a treatment and control group.
The variables:
id
: Random Student IDMath...
Math Question from the test scored correct or incorrect (1 or 0)Science...
Science type question scored correct or incorrect (1 or 0)ELA...
Reading Comprehension type question, science related scored correct or incorrect (1 or 0)MathWordProb...
Math Word Problem Type Question scored correct or incorrect (1 or 0)treat
: 1 if student was assigned to treatment class, 0 if assigned to standard classproflevel
: Categorical variable representing proficiency category on previous chemistry exam. 1 is the lowest, least proficient category. 4 is the most proficient.abilcov
: Whether student is a chemistry(3), molecular biology(2), or other (1) major. Categorical.
There are 40 items on the test in total.
#to get a sense of what's in the dataset.
#names(tells us what the column names are)
names(eirtdata)
## [1] "id" "Math.1" "Math.2" "Math.3"
## [5] "Math.4" "Math.5" "Math.6" "Math.7"
## [9] "Math.8" "Math.9" "Math.10" "Science.1"
## [13] "Science.2" "Science.3" "Science.4" "Science.5"
## [17] "Science.6" "Science.7" "Science.8" "Science.9"
## [21] "Science.10" "ELA.1" "ELA.2" "ELA.3"
## [25] "ELA.4" "ELA.5" "ELA.6" "ELA.7"
## [29] "ELA.8" "ELA.9" "ELA.10" "MathWordProb.1"
## [33] "MathWordProb.2" "MathWordProb.3" "MathWordProb.4" "MathWordProb.5"
## [37] "MathWordProb.6" "MathWordProb.7" "MathWordProb.8" "MathWordProb.9"
## [41] "MathWordProb.10" "treat" "proflevel" "abilcov"
Normally, it would be wise to look at descriptives. We’ll skip that since the emphasis is on fitting the models in TAM.
#just to check out the data, are there
#high performing students in both treatment groups?
table(eirtdata$treat, eirtdata$proflevel)
##
## 1 2 3 4
## not-treat 185 214 170 191
## treat 193 169 211 167
Running the Rasch Model
Let’s run some basic IRT using tam on this data. This should help us get an idea.
#fit the first IRT model. Have to subset the data since it contains person covariates.
mod1 <- tam.mml(eirtdata[2:40])
###4. view fit statistics.
#It's a lot easier to interpret these models if the data fit well.
#The data fits really well.
#If there are any outliers or anything like that, it would be revealed here.
#get item fit
mod1fit <- tam.fit(mod1)
#item difficulties. Just show the first 10 items difficulties
head(as.data.frame(mod1$xsi))
## xsi se.xsi
## Math.1 -1.1901775 0.06170396
## Math.2 -0.5384382 0.05552351
## Math.3 0.2070387 0.05424712
## Math.4 0.7624491 0.05714235
## Math.5 0.8450950 0.05786723
## Math.6 -0.1907538 0.05419563
#view the range of fit statistics, particularly infit
range(mod1fit$itemfit$Infit)
## [1] 0.9596799 1.0292587
#Get the sample size adjusted acceptable item fit range:
2* (2/sqrt(1500))
## [1] 0.1032796
The smalles infit value is .96, the larges it 1.03. Therefore, items are fitting pretty well, though, a few wouldn’t meet the acceptable standard based on certain fit criteria.
This is all to say, we’re good to start running our latent regressions.
Part 2. Latent Regression
Example Q1. Is there a noticable difference in general ability (based on the test we’re analyzing) between groups who were in the treatment group (coded as 1) and those who were not (coded as 0)?
One of the advantages of the latent regression approach is that it gets you item and student level information, potentially, and takes measurement error into account. If you were to simply regress test total score on the group identifier, you would have no measurement error. Plus, this regression approach wouldn’t really be a latent variable model. It takes the observed score and regresses it on the observed student classification.
Two ways for analyzing group differences
Treat the “treat” variable as placing students in groups, 1 and 0. We can do this two ways in TAM.
Method 1
Formula for latent regression:
\[\theta_s = \beta_1*X_{1s} + \epsilon_s\]where \(\beta_1\) is the regression coefficient for treatment value of 1. \(X_{1s}\) takes on a value of 1 if the student s is in the treatment group. Error is basically person specific (though, drawn from a normal distribution). \(\theta_s\) is now the student specific theta value.
With a dichotomous treatment variable, there are a few ways to do this in TAM. The first treats the data as if it comes from two “groups.” This method gets us group variances. So for instance, we can not only see if the two groups have major differeces via their theta estimates, we can see if their underlying distributions have different “shapes.”
#take the treatment variable and make it its own R object
treat <- eirtdata$treat
#add a simple TAM command
latg <- tam.mml(eirtdata[2:40], group = treat)
summary(latg)
Under the section, “covariances, variances” we see that the two groups have pretty similar distributions.
Method 2
#the next way, and I find this to be the most intuitive way,
#but you don't get group variances.
#You have to specify a formula:
formulay1 <- ~treat
#and then add dataY which can have many columns.
lat1 <- tam.mml(eirtdata[2:40], dataY = eirtdata$treat, formulaY = formulay1)
#unfortunately, as far as I can tell,
#this is the only way to get standard errors in TAM for your regression coefficients.
se<- tam.se(lat1)
summary(lat1)
# get the unstandardized regression coefficient.
#You can save these in an object if you like.
#get standardized regression coefficients (using yx)
kable(lat1$latreg_stand$beta_stand)
parm | dim | est | StdYX | StdX | StdY |
---|---|---|---|---|---|
Intercept | 1 | 0.0000000 | NA | NA | NA |
Y1 | 1 | 0.6989754 | 0.547852 | 0.3495732 | 1.095436 |
#get R^2 estimate
lat1$latreg_stand$R2_theta
## [1] 0.3001418
#get standard error
se$beta
## est.Dim1 se.Dim1
## 1 0.0000000 0.00000000
## 2 0.6989754 0.02448985
We can see that the latent regression explains about 30% of the variance in theta estimates (the rest going to person ability and error).
#Let's check standard errors of the regression estimates.
se$beta
## est.Dim1 se.Dim1
## 1 0.0000000 0.00000000
## 2 0.6989754 0.02448985
#note, this also works below, but gets slightly different estimates
#(a group mean is not held to zero)
#like <- IRT.likelihood(lat1)
#latregcom <- tam.latreg(like, dataY = eirtdata$treat, formulaY = formulay1)
#Let's compare person ability parameters between the standard Rasch model and the latent regression model.
head(mod1$person)
## pid case pweight score max EAP SD.EAP
## 1 1 1 1 23 39 0.05383554 0.3182850
## 2 2 2 1 26 39 0.39026644 0.3432272
## 3 3 3 1 16 39 -0.67112535 0.3122953
## 4 4 4 1 20 39 -0.25948566 0.3351358
## 5 5 5 1 19 39 -0.37144597 0.3318124
## 6 6 6 1 20 39 -0.25948566 0.3351358
#theta estimates
head(lat1$person)
## pid case pweight score max EAP SD.EAP
## 1 1 1 1 23 39 0.523588646 0.3041857
## 2 2 2 1 26 39 0.807582572 0.3265364
## 3 3 3 1 16 39 -0.388009891 0.3182653
## 4 4 4 1 20 39 -0.004515606 0.2869874
## 5 5 5 1 19 39 0.119659588 0.3024843
## 6 6 6 1 20 39 -0.004515606 0.2869874
As we can see, the person parameters, the EAP estimates are quite different after “adjusting for” a student being in the treatment group or the non-treatment group. The formula now looks like:
\[Logit[Pr(X=1|\theta_p, \delta_i)] = \beta_1*treat + \theta - \delta_i\]In this model, the variable, treat
is often called a “fixed effect.” TAM constrains the model so that the reference category fixed effect (no treatment) 0, has a value of zero. So, the model, for person 1, (who was in the treatment group) and item 1, looks like this:
- Fixed effect of
treat
= .70 logits (rounded) - Student was in the “treatment group” = 1
- .51 logits is the student’s ability after adjusting for the student being in the treatment group.
- Item difficulty for item 1 is -.85 logits.
We’ve decomposed the variance of theta.
Adding to the Latent Regression model
We can make the latent regression model more complicated by adding predictors beyond just treatment/not-treatment.
- Create a matrix of covariates.
- Create a latent regression formula object for formulaY1
- Run the model.
We’ll make the model more complicated by predicting theta with a latent regression controlling for proficiency level.
So the model is now:
\[\theta = \beta_1*treat + \beta_2*proflevel + \epsilon\]And the full model will now look something like:
\[Logit[Pr(X=1|\theta_p, \delta_i)] = \beta_1*treat + \beta_2*proflevel + \theta - \delta_i\]#this gets all the person level covariates into a single dataframe
daty <- eirtdata %>% select(treat, proflevel, abilcov)
#create the latent regression formula. Could also add an interaction
formulay2 <- ~ treat + proflevel
#now run the model
latreg2 <- tam.mml(eirtdata[2:40], dataY = daty, formulaY = formulay2)
#get standard errors
selatreg2 <- tam.se(latreg2)
#view latent regression coefficients
kable(latreg2$latreg_stand$beta_stand)
parm | dim | est | StdYX | StdX | StdY |
---|---|---|---|---|---|
Intercept | 1 | 0.0 | NA | NA | NA |
treattreat | 1 | 0.6 | 0.5885051 | 0.3000734 | 1.176722 |
proflevel2 | 1 | 0.6 | 0.5132791 | 0.2617163 | 1.176722 |
proflevel3 | 1 | 0.6 | 0.5123953 | 0.2612657 | 1.176722 |
proflevel4 | 1 | 1.2 | 1.0035342 | 0.5116929 | 2.353445 |
#view standard errors
kable(selatreg2$beta)
est.Dim1 | se.Dim1 |
---|---|
0.0 | 0.0000000 |
0.6 | 0.0011625 |
0.6 | 0.0016158 |
0.6 | 0.0016201 |
1.2 | 0.0016713 |
Note all estiimates are statistically significant. The last step in this section is comparing overall model fit.
comparemod <- CDM::IRT.compareModels(mod1, lat1, latreg2)
kable(comparemod$IC)
Model | loglike | Deviance | Npars | Nobs | AIC | BIC | AIC3 | AICc | CAIC |
---|---|---|---|---|---|---|---|---|---|
mod1 | -31109.02 | 62218.04 | 40 | 1500 | 62298.04 | 62510.57 | 62338.04 | 62300.29 | 62550.57 |
lat1 | -30923.31 | 61846.63 | 41 | 1500 | 61928.63 | 62146.47 | 61969.63 | 61930.99 | 62187.47 |
latreg2 | -27663.39 | 55326.77 | 44 | 1500 | 55414.77 | 55648.55 | 55458.77 | 55417.49 | 55692.55 |
The Item Side: The Linear Logistic Test Model (LLTM)
The LLTM is simply a more parsimonious Rasch model. Instead of each individual item being estimated, estimated item difficulties are made just based on item indicators. For instance, the data we have has four item types. Each item type gets its own difficulty estimate. However, there are some complications here. There is an assumption that each item indicator is responsible for the item difficulty. This is not a safe assumption.
The other unfortunate part about this is that the simplest way to go about fitting this model is by converting the data to “long” data. After that is done, we’ll create an indicator for each variable type (a categorical variable, denoting math, ela, science, or wordprob type item.
To make the data long, we’ll use the “gather” function from tidyr in the tidyverse package. Then we’ll add an indicator based on if_else statements.
#change the name of the mathwordproblem items
names(eirtdata)[32:41] <- paste0("WordProb.", 1:10)
#note the nested ifelse statements.
#Basically, if each var contains a certain value, such as "Math,"
#it gets recoded into its own column using "Mutate" from dplyr.
eirtlong <- gather(eirtdata, key = "item", value = "response", Math.1:WordProb.10) %>%
mutate(ittype = if_else(grepl(pattern = "Math.", x = item), 1,
if_else(grepl(pattern = "Science.", x = item), 2,
if_else(grepl(pattern = "ELA.", x = item), 3,4))))
head(eirtlong)
## # A tibble: 6 x 7
## id treat proflevel abilcov item response ittype
## <dbl> <fct> <fct> <fct> <chr> <dbl> <dbl>
## 1 1 treat 4 1 Math.1 1 1
## 2 2 treat 3 2 Math.1 1 1
## 3 3 not-treat 1 2 Math.1 1 1
## 4 4 not-treat 2 1 Math.1 1 1
## 5 5 treat 1 2 Math.1 0 1
## 6 6 not-treat 2 1 Math.1 0 1
To set up the LLTM, we have to use TAM.mml.mfr using the facets formula. TAM will create so-called “pseudo facets” for parameters that don’t have estimates.
formulaa <- ~ittype
facets <- as.data.frame(eirtlong[c(5,7)])
#just to make it easier to call
names(facets) <- c("item", "ittype")
#only want to use the response column. Have to add an id column
lltm <- tam.mml.mfr(eirtlong[6], facets = facets,
formulaA = formulaa, pid = eirtlong$id)
control <- glmerControl(optimizer = "optimx",
calc.derivs = F, optCtrl = list(method="nlminb", starttests = F, kkt=F))
#same thing but in lme4
lltmme1 <- glmer(data=eirtlong, response~-1 + as.factor(ittype) + (1|id),
family = "binomial", control = control)
#random item in lme4 (I don't know how to do this in TAM)
lltmlmer <- glmer(data=eirtlong, response~-1 + as.factor(ittype) + (1|id) + (1|item), family = "binomial", control = control)
Ok, so the LLTM model fit in TAM will only give you “difficulties” for the particular item properties. Now, instead of conceptualizing the Rasch model as ability - item difficulty, the model is “ability - item property difficulty.” There will only be as many item properties as you specify. There can be “crossloadings.”
#get property difficulties
kable(lltm$xsi.facets)
parameter | facet | xsi | se.xsi |
---|---|---|---|
ittype1 | ittype | -0.1092315 | 0.0167853 |
ittype2 | ittype | 0.0063631 | 0.0167639 |
ittype3 | ittype | -1.2873576 | 0.0198834 |
ittype4 | ittype | 0.0249119 | 0.0167649 |
psfPF101 | psf | 0.0000000 | 0.0000000 |
psfPF102 | psf | 0.0000000 | 0.0000000 |
psfPF103 | psf | 0.0000000 | 0.0000000 |
psfPF104 | psf | 0.0000000 | 0.0000000 |
psfPF105 | psf | 0.0000000 | 0.0000000 |
psfPF106 | psf | 0.0000000 | 0.0000000 |
psfPF107 | psf | 0.0000000 | 0.0000000 |
psfPF108 | psf | 0.0000000 | 0.0000000 |
psfPF109 | psf | 0.0000000 | 0.0000000 |
psfPF110 | psf | 0.0000000 | 0.0000000 |
fixef(lltmme1)
## as.factor(ittype)1 as.factor(ittype)2 as.factor(ittype)3 as.factor(ittype)4
## 0.109420105 -0.006163771 1.287211963 -0.024710713
Finally, let’s compare the overall fit of the models:
kable(CDM::IRT.compareModels(mod1, lat1, latreg2, lltm)$IC)
Model | loglike | Deviance | Npars | Nobs | AIC | BIC | AIC3 | AICc | CAIC |
---|---|---|---|---|---|---|---|---|---|
mod1 | -31109.02 | 62218.04 | 40 | 1500 | 62298.04 | 62510.57 | 62338.04 | 62300.29 | 62550.57 |
lat1 | -30923.31 | 61846.63 | 41 | 1500 | 61928.63 | 62146.47 | 61969.63 | 61930.99 | 62187.47 |
latreg2 | -27663.39 | 55326.77 | 44 | 1500 | 55414.77 | 55648.55 | 55458.77 | 55417.49 | 55692.55 |
lltm | -38564.85 | 77129.71 | 5 | 1500 | 77139.71 | 77166.27 | 77144.71 | 77139.75 | 77171.27 |
It is perhaps of no surprise that the most complicated model has the best fit.