Different Kinds of DIF

# Different Kinds of DIF
## A Distinction Between Absolute and Relative Forms of Measurement Invariance </span>
### Daniel Katz

---

# Introduction

+ "Group" assignment is a Catch-22 in statistics/measurement

+ On the one hand, we often need to assign people to groups for various reasons so we can compare apples to apples in a statistical model

+ However, we often have to run the statistical model to make our classifications

+ In measurement and statistics, finding important aspects of group membership is a way for us to compare (age-grading, diagnoses, regression modelling)

---

# "Fixed Effects" models

+ Sometimes called a "within" estimator

+ In econometrics, we'll include fixed effects - effectively dummy variables - to find an unbiased causal estimate of our "treatment"

+ "covariate adjustment"

+ You can have person fixed-effects, for each occasssion

+ More common - group-based fixed effects (classrooms, race, gender, ethnicity)

---

# "Fixed Effects" models

+ However, what constitutes a meaningful group? What if your group is a conglomerate of groups?
  + Consequence - may ignore heterogeneity in treatment effects even within groups
  + May inflate or a deflate the causual estimate (plus, can you interpret them within group?)
  
+ General lesson: A group indicator can induce more problems if not thought through

+ Problems...

---
# For more on fixed effects and Bias Amplification and Unmasking

![bias_unmasking](bias_amp.jpg)

---

# Differential Item Functioning (DIF)

+ Invariance/DIF analysis starts with the premise that items are invariant, when one can,

1. Match respondents on "ability" or a sum score, or the like, 
  
2. Model the responses probabilities to certain items or item categories 
  
3. Note whether the probability of that response to a given item for respondents is the same regardless of group membership

(note that in factor analysis/SEM, one may instead start with other parameters before item intercepts to make sure freeing those item parameters across groups does not "substantially" change the model fit)

---

# Formalizing
Given, participant `i`, item `j`, value on latent trait, `t`, selection variable (or group), `v`

`$F(X_{ij}=x_{ij}|T=t_i, V_i=v_i) = F(X_{ij}=x_{ij}|T=t_i)$`

Non Invariance Occurs when:

`$F(X_{ij}=x_{ij}|T=t_i, V_i=v_i) \neq F(X_{ij}=x_{ij}|T=t_i)$`
---
class: middle
# Strategies
  
+ Typically, items are removed when measurement non-invariance is found

+ There are various philosophies:
  + DIF found: Item removal (no questions asked - testing agencies)
  + DIF found: Item removal if DIF can be explained
  + DIF found: Items removed/altered/some parameters are allowed to change

---
class: middle

"The authors will be concerned with one specific shift of meaning that
occurs when the concepts of measurement invariance and bias are used in the area of personality and attitude testing. Especially ... when items invoke a frame of reference, for example, by inducing a within-group comparison...." - Borsboom et al., 434

---
class: middle
# More on DIF

+ DIF is classically assumed to be a sign of unintended multidimensionality

+ More accurately, a responses are thought to be a function of the target latent construct AND something related to group membership

+ Shealy and Stout (1993) point out, more accurately, that it's not necessarily "group" membership per se that causes DIF but some other construct that two groups differ on - this gets confusing when the difference on the latent trait is the way classify (for instance, heritage language ability)

+ This is a measurement threat

---
class: middle

# But...An alternative

+ Absolute and Relative Measurement Invariance

+ We typically look at absolute invariance

+ In some cases when absolute non-invariance is found, relative invariance might still hold

+ Let's consider a scenario...

---
class: middle
.pull-left[
+ Psychological measurement has preceeded physical measurement

+ We measure height via self-report]
--
.pull-right[
“I have trouble getting a book from the upper shelves
in a library”

"Sometimes I have to bend over in order to see my face in a mirror”

“I would do well on a basketball team”]

---
class: middle
# DIF: Considering Gender - Absolute (non)Invariance

+ The first two items will not show evidence of DIF

+ The third item will show evidence of DIF

+ DIF will be apparent in the `Basketball` item because difference in distribution (assuming height is important for basketball):
  + Women are on average shorter than men
  + A man who is 5'8" is less likely to endorse the item than a woman who is 5'8"

---
class: middle

# DIF: Considering Gender - Relative (non)Invariance
 .pull-left[
 + Surely, we wouldn't want to throw away this item
 
 + Within a group of men and women, this item is informative (we can learn about distributions)
 
 + We can compare men to men and women to women]
--
 .pull-right[
 + What if we standardized everybody's height? `$$\frac{h_p-\mu_h}{\sigma_h}$$`
 + Perhaps we could then re-consider our question
 ]
 
---

---
Absolute DIF
`$$P(X_{is}=1|\theta_s, \delta_i, \gamma_i, G_s)=\frac{exp(\theta_s-\delta_i + \gamma_i*G_)}{1+ exp(\theta_s-\delta_i +\gamma_i*G)}$$`

---

class: middle
# Recovering Measurement via Relative Measurement
Relative Trait Value: `$z_i = \frac{t_i-\mu_{Tv}}{\sigma_{Tv}}$`  
  
Relative Difficulty: `$\delta_{i_{rel}} = \frac{\delta_{i_{abs}}-\mu_{T_{\rho}}}{\sigma_{T_{\rho}}}$`
---

---
class: middle
# What about relative DIF?
 + If we're ok with the above setting, we have to accept the consequent
 + Absolute invariance holds
 + Relative invariance does not
 + This will happen if there is a difference in ability distributions between 
 
---

# Our example

+ Two "groups" - each normally distributed (I know...)
+ Group 0 ~ `$\mathcal{N}(\mu = 0, \sigma^2=1)$` 
   
+ Group 1 ~ `$\mathcal{N}(\mu = 1, \sigma^2=1)$`
+ Relative abilities are found by standardizing ability distributions relative to their own distribution (first distribution, already)

+ Relative difficulties are found by standardizing item difficulties relative to each groups ability distribution

---

---
class: middle
## Example: Group 0

.pull-left[Person ability = 0 logits  
Item difficulty = 1 logit  
Group ability mean = 0  
Group ability sd = 1]

.pull-right[
`$z_1 = \frac{0-0}{1} = 0$`    
`$\delta_{rel}=\frac{1-0}{1}=1$`    
`$p(X=1) = \frac{exp(0-1)}{1+exp(0-1)}$` 
=27% chance of endorsing the item]

---
class: middle

## Example: Group 1

.pull-left[Person ability = 0 logits  
Item difficulty = 1 logit  
Group ability mean = 1  
Group ability sd = 1]
.pull-right[
`$z_1 = \frac{0-1}{1} = -1$`

`$\delta_{rel}=\frac{1-1}{1}=0$`

`$p(X=1) = \frac{exp(-1-0)}{1+exp(-1-0)}$` = 10% chance of endorsing the item]
---
class: middle 
# Relative Invariance
`$F(X_{ij}=x_{ij}|W=w_i, V_i=v_i) = F(X_{ij}=x_{ij}|W=w_i)$`

Non Invariance Occurs when:

`$F(X_{ij}=x_{ij}|W=w_i, V_i=v_i) \neq F(X_{ij}=x_{ij}|W=w_i)$`

Where `W` is the within group, relative, position
---

class: middle
# We've effectively switched units 
 + A problem in IRT is that you need to have some imposed constraint to since the model isn't identified
 
 + The item is easier for a person of "ability" `0 logits` who is in group 0 relative to group 0 than it is for a person of ability `0 logits` who is in group 1 relatve to group 1
 
 + How do we feel about this?

---
class: middle
# Not an unheard of problem

+ "Value of the dollar"
+ Defined as the number of falafel/pita sandwiches one can purchase with $1

+ Santa Barbara: 1/8th of a sandwich
+ Cairo, Egypt: 3 sandwiches

+ If I want to compare the wealth of a person in Santa Barbara with the wealth of a person in Egypt, what would I do?
---
# A unit problem

+ In the falafel sandwich example: there's an anchor - the falafel and the country
  + The multidimensionality comes from issues related to the country
  + Can "marginalize out conceptually" by standardizing a person's wealth relative to others in the country
  
+ In the the IRT example - what's the anchor? The individual set of items?

+ Group ability distribution is unknown without the items

+ Analysis requires an item that doesn't have absolute invariance

---
class: middle

# why is this important to me?

### Group Membership
  
"There is nothing outside the text" - Derrida, 1976, p. 158; ***Of Grammatology***
  
"There is no outside text" (maybe - "it n’y a
pas de hors-texte")

---

# Anything that relies on self-report

+ People are going to compare themselves to their experiences

+ "I may consider myself depressed compared to people around me, but what about compared to other depressed people?" - I'll likely answer in a way that considers this comparison

+ If there is any sort of cutoff score - you have a lower and higher distribution - thus you have to consider relative DIF
---
class: middle
# For instance: 
+ What if you obtain responses from two different cultures?

+ Person from culture A answers thinking relative to their own culture
  
  + Person from culture B answers thinking relative to their own culture
  
  + Culture A is "higher" on the latent trait than culture B
  
  + You would only find this through relative DIF analysis!
  
---
class: middle
# But what about group membership? 
+ If we think of DIF as multidimensionality (except within group)

+ If we loop back to the very beginning, having the wrong group membership could lead to biased estimates.

+ So relative measurement assumes we have homogenous groups!

+ Or said another way, the dimensional structure must be the same within an "identified" group...
---

+ Let's say there's an assessment of reading strategy use, we'll call it the SUM (strategy Use Measure)

--
+ Intended to be multidimensional but treated unidimensionally

--
+ The assessment has a section testing knowledge of English-Spanish Cognates 
  
--

+ While the assessment has 157 items, I've cherry picked a set of items that exhibit DIF (for didactic purposes)  
  
--

+ Note - this is NOT a fair characterization of the assessment  
---

# I run a Rasch model and find evidence that some items have DIF

(I've selected 9 items, some of which show absolute DIF)

+ Relative measurement not ruled out

+ I need to test whether I can measure within my "identified" groups

+ Heritage Spanish Speakers vs. Not heritage Spanish Speakers
  
---

# strategy - trialing a method

+ Use a clustering method (for instance, latent class analysis)  
  
--
+ Predict cluster/class membership by subgroup 
  
--

+ Is there a cluster/class group members are most likely to be part of? Is it as expected based on theory?
  
--

+ Compare the cluster/distribution/class shape most like my view of a "group" to what the observed item response proportions

--
+ Re-estimate the cluster with just the heritage Spanish speakers and just the non-heritage Spanish speakers (English speakers)  
  
---
class: middle

# Key point

**For this to work, I need to have a response process theory in mind**

- "Those who read like Spanish Speakers" and "Those who do not"
  
---

## Key items
![Qcog3](QCOG3.jpg)

![Qcog20](QCOG20.jpg)
---

+ 328 heritage Spanish Speakers

+ 999 non-heritage Spanish Speakers

+ Used Latent Class Analysis
  + Exploratory Method
  + Effectively "finds" groups in the data - distributions that are alike
  + Accept or reject based on fit criteria

+ Categorical Latent Variable 
---

---

![sem](semplot.png)

---
class: middle

## Methods
+ Probability of being in a given class given self-identified heritage language

+ Multinomial logistic regression

+ Reference Outcome: Class 3 - the "Spanish reading profile"

+ Reference of the Language Predictor: non-heritage Spanish Speakers
---
class: middle
## Results

+ Compared to non-heritage Spanish Speakers, those who identify as heritage Spanish Speakers are 3 logits less likely to be in  the class 1 and class 2.

+ Flipping the reference classes and language groups around:
  + Students who are Spanish at home Speakers are 3 logits more likely to be in reference class 3 (Spanish at home) than any other class relative to non-Spanish at home speakers
  
+ 3 logits ~ 95%
+ -3 logits ~ 5%
---
class: middle
## That's promising for relative measurement
+ But let's think about that - it's not 100%  
  
--
+ What do the profiles look like when I run an LCA without each group?
  + Model without heritage Spanish speakers
  + Model with heritage Spanish speakers

---
<img src="DIF_DIF_presi_files/figure-html/unnamed-chunk-12-1.png" width="720" style="display: block; margin: auto;" />
---
<img src="DIF_DIF_presi_files/figure-html/unnamed-chunk-13-1.png" width="720" style="display: block; margin: auto;" />
---
# Final Not Spanish Speakers Compared to full final model
.pull-left[
![](DIF_DIF_presi_files/figure-html/unnamed-chunk-14-1.png)
]

## Just Spanish Compared to Final Model
.pull-left[
![](DIF_DIF_presi_files/figure-html/unnamed-chunk-16-1.png)
]