Anchoring Vignettes and the Compound Hierarchical Ordered Probit (CHOPIT) Model

Introduction

Self-assessments are a common survey tool but, they can be difficult to analyze due to bias arising from systematic variation in individual reporting styles, known as reporting heterogeneity.

Anchoring vignette questions combined with the Compound Hierarchical Ordered Probit (CHOPIT) model, allows researchers to address this issue in survey data (King et al. 2004).

This methodology is based on two key identifying assumptions:

• Response consistency (RC)
• Vignette equivalence (VE)

In today's blog we look more closely the fundamental pieces of this modeling technique including the:

• Typical data set up.
• Hierarchical Ordered Probit Model (HOPIT).
• Anchoring vignettes.
• Likelihood and identifying assumptions used for estimation.

In addition, we discuss a new test for evaluating the identifying assumptions, introduced by Greene, et al (2020).

Typical Data

In addition to ordinal self-assessment responses, the CHOPIT model requires data from a set of vignette questions. Both sets of questions are asked on the same ordinal scale and filled out by each respondent.

The vignette questions describe hypothetical situations, to provide an anchor and create a scale that allows you to compare self-assessments across survey participants.

As an example, consider the Survey of Health, Ageing and Retirement in Europe (SHARE), described in Greene et al. (2020). For a self-assessment of pain, respondents were asked:

SHARE Self-Assessment of Pain
Overall in the last 30 days, how much of bodily aches or pains did you have?
□  None    □  Mild    □  Moderate    □  Severe    □  Extreme

There were also several vignette questions for pain (corresponding to different levels of severity) using the same response categories. For example:

SHARE Pain Vignette
Karen has a headache once a month that is relieved after taking a pill. During the headache, she can carry on with her day-to-day affairs. Overall in the last 30 days, how much of bodily aches or pains did Karen have?
□  None    □  Mild    □  Moderate    □  Severe    □  Extreme

Differential Item Functioning (DIF)

Anchoring vignettes are motivated by the issues arising from the fact that two individuals with identical levels of true underlying pain may apply different response scales when answering the self-assessment question. This can lead to reporting different answers, known as Differential Item Functioning (DIF).

Figure 1: Example of DIF in self-assessed pain

In Figure 1:

• The vertical line represents the underlying latent scale for pain.
• The latent scale is continuous and increasing in scale as we move down the line.
• There are 5 categories available for self-reporting pain outcome. These categories require 4 cutoff-points: $\mu_0, \mu_1, \ldots, \mu_3$.

The issue of DIF is demonstrated by:

• The different locations of the individual-specific boundary parameters that identify the cutoff-points.
• The location of these boundary parameters dictate which response box is ticked.
• Both individuals have identical levels of latent pain, but:
• Respondent B reports mild pain.
• Respondent A reports no pain.
• Without knowledge of the locations of the boundary parameters, researchers would wrongly conclude that B has worse pain than A.

The availability of vignette questions, which all respondents assess, allows us to anchor self-assessments to individual-specific response scales and hence make DIF-adjusted comparisons.

In the rest of this blog, we show how to conduct such an analysis, as well as how to test for the underlying assumptions of RC and VE.

Methodology

To begin, let's note some key characteristics of our data:

• We have ordered categorical responses, $j = 0; \ldots ; J$, on our dependent variable.
• These will typically be on Likert-type scales. This isn't required but is the most common.
• J is typically small (< 10).

The Ordered Probit Model

These lend themselves nicely to the Ordered Probit (OP) model (Greene & Hensher 2010).

In the ordered probit model, there is an underlying latent variable

$$$$y^* = \tilde{x}'\tilde{\beta} + \epsilon_y$$$$

where

• $\epsilon_y$ is a standard normal disturbance term,
• All $\tilde{x}$ are observed with no constant term (though a constant term can be easily accommodated).

The underlying latent variable translates into the actual responses via the mapping

$$$$y = j$$$$

if

$$\mu_{j-1} \leq y^* < \mu_j \text{ for } j=0, \ldots, J-1$$

where $\mu_{-1} = -\infty$ and $\mu_{J-1} = +\infty$. To ensure well defined probabilities $\mu_{j-1} < \mu_j \forall j$. For example, consider the following boundary conditions:

$$J$$ $$\mu_j$$
0 $$-\infty$$
1 -0.681
2 -0.107
3 0.278
4 1.309
5 $$\infty$$

Based on these conditions we can map the latent variable, $y^*$ to $y$:

$$y^*$$ $$y$$
-0.791 1
-0.078 3
-0.341 2
0.578 4
1.67 5

The ordered probit model has standard resulting probabilities and likelihood functions which we won't cover here. However, for example, see Greene & Hensher (2010).

Hierarchical Ordered Probit Model (HOPIT)

The hierarchical ordered probit modifies the ordered probit model two key ways:

• It allows for individual-specific boundary parameters, $\mu_{i,j}$, which in turn allows for different reporting scales across individuals.
• The boundary parameters are specified to be functions of a set of observed characteristics $z_i$: $$$$\mu_{i,j} = f(z_i'\gamma_j)$$$$

Boundary specification options

A key consideration of the HOPIT model is how to relate the boundary parameters to observed characteristics. We will allow for two options for the boundary specification:

The linear specification: $$$$\mu_{i,j} = z_i'\gamma_j$$$$

• Simple to implement.
• Potentially problematic as the requisite ordering of the boundary parameters is not guaranteed, and negative probabilities could result.

The exponential specification: $$$$\mu_0 = z_i'\gamma_0$$$$ $$\mu_{i,j} = \mu_{i,j-1} + exp(z_i'\gamma_j)$$

• More complex to implement.
• Builds on the previous boundaries.
• Ensures that the appropriate ordering is maintained.

Anchoring Vignettes

Due to the linear specification of $\mu_{i,0}$ (for both boundary specifications above), unless other restrictions are imposed, $\gamma_0$ and $\tilde{\beta}$ are not separately identified for variables common to both $x$ and $z.$

However, in practice, it can difficult to determine variables that should be in $x$ and $z$. In these circumstances, additional (external) information is required to identify the parameters of the model.

This is where the availability of vignettes in sample surveys play a crucial role!

Say we have $k = 1, \ldots, K$ vignettes that relate to the self-assessment of interest, where each vignette:

• Has the same available responses as the self-assessment.
• Is determined by the same Generalized Ordered Probit model (as the self-assessment).
• Has a constant term in the outcome equation: $v^*_k = \alpha_k + \tilde{x}'\tilde{\beta}_k + \epsilon_k$
• Has exactly the same boundary specification, parameters and covariates as the main equation of interest.

The introduction of information from the responses to the vignettes, together with the restrictions imposed by the response consistency and vignette equivalence (see below), allows identification of all the model parameters, even when the variables in both $x$ and $z$ are identical.

Note that the vignette equivalence restriction imposes the requirement that $\tilde{\beta}_k = 0$.

Likelihood and Identifying Assumptions

The log-likelihood is the combination of that coming from the HOPIT component of the self-assessment ($lnL_{HOPIT}$) and that from the $K$ vignette equations ($lnL_{V,k}$):

$$$$ln L = lnL_{HOPIT} + \sum_k lnL_{V,k}$$$$

The components of likelihood function are linked via the common boundary parameters ($\gamma$). Due to the "compound" nature of the several HOPIT models here, this combined approach is often called the Compound HOPIT model (CHOPIT).

The approach relies on two assumptions:

1. Response consistency (RC): that the boundary parameters are equivalent across the self-assessment and vignette equations.
2. Vignette equivalence (VE): that the non-boundary components of the vignette equations are driven only by a constant term and random error.

In this CHOPIT model it is now possible to estimate the scale of the vignette equations (remember, normalized to 1 in the self-assessment of interest), either individually for each vignette or constraint to be equivalent across all vignettes. The latter tends to be the norm when estimating the model.

Testing the Identifying Assumptions of RC and VE

Greene et al. (2020) suggests two amendments to the CHOPIT model that allow a test for RC and VE:

• Specify the first boundary equation as $\mu_{0,j} = \gamma_{0,k} + exp(\tilde{z}\tilde{\gamma}_{0,k})$, with the remaining boundaries specified as in eq. (5).
• Normalise to 1 the scale of the vignette equations.

This amended specification:

• Lends itself to separate Lagrange-multiplier (or score) tests of RC and VE and together with a joint test.
• Has minimal effect on the estimated parameters, $\tilde{\beta}$ in the outcome equation for the self-assessments compared to the usual HOPIT model.

The tests are not only informative of the identifying assumptions underlying the CHOPIT model but may be used to select appropriate vignettes or combinations thereof where multiple vignettes are available.

Conclusion

In today's blog, we look more closely at addressing differing reporting scales in self-assessment survey questions, including:

• Typical data set up.
• The Hierarchical Ordered Probit Model (HOPIT).
• Anchoring vignettes.
• A new test for evaluating the RC and VE assumptions of the model.

References

Greene, W., Harris, M. N., Knott, R. & Rice, N. (2020), Specification and testing of hierarchical ordered response models with anchoring vignettes, Journal of the Royal Statistical Society Series A, forthcoming.

Greene, W. & Hensher, D. (2010), Modeling Ordered Choices, Cambridge University Press, Cambridge.

King, G., Murray, C., Salomon, J. & Tandon, A. (2004), Enhancing the validity and cross-cultural comparability of measurement in survey research, American Political Science Review 98(1), 191-207.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.