Actual Effort in Cognitive Tasks
Can Item Response Theory help with operationalisation?
In my article titled “What is (perception of) effort? Objective and subjective effort during attempted task performance” I offer clear conceptual definitions of both actual effort (objective) and perception of effort (subjective). Clear conceptual definitions are key for determining whether a given operationalisation of those definitions (i.e., our ways of defining those variables in the context of our research) meet the necessary and sufficient conditions adequately for the theoretical unit of interest.
In this post I will attempt to provide a solution for a problem I have thought about for a while; namely, how best to operationalise actual effort in cognitive tasks. As I will explain, unlike many physical tasks where operationalisation is fairly trivial, it is not quite so simple to do so for tasks where the underlying capacity that disposes an individual to be able to attempt and perhaps complete the task is not directly observable nor is the demand that the task presents.
However, I think that a solution might lie in the measurement theory employed in psychometrics known as Item Response Theory (IRT). Some of it’s key assumptions and the parameters of its models map conceptually well to those underlying my definition of actual effort. First though, let’s review my definition for context
Conceptual definition of actual effort
I define actual effort as follows:
“Effort; noun; That which must be done in attempting to meet a particular task demand, or set of task demands, and which is determined by the current task demands relative to capacity to meet those demands, though cannot exceed that current capacity.”
And more specifically following Markus’1 Set Theoretical approach as2:
“Effort (concept);
is the actual effort for any individual at time where , and are the actual capacity and actual demands respectively, and and are the magnitudes of those respectively for individual at time , where denotes all possible states of affairs (i.e. combinations of , and , and denotes the boundary conditions noting it as intensional to all possible types of tasks.”
And which is expressed as a derived ratio, given that capacity and demands have natural origins (capacity can be zero, as can demands, but neither can be less than zero):
Where the ratio is expressed as a percentage (%).
Now, the
“In a physical task the role of differential demands and capacity are easily considered in that actual effort is determined by the task demands relative to the current capacity to meet task demands. As such, if two individuals were attempting to pick up the same specific absolute load (e.g. 80 kg) the stronger of the two would initially require less actual effort to complete this task. If they had both performed prior tasks that had resulted in a reduction in their maximal strength, then each would require a greater actual effort to complete the task than compared with when their capacity was not reduced. And further, if both continued performing repetitions of this task their maximal strength might continue to reduce insidious to continued attempts to maintain a particular absolute demand, and thus require an increasingly greater actual effort with every individual or continued attempt to meet the task demands. Correspondingly, if the absolute task demands were increased then both individuals would also require greater actual effort to complete the task. Yet for both, the continued attempted performance of the task with fixed absolute demands and insidious reduction of capacity or the increase of absolute demands, task performance would be capped by their maximum capacity at which maximum effort is required. With training though that maximum strength might be increased such that a given absolute task demand now represents relatively less and so requires less actual effort. Further, biomechanical alterations to the task might reduce the absolute demands and thus the actual effort.”
So, let’s say an individual
So, the amount of actual effort required by the individual to lift the load is 80%. Nice and simple.
But what about cognitive tasks? Sure, we can conceivably apply my definition to such tasks if we assume that such tasks present demands that must be met, and that we have some capacity to meet them. In fact, we could draw similar examples as above for such tasks… again, here’s what I note:
“Similar examples could be provided for cognitive tasks. For example, if two individuals were attempting to hold a fixed number of items in their working memory, the one who has the larger working memory of the two would require less actual effort to complete this task. However, both individuals would again require greater actual effort to do so in the presence of lingering reduction in cognitive capacity from prior tasks, or from continued attempts to meet the task demands, or from increased absolute task demands (i.e., more items to be held in working memory). Again, training may also improve maximal capacity. Also, cognitive processing alterations (i.e., heuristics; Shah and Oppenheimer, 2008) might reduce task demands and thus the actual effort.”
The problem however, is actually measuring the capacity being used to perform cognitive tasks, and the demands of those tasks. It’s not as simple as with say strength or the load lifted. We have an operationalisation problem for cognitive tasks.
But, as noted, I think the trick to this problem might lie in IRT.
Item Response Theory
For those unfamiliar with IRT, I’ll provide a very brief overview of some key elements that are relevant for this post. But otherwise there are plenty of great texts out there covering its background and history, differences with Classical Test Theory, assumptions, different model types and parameters, how these are estimated, model fit etc3.
Let’s suppose that, for each person we test
A key aspect of IRT is that, as a theory, it posits links between some construct that is a characteristic of an individual, referred to as a trait or ability, and that an individuals performance in a test of that ability are predicted or explained by that ability. However, we can’t directly observe this ability itself and instead must infer an estimation of it from the observation of performance on the test. For this reason the abilities are often referred to as latent. The relationship between the “observable” and “unobservable” is then described by a mathematical function which are models that make specific assumptions about the test data; different models imply different assumptions one is willing to make about the test data being examined given the nature of the test conducted. For example, a recently popular model due to its flexibility is the four-parameter logistic model (4PL) where
In this model there are four key parameters as the name suggests, which reflect the assumptions about the test data. The
An Item Characteristic Curve (ICC) is usually used to visualise the relationship between ability and the probability of a correct response to items. So for example, a 4PL model might look something like:
Different models make different assumptions about these parameters. For example, the simplest one-parameter logistic model (1PL or Rasch model) assumes that
In general though, the application of an IRT model allows for test data to be decomposed into an estimate of the characteristic of the individual (i.e., their ability,
That’s about as detailed as I am going to get for the purpose of this post. The point I want to make is more conceptual… or at least, it is more about exploring whether or not we can use IRT models as a means of operationalising
Ability = Capacity ( ); Difficulty = Demands ( )
In essence, the argument I am putting forward here is that the two primitives
So I think that we can use IRT models in order to estimate these parameters and then use them to calculate an estimate of the
However, those who are familiar with typical IRT models might notice a problem here for my proposed solution to operationalisation;
Fortunately, the choice of which scale to place
Using IRT models for lifting weights
I’m a fan of analogical abduction. I used it in developing my conceptual definition of effort in the first place. So I’m going to use the analogy of a test of the ability ‘strength’ where an individual attempts to lift different loads. In resistance training, it is pretty common to measure strength this way. We perform what is referred to as a one-repetition maximum (1RM) test. This test is the operationalisation of strength through the capacity to lift load in a given exercise task. Normally, an individual would perform a warmup, and then lift progressively heavier and heavier loads5 until the heaviest load they could lift only once, and no more, was identified. If we know the maximum load that an individual can lift once, then it’s a good assumption to think that this means they can also lift any load weighing less than this. If their 1RM was 100 kg and we asked them to lift 50 kg they’d almost certainly be able to. If we asked them to lift 90 kg, whilst it would be a lot more demanding to do so, they’d still almost certainly be able to do so. But if we asked them to lift 110 kg they’d almost certainly not be able to do so.
Given that the outcome of attempting to lift a given load can be considered in a binary manner (that is to say a person either can, or cannot, lift the load given their strength ability), whilst unusual to do so, we could fit an IRT model to such data. This offers an interesting toy example to play with where we could actually have a directly measurable ability (strength;
Let’s simulate some data6 and see what it looks like. We sample
An example of data for an individual showing the first 10 loads looks like this7:
## # A tibble: 10 x 5
## person one_RM item actual_effort response
## <fct> <dbl> <fct> <dbl> <dbl>
## 1 p002 91.5 10 10.9 1
## 2 p002 91.5 20 21.8 1
## 3 p002 91.5 30 32.8 1
## 4 p002 91.5 40 43.7 1
## 5 p002 91.5 50 54.6 1
## 6 p002 91.5 60 65.5 1
## 7 p002 91.5 70 76.5 1
## 8 p002 91.5 80 87.4 1
## 9 p002 91.5 90 98.3 1
## 10 p002 91.5 100 100 0
Now, given the kind of test completed here (i.e., lifting different loads) it is reasonable to use the 1PL model mentioned above because there is no guessing or lapsing, nor do different items differentiate people of different abilities differently (it’s a toy example so we’re building these assumptions in). I’m going to follow an approach using Bayesian mixed effect modelling using weakly regularising priors to fit the initial 1PL model using the {brms} package which I won’t go into detail about here8. I’ll use Bayesian models for the following parts also with default priors to keep the approach consistent.
So we fit the 1PL model and we can plot the ICCs9 for each load which look like this:
So it’s pretty clear that the model recognises that people are more likely to lift lighter loads than heavier loads, and that people with a greater strength ability are also more likely to lift a given load. We do have some loads though that are just incredibly easy such that pretty much anyone can lift them, and conversely so incredible difficult that no one can lift them.
After we fit the 1PL model we we can then extract the random effects by person and item, namely
So we fit a simple linear model to estimate 1RM from
lm_1RM
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: one_RM ~ theta
## Data: scores_oneRM (Number of observations: 100)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 101.60 0.40 100.82 102.36 1.00 3512 2914
## theta 3.72 0.06 3.61 3.84 1.00 4520 3043
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 4.07 0.30 3.54 4.69 1.00 3747 2774
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
And also to estimate load from
lm_loads
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: load ~ beta
## Data: loads (Number of observations: 20)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 103.07 1.84 99.54 106.85 1.00 2929 2190
## beta 3.66 0.11 3.43 3.88 1.00 3147 2548
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 7.92 1.47 5.64 11.38 1.00 2759 2318
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
And visually the fit looks like this:
So, we can now use the intercepts and coefficients from these models to linearly transform
Now we have two sets of actual effort in our dataset; we have the original actual effort calculated directly from the 1RM and loads (
## person one_RM item actual_effort theta beta theta_to_raw
## 1 p001 103.3932 10 9.671814 1.25077 -23.9547152 106.2544
## 2 p001 103.3932 20 19.343628 1.25077 -23.9230662 106.2544
## 3 p001 103.3932 30 29.015442 1.25077 -16.8780428 106.2544
## 4 p001 103.3932 40 38.687256 1.25077 -16.8764711 106.2544
## 5 p001 103.3932 50 48.359070 1.25077 -16.8987563 106.2544
## 6 p001 103.3932 60 58.030884 1.25077 -13.2056382 106.2544
## 7 p001 103.3932 70 67.702698 1.25077 -10.1221819 106.2544
## 8 p001 103.3932 80 77.374512 1.25077 -6.3022157 106.2544
## 9 p001 103.3932 90 87.046326 1.25077 -3.8182128 106.2544
## 10 p001 103.3932 100 96.718140 1.25077 -0.2936207 106.2544
## beta_to_raw irt_effort
## 1 15.47079 14.56013
## 2 15.58653 14.66906
## 3 41.34980 38.91583
## 4 41.35555 38.92124
## 5 41.27405 38.84454
## 6 54.77959 51.55511
## 7 66.05562 62.16740
## 8 80.02504 75.31454
## 9 89.10890 83.86370
## 10 101.99815 95.99425
Now, because we have multiple observations for each individuals we can fit a mixed effects model with random intercepts and slopes to explore how well the
This doesn’t look too bad to be fair. In fact, while it slightly underestimates at the lower bounds, it’s pretty darn good in my opinion and leaves me feeling fairly confident in using
Summary and Conclusion
Key assumptions underlying IRT models and their parameters, ability and difficulty, map conceptually well onto the assumed primitives, capacity and demands, in my definition of effort. Given this, IRT models seem like a useful approach to operationalisation of actual effort in cognitive tasks. In fact, using the example of a test involving lifting weights where we actually know a persons underlying ability/capacity and the difficulty/demands of each item in the test, the estimates of effort that result from an IRT model are pretty reasonable estimates of the actual effort we could calculate from direct measurements.
I think this approach offers an interesting opportunity to look at actual effort in cognitive tasks. This could be particularly useful in exploring psychophysics in such tasks where we also capture self-reports of perception of effort during each item.
Further, I do not think that such models are limited to only cognitive tasks. As I have shown here, we can apply IRT models to tasks we wouldn’t typically think to. Many people have been thwarted in attempts to conceptualise effort in target based tasks such as dart throwing. However, I think that the use of IRT models could also allow the estimation of actual effort for these tasks.
Markus, K. A. (2008). Constructs, concepts and the worlds of possibility: Connecting the measurement, manipulation, and meaning of variables. Measurement: Interdisciplinary Research and Perspectives, 6(1-2), 54–77. https://doi.org/10.1080/15366360802035513↩︎
I’ll use
here to denote the individual instead of as I do in the original paper, to be in keeping with the typical notation used in Item Response Theory models that follows because is used for the ‘item’.↩︎To be fair, the Wikipedia page on IRT provides a pretty good overview as expected. But I do like Fundamentals of Item Response Theory by R. K. Hambleton, H. Swaminathan, and H. J. Rogers as a strong intro text that’s pretty short (only ~150 pages excluding appendices etc.)↩︎
Note, for my purposes here I am going with the ‘difficulty’, but if we wanted ‘easiness’ we would use
.↩︎For those unfamiliar, in practice this is normally achieved within 3-5 attempts so as to not allow cumulative fatigue to unduly influence the estimate of maximum strength.↩︎
I used Lisa DeBruine’s great package {faux} for this, which I tend to use for a lot of simulation as it’s so intuitive. Check it out here.↩︎
I’ve deliberately chose someone with a 1RM < 100 kg so it’s clear that the response is dependent on the relationship between that and the load lifted.↩︎
But, take a look at at the great papers by by Paul Bürkner here and here who authored the {brms} package - see here.↩︎
Credit to Solomon Kurz for his great post on wrangling {brms} models to create these ICC plots.↩︎
Nice paper on this recently developed approach to handling bounded variables, along with the {ordbetareg} package that overlays {brms}, from Robert Kubinec here↩︎