A Vote of Confidence in Assessment Centre Assessors
When we see candidates get quite different scores for the same competency in different exercises, this can create an anxiety that the different assessors responsible for those marks are at odds with each other. However, it could equally be that the ratings reflect genuine differences in the ways each person approaches different situations. Such genuine differences are, after all, a fundamental assumption of interactionism that states behaviour is a function of the person and situation together.
The normal design of an assessment centre does not allow this issue to be settled. Generally each rating is the result of a single assessor observing a single candidate. As such, it is hard to disentangle the extent to which the rating reflects actual behaviour or erroneous assessor perceptions. However, a recent study (Putka and Hoffman, 2013) used two assessors for each candidate in each exercise. This design allowed the researchers to estimate the variance in ratings due to genuine candidate behaviour as contrasted with the variance due to error.
The results are provocative. Using assessors who had been trained for 32 hours and who were provided with carefully developed behavioural summary scales for each dimension/competency in each exercise, the researchers state that “the vast majority” of the variance in ratings “reflects reliable variance” (p 122). The largest component of this variance (43 – 52 per cent, across three centres) was attributable to the assessee by exercise interaction. This ‘exercise effect’ looms large over the percentage due to the assessee by dimension interaction that was estimated at between 0.5 and 1.8 per cent, depending on the centre. However, they also found that the genuine (not assessor error) effect of idiosyncratic behaviour by each assessee in each exercise on each dimension accounted for around 30 per cent of the variance. Furthermore a genuine halo, doing well (or not) across dimensions and exercises, accounted for around 20 per cent of the variance.
The study helps us consider what is happening in an assessment centre. It appears that people vary quite a lot between exercises in their display of each dimension. This is not surprising in view of the vast literature on the variability with which people display personality across situations. It ties in with the notion of trait elicitation theory that looks at the power of different situations to elicit a given trait by a given person. It also appears that the general (“didn’t he/she do well?”) factor should not be put down to a halo error but is a genuine phenomenon. Finally, Putka and Hoffman caution against ‘giving up’ on dimensions. Despite the small percentage accounted for by the dimension effect they note that dimensions are still central to the assessee by dimension by exercise interaction.
In terms of practical implications, it is very important to note the hours of training that assessors received. In our experience, it would be most difficult for the average client to sanction assessors to have, in effect, a whole week given over to their training. It is also important to note that assessors were provided with detailed guidance on rating each competency in each exercise. Nonetheless, the study is encouraging in its evidence of the reliability that can be achieved in an assessment centre. It also, once again, makes clear the crucial importance that the assessment centre exercises are faithful simulations of the at-work situations and that the trait elicitation cues at work are reflected in the centre. Otherwise you might well end up with a centre made up of exercises that give wonderfully reliable ratings that bear little relationship to subsequent job performance.
How Can Human Assets Help?
The literature on assessment centres has taken a turn for the complex, while also being illuminating. Papers in academic journals are becoming somewhat impenetrable and there is a danger that they are analogous to complex financial derivatives. People just have to take the results on trust because they really are not sure what a ‘linear random effects model’ is that has been used to decompose the variance in assessors’ ratings.
As practitioners and leading experts in the field of assessment and development centres, we can bring you and make sure you benefit from the latest academic research. Much is summarised in our book (Woodruffe, 2007) and now there is a new wave of research activity.
Whilst based on a complex methodology, the sort of research published in this study needs to be taken into account in designing assessment centres, assessor training and candidate feedback. How many assessor training course will continue to castigate assessors for a halo effect while not acknowledging that the effect could be genuine? How much feedback will continue to be in terms of averages across exercises, rather than going into the nuanced way that candidates have approached each exercise and revealed their competencies?