Copyright © 2017 Meeremans et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Scoring reflex responsiveness and injury of aquatic organisms has gained popularity as predictors of discard survival. Given this method relies upon the individual interpretation of scoring criteria, an evaluation of its robustness is done here to test whether protocol-instructed, multiple raters with diverse backgrounds (research scientist, technician, and student) are able to produce similar or the same reflex and injury score for one of the same flatfish (European plaice, Pleuronectes platessa) after experiencing commercial fishing stressors. Interrater reliability for three raters was assessed by using a 3-point categorical scale ('absent', 'weak', 'strong') and a tagged visual analogue continuous scale (tVAS, a 10 cm bar split in three labelled sections: 0 for 'absent', 'weak', 'moderate', and 'strong') for six reflex responses, and a 4-point scale for four injury types. Plaice (n = 304) were sampled from 17 research beam-trawl deployments during four trips. Fleiss kappa (categorical scores) and intra-class correlation coefficients (ICC, continuous scores) indicated variable inter-rater agreement by reflex type (ranging between 0.55 and 0.88, and 67% and 91% for Fleiss kappa and ICC, respectively), with least agreement among raters on extent of injury (Fleiss kappa between 0.08 and 0.27). Despite differences among raters, which did not significantly influence the relationship between impairment and predicted survival, combining categorical reflex and injury scores always produced a close relationship of such vitality indices and observed delayed mortality. The use of the continuous scale did not improve fit of these models compared with using the reflex impairment index based on categorical scores. Given these findings, we recommend using a 3-point categorical over a continuous scale. We also determined that training rather than experience of raters minimised inter-rater differences. Our results suggest that cost-efficient reflex impairment and injury scoring may be considered a robust technique to evaluate lethal stress and damage of this flatfish species on-board commercial beam-trawl vessels.