How Accurate Are Referees? The PR v The Research

March 5, 2018 Daniel Rhodes

By Daniel Rhodes

March 5, 2018 In Free, Referee's View, Talking Point

When your boss describes a statistic you’ve quoted about the accuracy of referees as “nonsense” then the only way to settle it is by rolling the sleeves up and doing some old-fashioned research. To start, let’s take a look at the particular numbers in question, from the article quoting the 98% accuracy figure for refereeing decisions:

According to the PGMO (Professional Game Match Officials) Premier League referee makes around 245 decisions per game, three times more than an average player touches the ball over 90 minutes. That’s one decision every 22 seconds.

Approximately 45 of these decisions are technical – whether goal-kicks, corners or throw-ins – leaving around 200 decisions to judging physical contact and disciplinary actions.

Of those 200, around 35 are visible decisions where an action is taken (fouls, restarts), and 165 are non-visible, where play is allowed to continue.

In total, refs make around five errors per game, meaning they are right 98 per cent of the time.

The number of decisions referees have to make has increased by around three per cent in each of the last two seasons, and that is only likely to go up in the coming years as discussion around rule changes intensifies.

The assistant referee makes on average 50 decisions each game; 45 of these are pure offside judgements, with four of these resulting in offside flags. Their accuracy? Again, a staggering 98 per cent. (Sky Sports)

Background:

Many on this site will be aware of my ‘campaign’ on behalf of the officials in the Premier League this season. There are a few factors involved:

My brother-in-law is a referee in the local leagues, and has been discussing his experiences learning the trade, and then the interactions he has with players, fans and those who rate him after the match.
Constant focus on the officials takes away from the football, as highlighted by Chris Rowland in this article.
Judging officials’ decisions, on a club-based website, is a recipe for bias. Whether we acknowledge it or not, we are open to looking at each decision in a vacuum of emotion. Each call by the referee or assistant ref is initially watched while we are all ‘involved’ in the game. Research suggests once a fan makes their mind up, even on one viewing, supplying evidence to the contrary is usually ignored or argued against.
The job itself is incredibly difficult. And is getting harder and harder as the physicality and intensity of the league improves.

Prozone data shows top-division football is 20% faster than five years ago. Last season referees averaged 176 high speed runs (above 20km/h) whereas players averaged 175. They also performed, on average, 50 sprints (above 25km/h) in each game – which is a massive 64 per cent increase on what it was five years ago.

The video below is a fine breakdown of the job – and even how ex-professional footballers at the top level struggle to find accuracy.

This is further illustrated whenever the officials get ‘laymen’ to judge close calls, even when the physical aspect is taken away.

This past week Mike Riley, the head of the Professional Game Match Officials (PGMO), set a challenge for a roomful of football writers. He played a training-ground video showing three examples of a player passing the ball, a defender holding his line and an attacker running at full sprint. It was our job to decide if it was offside and, on each occasion, at least 80% decided the flag should be raised. When the replays were slowed down it showed they were onside every time. Twice it was by the sort of distance that might have seen a manager bursting a blood vessel if it had happened in the Premier League. (Guardian)

The Claim:

There’s a little bit of conflict over the exact figures, as they appear in three separate reports across a span of three years – and bear in mind this is the only place that I can find these figures, other than in online articles.

For full disclosure, I will include ALL the info on officiating in the reports.

Premier League Report – 2011/12

Premier League Report – 2012/13

(The image in the introduction)

Premier League Report – 2013/14

Interestingly, the amount of information seems to reduce over time in the reports, until that tiny snippet in 2013/14, before the whole thing disappeared from the reports themselves. I wonder why that could be?

More importantly, when looking at the apparent research done by the PGMOL there is zero method in the public arena. Are these figures based on all games? Have they been extrapolated based on the study of one match? If the figures do exist, why not publish them for all to see? Mainly so they can be verified by external judges. Finally, are you counting “non-visible” decisions just to increase the accuracy and massage the figures?

94% major decisions correct

Fouls, penalties, yellow and red cards

98.6% penalty area decisions correct

99% offside decision correct

Without a published method, I’m going to have to speculate on the one they used using some actual video footage.

The rest of this article is for subscribers only and looks at all the academic research available in the public sphere to challenge these claims made by the PGMOL.

[ttt-subscribe-article]

Video Analysis:

Premier Game Management Officials Limited

This is virtually all the information I could find on them. From the official Premier League website, as they don’t have their own website.

What is PGMOL responsible for?

Formed in 2001 to improve refereeing standards, the PGMOL group officiate across all the Premier League, English Football League (EFL) and Football Association (FA) Competitions – all three organisations fund it.

The training, development and mentoring of 109 referees and 206 assistant referees – run by Managing Director Mike Riley (a former PGMOL referee) and a team of managers and coaches.

The most high profile officials, the 18 full-time professional Select Group referees.

From the start of season 2016/17, 27 Select Group assistant referees who prominently officiate within the Premier League will also become full-time professional match officials.

How is being a Select Group referee different?

Premier League matches are officiated by Select Group referees and assistant referees.

They meet for a ‘training camp’ twice a month, where they perform physical and technical training sessions, and analyse match videos.

There is a robust system for measuring Select Group performance over the season. Each Premier League match is evaluated by a former senior referee who scrutinises every decision using the match footage and ProZone to measure the officials’ technical performance. Former players and managers (Match Delegates) assess the accuracy and consistency of decision making and their management of the match.

(Premier League Official Site)

Academic Research vs The PGMOL

While the PGMOL claim that “each Premier League match is scrutinised” to assess the accuracy and consistency of decisions unfortunately all of this is done behind closed doors – and this is the critical point of this article: how can we ever check this? They make statements, but provide zero evidence and expect fans to just trust the information.

Thankfully, across the world, there has been various studies looking into the accuracy of officials’ decisions. We can use these to test the claims of the PGMOL.

List of Studies:

Effect of positioning on the accuracy of decision making of association football top-class referees and assistant referees during competitive matches
Javier Mallo , Pablo Gonzalez Frutos , Daniel Juárez & Enrique Navarro
Pages 1437-1445 | Received 13 Jun 2011, Accepted 09 Jul 2012, Published online: 06 Aug 2012

Errors in judging “offside” in association football: Test of the optical error versus the perceptual flash-lag hypothesis
Werner Helsen , Bart Gilis & Matthew Weston
Pages 521-528 | Accepted 07 Aug 2005, Published online: 18 Feb 2007

Offside decision making of assistant referees in the English Premier League: Impact of physical and perceptual-cognitive factors on match performance
Peter Catteeuw , Bart Gilis , Johan Wagemans & Werner Helsen
Pages 471-481 | Accepted 30 Nov 2009, Published online: 23 Apr 2010

Physical Performance and Decision Making in Association Football Referees: A Naturalistic Study
D.R.D. Mascarenhas, C. Button, D. O’Hare, and M. Dicks
Glyndwr University, Wrexham, Wales; & University of Otago, Dunedin, New Zealand

Call Accuracy and Distance from the Play: A Study with Brazilian Soccer Referees
MARIO CESAR DE OLIVEIRA, ROGERIO ORBETELLI, and TURIBIO LEITE DE BARROS NETO
Department of Human Movement Sciences, United Metropolitan Colleges, São Paulo; Department of Physiology, Federal University of São Paulo, São Paulo

Results:

Mallo et al (2012) looked to compare how accurate decision-making was in relation to the positioning of the officials based on a study of matches at the 2009 Confederations Cup. They examined 380 foul play incidents and 165 offside situations.

The error percentage for the referees when indicating the incidents averaged 14%. The lowest error percentage occurred in the central area of the field, where the collaboration of the assistant referee is limited, and was achieved when indicating the incidents from a distance of 11–15 m, whereas this percentage peaked (23%) in the last 15-min match period. The error rate for the assistant referees was 13%.

Both Helsen et al (2007) and Catteeuw et al (2010) focused on offsides, and analysed the optical errors and ‘flash-lag’ (i.e. a moving object is perceived as spatially leading its real position at a discrete instant signalled by a briefly flashed stimulus) element of running the line in a football match. These studies looked at matches across two World Cups and one season of Premier League football. The results, once again, are very different from the PGMOL PR.

The error percentage was 26.2%. During the first 15 min match period, there were significantly more errors (38.5%) than during any other 15 min interval. (Helsen 2007)

Above is the results from the 2002 and 2006 World Cups, and here is the study from here in England:

The error rate was 17.5% (868 of 4960 situations). As the English assistant referees tended not to signal in doubtful situations (c = 0.91), there was an overall bias towards non-flag errors (773 non-flag errors vs. 95 flag errors). The flash-lag hypothesis could explain all flag errors, whereas the optical-error hypothesis could explain a proportion of the non-flag errors (45.4%). Fatigue, movement speed, and angle of view did not have a detrimental effect on offside decision making. In conclusion, there were fewer flag errors than in the 2002 and 2006 FIFA World Cups, whereas the number of non-flag errors rose.

It seems the standard of offside side accuracy is in fact better in the Premier League than international tournaments, improving from one in four incorrect decisions to one in six. However, a figure of calling 82.5% of offside decisions correct is very different to the 99% given by the PGMOL.

Mascarenhas et al (2009) provided perhaps the most interesting set of figures, although with the caveat that it was the New Zealand football championship so maybe the standard of officiating isn’t as high – although could be a bias and poor assumption on my part. Here’s the abstract:

A panel of independent referees analysed incidents (n = 144) taken from five referees in seven games in the New Zealand Football Championship (2005/06). The match-day referees made accurate decisions on 64% of the incidents, although their accuracy levels were not related to variables such as movement speed, HR, and cumulative distance covered. Interestingly, referees were on average only 51% accurate in the opening fifteen minutes of each half compared to 70% accuracy at all other times.

Again, a completely different picture than that presented by PGMOL, and crucially in this study they also factored in the difficulty of decisions into their research and found this little nugget of information:

There was a strong relationship between incident difficulty and the correctness of the match referees’ decisions (21, 127) = 15.5, P < .0001). For the 33 most difficult incidents (i.e., viewed most often by the panel) match referees’ decisions were only 36% correct compared with 75% correct for the remaining 94 clips.

As demonstrated in my own video analysis, it seems like the PGMOL inflated their own figures by including literally every decision (active or not) taken by an official, even if they are 100% obvious what the decision is. For example, Player A kicks the ball out of play. If you asked 100 random people off the street who kicked it out and which team get the throw-in, 100 people would confirm the correct decision. Crucially, the vast majority of decisions are like this, based on the PGMOL’s parameters. However, once you dig down and present a panel of independent officials with a big enough sample of decisions, and break those down by difficulty, then you can see the results are very, very different. In fact, in the above study they found referees only managed to get 36% of the tough decisions correct.

In fairness to officials, no bias was found with regards the home or away team:

Investigation of the balance of both correct and incorrect decisions (i.e., whether the decisions favoured the home or away team) revealed no bias, despite contrary evidence from previous research. The match day referee and the experienced panel (who were not susceptible to player or crowd coercion) gave a similar distribution of decisions in favour of the home and away teams.

Another study of Brazilian referees by De Oliveira et all (2011) looked at how far from the incident referees were when making their decisions and whether this had any impact on the accuracy:

Abstract: Soccer matches supervised by the São Paulo State Football Federation were recorded and 321 foul calls were analyzed. No significant association was found between the referee’s distance from a foul play and accuracy of the call (p = 0.561). However, there was a significant increase in the number of correct calls in the last 15 minutes of the second half compared with the number of correct calls in the first 30 minutes of the same half (p = 0.003).

Here’s a table with a breakdown of the distance and accuracy of the decision:

Again, the figures tally with the rest of the academic research, being around the seven out of ten level and certainly not with the PGMOL who claim 98 out of 100 decisions were accurate. Although there is a spike when referees are positioned 20 to 25 meters away from the action, there is no correlation between distance and accuracy:

The independent review of 321 foul calls by three renowned top-class referees indicated a non-significant trend towards a higher percentage of calls confirmed as correct (80.6%) when the referee administering the match was positioned between 20.1 and 25.0 meters from the play, as shown in Table 1. However, there was no significant association between correct calls and the referee’s distance from the play.

Conclusion:

The headline to this article could easily be: Ninety-Eight Percent of Statements by the PGMOL are Bullshit.

And that would be factually correct … For a number of reasons: they rarely make statements; those they make are presented in glossy end-of-reason reports with no method, no link to the research and no chance to challenge any of it; the figures they quote are massaged by questionable methodology; and most importantly of all, none of the research they have ‘published’ tallies with that presented by numerous academic studies. The open access studies found that only 36% of ‘difficult’ decisions were accurate, compared to the 94% of ‘major’ decisions made in the Premier League. Across the rest of the research accuracy figures ranged from 55% to 85%, and intuitively that feels to me more realistic. Accuracy will fluctuate across individual referees, and will also fluctuate depending on the volume of difficult decisions that occur in particular matches. As you saw with the Firmino decision against Southampton, Martin Atkinson had his view obscured by numerous players, and had he seen the incident on video then surely the penalty would have been awarded. How the assistant referee failed to see it though is beyond me.

And thus we can conclude with a high degree of certainty that the 98% accuracy figure I initially quoted was, in fact, absolute codswallop.

Video Assistant Referee – Extra Section

In recent days, this decision has been made:

The International Football Association Board (Ifab) “unanimously approved” its (VAR) introduction on a permanent basis after a meeting in Zurich on Saturday…

Ifab is made up of world governing body Fifa and the Football Associations of England, Scotland, Wales and Northern Ireland.

Each FA has one vote to Fifa’s four, with six votes required for a change in the laws.

Saturday’s decision was made after Ifab was presented with the results of independent analysis conducted by Belgian university KU Leuven.

“I would say to the fans, players and coaches that it will have an impact, a positive impact,” said Infantino. “That is what the results of the study show.

“From almost 1,000 live matches that were part of the experiment, the level of the accuracy increased from 93% to 99%. It’s almost perfect.”

Now, in the context of this article and research, that last line is almost a carbon copy of the claims made by the PGMOL, however, in this case the evidence is available for all to see; we can check it ourselves if we go through the games; each country trialling it have to assess every single VAR decision made and its impact.

One of the biggest criticisms of VAR has been the time it takes, however, according to this report:

Median check time of the VAR is 20 seconds.

Most checks take place quickly whilst play continues or during the ‘normal time’ of a stoppage e.g. during the goal celebration, so have no impact on the flow of the game.

The average time ‘lost’ due to the VAR, represents under 1% of overall playing time.

Use of VARs has a very small impact on the overall playing time ‘lost’ compared with all other situations where playing time is ‘lost’ (typically: free kicks (9.5%), throw-ins (8%), goal kicks (6%) corner kicks (4.5%), substitutions (3.5%) etc).

It feels like the use of VAR in the UK has been very different to the vast majority of trials, it’s almost like they want it to fail? Most of the online polls I’ve seen put public opinion at around 80% against the use of it, so the crucial initial introduction period has been a failure on these shores.

Crucially, we have to persist with it. We have to try to improve it, look at other trials, involve the crowd, speed up the process. Because if not, we are destined to continue to watch football matches and see mistake after mistake that could easily be rectified with a simple look at some video footage.

These are just a summary of the figures, but I highly recommend reading the full IFAB report.

Daniel Rhodes

Deputy Editor - The Tomkins Times More by Daniel Rhodes

"Perhaps the most intelligent guide to LFC available on the internet"

The Independent on Sunday

“An ingenious and intelligent look beneath the surface to reveal what the headlines too often don’t tell us. Fascinating.”

Jonathan Wilson

“Golddust analysis”

John Sinnott, BBC

“[Football analysis] is best left to the professionals, like the admirable Mr Tomkins.”

Daily Telegraph

"Another triumph of impeccable research, Pay As You Play brings much-needed factual insight to a discussion previously dominated by half-truths"

Oliver Kay, The Times

"Very compelling arguments"

Gabriele Marcotti, BBC Radio Five Live

"Liverpool do happen to be blessed with supporters whose statistical analysis provides a lucid interpretation of where the club’s strengths and weaknesses lie, accessible through the Tomkins Times website.”

Ian Herbert, The Independent