It’s the classic trope wheeled out by those who don’t like football – goalless draws, boring! Even those of us who live and breathe football have to confess we’d rather see a game with goals, even if those goals are all a bit comical, as they were in the 2-2 draw between Man United and Arsenal in the week.
Just how frequent are goalless draws? Mark Lawrenson and Paul Merson spectacularly under-predict them; prior to the current season, Lawro had called just 8 0-0 draws in 2,617 recorded predictions, and Merson just 4 in 1,483 recorded predictions (thanks to @MyFootballFacts and @EightyFivePoints for the data). That’s low (about 0.3%), but how low compared to outcomes?
The featured pic for this post shows the frequency per season (northern hemisphere) over the history of data collected on Soccerbase. There’s been quite a bit of variation over time, and perhaps surprisingly for someone who got into football in the late 1980s and early 1990s, that isn’t the period when the most goalless draws were recorded – it’s actually the 1920s.
So we see that since the late 1960s, things have been fairly constant, with (persistent) variation around about 8%. The middle panel gives the residuals, hence the difference each year from that mean level of 8%. We’re on a run now of 12 consecutive seasons with less than 8%, but not statistically significantly less than 8%. If the downward trend continues though, we may be looking at a new equilibrium sometime soon.
The art of defending, maybe, is a thing of the past?
The paper asks whether players of the prediction game choosing to revise their scoreline picks leads to more accurate forecasts as match kick-offs approach. At first glance, this ought to be a no brainer: of course revising scoreline picks should improve forecast accuracy, because over time there is more information about the nature of an upcoming football match, including injuries to key players and even starting lineups. But there are several possible sources of bias which could affect scoreline tips (or judgement forecasts) and their revisions.
In this research, we find that football tipsters should stick with their gut instincts. This appears to be true quite generally when it comes to forecasting football match scores, both accounting for any differences in forecasting ability between individuals and the differences in predictability between football matches. Revising a forecast (i.e. not sticking with their gut instincts) left the tipsters only 80% as likely to forecast a correct match scoreline compared with when they stuck with their first predictions. In those cases where game players did revise their forecasts, initial scoreline picks were just as good on average as when players didn’t make any revisions. We also found evidence of how game players managed to do worse when they revised their forecasts: their revisions were excessive — perhaps they overreact to some new and salient piece of information about the upcoming match.
These results have some similarities with those found more widely in the academic literature on behavioural forecasting. We hope to use them to guide possible field experiments among communities of sports tipsters (forecasters). One interesting application is whether or not we can find ways to improve the power of crowds in forecasting.
We are also carrying out other research on how to evaluate the football score judgement forecasts made by tipsters, and how forecasting behaviour and evaluation should respond to differences or changes in the “rules of the game” being played by a sports tipster. In general, we find football matches particularly strange objects to forecast, as we have touched on in this blog before. This is mostly explained by the simple fact that frequently the most likely scoreline in any given football match will conflict with the most likely result. There are other situations where this can also be true, and where forecasts could have somewhat greater socio-economic importance.
Midweek is a busy week in the Football League, with a full set of fixtures in Leagues One and Two. We continue to refine our conditional forecasts, also. There’s a fuzzy area around where all three probabilities (home win, away win and a draw) are similar, though usually with the draw a little less likely than the home or away win.
But if the draw is at 28%, home win at 32% and away win at 30%, it seems most likely that a draw scoreline is the better forecast than either team to win. It’s an empirical question as to at what point a draw is the right scoreline to predict, relative to a win for either side, and one we will look at in time.
In the meantime, we make use of the measure of entropy thanks to Claude Shannon, a measure which is highest the more “undecided” is a market – i.e. the closer to 33.333% is each probability in a football match. If the measure of entropy is above 1.09 (corresponding to probabilities between 28% and 39%), we conclude the most likely outcome is a draw, and provide a draw scoreline as our prediction. These are our conditional forecasts.
In the next two tables, we present our forecasts constructed this way. In League One, the strong starts from Sunderland and Barnsley seem set to continue (2-1 away wins, both with probability 9%, but both with win probabilities above 40%). In League Two, Lincoln and MK Dons may end up stealing a march on Exeter and Stevenage in the early four-way tie at the summit of the league.
Our forecasts for Round 3 of the Championship are in the Table below.
But first, a quick word on how and why we are expanding our range of forecasts:
Over the past couple of weeks, we have extensively evaluated our forecasting model, both on the 2018/19 results, but also on how it would have performed over past seasons.
If we were only interested in forecasting correct scores, then we would happily just report each week on what the “Most likely” scoreline is. However, the model also tells us what the most likely result is, and this will often conflict with our forecast of the most likely score. This is simply because draws are relatively rare among results, but relatively common among scorelines. People care about results, probably more than they care about scores.
Also, we now believe that prediction performance metrics on the BBC Sport and Sky Sports websites, as well as online games such as Superbru, favour conditional forecasts. That is, to perform well on those games, players should first pick what they think is the most likely result, and only then pick the scoreline sticking to that result.
Therefore, we are expanding on our forecasts to reflect all of the above.
In the table below, we now predict:
The Most Likely scoreline, with the % chance of that happening
The % chance of a win by the Home team, P(H), or Away team, P(A) (with one hundred minus those two numbers giving the % chance of a draw)
The Conditional scoreline, if the most likely result happens, with the associated % chance of that happening among all possible scorelines
A bunch of “other” matches took place last night, which we forecast. We also presented conditional probabilities, perhaps a bit more intuitive because while 1-1 is the most likely result, a draw is hardly ever the most likely result.
After the event, the natural question is: how did we do? The table below shows that our standard forecasts (loads of 1-1s) got 15 right results, and 8 scores (from 46 matches), while our conditional forecasts got 22 right results, and 5 scores.
Made up score
So what is better? To get more scores, or get more results? This all depends on preferences. The scoring rule of Mark Lawrenson’s forecasts on BBC Sport is 40 points for a score, 10 for a result. The Sky Super Six scoring rule, which we might attribute to Paul Merson’s forecasts, is 5 for a score, 2 for a result, thus valuing a score less than the BBC does. By the Lawro score (scaled down to make it comparable to Sky), our unconditional forecasts got 47, our conditional ones 42. By the Sky score, the difference was one half: 35 to 34.5. If we value a score at only twice a result (rather than 2.5 times as Sky does, or 4 times, as Lawro does), then we get that our conditional forecasts were better, scoring 32 to 31.
Scoring rules matter, and may well matter for how players play games. Scoring rules probably also reflect, to some extent, our preferences and beliefs. It’s clearly much harder to get an exact score right, so why not reward that, like Lawro’s score does, by a lot more than just a result?
In expanding our model to cope with promotion and relegation, and longer term trends in teams’ strengths, we’re now estimating over more seasons and more divisions. The upshot appears to be that we predict a lot of 1-1 draws. Now, is that a bad thing?
If we look at every single match on the Soccerbase website, we find 11% of matches have finished 1-1 — the most common score ever. With almost every single 1-1 we predict, it’s with a probability of about 11%. So perhaps not the worst thing in the world that our model predicts this quite a lot.
Equally, though, it might be a sign that our model is not really able to distinguish between the kinds of matches that do finish 1-1 and the ones that don’t. It’s all well and good predicting 1-1 every single time, but it’s hardly very insightful.
An alternative is to consider conditional probabilities — probabilities conditional on the most likely result occurring. Now despite 1-1 being the most common score, the most common result is a home win. About 46% of the time, the home team wins, draws only occur about 24% of the time. So a conditional forecast would first determine the most likely result, and then produce as the score forecast the most likely score delivering that particular outcome. In our forecasts for tonight’s games in the National League and EFL Cup, we also produce a column with conditional forecasts.
Tonight there are National League, and EFL Cup games. Part of experimenting with potential improvements to forecasting, we have expanded the matches over which we estimate, enabling forecasts of these “other” matches.
We present these abbreviated below (NL=National League, EFLC=EFL Cup, and we hope the teams are still clear). We also present some conditional probabilities, namely the probability of the most likely outcome. This is because while 1-1 is the most likely score, a draw isn’t the most likely outcome. Leeds vs Bolton is a great example of this, where the most likely score is 1-1 (11%), but the most likely outcome is a home win, (45%). Hence the conditional forecast is a 2-1 Leeds win, which seems more sensible, and is at 9% rather than 11%.
Note one other thing; H & W is Havant and Waterlooville, new entrants to the National League. This means they are, for the first time, in our dataset and hence our model, and as a result their model-generated probability of winning at Dover is 0. This is, of course, inaccurate; even Oldham have a 12% chance of beating Derby, a team two divisions above them. Still some wrinkles to iron out…