Forecasting is a mug’s game, everyone knows this. Nonetheless, we like doing it, especially when it comes to football. We all want to know: how will my team do this weekend? This season? Is promotion realistically on the cards? Will league survival come down to the final round of fixtures?
We are two academic economists interested in forecasting such outcomes. We are up for a challenge, and so our focus is on the particularly tricky endeavour of predicting exact scorelines. We are aware of all the forecasting pitfalls and of how terrible forecasters us economists are (perceived to be). We do not claim to be experts, but just a couple of football fans interested in various aspects of the endeavour of forecasting.
Exact scoreline forecasting in football is already a widespread activity. To reasonable fanfare, former professional footballers make weekly forecasts of English Premier League scorelines throughout the season. Websites exist enabling anyone to record their predictions. Bookmakers implicitly make forecasts too when releasing their odds for all the possible match outcomes. We’ve all heard about the “wisdom of crowds”, combining these various forecast sources. But how do they differ? Are some forecasters better than others?
Given the massive volume of information that football generates in a timely fashion, it can be readily collected and analysed. Statistical models are now widely used to understand more about the beautiful game (e.g. when is a short corner better than a ball whipped in under the keeper’s nose?). These models can also be used to forecast individual match results, scorelines, and even the final league table come next May. Can we use them and make forecasts which improve on those currently coming from the crowd?
In fact, we are by no means the first to attempt this in the context of football scorelines. Nonetheless, we have created our own original model which estimates how many goals each team scores in a given match as a function of their own historical attacking and defending abilities, the historical abilities of their opponents, recent form, home advantage, the disruption of international breaks and European matches, and whether the match takes place on a midweek evening in November (though not, as yet, whether that evening was also rainy).
We use this model to predict what is the most likely score in upcoming matches. This is no straightforward task. Most weeks it defeats former professionals (and legends) Mark Lawrenson and Paul Merson. In general, even the most likely scoreline has just a 10-15% chance of happening. So, on top of predicting the most likely scores, we give what the model suggests is the probability of them happening.
Reading FC, now managed by former Derby Country and Swansea manager Paul Clement, kick-off the entire English football season on Friday night. They welcome rookie boss Frank Lampard’s Rams to the Madejski. We find that the most likely final score is a narrow 1-0 win for Derby. Does that mean we were wrong if it finishes 2-1 to Reading instead? In a way, yes. But also no, because we can also say that there is a 13% chance of a 1-0 win to Derby, which means an 87% chance it’s not 1-0 to Derby – but that’s an imprecise forecast and not of much interest to anybody. What we are saying is that if the game on Friday night could be replayed 100 times under the exact same conditions – after each final whistle we rewound back to kick-off – then Derby would win 13 of those games 1-0 (It’s a bit like saving where you are in a computer game, playing a match, then going back to the saved version of the game and trying again).
Once we’ve predicted one set of games in the English Premier League or Championship, and the likelihood of their possible outcomes, we can carry on doing it, all the way to the end of the season. We do this using the model to simulate the entire season forward many times, updating the model estimates after each simulated game, until at the end we arrive at many iterations of the possible final league table. The fraction of times a team appears in the top two league positions at the end of May gives us a prediction on how likely they are to achieve automatic promotion to the Premier League. Similarly, we use the model and simulations to predict the likelihood of each team making the playoffs or suffering relegation.
After each week of matches over the coming season, we will update our model and our end-of-season predictions. As more information comes to light on the relative strengths of the teams, both these predictions and weekly scoreline forecasts should become increasingly accurate.
There is a serious side to this beyond just making football predictions. This work forms part of our ongoing research on evaluating forecasting methods and behaviour more generally, the results of which we may occasionally also discuss here.
Finally, we must make a standard disclaimer. We, the authors, have no commercial interests, and are publishing in our capacity as private individuals. Neither we as individuals nor the bodies to which we are professionally affiliated are suggesting that anyone should use our predictions to attempt any financial gains.