Pages

Wednesday, 6 March 2013

A Brief History of Football Analytics (From a Betting Perspective).

Google trends is a great tool. Type in a word or phrase and Google will return the popularity of that search term over time. In 2033 if you enter "Harlem Shake", you should return a sharp peak separated by two flat, horizontal lines, unless Martin Jol's Fulham has embarked on a 20th anniversary tour. In short, it is an excellent tool for tracking net chatter from the last decade.

"Soccer Analytics" first makes an appearance in May 2008, registering around 50% of peak interest, before topping 100% in May of 2010 and then falling back to near it's initial levels again in 2013. "Football Analytics" makes an appearance a year earlier, but this term has probably suffered some cross contamination with America's most popular sport, the NFL. So it is probably wise to actually google the phrase and scan the results to ensure that you are researching the correct subject.

The Poisson distribution is a useful mathematical tool to predict the likelihood of rare events occurring if you know the average rate at which those events occur. It has long been the staple content of many sites used to discuss the modelling of the number of goals you would expect to see in football/soccer matches. The term "football poisson" predates "soccer analytics" on Google Trends by over two years and the reason for this disconnect is that much of what could today be termed "soccer analytics" was being presented on out and out gambling sites in the mid to early part of the last decade. Therefore, much of the initial ground work that exists in mainly American based sports, such as baseball, but appears to be lacking in football, is actually contained on the sub forums of bigger gambling sites or possibly behind pay walls.

The surge of interest in ways to mathematically describe the major events and outcomes surrounding a football match coincided with the relaxing of betting rules, especially in the UK. A decade or so ago the minimum requirement when betting on the outcome of a non live football game was for three games to be included as an accumulator. Therefore, the incentive to produce time consuming mathematical models to predict the outcome of individual football matches was limited. Even if you found one match where the bookmakers estimate was still wrong once they had applied an overround, you were required to pair that game with two others.

Once single bets became the norm, football modelling on the net took off, initially using the poisson approach, but often extending to neural nets, least squares or individual team ratings based on goals or shots. The poisson was and remains the most accessible route into estimating the massive array of gambling opportunities presented by the bookmakers. An average rate of goal scoring leads to team specific numbers, which leads to predicted correct scorelines and onto individual match odds. Easily derived spin offs include the likelihood of winning individual halves, winning or losing from a goal up against 10 men with 40 minutes to play or the chances of scoring the first goal...And then onto diagonally inflated bivariate models if you caught the bug.

I recall first seeing a Pythag for football, corrected for scoring environment being posted by a well known name from sabermetrics in the mid 2000's. Home field advantage was widely appreciated from the start, along with continental advantage in World Cups and regression towards the mean and by implication the importance of random variation due to luck was practiced, if not explicitly expressed. Most of the posters went under pseudonyms, based on exotic particles from physics (guilty) or pioneering communication satellites. The analytics sites of the present are partly retracing territory already covered by the gambling sites of the last decade.

Gambling organisation are obliged to price up many eventualities and the emergence of spread betting, where poor judgement from either side is punished by potentially large losses and good judgement is rewarded has extended the range of events which require modelling. A sensible business needs to know and manage the risks that it is taking. So it is reasonable to assume that bookmakers are confident that they have the best available estimate about a footballing event actually taking place. Modelling football's basic events is possible, because bookmakers and to a lesser degree punters have been practicing the dark arts on gambling orientated sites for at least as long as Google Trends has been tracking search results. The remnants of these sites provide a great primer for those interested in the base rate figures for such connected events as goal expectancy and the most likely time of the first goal. As has been stated, this type of analysis describes football as it is, but it really is essential knowledge.

Sloan Sports Analytics Conference had a gambling panel for a reason.

No comments:

Post a Comment