Monday, 14 August 2017

Liverpool's Split Personality

Everyone likes a good mystery and Constantinos Chappas provided the raw material for a great one when he posted this breakdown of Liverpool's points per game performance in 2016/17 against the six teams from Everton and above and against the remaining 13 sides.

It's a great piece of work from Constantinos and Liverpool's split personality when playing very well against title contenders and Everton compared to when they do less well against lower class teams has generated much speculation.

These have generally fallen into two mutually exclusive groups, either narrative based tactical flaws of Klopp and Liverpool or odds based simulations that attempt to explain away the split as mere randomness.

It is unlikely that either approach will wholly account for Liverpool's apparent failure to dispatch mid and lower table teams with the authority they appeared to preserve for the league's stronger sides.

Football is awash with randomness as well as tactical nuances, so it seems much more likely that a combination of factors will have contributed to the 2016/17 season.

It's a simple task to simulate multiple seasons, often using bookmaker's odds as a proxy for team strength to arrive at the chances that a side, not necessarily Liverpool might exhibit a split personality.

However, it's a stretch to then conclude that either chance was the overriding factor or it can be excluded as a cause merely because this likelihood falls above or below an arbitrary level of certainty.

There is so much data swirling around football at the moment, particularly ExpG, that it seems helpful to use these number to shed some light on Constantinos' intriguing observation.

Rather than a pregame bookmaker's estimate a a side's chance, we have access to ExpG figures for all of Liverpool's 2016/17 matches.

ExpG have arisen from the tactical and talent based interaction that took place on the field and spread over 90+ minutes of all 38 games they perhaps provide a larger sample of events with which to explain a series of game outcomes, rather than simply using 38 individual sets of match odds, however skillfully assembled.

One aspect of a low scoring sport, such as football, where ExpG struggles is how teams adopt different approaches to achieve the aim of winning the most available number of points.

A side may take a fairly comfortable lead early in a contest and then chose to commit more to defence against a weaker or numerically deficient opponent.

An extreme case was Burnley's win over Chelsea, where early actual goals allowed the visitors to concede large amounts of ExpG and just few enough actual ones to handsomely lose the ExpG contest, but win the match.

ExpG figures are inevitably tainted by actual real events, such as goals and red cards, but it is still at its most useful when used in conjunction with simulations to attempt to describe the range and likelihood of particular events occurring.

Scoring first (and 2nd and 3rd, along with Chelsea going down to 10 men) was a big assistance to Burnley and Andrew Beasley has written about the importance of the first goal here, for Pinnacle.

If we look at the size of the ExpG figures for all goal attempts in a game and the order in which they arrived, there may be enough data that is not distorted by actual events to estimate which side was most likely to open the scoring, allowing them then to be able to more readily dictate how the game evolves.

In games against the 13 lowest finishing teams, Liverpool took the initial lead 16 times, compared to a most likely figure of 15.

With the interaction of attempts allowed and taken, Liverpool ended up 1-0 to the good or bad or goalless throughout about as often as their process deserved.

They fared much better against the top teams.

In those 12 games Liverpool took the 1-0 lead nine times compared to a most likely expectation of just six based on the ExpG in their games.

It was around a 7% chance that an average team repeats this if Liverpool carve out and allow the chances for them.

It's understandable to look to the heights that may be achieved, rather than the lowly foothills left behind.

But based on Liverpool's 2016/17 process from an ExpG and first goal perspective, perhaps their relatively disappointing record against lower grade sides is not the outlier, but rather their exceptional top 6 results are.

Scoring fewer first goals than they actually did in these top of the table clashes would likely decrease their ppg in these games, while inevitably increasing those of their six challengers.

This would shift the top six group gradually to the right in the initial plot and Liverpool slightly more substantially to the left until they perhaps formed a more homogenous group with no outlier.

It's traditional to wind up with "nothing to see, randomness wins again", particularly when one set of data is taken from a small, extreme inducing sample of just 12 inter connected matches per team.

But we now have the data, a place to look and video to see if there is some on pitch, if possibly transient cause to the effect of Liverpool finding the net first in big games or if the usual suspect in Constantinos'  mystery does indeed turn out to be the major guilty party.

All data from @InfoGolApp

Tuesday, 8 August 2017

"It's All about The Distribution Part 2"

First the disclaimer, this isn't a "smart after the event" explanation for Leicester's title season.

It is a list of the occasional, nasty or pleasant surprises that can occur and the limitations of trying to second guess these when using a linear, ratings based model.

Building models based around numbers and averages do work extremely well for the majority of teams in the majority of seasons.

But as the financial world found to the cost of others, neglecting distributions, especially ones that appear normal, but hide fatter than usual tails can leave you unprepared for the once in a lifetime event.

The previous post looked at a hypothetical five team scenario, where the lowest rated, but under exposed side had a much better chance of winning a contest than implied by the respective ratings, simply because the distribution of potential ratings were markedly different for this side.

Again, full disclosure, this model wasn't from football, it was a five runner race run at Uttoxeter and Team 5 was actually a very lightly raced horse against exposed rivals.

I assumed that the idea that distributions of potential performance sometimes matters also carries over into football and the obvious example of an unconsidered team taking a league by storm was Leicester's 2015/16 title winning season.

I went back to 2014/15 and produced some very simple expected goals ratings for all 20 sides going into the 2015/16 season.

I also looked at how diverse and spread out the performance ratings from 2014/15 were for each side.

Three teams whose performances had fluctuated most and might be considered as having a bit more meat in their distribution tails and might be less likely to adhere to their "average" expectations were champions, Chelsea, West Ham and Leicester.

I then set up a distribution for each team based around their average rating and the standard deviation from their individual game by game performances in 2014/15.

I then drew from these tailored distributions as a basis to simulate each game in the 2015/16 season, Leicester's winning season.

And this is how the Foxes and their fellow in and out teams fared in simulations that take from a distribution, rather than a rating.


Leicester project as a top half team, who were as likely to finish in the top two as they were to be relegated and West Ham put themselves about all over the place, but predominately in the top half, which is where they ended up.

Chelsea have a minute chance of ending up tenth, so kudos to Mourinho for breaking this particular model.

There are some really interesting figures emerging today, both for teams and players and usually it's fine to run with the average.

But these averages live in distributions and when these distributions throw up something inevitable, if unexpected, as the bankers found out, someone has to pay.

"It's All About The Distribution".

You've got five teams.

One is consistently the best team, their recruitment is spot on with a steady stream of younger replacements ready and able to take over when their starts peak and wane.

Then we've got two slightly inferior challengers, again the model of consistency, with few surprises, either good or bad.

The lowest two rated teams complete the group of five.

The marginally superior of these also turns in performances that only waver slightly from their baseline average.

For the final team, however we have very limited information about their abilities, partly due to a constantly changing line up and new acquisitions.

The current team has been assembled from a variety of unfashionable leagues and results and we only have a handful of results by which to judge them.

So we group together the initial results of similarly, newly assembled teams to create a larger sample size to describe what we might get from such a team.

Instead of a distribution that resembles the four, more established teams, we get one that is much more inconsistent. Some such teams did well, others very badly.

The distribution of performances for the first four sides is typical of teams from this mini league, whereas the distribution we have chosen to represent the potential upside and downside of this unexposed side is not.

Team 5's distribution has a flatter peak and fatter tails, both good and bad.

The average "ratings" of the five teams are shown below.

Team 5 has the lowest average rating, but by far the largest standard deviation based on the individual ratings of the particular cohort of sides we have chosen to represent them.

As Team 5 is the lowest rated, they're obviously going to finish bottom of the table, a lot, but just to confirm things we could run a simulation based on the distribution of performances for all five teams.

First we need to produce a distribution that mimics the range of performances for the 5 teams and we'll draw a random number from that distribution to decide the outcome of a series of contests.

The highest performance number drawn takes the spoils.

Run 10,000 simulated contests and Team 5 does come last more frequently than any other side, roughly half the tournaments finish with Team 5 in last position.

However, because their profiled performances are inconsistent and populated by a few very good performances, they actually come first more frequently than might be expected from their average performance rating.

In 10,000 simulations, Team 5 comes first 22% of the time, bettered only by Team 1, whose random draw of ratings based on their more conventional distribution of potential performances grants them victory 36% of the time.

Not really what you'd expect simply from eyeballing the raw ratings.

Team 5, based on the accumulated record of teams that have similar limited data, are likely to be sometimes very bad, but occasionally they can produce excellent results.

Such as Leicester when they were transitioning into a title winning team?

As someone once said at an OptaProForum.......

"It's all about the distribution"

......and simple averages can sometimes miss sub populations that could be almost anything.

Straight line assumptions, extrapolated from mere averages will always omit the inevitable uncertainty that surrounds such teams or players, where data is scarce and distribution tails might be fatter than normal.