Tuesday, 26 July 2016

How the Better Sides Tried (and Failed) to Break the Stalemate in 2015/16

One of the more confusing aspects of football analytics is the proliferation of different terminologies, often to describe similar events. There also exists the added problem of definitions meaning different things to different parties.

Game state is a case in point. It is often used to refer to the current score differential, in which case why not just use something like, I don't know, maybe "current score differential".

Whereas to others it denotes a team that is in a position that exceeds or falls below their pregame expectations.

Often this will correspond with score differential, but the lines become blurred when the scores are level. And a side in a low scoring sport, such as football spends a great deal of time deadlocked.

Here's the average expected number of league points a side with a 50% chance of winning the game at kick off will win as a game progresses, but remains goalless.

After 40 minutes of goalless football, their likely average haul of league points has fallen by ~8% of their original hopes, to 1.65 points. After 70 minutes stuck at 0-0, they've shed nearly 20% of their potential gains.

So even with no change in score differential, circumstances have moved gradually away from such a favoured side.

By contrast, a side which initially had a 10% chance of winning and an initial expected points total of 0.53 per game from such contests, will have seen their potential return rise by ~ 50% to 0.8 expected league points if they reach the 70th minute still locked at 0-0.

So a score differential of zero means different things to different teams, especially when initial team expectation and time elapsed is included.

When looking at how sides play at different score differentials, the relative abilities of the teams can distort any conclusions.

A superior team that falls behind may be more capable of creating higher quality opportunities to equalise compared to an inferior team in the match up, who may need to rely on higher volume, but poorer quality attempts to draw level.

However, if we look at matches that remained goalless until fulltime, we can divide the sample between those who were the favoured team at kickoff and those who were the pregame underdog.

The former sides will see their pregame league points expectation decay with time, but, as the superior team, they are likely to be more willing and capable to attempt a positive change.

Based on Opta data from 0-0 games played in the Premier League during 2015/16, here's how the expected goals per attempt, a basic indicator of attempt quality and shot volume changed for favoured teams attempting to turnaround increasingly disappointing games as they headed to a 0-0 conclusion.

Expected goals per minute, which combines shot quality and quantity, is used to attempt to show how much effort the favoured side is putting into trying to break the deadlock against their weaker opponent.

The general trend is upwards and corresponds with the initial plot that shows how potential points return decays with every passing minute. Superior teams do seem to be able to increase their efforts to score as goalless matches progress, even alongside the general increase in scoring as a game progresses..

Of more interest is how they achieve these increased rates.

The plot above includes the average expected goals per attempt and the number of attempts per ten minute interval made by favoured sides in goalless matches in 2015/16. they have been compared to the averages for the sides over the match as a whole and added time has been scaled up to be consistent with the other nine intervals used.

Expected goals and attempts have also been scaled so thay are directly comparable.

For example, during the first ten minute interval, the favoured teams attempted below average volume of shots compared to the games as a whole and these attempts also had below average expected goals values.

Between 21-30 minutes, the quality of chances rose above game average, but shot numbers were still below par.

The most potentially productive period to date came between 51-60 minutes, where, as a group in 2015/16, the favoured teams created above average quality opportunities, coupled with slightly above average attempt volumes.

There's then a slight lull in overall, combined goal expectation, before added time heralded a barrage of high volume, but lower than average quality attempts.

In short, quality and quantity around the hour from the frustrated favourites, giving way to frequent, low quality, but potentially productive per minute desperation as the board appears.

Sunday, 3 July 2016

The Biggest Expected Goals Shocks of 2016/15

Expected goals models have slowly gained acceptance into the mainstream of football analytics.

Whether they are entirely attempt based, predicting the likely outcome for an attempt based on its characteristics compared to historical precedence or non shot based, the aim is the same.

Namely to examine the process of goal scoring in a probabilistic way to attempt to see which teams possess solid fundamental skills that should bring success in the long term, even if they may not always reap their just rewards in the short term.

To mimic actual scorelines, expected goals match summaries are often presented as a cumulative total of the individual expected goals accrued by each team through their efforts in the match.

For example, a side may actually win the game by 2-1, while posting equivalent expected goals totals of say 1.78-1.05.

Intuitively the actual score feels fair and proper. The first named team out scored their opponent both in terms of actual and expected goals and the respective totals are relatively similar.

There are some well documented pitfalls from using cumulative expected goals, notably how a side's expected goals is distributed over their attempts, particularly in terms of so call big chances.

Simulating each chance is the most obvious way to reacquaint expected goals conclusions with the granular nature of the original data.

In comparing expected goals conclusions to actual score lines we should try to sift those optimistic sides who hope for an occasional goal bonanza by trying their luck often and from distance and those continually strive to create fewer, gilt edged chances.

On opening weekend, Arsenal created 21 opportunities, cumulatively amassing around 1.7 expected goals compared to 8 visiting West Ham attempts totalling barely half an expected goal.

West Ham still won 2-0.

Simulating each individual attempt in the game results in Arsenal "winning" around 70% of the time and West Ham just 7%, with 23% of iterations drawn. An 82% success rate for the Gunners, where draw are counted as half a win.

On the day, West Ham's success rate was a perfect 100%. But even when using actual scorelines there may exist different levels of dominance.

The record margin of victory in a Premier League game was Manchester United's 9-0 thrashing of Ipswich Town in 1995. Norwich and Liverpool in 2015/16 also played out a nine goal game, where Liverpool edged the match 5-4.

Comparing these two actual wins, with very different margins of victory, it is perhaps intuitive to think that Manchester United could only be credited with a 100% success rate, whereas Liverpool's single goal win in a goal feast is perhaps less worthy of a perfect score.

Actual goals scored and allowed can be converted into a more probabilistic final reckoning in a variety of ways and those leaving Carrow Road in late January may not have quibbled had it been suggested that Liverpool might have only deserved a success rate marginally above 50% based on the "anyone could have won" 5-4 final score.

West Ham's 2-0 win at the Emirates should perhaps lie between Liverpool's fortunate 5-4 win and United's record breaker 9-0.

A success rate of circa 90% perhaps, in a sport that is usually low scoring might seem a reasonable estimate for the Hammers' actual scoring and conceding achievement in overcoming Arsenal 2-0.

We now have ways to express actual and expected scores in the same currency of probabilistic success rates, so we can compare the two figures for a single match to see where the divergence is greatest.

And that occurred at the Emirates when Arsenal (1.7 expected goals) lost 2-0 to West Ham (0.4 expected goals).

The season's second biggest disconnect between scoreline and expected goals occurred on the final scheduled day of the season when Stoke (0.4 expected goals) beat West Ham (3.1 expected goals) by 2 actual goals to 1.

What goes around................