Saturday, 18 February 2017

Expected Saves Ageing Curve.

Everyone is probably familiar with the concept of expected goals, assists and saves by now.

A modelled prediction of the likelihood that a player will score, based mainly on the location and type of attempt is summed over a number of attempts and then compared to his or her actual output.

A player who scores say ten goals against a cumulative expected goals tally of eight is therefore considered to have over performed against their expectation.

The reasons for and the sustainablility of this over achievement can be  many and varied, ranging from the presumption that they are a persistently skilled finisher, they have had a hot, finishing run or the model is inadequate to fully describe the nuances of real football life. (Although the latter may be mitigated by running goodness of fit tests on out of sample data).

Instead of merely presenting expected and actual goal numbers ranked by over and under achievement, the same information can be presented in a more graphical form.

Rather than quoting cumulative figures, the granular nature of attempts is respected by using a Monte Carlo simulation for all shots and headers to produce a range and frequency of potential actual goals scored based on all attempts and these distributions are then compared to reality.

Here's a recent example that shows Chelsea and to a lesser degree, Spurs and Arsenal outstripping their simulated range of potential goal difference tallies based on the number and quality of chances they each have created and allowed in a possibly unsustainable manner.

The same approach may be used to describe, if not fully predict the performances of goal keepers.

In defining the difficulty of the task faced by a keeper it is legitimate to include post shot information, such as placement, strength and whether or not a shot took a deflection. These are additions that may not be repeatable from the shooter's point of view, but do better describe the reality of the keeper's task.

Here's a distribution plot for a number of Premier League goalies in 2016/17. Hull's Jakupovic's is most likely to have conceded 15 goals, rather than the nine he actually has and it is around a 1% chance that the average keeper described by the model would have performed as well or better.

By contrast Bravo is having a well documented torrid time at Manchester City, conceding nine more goals that the most likely peak of the simulated distribution of the attempts he has been asked to save.

However, the question remains as to whether these snapshots of "form" represent a longer term up or down tick in the keeper's potential future performance in his current environment or if they will regress towards less extreme levels going forward.

David de Gea is a couple of goals in credit against the model's expectation in 2016/17 and while this is not uncommon for United's keeper, it is possible to find runs of 50 attempts when would have been classed as under performing.

Notably in May of 2015 and February 2016.

Perhaps most usefully, this simulation approach may open up another way to look at the age at which a position generally reaches the peak of a particular attribute.

A variety of methods and curves have been used, See here, here and here. Grouping keepers by their rounded age when they did or didn't make a particular save and then seeing if this enlarged group of ages show a tendency to over or under perform may be another route.

Here's the under (red) and over (green) shot stopping performance of Premier League goal keepers, sorted by age over multiple seasons.

Notwithstanding the problems of survivor bias for older keepers in this type of traditional plot, there does appear to be tendency for keepers to over perform an attempt based model in their mid to late 20's, peaking at around 28 (which is consistent with other approaches listed above).

Their under performance relative to their older selves in their formative years and in the advanced stages of their careers compared to their younger selves is also typical of ageing curves in general.

This approach of course may be used for other performance related indicators across other playing positions.

Modelled data via InfoGolApp