Pages

Thursday, 25 May 2017

The Ticking Premier League Clock.

With the 2016/17 Premier League season now a wrap there's inevitably a raft of season reviews, both statistical and narrative driven.

Already sides are scrambling to pick apart the squads of the three relegated teams and capture the talent that shone brightest amongst the mediocrity.

Improving your Premier League squad for the upcoming 2017/18 season is an obvious priority. The likely output from you current collection of talent does not stand still, principally through the ticking of the clock.

It has been well demonstrated that a player's output, as measured by simple metrics or the amount of playing time he is given first waxes and then wanes (desperately resists obvious pun).

Although there is some positional variations, as well as individuals who possibly fall outside the usual, the peak ages in general for Premier League players lies between the ages of 24 and 29.

It is a simple task the chart which teams are well set to enter 2017/18 with a squad that is likely to show an improvement, just because players who were deemed good enough to be given playing time in 2016/17 are either moving into  the sweet spot for age related peak of performance or are remaining within their peak years.

On the flip side, other teams may be anticipating the need to recruit new, younger talent to replace an ageing squad that may have produced results that are acceptable for the club's perceived status in the Premier League pecking order, but if left unresolved will likely see an age related decline.



In the table above, the weighted amount of playing time given to players has been grouped by age,

This makes it possible to see which teams have a comfortable buffer of young talent that was deemed good enough to play some part in 2016/17 and under normal development will be expected to pick up some of the shortfall from older squad members who may begin to show age related decline.

It's also possible to wind the clock forward to spot which sides are best placed to cope with these transitions in the absence of new signings.

Ominously, Chelsea will likely retain the highest proportion of peak age performers, narrowly followed by fellow Champions League participants, Liverpool and Spurs.

By contrast, Manchester City again find themselves with a dearth of peak age performers from their current squad in the upcoming season, suggesting a bout of major squad reconstruction is imminent.

Monday, 22 May 2017

Tony Pulis Is Not A Slacker

Tony Pulis is never short of narratives.

Since the diminutive Welshman announced his presence on the main Premiership stage, guiding an under funded Stoke team, lacking in top flight talent to perennial survival, he's attracted plaudits and brickbats as the master of squeezing the most from meagre resources.

He's acquired manager of the season awards, as well as acrimony for his dour, anti football, laced with innovation, for which all Stoke fans will forever forgive him, especially as it came with the added bonus of infuriating Arsene Wenger.

Slacker, however, is a term rarely associated with Pulis or his three Premier League charges.

Until now.


Visually the evidence appears damning. In the 54 matches a Pulis led side has played after the black line in the graphic, only 45 points have been won.

That's relegation form in every season and the implication is that a manager who once infamously multi-tasked by cancelling Christmas, while also showering, has allowed his team to slacken when a likely survival target has been met.

So do the numbers support the view that a manager whose mantra is "work 'ard" actually relents during April and May.

"Can I have the month off, boss"?

Firstly, there is an element of selective cutoff points that do Pulis no favours in the graphic.

To surpass any target requires a side to either win or draw and in eight out of the nine seasons, Pulis' side reached the line set in the graphic with a win.

Therefore, just as "X has not won at Y since 2014/15, immediately tells you that they did actually win in 2013/14, each period of "rest and reflection" begins immediately after a positive result and that biases your perception of the ensuing games.

Secondly, gaining points is very difficult for mid to lower ranked teams, epitomised by those TP has managed.

It's quite easy to spot runs of 5 or 6 consecutive matches without a win during periods when Pulis was presumably cracking the whip (or wet towel).

Thirdly, the fixture list can get very unbalanced when broken down into segments of between 12 or just three matches, as has been done in the graphic.

Whether by quirk of the fixture list or design, Pulis has been sent more games against the Premier league's best and Arsenal in the latter phases of the season.

Rather than lounging on a deckchair, they've been taking on Arsenal (6 times), Man City (4 times, including once immediately after a FA Cup Final), Chelsea (3 times), Everton (3 times), Liverpool (3 times), and Manchester United and Spurs, twice each.

That's a disproportionately larger share of the current top 7 compared to a random draw.

The easiest way to quantify how a side has done over a range of games is to simulate the range of possible points won based around a probabilistic model that doesn't incorporate a "doesn't try when safe" variable.

This approach results in Pulis gaining the actual 45 points his sides accumulated or fewer in around 16% of trials.

So the return is an under performance, certainly, but one that might occur in 16% of simulations simply through the randomness of how points are won.


Here's an attempt to cherry pick a single season where the returns are so low compared to a odds based distribution of points that randomness is challenged as a possible contender for the actual points returned in the run in.

Seven out of the nine seasons are unremarkable, the two exceptions are the most recent campaigns at WBA, but even these two examples have respectively a 10 and 7% chance to just be random deviations from a bench line estimate of WBA's ability over the season.

And with a raft of sides hovering around WBA's performance expectation for points won going into April, the chances improve that someone, (not necessarily WBA), will appear to tank their season early.

Even if there is something in the tailing off of a Pulis side in two out of nine seasons, evidence must be presented for a possible cause, which could be plentiful.

Resting players carrying longterm injuries, experimenting with alternative tactical set ups, blooding inexperienced players, seeing your hot and unsustainable production from niche attacking methods regress towards less extreme levels each deserve scrutiny.

The list is nearly endless and almost universally laudable, but Tone giving the lads a breather would be way, way down my list, even if the data supported the claims.....which it doesn't.

Friday, 19 May 2017

Who's Made Their £Million Wage Earners "Put In A Shift"?

As soon as Omar Chaudhuri starts tweeting words like "bugbear", you know he's onto something that deserves a good going over.

£'s per point were deservedly in his sights as a way of determining over or under performance compared to league position following The Times perpetuating this nonsense.

I've outlined the fatal flaws in this approach in yesterday's blog, and Omar has also suggested improved methodologies on his Twitter timeline.

But it opens up a wider question about the simplified use of readily available data.

Just because something is relatively easy to calculate and appears to be intuitively sensible it doesn't make it immune from being a piece of pernicious hogwash.

In the NFL, strength of schedule prior to the season is regularly estimated by adding the win/loss record from the previous season of the upcoming opponents for each team. This seems sensible and excel & csv files are your friend.

However, this too is easily verified GIGO. Do you really think a multitude of easily identified factors that delivered a 2-14 record are going to perpetuate?

Note to the Racing Post, "stop using these numbers in your season preview".

No one minds flawed reasoning, but the greater the potential audience, the greater the responsibility to do some due diligence regarding methods and a willingness to make corrections if needed.

Here's the performance of Premier League teams over the last six, nearly complete seasons, using the proportion of resources outlaid in wages and the similarly weighted rewards in terms of wins and draws compared to the historical relationship between the two.

A couple of seasons may be missing because I couldn't find the data for a few sides.


Everton, Spurs & Southampton have had a more than fair return for handing out bulky pay packets as do Bournemouth, with more limited evidence.

Newcastle have managed just one, albeit a spectacular season of over performing against the splashed cash and Leicester's single over par was unsurprisingly the largest one in the whole six year sample.

Sunderland can at least attempt to eventually over perform in new surroundings in 2017/18.


Here's the individual under/over seasonal wages vs performance for the 11 ever presents over the six seasons.

Tottenham and Everton making a habit of beating expectation and Arsenal performing to similar relative levels as WBA and Stoke (whose managers names escape me for the moment).

Tuesday, 16 May 2017

Chelsea Win the Title By Efficient Use of Wages.

The Times is one of the pioneers of quality, statistically based football journalism, notably under their Fink Tank banner.

It's therefore no surprise when their sporting articles not only receive extensive coverage on Twitter and in other news outlets, but also carry a degree of authority based on the legacy of past departed star performers.

One such post appeared on twitter today and quickly spread via a raft of online newspapers and media, gaining many likes and re tweets.



The post was a long performed ritual come the end of the season, whereby a side's wage bill is divided by their points total to derive a "cost per point" number.

Following this intuitively comforting calculation, one team is deemed the most wasteful with their millions, in this case Manchester United (£3.6 million per point) and one is crowned the "value for money" team of the year. Spurs (£1.3 million per point).

Title winners, Chelsea came 12th out of 17 (the promoted teams were omitted), implying perhaps that they had won the title with a wasteful, inefficient splurging of the chequebook.

But is that really the case?

When Chelsea last won the league in 2014/15, the average wage bill for the 20 teams was around £100 million, ranging from £29 M at Burnley to £217 M at the title winners.

The Blues' wage bill was just under 2 standard deviations above the league average and for that outlay they gained 87 points or a success rate of 0.8 per game if you prefer to express draws as half a win.

The reward for Chelsea spending 2 SD above the league average wage bill was a success rate that was slightly greater than 2 SD above the average success rate for the Premier League.

This relationship holds for multiple seasons and for most teams.

Below is the plot for 2014/15.


The uncomfortable truth for a team wishing to gate crash the top of the Premier League, Leicester excepted, is that a typical title winning season requires a financial outlay in the region of 2 SD's above the league average.

Similarly, a stinting on the wage bill inevitably, with some variation throughout to account for luck, innovation, plagues of locusts etc pitches you into the bottom half of the table.

But the take away is that there is a strong relationship between how much you spend in comparison to the league average and the actual wage bills of the remaining teams in a Premier League season and your success rate again compared to the league average and that of your competitors.

Therefore, any under or over performance in a season should be compared to this historical relationship, rather than a perennial and flawed click bait ritual involving nothing more than long division.



Using the raw figures used by The Times (actually from 2015/16, but we're assuming 2016/17 will be similar), Chelsea spent 1.64 SD's above the league average this term.

 That "entitles" them, based on historical precedent to gain a success rate that is around 1.3 SD'a above the league average.

With a week to go their success rate ( (wins + draws/2)/Games played) is nearly 1.8 SD's above the league average success rate.

Whereas Chelsea languish two thirds of the way down The Times' value for money table, they've actually won the league, while also being this season's third best over performers in terms of share of money spent and success rate achieved.

Rather than being lambasted in a quality daily, Roman's bean counters, backroom staff and players deserve a huge pat on the back for being the best and efficiently so. Even if there was, as ever, an element of unsustainable good fortune, as well.

Bournemouth fall from 2nd to 4th, Watford from 4th to 8th, Stoke drop from 6th to 10th and Swansea also drop to 15th from 11th. Manchester City rise from 13th in The Times to 6th and Liverpool go from 15th to 9th.

Some teams remain relatively unchanged. Congratulations Spurs, sorry United fans, but consigning to the bin this self confessed "simplistic" method of ranking over or under achievement is long overdue.

Also check out Omar Chaudhuri's time line for his views on this "bugbear" & an alternate approach to quantifying inefficiency in wages.


The Championship Playoff Defensive Kingpins.

The second leg of the Championship playoffs are played out over the next couple of days and while the headlines will inevitably be grabbed by the goalscorers I thought I'd spread a little love for the workhorses whose job it will be to prevent the net from bulging.

Analytics has unsurprisingly concentrated on attempting to quantify those involved in the actual act of scoring or not.

Expected goals, either from the perspective of the the player taking the chance, those who immediately set up the opportunity or the keeper trying to make the save, can begin to quantify the probabilistic based process between the singular outcome.

However, defensive actions are more problematical.

Mere counting actions, without context can lead to misleading if plausible conclusions.

My first ever blog gloriously demonstrated that the recently promoted Stoke City may have lead the league in fouls committed, but once possession and opportunity were factored in it was actually Arsenal that perhaps deserved the title of dirtiest team in the Premier league.

A simple tweak, that more reveals the importance of an individual to the defensive actions of his side is to look at the proportion of defensive actions a player has attempted compared to the proportion of playing time he has enjoyed.

These actions may range from attempted tackles to interceptions and clearances.

We can easily add a further layer of information to this gradual contextualisation of defensive stats by including the average position on the field from where each player carries out these actions, as well as how spread out these actions are from this average point on the field.


Here's the Defensive workload undertaken by the players who may lineup in this week's Championship playoff second legs.

A defensive quotient corrected for playing time of 1.0, suggests a player is participating in an average share of his side's defensive duties and the larger the average % distance his actions occur from his opponents goal, the more his mixing it in the muck and nettles of his own half.

So two predominately green bars denote lots of defensive actions, predominately in his own half. And red and red equates to a lightly defensively involved player, likely playing much higher up the field.

Great to see Deano still plying his deep lying defensive skills at a high level!

Sunday, 30 April 2017

Nothing To Play For (except the odd £15 Million).

The final couple of matches of the Premier League season have traditionally thrown up a host of matches where there is little for some teams to play for.

They are marooned in mid table, too remote from the title or European places, but relatively, if not mathematically safe from the threat of relegation.

Anecdotally, they are high scoring affairs, where teams care less for the physical risks associated with full bloodied defending, although this weekend to date appears to have deemed attacking play an optional extra.

However, the influx of Sky, BT and overseas rights money has potentially made these hitherto meaningless games a much more lucrative sideshow to the drama at the top and bottom of the Premier League.

A fixed cut from here and overseas, combined with a performance related slice and additional extras for more frequent TV appearances can inflate the end of season TV paycheck by upwards of £10 million, even at this late stage of the campaign.

Some teams are locked within a place or two of their current league position, but for a handful of mid table sides the up or downside can stretch to three or four places in either direction.


click image to enlarge.

The table above has simulated the remainder of the season 1,000's of times using data from the Infogol App , but rather than plotting the traditional likelihood that each team will finish in a particular position, this has been replaced with the cumulative reward each side would receive if they climbed or fell in the table.

Sunderland are the easiest side to explain.

They are virtually certain to finish bottom for which they'll receive £98 million, but there is also around a 3% chance they could win another £2 million for the club by ending up second botton.

For the likes of Stoke, Leicester, West Ham, Watford, Palace, Burnley and Bournemouth, the up and downsides are more widespread. Stoke are as likely to add £121 million to the Coates' family billions as they are to humbly submit a mere £105 million.

So whether the players will play ball or not remains to be seen, but the age where May saw the beach beckoning for swathes of Premier League players may be a thing of the past in the era of new found even greater affluence.

Friday, 28 April 2017

Arsenal's Shooting Accuracy, Nothing To See?

Expected models come in many shapes and flavours, using a variety of inputs and variables, but they almost all relate to goals scored or conceded.

Early expected incarnations also looked a similarly binary events to goals, such as whether an attempt was on target or blocked, but they've never really grabbed the limelight like their bigger, expected goals older brother.

The methodology is essential the same as that used in building an expected goals model.

Variables are the usual mix of shot location, type and attack classification, tested on out of sample data.

Here's the expected number of shots on target compared to their expectation for the top six teams up to the beginning of this month,

The data has been taken from the Infogol App.


It appears to give a fairly straightforward narrative, based around a relatively accepted family of metrics.

Without running a few simulations it would be easy to categorise the six best team in the Premier League as being either wasteful with the accuracy of their attempts (Manchester City) or more than a little pleased with themselves (Manchester United, perhaps slightly surprisingly).

What seems self evident is that there's nothing really to see with the remaining quartet.

Chelsea, at the time had six more shots on target than their expectation, albeit from the smallest sample size, very similar numbers to Arsenal's slight under performance and well in line with Liverpool and Spurs.

However, a cursory look at expected figures verses actual achievement to label a side either under achieving/unlucky or over achieving and fortunate often fails to reveal nuances in a side's scoring or shooting profile that is present in the data.

The only team that deviates significantly from the expected SOT model in the above table is Arsenal, despite masquerading as a side mildly under performing when it comes to working the keeper.

Arsenal are actually pretty poor at hitting the targets when taking shots that the model deems more likely to miss (usually longer range efforts). While they are a lot better at hitting the target with attempts that the model has decided are much more likely to be on target and require a save.

These two fairly large deviations from the predicted arc of expectation from the model at opposite ends of the likelihood scale roughly balance out giving Arsenal an actual accuracy figure that comes close to matching their predicted figure.

In short, Arsenal may look fine in the aggregate, but they're the only top six team that have individual ranges of actual outcomes that deviate by an interesting amount from this decently robust model for the Premier League.

Wednesday, 26 April 2017

Reading Between the Lines

The 2016/17 regular Championship season is almost done and dusted.

With two games remaining for each team it is left to Leeds to attempt to gate crash the play off picture, most likely at the expense of Fulham and Blackburn to try to leapfrog out of the final relegation spot to the detriment of either Forest or Birmingham.

What seems highly likely and certain is that Reading (GD currently +1) and Huddersfield (+3) respectively will contest the playoffs for promotion to the Premier League.

Whilst Huddersfield's underlying ExpG stats have gradually gravitated towards their lofty league position, Jaap Stam's Reading remain an uncomfortable enigma for advanced stats.

Ben Mayhew, who runs the excellent EFL orientated site Experimental361 has consistently rated Reading as a lowly Championship team and Colin Trainor, one of the earliest analysts to develop the concept of ExpG has also tweeted about the Royals' apparent over achievement.

In addition, the ExpG model which powers Infogol's football app has Reading's underlying stats being consistent with a side in the lower third of the table, rather than striving for the pinnacle.



Here's the rolling six game, ExpG differentials for the still active protagonists at the top of the Championship to the start of April.

Wednesday, along with the two automatic promotion teams have been the most consistent ExpG teams this term.

Huddersfield, as already noted have gradually produced underlying stats that are fit for their position, while Leeds and Fulham have been inconsistent, but overall in credit.

The sole exception is Reading, whose underlying ExpG differentials have simply declined, even with a cut off point that omits a 7-1 thrashing at Norwich.

In terms of traditional goal difference stats, they share a similar figure with Derby and Preston, who have nearly 20 fewer points than high flying Reading.

If we look at Reading's ExpG figures, there is little to quibble about in their goal scoring exploits. Their actual goal tally agrees almost exactly with the modelled ExpG values from each goal attempt they have created.


The disconnect is on the defensive side of the ball.

Reading's Exp GD is -13, which would place them in and around the likes of Burton, and still relegation sensitive Forest and QPR. So they have over achieved on the defensive side of the ball.

If you run simulations on all of the attempts Reading have allowed their opponents in this season's Championship, they concede their actual total or fewer around 5% of the time.

The chances of Reading doing as well, defensively, given the volume and quality of chances they have conceded is relatively small, but the chances that someone over achieved by as much as the Royals at sometime in the recent past or even just this season becomes more likely.

While defence is a group activity, much of the burden falls on the keeper, in Reading's case, the veteran Omani, Ali Al-Habsi.

We may use an alternative ExpG model for keepers, which focuses merely on shots on target and incorporates post shot information, such as placement. power, swerve or deflections to help us understand if the keeper is playing above the save expectations of such a model.

As with the general ExpG allowed model, Al-Habsi has also over performed. It is again around 5% that an average keeper does as well or better faced with the shots Reading's keeper has been required to deal with.

The goodness of fit of this attempts on target model can also be tested to see if Al-Habsi is coming close "breaking" the modelled ExpG or if we can speculate that he has been somewhat lucky.

If we rank the efforts faced by Al-Habsi in terms of difficulty, he has had a particular purple patch when dealing with moderately difficult attempts. He's conceded just 8 goals from chances that were ranked as being between a 55% chance and 80% chance of being scored, compared to an expectation of over 12 goals.

However, over the entire range of chance probabilities he's faced the deviation from the model has around a 15% chance of having occurred by chance.

In short, he's saved more of the chances that the model expected him to save and he's progressive, if slightly unevenly, let in more of the most difficult shots.

A 29 year old Ali Al-Habsi, six years from his peak?

Overall, Reading don't break a variety of ExpG models, even on defence and while luck is the most likely explanation for their over performance, I wouldn't be so presumptuous as to assume skill differentials or tactical nuances aren't entirely absent.

However, even if we allow an Al-Habsi inspired Reading a near equal GD, that explains why they aren't actually in the bottom third, it still remains puzzling as to why they aren't drifting aimlessly in mid table with Derby and Preston.

Newcastle are the poster child for out performing their actual GD, claiming around 10 more points than their single figure GD merited in 2011/12.

They share with Reading an imbalance of narrow wins with a handful of wide margin defeats scattered throughout the season.

Reading has won 16 games by a single goal margin accounting for nearly 60% of their current total points.

This may hint a the near mythical ability to "score when we want" as most exemplified by Manchester United, under SAF, Jaap Stam's former employer, but United perhaps aside, this ability is often difficult to maintain.

The effect is more striking in the Premier League, but a side which relies on single goal wins for a large bulk of their points sees their points total decline by around 6% in the subsequent season. Maybe implying their record was more down to unsustainable influences.

Reading have also profited from late goals. As a crude measure, 11 extra points have been won from goals scored after the 87th minute. Another alchemical trait practised almost exclusively in the longish term by SAF

It would be churlish to brand Reading as merely fate's flavour of the season, but it would equally be unwise to take their current lofty league position entirely at face value.


Tuesday, 25 April 2017

Profiling Open Play Shooters in the 2016/17 Championship

As a follow up to the recent post looking at the average distance from goal Premier League strikers tend to shoot from an the variability of their preferences, here's the same for the EFL Championship.

The players are arranged in decreasing level of shot volume and shots only from open play are considered. All headers are also omitted.

I've simply ranked the players in terms of distance and variability.

Average distance to goal from the origin of all shots has been used to determine those who take their efforts closest to goal. Currently this honour falls to Tammy Abraham among the top 100 individual shooters by volume.

Tammy also barely strays from this area of preference, bagging him the description of an archetypal goal hanger, along with Scott Hogan.

Variability in the range of shots a player takes has been determined by calculating the amount each shot taken by a player varies from that player's average shooting position.

In contrast, Jacob Butterfield only appears to try his luck from distance.


As a more general guide to the likely shooting distance and either single mindedness or willingness to to vary their approach, you can refer to the crib sheet below.


Data from Infogol

Saturday, 22 April 2017

Profiling Open Play Shooters in the 2016/17 Premier League.

Among the many attributes Charlie Adam has bought to the Premier League is his willingness to give it a go from distance.

Whereas most cultured midfielders, picking the ball up in their own half, would only consider their passing options, Charlie can often be seen checking out that the opposing goalie is actually paying attention.

Success from such extreme distance is invariably fleeting, if glorious, but Adam's opportunism has extended his average open play shooting distance in 2016/17 to nearly 29 yards from goal.

This places him top (or if you prefer, bottom) in the list of 75 greatest volume shooters when sorted by average distance from goal per attempt.

Andros Townsend hang your head in shame, (a mere 68th).

"Now if I could just tame this damn thing......"

Sadio Mane is currently the league's anti Adam.

The average distance to the centre of the goal for all open play shots attempted by Mane is around 16 yards, nearly half the average racked up by Adam.

These are useful figures, but it is also helpful to know if Mane is an habitual penalty area shooter or whether no one is safe from Charlie's optimism, but it is relatively rare to see him threatening to get on the end of a tap in from the six yard box.

We can attempt to answer this additional question about a player's shooting profile by measuring how far each of his attempts strays from his average shooting position for a particular season.

In Mane's case, the answer is not very far.

Out of the top 75 volume shooters in 2016/17, he's ranked 10 for sticking close to his average shooting position of 16 yards from the centre of the goal-line.

Adam by contrast is ranked 73rd out of 75 for sticking close to his average shooting position. He pretty much shoots from any and every where.

Here's the full list.


Aguero shakes out as the league's premier goal hanger. His average attempt is ranked the 7th closest to goal and he sticks vigorously to his hunting ground, with England new boy, Defoe running him close.

Mata is the second nearest average shooter, but his 15th ranked variability suggests that he's had a few tap ins from virtually the goal line and he's far from an classic six yard box poacher.

Pogba's a long distance shooter and a relatively high variability rank (low variability) suggests that long distance shots are his gig.

But the prince of long distance shooting with little desire to get into the six yard box is Spurs' Eriksen.

He has a very similar profile to Swansea's Sigurdsson, an ideal transfer target should Spurs require depth in this niche shooting role.

Data from Infogol

Saturday, 18 February 2017

Expected Saves Ageing Curve.

Everyone is probably familiar with the concept of expected goals, assists and saves by now.

A modelled prediction of the likelihood that a player will score, based mainly on the location and type of attempt is summed over a number of attempts and then compared to his or her actual output.

A player who scores say ten goals against a cumulative expected goals tally of eight is therefore considered to have over performed against their expectation.

The reasons for and the sustainablility of this over achievement can be  many and varied, ranging from the presumption that they are a persistently skilled finisher, they have had a hot, finishing run or the model is inadequate to fully describe the nuances of real football life. (Although the latter may be mitigated by running goodness of fit tests on out of sample data).

Instead of merely presenting expected and actual goal numbers ranked by over and under achievement, the same information can be presented in a more graphical form.

Rather than quoting cumulative figures, the granular nature of attempts is respected by using a Monte Carlo simulation for all shots and headers to produce a range and frequency of potential actual goals scored based on all attempts and these distributions are then compared to reality.


Here's a recent example that shows Chelsea and to a lesser degree, Spurs and Arsenal outstripping their simulated range of potential goal difference tallies based on the number and quality of chances they each have created and allowed in a possibly unsustainable manner.

The same approach may be used to describe, if not fully predict the performances of goal keepers.

In defining the difficulty of the task faced by a keeper it is legitimate to include post shot information, such as placement, strength and whether or not a shot took a deflection. These are additions that may not be repeatable from the shooter's point of view, but do better describe the reality of the keeper's task.


Here's a distribution plot for a number of Premier League goalies in 2016/17. Hull's Jakupovic's is most likely to have conceded 15 goals, rather than the nine he actually has and it is around a 1% chance that the average keeper described by the model would have performed as well or better.

By contrast Bravo is having a well documented torrid time at Manchester City, conceding nine more goals that the most likely peak of the simulated distribution of the attempts he has been asked to save.

However, the question remains as to whether these snapshots of "form" represent a longer term up or down tick in the keeper's potential future performance in his current environment or if they will regress towards less extreme levels going forward.

David de Gea is a couple of goals in credit against the model's expectation in 2016/17 and while this is not uncommon for United's keeper, it is possible to find runs of 50 attempts when would have been classed as under performing.



Notably in May of 2015 and February 2016.

Perhaps most usefully, this simulation approach may open up another way to look at the age at which a position generally reaches the peak of a particular attribute.

A variety of methods and curves have been used, See here, here and here. Grouping keepers by their rounded age when they did or didn't make a particular save and then seeing if this enlarged group of ages show a tendency to over or under perform may be another route.


Here's the under (red) and over (green) shot stopping performance of Premier League goal keepers, sorted by age over multiple seasons.

Notwithstanding the problems of survivor bias for older keepers in this type of traditional plot, there does appear to be tendency for keepers to over perform an attempt based model in their mid to late 20's, peaking at around 28 (which is consistent with other approaches listed above).

Their under performance relative to their older selves in their formative years and in the advanced stages of their careers compared to their younger selves is also typical of ageing curves in general.

This approach of course may be used for other performance related indicators across other playing positions.

Modelled data via InfoGolApp

Wednesday, 4 January 2017

How Dominant Are Chelsea's Halftime Record Chasers?

Chelsea travel to Spurs tonight needing a win to equal the record for the number of consecutive Premier League wins.

After 19 games, half a season, they have accumulated 49 points, beaten only by the 2005/06 Chelsea side, who gained 52 points and equalled by the 2003/04 Manchester United team.

In keeping with all of the traditional title challengers in the Premier League, Chelsea has put a lacklustre 2015/16 behind them and improved their expected goals at both ends of the pitch as the season has progressed.


It is an impressive reversal of fortunes, but it is also shared by their title challengers. Only Arsenal has shown a marked decline in their defensive metrics and of course Leicester, although the Foxes have been replaced by a resurgent Manchester United.

Chelsea are therefore worthy favourites to regain the title in May 2017. In simulations of the remaining matches, they are odds on to finish top of the pile.

But where does the current halfway house, Chelsea lie in the Premier League role of honour?

Points won is a natural starting point, but that neglects to account for the closeness and quality of challengers.

A better measure is the points per game won by Chelsea, expressed as a standard score, which attempts to account for how dominant a side has been using the characteristics of this particular season as a benchmark.


Chelsea (2016/17) is currently 2,06 standard deviations above the league average points per game prior to last night's results. Five teams are within 10 or fewer points of their current total, albeit after one game more, with the exception of their opponents tonight, Spurs.

By contrast, 2014/15 Chelsea had three fewer points than the current team, but had burned of a lot of challengers, with the exception of Manchester City. So arguably that was a more dominant mid term performance.

Similar comments apply to the other eight teams above Chelsea in the preceding table in terms of standard scores at halfway.

Check out the ultimate performance of the league leaders on Christmas Day based on their standard scores in this post from 2014