Sunday, 27 December 2015

Express Yourself.

I occasionally write about the efficiency of rugby union kickers using a model that has many things in common with the expected goals models in use in soccer.

The rugby edition uses fewer variables than its cousin, primarily kick location and footedness of the kicker, but it does differ in that many of the simpler attempts have expectations that approach 100%.

Therefore, many kickers have a near perfect conversion rates from a particular distance and angle.

One choice that is common to each model is how to express a player's over or under performance and typically the percentage above or below the expectation of an average kicker is used. Or occasionally a +/- differential expressed in goals or points in the case of rugby.

Repeatability is always a desirable quality of a metric that professes to capture aspects of player talent. Performance levels may fluctuate for a variety of reasons, injury, aging or simply random variation, but if a model is to be useful we would expect to see some season on season correlation between the metrics we are recording.

Expected goals models profess to show those who are performing above the general average expectation and often this is used to illustrate above average ability, although inevitably buoyed also by luck.

However, often these levels of over performance are not repeated in future seasons, inevitably calling into question the validity and usefulness of a particular model.

Inevitably these models lack all of the inputs to adequately quantify the abilities we are trying to measure, but part of the problem may be down to how the outputs are expressed, especially if some chances are highly likely to be successfully taken.

Imagine an idealised example from soccer.

A player takes five shots at goal, each has a 20% chance of resulting in a goal, so the average expectation is that he scores once. In the field he scores twice, so he's doubled his expected goals, scoring 200% of an average player.

In terms of goal differential, he's +1 goal.

His next five attempts are much better chances with an 80% chance of scoring, akin to the relatively automatic conversions I see in rugby.

He should score on average four goals, but buoyed by being dubbed a hot striker on BT sport he converts all five. He's perfect, he couldn't have scored any more.

In terms of differentials, he's now plus 2. The average player would expect to score 5 from his last 10 attempts , but our player has scored 7. He has been rewarded for his perfection by seeing his differential above average increase from +1 after 5 attempts to +2 after 10.

How does an index approach reward his recent spree?

After 5 20% attempts, his two scores were scoring at twice the expected average rate (2 goals instead of just 1). But once we include his arguably more impressive perfect five from five 80% chances it actually reduces his rate of over performance from twice the average to 1.4 times the average rate (7 goals instead of an expected 5).

Two ways of expressing the output from an expected goals model. Differentials reward a run of perfection by improving the rating from 1 to 2, while a rate approach decreases the rating from 2 to 1.4.

Intuitively the latter would appear flawed when applied to a player who attempts high value chances and the quality of your model notwithstanding, how you chose to express the output may impact on your chances of finding year on year correlations.

A third alternative is to run a simulation of the individual expectation for each attempt and see how many trials are as good or better than the result achieved by the player under scrutiny.

25% of true average players would score two or more from 5 20% attempts just by luck, but only 11% would manage 7 from the ten attempts further described.

A player who continually finds himself in this lucky subset, may simply be better than the average striker and using an approach that accounts for the distribution of the quality of his chances may not see his rate bounce around if he has a bout of successful goal hanging.

Tuesday, 22 December 2015

Home Field Advantage in the 2015/16 Premier League.

The 2015/16 Premier League season has been portrayed as a remarkable one in which the natural order has been upturned. Although as Simon Gleave points out, the only major dislocation from previous years is that Leicester and Chelsea has switched shirts.

Otherwise, the expected strugglers are struggling, the usual title contenders are heading the betting market, if not the actual table and a handful of unfancied mid table fodder has leapt into the top half of the table buoyed by good play, small sample size and a bit of good fortune.

A minor sub plot has been the near equality of home and away results. The raft of away successes early in the season highlighted the apparent supremacy of travelling teams and I suggested that a quarter of the season was insufficient to declare a sea change.

After 170 matches home wins are now back ahead of away victories, but the lead is a narrow one.

Expressed as a success rate, where draws are treated as half a win, away teams are running at 0.49 and home wins, unsurprisingly 0.51.

Historically, the trend is for decreasing levels of home field advantage, although there are inevitably peaks and troughs within the general descent. It therefore makes sense to see if a run of 170 matches where home and away teams came close to parity is unusual in the recent past.

HFA, on the wane.
Success rate for away teams in groups of 170 consecutive matches has ranged from lows of 0.34 to highs of 0.48 since 2002, excluding this season.

2008 began with away teams achieving a 0.46 success rate across the opening 170 matches with home teams outscoring their visitors by just over 0.1 of a goal per match.

But over the season as a whole, home teams were on average superior by 0.32 of a goal per game and away side had a success rate of 0.42.

So evidence for a closing of the gap between host and visitor, but not for parity.

Expected goals for 2015/16 confirm a period of matches where home and away teams have been closely matched, with the former outscoring the latter by just over one tenth of a goal per game.

Simulating all 170 matches results in away sides having an above 0.5 success rate in 16% of the seasons and a success rate as good or better than their actual record in 30% of simulations.

That leaves around 80% of simulated seasons where home teams have the higher success rate and 10% of seasons where that success rate is a healthy 56% or higher.

Again, a continued closing perhaps, rather than an elimination of home field advantage.

The causes of home field advantage, not just in soccer, is not well understood nor universally accepted. Even in the cossetted environment of modern soccer, travel may play a small role, as may crowd support.

And these factors may subtly change from season to season.

However, an important contributor to individual match outcomes is red cards. Eleven verses ten or even nine, is on average a big advantage to the numerically superior team.

Historically away sides suffer more red cards, not particularly because of referee bias, but simply because they are forced into making more tackles.

In 2014/15, Premier League home sides lost 600 playing minutes to red cards compared to 1000 for their visitors. The previous season it was broadly similar, 520 minutes lost by the hosts and 1070 for the away team.

So far in 2015/16 this potent, but relatively rare event is favouring the visitors. Home teams have lost 460 minutes to red cards spread across 11 matches, seven of which have been lost and two drawn.

Away teams have lost 420 minutes.

Home teams may have been unlucky so far based on expected goals in just 170 matches. Refs may not continue to find fault with the home players in a way that is unusual in recent seasons and home field advantage may continue to be a depreciating, but real feature of the current Premier League.

Thursday, 17 December 2015

Chelsea Win In Cyberspace.

Chelsea's defeat at Leicester on Monday night was hardly made more bearable by their moral victory over the title leaders on a myriad of spreadsheets. It certainly failed to impress Roman Abramovich.

From the context-less reaches of hyperspace to the sports pages of the Guardian, the floundering champions gathered three virtual league points in a hard fought, but decisive, expected goals victory that barely required more than one decimal place to confirm their eventual superiority.

In a straight summation of expected goals it is difficult to find a model that didn't rate Chelsea above Leicester on the night despite the Foxes' 2-1 win.

For those who watched the match (and anyone who quotes expected goals or some such, is automatically assumed not to have bothered), the expected goal figures do not pass the eye test.

Part of the problem may arise from incomplete models.

The game was level until the 34th minute, whereupon Leicester took a lead that they increased, saw it reduce, but subsequently kept.

Leicester had five shots on target spread across the 2nd minute to the 48th and none thereafter.

Chelsea had no shots or headers on target until the 62nd minute and four in total, ending with Remy's 77th minute headed goal.

So a "game of two halves".

Game state, score effects, or how ever you wish to describe them eventually alter a side's approach to the match. Risk, reward subtly change based on score line, abilities and time remaining.

A side chasing a deficit appears to see their chances of scoring reduced by around 15% compared to the same opportunity from a side that has the lead. Possibly due to different levels of defensive pressure throughout the chance creation process.

So Chelsea's chances may not have been as gilt edged as they appeared merely from shot locations.

Also closely related events are not additive and Chelsea's two opportunities around the 62nd minute where close enough to have only reasonably been able to deliver a single goal.

If you include these factors on your spreadsheet, the game remains with Chelsea, but they only win around 25% of simulations, with Leicester taking 20% and avoiding defeat in the remaining 55%.

In Cyberspace no one can hear Mourinho scream. (credit @lubomerkov)
Expected goals is a flexible tool, rather than a true reflection of what the score should have been in the context driven environment of a single 90+ minutes.

It can be used to illuminate the effects of last throws of the tactical dice, such as when Chelsea sent caution to the wind.

For example, once Leicester had a two goal lead we can hazard a guess as to the likelihood that Chelsea's mini barrage of chances could engineer a comeback. Leicester hold on for a win around 40% of the time and draw a similar percentage of simulations, despite not troubling Courtois in their second half display.

We can equally ask how likely Leicester were to score two goals without reply when they were on the offensive front foot during the first 48 minutes and whether that outcome was typical for such a first half performance.

In reviewing a single game exp goals just adds another layer of information, it's as useful or useless as bringing us news about the dressing room psyche or attempting to second guess a manager's in game intentions, however eloquently and subjectively they are presented.

Thursday, 10 December 2015

Monk Loses the Winning Habit.

At the end of August 2014, two sides, Swansea and Chelsea were vying for the lead in the Premier League table with a 100% record from three matches. Aston Villa were third.

So it represented business as usual for Mourinho and a validation of the soon to be written raft of complementary articles about the bright new manager in charge at Swansea, Garry Monk.

As of yesterday two of those three teams have seen managerial change and Mourinho's tenure hangs by a Champions League thread.

Monk's hot start to the season was prolonged enough to earn him a place on the Daily Telegraph's shortlist of six for manager of the season, along with the beleaguered Mourinho and the subsequently dispensed with Sam Alladyce.

A near 50% attrition rate in the blink of an eye.

It is becoming commonplace to increasingly acknowledge the role that luck plays in shaping a relatively short, skill based competition, such as a Premier League season.

More data hungry models in late 2014 were already suggesting that Swansea had been relatively shot shy and fortunate even as they remained buoyant, only slightly removed from their August heights,

An abundance of 1-0 wins, seven in total by May, further hinted at a solid mid table side inflated upwards by random, most likely non repeatable events.

Premier League managers, always looking over their shoulders.
Outsiders are never privy to the inner workings of the professional relationships with a football club that may drive change, but it was particularly unfortunate for Monk that an immediate see-sawing of narrowly contested 1-0 games fell so badly for him in 2015/16.

Five such league defeats and a cup exit since August 2015.

Extremes, such as Monk may have benefited from in 2014/15, tend to be less extreme in the future, but fueled by euphoria and congratulatory broadsheets, they tend to become the normal expectation from both the fan base and employer.

Swansea's 14 actual points through 15 games are around a win shy of their most likely total based on shot model simulations and they have created enough to have had a 1 in four chance of bettering the 20 or more points that would have invited a more prosperous New Year.

The reality is probably that, in part at least, a straight comparison has been made between the 0.9 points per game this term and the near 1.5 points per match in 2014/15 and knees have been jerked.

Random variation gives and it sometimes cruelly takes away.

Wednesday, 9 December 2015

Rebranding Stoke.

The view that you're as good or bad as your last performance usually flourishes in the online club fora and the soundbite world of football punditry.

So it was hardly surprising that Stoke briefly rose to the dizzy heights of everyone's favourite second team following their comprehensive and visually pleasing defeat of second placed Manchester City on Saturday lunchtime.

Stokealona or my own favourite, Inter City, briefly trended.

The tendency to stereotype teams and players based often on stale evidence from seasons long gone, is a trait that continues to surprise.

On Saturday the realisation gradually dawned on the BT commentary team that even players who remained from the rump of Tony Pulis' ingeniously devised, but widely despised system, could actually participate in a passing based evolution.

Quotes from opposing managers who should really have known better suggesting "We know what to expect from Stoke", while packing the team bus with six foot plus defenders, was amusingly familiar even while the Hughes revolution stumbled uncertainly from possession poor to possession normal.

Pass completion rates for the likes of Cameron, Whelan and Shawcross were poor under Pulis not because those players couldn't pass the ball, but because they were required to implement an approach that at its most extreme coveted distance over retention.

Following the fairly amicable parting of the ways, Pulis' brand of survival at all costs football swept through the lower reaches of the Premier League, first at Palace, later in a delicious irony at WBA, sending the passing stats of competent players plummeting in the process.

Raw shooting differentials failed to spot the trade off between shot quantity and shot location, as Stoke under Pulis invited the opposition to shoot frequently from distance, while they bundled in sufficient goals at the other end from just inside the six yard box.

Shaqiri, along with Afellay, Arnautovic, Bojan, Joselu and Muniesa, "He plays for City!".
Hughes' Stoke has partly borrowed from the Pulis blueprint, recruiting flawed jewels from a wider market. Careers marked by injury, under achievement or a temperament that prefers to invite a post game red card, rather than celebrate a brace of match winning goals, has allowed the assembly of unprecedented talent in the Potteries.

But while plaudits are a welcome change, Stoke's longterm prospects should perhaps be viewed in the context of their accumulated stats. Just as Pulis' Stoke were legitimately better than a swift glance at their shot differentials implied, Hughes' infinitely more entertaining version may be better judged on their statistical achievements this term.

Individual match performance will invariably fluctuate. One or two perceived improved results do not make a trend and Stoke are more usually to be found in the lower half of current league simulations, a handful of expected points below their actual current total of 22.

The Hughes revolution hasn't taken Stoke, puns apart, into the higher echelons of European football. they've merely entered the Premier League mid table tactical mainstream.

Thursday, 3 December 2015

Diego Costa, Head & Shoulders Above the Rest?

There have been some great stats on potential finishing ability posted here on Dan Kennett's twitter feed. Naturally the focus was on Liverpool players, particularly Daniel Sturridge and the post proved timely following Wednesday night's 6-1 away victory at Southampton.

Identifying different levels of finishing ability is always going to be challenging in a sport where scoring opportunities are relatively rare.

Squad rotation, substitution and injuries often deprives strikers of playing time and few manage more than five attempts per 90 minutes.

Even in Dan's comprehensive list of the highest achievers, only a handful of players have exceeded 10,000 minutes since 2011, the time it allegedly takes to master a skill.

Topping the list of currently active Premier League strikers is the recently rested Diego Costa. A 23% conversion rate has been achieved in less than 100 attempts, a small sample size compared to the remaining players in Dan's list, who average nearly 300 attempts each.

Therefore, although Costa's conversion rate is well in excess of Aguero and Sturridge, his nearest EPL challengers, there must be a suspicion that his 23% rate is unsustainable, long term compared just under 15% for the other two.

Who Needs a Fit 16% Striker Every Week!
There are a variety of approaches that are currently available try to identify finishing skill.

Expected goal models add an extra level of insight. But they are data hungry and potentially susceptible to rare events, such as deflected shots. Ultimately they only measure a player's deviation from the norm expected by that particular shot based model, which itself is almost certainly incomplete.

It is also rare to see such model based analysis address the likelihood that any over or under performance occurred merely by chance.

If instead we assume the chances presented to these out and out strikers are broadly similar, we can see if the spread of conversion rates is wide enough to imply differing levels of finishing skill within the chosen group.

This approach focuses more on the role of random chance and incorporates sample size, while assuming chance quality is similar for each player.

In short, it is a flawed, polar opposite approach to that of an equally flawed shot based model.

Regressed Conversion Rate for EPL Strikers 20011/12-2015/16.

Data Credit - Dan Kennett.

Player Regressed Conversion Rate.
Costa 0.149
Hernandez 0.148
van Persie 0.147
Aguero 0.147
Sturridge 0.147
.............................. ..........................
Defoe 0.142
Lukaku 0.141
Suarez 0.141
Ba 0.139
Bony 0.139

The spread seen in Dan's numbers are just extreme enough to conclude that there is some evidence that finishing skill may exist. 

Costa remains the highest rated finisher, but his numbers are regressed by over 90% towards the group average because of his relatively low number of attempts. We have to go to the third decimal place to elevate him above the next four highest players, including Sturridge.

Similarly, the gap between the most and least efficient finisher is now just 1%, rather than the 12% seen in the raw data.

It would be unusual to see a wide range of true finishing abilities at the elite level of a professional sport. 

There may be tentative evidence to suggest that a narrow gap does exist (perhaps traditional scouting could contribute the eye test) and Daniel Sturridge is towards the top of such a pecking order...but can he do it on a bitterly cold night in a minor cup competition in January at the Britannia Stadium!