Friday, 31 May 2013

Football In Hay-On-Wye

The Welsh town of Hay-on-Wye is an unassuming dot on the map just over the Wales England border in Powys. Difficult to find, it is even harder to leave. The brooding presence of the Brecon Beacons, ten miles to the west makes the small market town a sat nav black spot and those who manage to arrive by way of a major road often leave via a more singularly tracked lane, following antiquated road signs indicating Hereford 25 miles (via toll bridge). Apple orchards or hop fields abound.

The town's worldwide fame is due entirely to the shops. Every other one is either bookshop or a food shop or quite often both. Every year in May, navigating the town becomes even more difficult when Hay hosts a ten day literary festival. Giant bubble machines and make shift organic kitchens spring up on every street corner, every available area of green space suddenly sports at least one yurt and the town's temporary population skyrockets. And when Abergavenny has done with the rain, it sends it over the Brecons to remind everyone they are in Wales.

Deckchairs at Hay-on-Wye, but sometimes without the sunshine.

The festival site is a mini Glastonbury, with Noah and the Whale headlining instead of corporate megaliths such as the Rolling Stones. Daniel Dennett, Will Self and Rowan Williams vie with the likes of Michael Vaughan and Darra O'Briain to attract audiences to the many tented lecture halls.

This year, following in the footsteps of that other famous American, Bill Clinton, Chris Anderson and his co author David Sally filled the Digital Stage to promote their new book on football analytics called "The Numbers Game" in a game of one half (with 15 minutes injury time for questions) refereed by Clemency Burton-Hill.

Spotting Chris strolling with his family around the festival site was relatively easy as he was one of the few in attendance who seemed capable of saving a penalty kick. He displays in real life the same passion and good humour that is the hallmark of his excellent blog, "Soccer By The Numbers". Within five minutes of meeting Chris, my wife, a rugby fanatic with little interest in the beautiful game, had demanded I purchase another ticket. By the end of the evening she was interrogating David Sally on the relative merits of the first and second goals in football.

I won't describe the lecture or review the book, because all you really need to know is the book is essential reading for anyone remotely interested in football (and even die hard rugby fans). The arguments put forward are relentlessly, yet simply constructed, leaving you convinced and eager to learn or discover more.

Football may lag behind other sports in the amount of data that has been available and the scale of analysis that is in the public domain. But one accolade it can now claim is that football has a book that is worthy of a place in the top three books written about sporting analytics.

"The Numbers Game" by Chris Anderson and David Sally.

It really is that good.

Sunday, 26 May 2013

The Last Days Of Tone.

Often those nearest to a situation are the first to realise that something is wrong and few visitors to the Britannia Stadium in 2013 could fail to sense that Tony Pulis, perennial runner up behind Sir Alex Ferguson in the "most secure manager" stakes, was a dead man walking.

The purchase of a footballing, "pass first and pass often" midfielder in Steven N'Zonzi had merely whetted the appetite for a more pleasing and less risk averse football of the type often displayed by visiting teams to the Potteries, but hardly ever from the hosts. The heady "all in it together" days of 2008, when news of newly promoted Stoke's "relegation" reached me while watching county cricket at Derby after a mere 45 minutes of the season had elapsed were long forgotten as Pulis struggled to implement the long promised change in footballing direction.

0-3 to Bolton at halftime at the Reebok, was sufficient evidence for Irish bookmaker Paddy Power to pay out on Stoke's certain demise. Once in real money and once in mint flavoured ice cream by way of apology as City comfortably avoided the drop by finishing above their opening day conquerors.

The methods Pulis used have been well chronicled here. Super efficient, (by virtue of their shot origin) but relatively rare finishing and a team committed to defence, (mostly without the ball) was a small price to pay for keeping a mid table Championship side, at best in the top flight. Suddenly, "long throw" no longer immediately conjured up long forgotten footage of Ian Hutchinson's windmill arms.

If cloth was initially cut accordingly, the promise of a three (later four) year plan to play more expansive football was mooted. Team's with limited resources, aims and ultimately achievements often gather most of their points in low scoring encounters, whereas those of grander ambitions display a more rounded ability by gathering greater proportions of their points in higher scoring contests. And as such Stoke's first, memorable taste of home Premiership football was a sham. A 3-2 win over Villa contained typical Potteries goals, one from a set piece (a penalty), one from Ricardo Fuller's rare individualism and an injury time winner via a six yard header from a long throw. But winning points in high scoring affairs was never part of the initial game plan.

Stoke's points gathering profile was stuck firmly in the typical rut of a team destined perennially for the bottom five places, although it was testament to Pulis' implementation of his extreme brand of football, that he never once finished as low as this. Only once, in 2010/11 and including a run to the cup final, did Stoke's exciting, dual wing play lift their points scoring profile closer to the midtable levels that they were actually achieving. A seasonal quest for safety now came with a few pleasing extras.

 Adam and Crouch prepare to Implement Plan A one last time.
However, the source of Pulis' initial success proved the downfall of his greater ambitions. The Premiership is unforgiving especially outside the top six, where random fortune can see almost any side relegated. It is not an arena where change, especially dramatic change can be easily applied. The purchase of more quality and the change to a more pass orientated style, showed willingness, but Pulis is rightfully proud of a record that has never seen his sides relegated and ultimately, when points became hard to come by, the cage won out over the more elegant pass and go.

N'zonzi passed like a "normal" Premiership midfielder ( see Devin's Opta post), was voted Stoke's young player of the year and promptly asked to leave. Tony Pulis, however, beat him to the door, but not before maintaining his record of keeping sides afloat in their chosen division. Fittingly, the random element in football favoured the Welshman in this last difficult season as Charlie Adam, an upgrade with no real role, scored a paltry three goals, but scored each one in 1-0 home wins. Three became nine and Stoke survived in 13th, but with the points gathering profile of a bottom five side. Back to the beginning.

The parting of the ways was both amicable (in public at least) and inevitable and Tony Pulis will remain a major figure in Stoke's 150+ year history. Stoke are currently without a manager, but more importantly, a side with the rump of two distinct and seemingly incompatible styles. Sorting out the latter will prove much more urgent and difficult than resolving the former.....Although "Hughes Out" banners have already been moved on from the west car park, in anticipation of any imminent announcement.

Tuesday, 21 May 2013

How Important Are "Six Pointers"?

Imagine an alternative end of the season. Wigan and Aston Villa find themselves locked on 40 points going into week 38, trailing a host of teams, each with 42 points. So there is an anxious final 90 minutes in store, not only for Wigan and Villa in the fight to avoid the Premiership's final relegation spot, but also for the raft of sides two points in advance of them.

Except because of the vagaries of the fixture list only Villa and Wigan are at risk of relegation. Villa visit Wigan on the final Sunday and so everyone else is perfectly safe because the two lowly rivals cannot both get three more points to overhaul the pack.One of the two final day rivals is guaranteed to finish the season in the final relegation spot.

Such considerations quickly become major factors when simulating potential points totals for various teams or scenarios. The fixture list is thoroughly entwined and if Stoke defeat Arsenal at the Britannia in one particular simulation, the result for Arsenal on their travels to the Potteries for this particular iteration must also reflect this.

The hypothetical Wigan/Villa scenario boils down to a winner (or draw) takes all, but the importance of so called "six pointers", where teams who are likely to be locked together in the final table, has long been recognized. The most eagerly dissected head to head confrontation this season involved long term rivals Arsenal and Tottenham and with all title hopes extinguished by January, their fight for fourth place and Champions League football in 2013/14.

With all due respect to Everton, who found themselves sandwiched between the two London sides at the end of January, I am going to merely simulate the post transfer window campaign from the perspective of Arsenal and Spurs, taking particular account of the effect on the simulation of the result of the North London derby played on March 3rd at White Hart Lane. A true "six pointer", with Arsenal still trailing Spurs by 4 points at the time.

Once the window shut in January both sides had 14 games left. Arsenal, it would transpire had the easier run in. The median position of their opponents at the time of each match during the final third of the season was 12th compared to a more elevated 9.5 for Tottenham. That advantage was partially counter balanced by the use Spurs had already made from a less onerous set of fixtures. They led Arsenal at the start of February by three points after each side had played 24 games.

The race for fourth, therefore appeared very tight and so it proved. In the simulations, which account for strength of schedule and simulate matches played from February onwards, Spurs won 51% of the races where there was a clear league points winner. However, 6% of the races ended with both sides tied on league points and Arsenal's already superior goal difference at the start of February would make them much more likely to win the majority of these simulated contests on the tie breaker. If we take this best case scenario, Arsenal now grab fourth spot in 52% of the simulations.

So how pivotal in the simulations was the head to head encounter in March? In simulations where Spurs beat Arsenal, their share of the Champions League winning spoils rose from around 50% of the simulations to nearly 70%, with almost the absolute reverse being the case on the occasions where the Gunners triumphed. A drawn game saw Tottenham and Arsenal winning virtually identical percentages of the trials.

Bale's opening strike in the North London derby would ordinarily have led to Champions League football.
So in simulations, a head to head result from a near level break in January appears to give a large boost to the winning side. Spurs did in reality beat Arsenal in March, but they still came up short in May. Arsenal were always likely to gather more points than Spurs during the post January run in, they did so in 73% of the simulations, but Spurs' win at White Hart Lane would, more often than not have been decisive. Expensive, post Europa losses to the likes of Fulham may fuel another debate and combined with Arsenal's impressively fine run in, the Gunners can look forward to high quality European action next term.

I'll flesh out the numbers in a later post, but more good stuff based around the race for 4th spot can be found here from Simon Gleave , here from James Grayson and here from Zach Slaton.

Sunday, 19 May 2013

Possession, Opponents And Match Outcome.

One of the early additions to the usually quoted football stats of goals scored and conceded was the amount of possession enjoyed by each side during a match and as such it is almost universally quoted today. It is therefore understandable that much effort has gone into determining the connection between possession and success or otherwise on the field.

Naturally from a supporter's viewpoint it feels more secure if your own side has the ball and similarly the ability to keep possession is often, quite rightly connected with talent and skill. If we further include that the most successful and well resourced teams in most major leagues are largely based around a passing and therefore possession based style it is easy to see how increased amounts of possession became intertwined with an increasing likelihood of achieving a favourable outcome.

However, it is increasingly becoming apparent that the relationship between possession and wins is far from straightforward. In this post  I outlined an earlier view that possession merely tells you how long teams spent trying to do certain things on the pitch. It doesn't tell you what those actions were, it doesn't tell you how successfully the actions were translated into really important things, such as goals  and it doesn't tell you how effectively each side carried out those tasks. Game states, tactical approaches and relative skill levels between the sides  and of course randomness, decide match outcomes and how these are played out on the day decide the largely secondary statistical measure that is recorded as possession.

Extremes can often be used to illustrate more subtle differences which appear in all matches but are difficult to spot when the teams play with similar natures and intent. Barca are unlikely to ever compete in everyone's dream matchup against Stoke at the perennially wet and windy Britannia, but pass loving Arsenal provide an adequate proxy for the Catalan giants. The outcome is fairly predictable at the Emirates, but less so at Stoke, where the Potters often claim all three points. But one universal constant persists, namely win, lose or draw, Arsenal always have much, much more of the ball than Stoke.

The match outcome is decided by the interplay of Stoke's direct, set piece centric approach, where defending is a chore undertaken largely without the ball, pitched against Arsenal's weaving, intricate brand of passing. Possession stats merely fall into place at the end of the game as a by product. So if possession in the case of Arsenal and Stoke contests is a predictable variable based around team styles, that his partially hard baked into their contests and is largely independent of match outcome, how do other less stylistically extreme matches fare?

Below I've plotted the amount of actual possession enjoyed by Stoke and Arsenal in every game from 2011/12 against the average possession of their opponents over a representative selection of matches.

The trend is clear and even more prominent across other EPL teams. A side's share of match possession is tied to the historical tendencies of both itself and it's opponent. For example , when Stoke meet a side which also employs a tactical approach that shuns possession, then Stoke's share rises and when Arsenal face similarly possession loving sides their share falls. In short, the possession battle is decided, largely before a ball is kicked and while it can be shifted slightly by such things as red cards, venue and scoreline, it is partly predictable with reference to the past styles of the competing teams.

The Stoke plot particularly highlights the futility of trying to connect possession stats to match outcome without reference to a side's preferred, and presumably most effective playing style. Stoke won eleven games in 2011/12, all achieved with less than 50% possession and they dominated possession in three games, winning none of the three.

League wide the connection persists. Pregame, historical possession stats for both sides, along with venue can predict individual match possession relatively well, as demonstrated by the plot below. Red cards in particular produce distorted extremes, but prior knowledge of each side's possession history leads to an adequate estimation of how often each side will see the ball in a single match.

Stoke have managed to secure virtually all of their Premiership wins with less than 50% of match possession, and to sever the last remaining connection between possession and performance, we need to see if more possession relative to a side's normal, average share, as opposed to the lion's share of possession, correlates to more successful results.

Spurs were in a more celebratory mood when seeing less possession in 2011/12.
 Defining performance over a single match is difficult because of the discrete nature of 3, 1 or 0 points awarded for each possible outcome, but we can partly overcome this by seeing if above average performance compared to pregame estimates are seen where possession figures are also above average for individual teams.

Stoke won 85% of their pregame points expectation combined in matches when they had above average( for them) possession, but 125% of their pregame points expectation when possession fell below their typical average. So more possession was generally an indication that Stoke were doing time consuming actions that, for them, were connected to under performance. Other teams shared this trait, from Everton, Spurs and QPR to both Manchester clubs.

Team's spread the length and breadth of the EPL table from Arsenal, Chelsea to Bolton, WBA and Villa demonstrated the reverse preference. Possession figures above their average led to better than average results and the less they saw of the ball the poorer their relative performance became.

Once again we are looking at what teams did and how long those actions took and ultimately how successful and lucky they were with their changing approaches. Possession largely appears to be a statistic that more defines a side's stylistic approach to gaining, defending, retrieving a desired result. As a stand alone number, it's usefulness fails to survive the inevitable lack of more detailed context driven investigation.

Saturday, 18 May 2013

What Chance A Premiership Playoff Game At Villa Park?

Arsenal and Chelsea go into the final Sunday of the Premiership season with the prize of automatic qualification for the next year's Champions League still to be decided. Chelsea hold their fate in their own hands and victory at home to Everton will render Arsenal's result away at already safe Newcastle an irrelevance.

However, with final placings decided firstly on points gained, then on goal difference and finally on goals scored, there is a possibility that Chelsea and Arsenal could end up stalemated for third place.

Chelsea Result. Arsenal Result. Chance Of Both Occurring.
0-0 2-1 1 in 120
1-1 3-2 1 in 450
2-2 4-3 1 in 14,000
3-3 5-4 1 in 1,600,000
4-4 6-5 1 in 500,000,000

The Premier League have taken the possibility of Arsenal and Chelsea ending up level under all three tiebreakers so seriously that they have provisionally scheduled a playoff game to be played at Villa Park on May the 26th. Above I've listed the combinations of results that will trigger Villa Park to prepare for their biggest game of the season. Of the relevant scores, Arsenal winning 2-1 is their most likely outcome and Chelsea being held to a 1-1 draw has the highest probability of occurring for them.

Fortunately for the fate of Chelsea's American tour, which straddles the chosen date for the playoff, these two most likely individual outcomes don't pair up. A goalless game at Stamford Bridge and a 2-1 win for Arsenal away at Newcastle is the most likely combination and the cumulative chances of a 39th game for both sides comes in at around 1 chance in 90.

Stranger things have happened on the final day of the Premiership season. But for those contemplating a much more outlandish finale to the campaign, a 6-6 draw for Spurs coupled with a 15-0 defeat of Arsenal by Newcastle would result in a playoff between those two sides for the final Champions League spot. The 6-6 draw alone carries around a once in 18,000,000 chance.

Thursday, 16 May 2013

The Art And Talent Of The Corner Kick.

Stoke City's innovative style of play, involving making the very best of the limited assets available wasn't merely restricted to the Delap/set piece routines so familiar to recent Premiership audiences. Paul Maguire, floating in the near post corners and Brendan O'Callaghan providing the headed goal or a worst the delicate flick on were a staple ingredient of match days at the Victoria Ground in the early to mid eighties. Both are still fondly remembered, especially O'Callaghan, who announced his Stoke City debut with a goal within 10 seconds, as a substitute.....from a corner. However, players move on, tactics change and strategies are developed to combat every successful system and in Stoke's more recent non Premiership past, they were regarded as a side that couldn't buy a goal from a corner. Once back in the top flight, Stoke reacquained themselves with the joys of scoring from set pieces in general and corners in particular.

If converting corners into goals is a talent that is distributed unevenly between teams and therefore, ebbs and flows across the decades, we should be able to see both repeatable team traits across seasons and conversion rates that diverge from those expected if the process was simply centred around the league average in a purely random manner.

The average goal conversion rate from corner kicks in 2011/12 was just over 3%. Highs of 5.5% were seen at Manchester City, lows of zero percent at Villa Park and an average of 200+ corners were attempted per team across the Premiership. Equality of opportunity was guaranteed for each corner at the outset of the kick because they are all taken from near identical pitch placements and the spread of the individual team success rates polarized by City and Villa over the last completed campaign, implies that some teams are more talented corner takers than others. If we attempt to account for the random variation component, we are left with conversion rates that are more indicative of the actual talent of each team and this figure is more likely to predict future performance than the actual conversion rates recorded by a side.

Manchester City were likely to have been good and lucky in 2011/12. So a conversion rate  nearer to 4% than their actual figure of 5.5% should really be entered against their name and likewise a near 2% conversion rate is a more accurate legacy to Villa's corner converting prowess for the 2011/12 season. It is probable that they experienced the perfect storm of being both generally poor takers of a corner and unlucky and improvement through a variety of routes should have been expected in 2012/13.

The eventual champions scored at least one goal from a corner once in every three matches and their closest challengers, United needed on average an extra match to do likewise. Overall, by Opta's definition, two matches out of every seven in 2011/12 saw at least one goal scored from a corner kick situation.

If we move on to the defensive side of the ball, the same effects are seen. The actual observed conversion rates allowed by each defense is more spread out than you would expect if each defence shared an identical ability to defend corner kicks. Also, by dragging extreme results closer to the league average and giving more weight to the raw figures recorded by sides which faced larger numbers of kicks, we produce numbers which are more predictive of future performance.

The poorest five performers at defending corner kicks in 2011/12 occupied the bottom five slots in the final Premiership table. Corner conversion has often been an avenue to excel at on the route to preserving top flight status, but by neglecting their duties at the other end of the field and leaking goals from corners at rates of at least one goal every three games, both Bolton and Blackburn's ultimately suffered relegation. Although Wolves managed to narrowly see off the trifecta, they were only marginally better than Wigan and QPR and also experienced the first of multiple demotions.

Interestingly, the season on season correlation for defensive performance is stronger than the corresponding attacking situation. Possibly the ability to make something happen (score from a corner) attracts more attention than the ability to prevent something from occurring. Therefore proficient corner scoring teams are quickly identified and schemed against in future meetings (Delap's longthrow survived as a potent weapon for barely three Premiership seasons and only remained effective thereafter in the unfamiliar territory of the cup competitions).

Villa employ a novel corner defence by attacking the ball.
Such attributes as aerial ability is an obvious advantage when defending and attacking against corner kicks and there is a weak correlation between corner competence at either end of the field. Above average converters of corners are slightly more likely than random to also be above average defenders of such a set piece. However, the weakness of the relationship hints at the diversity of talents that comprise a successful corner. An excellent delivery, as provided by a Robin van Persie or a Paul Maguire doesn't help his side defend a corner, but a good header of the ball is an asset at both ends of the pitch.

Raw conversion rates can hint at different talent levels of corner conversion and a relatively strong season on season correlation also implies a repeatable skill is present. But a deeper analysis of corner strategy requires isolation of every associated skill, from ball delivery to off the ball running and even the semi legal art of blocking opponents. As a valuable scoring method, a 3% conversion rate may not initially impress. However, as @analyseFooty suggested in relation to this post, if we consider a corner kick as just another pass, compared to an average pass, it is a devastatingly efficient one!*

All data is taken from the MCFC release of 2011/12 data in conjunction with Opta.

*(on average an EPL team makes 450 passes a game and scores 1.3 goals, of which about 70% are from open play. Therefore conversion rate per pass is of the order of 0.2 to 0.3%. In 2011/12 over 4,000 corners produced 131 goals, therefore, conversion rate is around 3%. Even allowing for general passes which don't carry attacking intent and accepting that not every goal scoring corner is a first contact score, corner conversion rates still easily hold their own).

check out Ravi's site at

Tuesday, 14 May 2013

Game States And Team Quality.

In my previous post I looked at how Arsenal's attacking and shooting tendency was tailored towards the particular game and scoreline states in which they found themselves over the 2010/11 season. Arsenal were the pregame favoured team in virtually all of their 38 Premiership matches in that season and it was only in the four matches where they traveled to Liverpool, Chelsea and the two Manchester sides that they went into the contest as underdogs. Consequently, the scoreline state and game states mirrored each other fairly well. A lead was obviously a good game state, a draw could almost always be improved upon compared to pregame expectations and when trailing, the Gunners had both the incentive and almost always the potential ability to turn the scoreboard around.

However, in the case of more mediocre sides, these correlations aren't always as clear cut, especially when the game is stalemated.

The final 2010/11 table was a fairly typical example of the recent Premiership. Manchester United were comfortably crowned champions, Chelsea, Arsenal and Manchester City followed them home in a tight group of three and then came those aspiring to qualify for the Europa league. The mediocre EPL sides then begin to appear and going into the final round of matches just seven points separated 9th place from 19th. Therefore, Aston Villa, 13th after 37 games and 9th a game later could reasonable be chosen as a typically, run of the mill side.

Villa were the favoured side in just 17 of their 38 games and unlike Arsenal, there would likely have been many more games where a draw would have been an acceptable result for the team from the West Midlands. So where Arsenal's approach would be consistently to tend towards pushing for a go ahead goal, the connection between Villa's scoreline state and game state is likely to be more ambiguous. A current point away to Fulham was most probably acceptable, (although they may harbour thoughts of capturing all three), but one at home to ultimately relegated Blackpool would be much less acceptable. In short, the scoreline states don't coincide as neatly with a side's perceived game state in the case of Villa compared to Arsenal.

Similarly when Villa trailed, their ability to match the desire to improve the scoreline with their capability of achieving that aim is also unlikely to tally with that of Arsenal. Villa trailed at some stage on 19 occasions, against teams who were as determined to hang onto their three points as Villa were to retrieve something from the match. So the change in scoring effort from Villa is likely to be a function of these shifting priorities shown by each side. When the same thing happened on 14 occasions to Arsenal, the Gunners had a more potent attacking force to call on for a more concerted retrieval approach than did the Villans in their various contests.

As with the previous Arsenal analysis, I've used the x, y data of the shot to determine a goal expectation, which in turn leads to an expected long term scoring rate in different scoreline states. At worst, this type of analysis can give an enhanced picture of how Villa tried to play during different phases of matches in that season and we may also be begin to see the interaction between teams without painstakingly plotting minute by minute changes in game state.

Aston Villa's Goal Expectancy From Chances Created in Various Scoreline States.2010/11.

Scoreline State. Ahead. Level. Behind.
Goal Expectation From Chances Created. A Goal Every 72 Minutes. A Goal Every 52 Minutes. A Goal Every 58 Minutes.

We see a similar trend to that exhibited by Arsenal. Chance creation and long term scoring rates decline when Villa led, compared to other scorelines. Shots were less frequent and marginally of the lowest quality on average. Interestingly, potential scoring rates are actually highest when games were level, Villa were creating best and most frequent chances in this scoreline state. Numerically, creation rates only fell away very slightly when they trailed, but quality was noticeably poorer.

All Hands On Defence As Villa Protect A Lead.
These changing rates coupled with those produced by Arsenal in the same season, hint at the changing dynamics of a football game, where desire and capability are pitched against opponent ability and intent. The game state at level scorelines is likely to be less clear cut in the case of Villa compared to Arsenal. In the former, both sides may be still be actively seeking a win, whereas the opponents facing Arsenal are likely to be more uniformly engaged in defending their point. In short, when drawing Villa are more likely facing teams who are also willing to take a chance.

Once Villa trail the eventual priorities are more clear, but as Villa lack the attacking expertise of the top sides, exemplified by Arsenal, their ability to create valuable chances may now be less than they were capable of achieving in a more open situation where both sides may still have been trying to break a stalemate.

Overall the Villa figures show a similar general trend as Arsenal in 2010/11. Both sides were at their least dangerous in goal scoring terms when already ahead. The differing potency of both Arsenal and Villa at level or trailing scoreline states may merely be simply an artifact of sample size or it may represent a genuine difference between the very best in such situations and the mediocre.

Often in football analysis, such as the relevance of possession, the characteristics of the very best overwhelm the tendencies of the less gifted majority, in turn hiding a more complex reality and this may be the case in determining game states for different teams under the same scoreline, especially stalemates.

Ultimately, game states will have to be defined by the non trivial interplay of relative team quality, current scoreline and time remaining.

Saturday, 11 May 2013

Cranking Up The Goal Expectation When Doing Badly.

A football match is a contest that is constantly and subtly changing in many ways. Goals are the obvious major events that alter the balance by which teams either seek to consolidate an advantageous position or retrieve a potentially losing one. Goals come about through a combination of skill, random chance and no little effort and the varying degrees to which teams choose to attempt to impose this factors on an opponent determines how successful they will be. In this post  I looked at how trailing teams are more likely to score than they had been previously when they concede the lead.The amount of time remaining is also a contributing factor, but sooner rather than later every team will launch a concerted effort to retrieve a losing position. They don't automatically become the most likely team to claim the next goal, if there is one, but they do, on average become more dangerous in attack than had previously been the case.

The extra potency shown by such teams could previously only be quantified if their efforts produce a goal and over large enough samples their scoring rate when trailing can be shown to increase by upwards of 10%. However, by using models that predict goal expectations for individual goal attempts based on the x,y co ordinates from where they were made, we can demonstrate how sides, on average attempt to up their attacking game in certain match situations. Either until their opponents succumb, they themselves are caught on the counter attack or the game merely excitingly runs it's full course.

Arsenal, being a consistently successful side are less prone to ambiguous, stalemated game states, where doubt lies as to whether or not they are reasonably happy to be on level terms. Original game winning probabilities of around 25% or smaller are the break even point, whereby a side is theoretically content with a point and the vast majority of Arsenal's matches will see them quoted at greater probabilities than this to win at the outset. Therefore, Arsenal are almost certain to push for a winner at some point in almost every tied game unless they are visiting either Manchester club or Stamford Bridge.

Arsenal's Goal Expectancy from Chances Created in Various Scoreline States. 2010/11.

Scoreline State. Ahead. Level. Behind.
Goal Expectation from Chances Created. A Goal every 55 minutes. A Goal every 45 minutes. A Goal every 45 minutes.

The overall level of Arsenal's ability in 2010/11 was on par with a side expected to score, on average a goal every 51 minutes. The goal expectancy based on the quality and quantity of the chances they created when they led suggests that then they played like a team capable of scoring only once every 55 minutes. So, as a team which had the lead they moved into a move defensive mode to the detriment of their attacking expectations.

The Gunners' urge to improve during level and trailing scoreline states is reflected in their quantity and quality of goal attempts being the equivalent of a long term average scoring rate of a goal every 45 minutes. In 2010/11 they upped the rate of chance creation and partly maintained the quality in a level scoreboard state and upped creation even more, but at the cost of chance quality when behind.

In the absence of goals, we can still show the efforts, sometimes fruitless, made by Arsenal in losing or frustratingly stalemated situations. During the 79 minutes they trailed to Villa in their final home game of the season, Arsenal fired in enough goal attempts of varying quality to have scored at a long term rate of a goal every 30 minutes and their game long potential goal expectancy over the full 90 minutes was an equally urgent goal every 35 minutes. But the randomness of conversion rates saw them merely register a 89th minute consolation, despite their numerous efforts. They lost on the day, but through random variation rather than lack of trying.

Above I've plotted the overall, theoretical scoring rate suggested by all the chances created by Arsenal in each 2010/11 match against the average of the game state they encountered on the day. In matches where they were consistently chasing their hoped for outcome, they were able to up their attacking output, producing chances that would yield almost a goal every 20 minutes in their home loss to WBA and these bouts of increased effort often remain in the overall game figures. But, as with the Villa game, short term randomness again beat them.

At the opposite end of the plot, they were unable or unwilling to continue to take the fight to United and Chelsea when beating both. Defence was probably more of a priority once the lead had been secured. An early goal from the game's best chance against Wolves, allowed Arsenal to dictate much of the game at Molineux. More goals would have been welcome, but weren't essential and a second goal only arrived in injury time as Wolves pressed forward.

It appears that all sides eventually tailor their attacking intent to suit the current score, their own pre game expectations, the quality of the opposition and time remaining and if random distribution of your innate talent is kind enough to gift you a three goal lead at home to Chelsea, there is little need to try to run up the score at the risk of opening up the game. Scoring further goals no longer remains a high priority.

Single game scoring efficiency is a heady mix of match day randomness that infrequently yields significant, talent driven events and the relative abilities of the contestants. High quality chances often fail to result in a goal, whereas poorer quality ones sometimes do, and these partly random outcomes often help to frame how the remainder of the match is played out.

For more interesting work on this essential context driven subject check out Paul's recent post at differentgame.

Friday, 10 May 2013

Strikers Are Better Finishers Than Defenders.

Shooting ability is one of the most coveted of footballing skills and the prices paid for the likes of van Persie and the speculative sums quoted in relation to any potential sale of Gareth Bale confirm this. It has become increasingly possible to quantify shooting talent, first by collecting large amounts of goal attempts and assuming that equality and quality of chances are evened out over such large sets. We can further sub divide these numbers into set piece, open play or headed attempts and finally by using x, y coordinates we can try to ensure that players who consistently shoot from advantages ranges don't see their numbers artificially inflated. Although we shouldn't overlook the fact that the ability to continually shoot from closer range is probably a talent as well.

Therefore, x and y data analysis offers a promising way to begin to evaluate the relative merits of individual players, although the influence of short term random fluctuation should be appreciated and the absence of additional inputs, such as the position of opponents would also prove helpful.

An individual player who scores more frequently than expected given the position from which he shots can be considered to have had a successful season, although consistent over performance over a larger period would be needed to confirm a repeatable excellence. However, we can take a more general look at a larger data set by dividing players simply by position to try to confirm that some groups of players are more proficient shooters than others.

Strikers are primarily purchased to score goals, whereas defenders require different talents and so it is reasonable to assume that the former, overall have better shooting capabilities than the latter. If we can confirm a difference in shooting ability between different positional groups, it is also likely that a spread of talent within these separate groups also exists.

 Average Likelihood Of Players Scoring With A Central Shot From 14 Yards.

Player Position. Chance Of Scoring With Shot From 14 Yards. Chance Of Shot Being On Target from 14 Yards.
Defender. 15% 31%
Midfielder. 22% 44%
Striker. 25% 47%

Average Likelihood Of Players Scoring With A Central Shot From 25 Yards.

Player Position. Chance Of Scoring With Shot From 25 Yards. Chance Of Shot Being On Target from 25 Yards.
Defender. 5% 21%
Midfielder. 8% 31%
Striker. 10% 34%

The conversion rates above are derived from regressing x,y coordinated shooting attempts across most of the teams that have recently played Premiership football. All headers have been excluded along with obviously higher value opportunities, such as penalties. By including a term to represent the designated playing position of all players, I have calculated the drop off in observed conversion rates as we go from strikers, through midfielders to defenders at varying distances from the goal and maintaining a central position. Strikers, in general are almost twice as effective when shooting from just inside the box than are defenders and this repeats at much longer distances.

The divide in shooting talent appears clear between groups of players, although defenders may be shooting in situations immediately following a corner, where the defence is more concentrated and as we begin to add terms for defensive pressure to such models, these effects will hopefully become better defined. Conversely, some defenders may not even get the opportunity to impress or otherwise with their shooting prowess. Defenders account for almost 40% of playing time, but contribute just 15% of shots, compared to a near reversal of numbers for strikers. Therefore some of the poorest shooting defenders may be almost completely absent from the sample.

Heading ability is one area where defenders may be expected to close the scoring gap between those more usually charged with the task of scoring goals. However, again strikers and midfielders appear to be both more efficient and more accurate with their efforts than the average defender. Although again defenders who do venture forward are more likely to do so at set pieces, where they are bound to attract attention from the best markers.

Average Likelihood Of Players Scoring With A Central Header From 7 Yards.

Player Position. Chance Of Scoring With A Header From 7 Yards. Chance Of Header Being On Target from 7 Yards.
Defender. 20% 28%
Striker/Midfielder. 27% 38%

Methods that confirm widely held beliefs are as important as ones that challenge the status quo and that the best finishers tend to be strikers and midfielders shouldn't surprise given the premium that such skills attract in the market. A lethal finisher, especially from open play is much more valuable when playing in a position where he can showcase his rare talent and while the transition from defender to striker is almost certainly a leap too far, the conversion rates of defenders and midfielders is much closer.

Midfielders account for just over 40% of total playing time and account for nearly 50% of a side's shots. So often from a purely shot conversion viewpoint it is simply access to opportunity that distinguishes the number of goals scored by a similarly talented defender compared to a midfielder of near identical shooting abilities. Bale's transition from fullback to goal scoring midfielder may have been partly written in his early stats.

Sunday, 5 May 2013

The 2012/13 Championship Table.

Following on from this post describing how a Premiership table would look if all teams where of equal quality so that any game would be decided by randomly selecting from the historical home/away and draw probabilities, here are a couple of graphics for the 46 game Championship.

Points Distribution For The Last 20 Championship Seasons.
And for a luck driven table....

Points Distribution For A Championship Comprising Of 24 teams Of Equal Quality.

This season's Championship had perennial also-rans, Cardiff as champions with 87 points. Bristol City relegated with 41 points. The relegated Premiership clubs, whom usually account for a fat right hand tail in historical plots, were scattered to the three winds (in the case of Wolves, all the way to Division One).And a peak at around 60 points.

2012/13 seems a much better fit for the latter plot than the former. When true abilities are very closely matched, luck plays a huge role in the final outcome. Scant comfort for the Old Gold shirts.

Thursday, 2 May 2013

The EPL.....All Things Being Equal.

The numbers that define the levels of performance posted by a team or individual over a period of time will be partly of function of their innate talent, but also partly dependent upon random variation around that true talent. The tendency is for exceptional performances, be it good or bad, to arise over limited time scales because of both talent and non repeatable bouts of randomness. Only by gathering copious amounts of data will true ability shine through the randomness, by which time we will probably have to also control for ageing effects. Our overall assessment of an individual's actual quality, therefore must always allow for the existence of potentially distorting amounts of random variation.

Even when success rates are known with absolute accuracy, random variation will make some coins appear more "talented" than others and in this post I looked at how easy it is to see, and over pay for noise instead of signal. Correcting for noise is possible when looking at player actions which have just two outcomes, a successful one or a failure. Save or shooting percentage are good examples of this. But as we move onto the contribution of randomness in team results, we have to allow for a third significant outcome in football, namely the draw. The most random of the three possible match outcomes.

The simplest way to outline the effects of random variation and team talent on league outcomes is to compare the distributions of points gained in a league over a season, where skill is an undoubted factor against a league decided by pure luck where the talent levels are identical for each team. It isn't quite a coin toss league (because there are three outcomes for each game), but the concept is the same.

I therefore simulated 380 game seasons for a 20 team league, where each team was the equal of the other 19 and home advantage was a fairly typical three or four tenths of a goal. To illustrate, rather than quantify the effect of the actual lop sided division of talent we see in the real Premiership I then compared the characteristics of the 100% Luck league to those of the actual EPL.

 The next plot shows the frequency of actual points totals accrued by Premiership teams over the 38 game lifetime of the league. Any points deductions, such as those imposed on Portsmouth have been re instated.

 The scales have been maintained on each plot, so that the flatter, fatter tailed distribution for real live events can be directly compared to the entirely luck driven, theoretical scenario. To get a clearer picture of the differences involved, the final plot superimposes one plot on the other.

The two distributions have clear differences and the actual points totals achieved in the EPL show many more outstanding seasons being recorded by the big four teams than would have resulted by pure chance. The presence of a fatter tail in reality appears to indicate the clear influence of talent on the season long results recorded in the Premiership.

A feel for the differences can be illustrated if we compare 20 seasons of luck driven Premiership seasons with a similar number of real contests. Only a handful of teams have won the EPL title in reality, with Manchester United dominating, but in the extreme parity version hardly any team hadn't won the title. The most "lucky" titles won were three, compared to United's nine in three fewer attempts (well done QPR). Even Stoke lifted the crown as often as Manchester United (twice) in this egalitarian, altered reality. Equally relegation came almost without favour to all.

Points totals probably hold the most interest. The highest points scoring champions are Chelsea with 95 points compared to just 76 in a luck drive contest and champions overall gain an average of 86 points in reality, 20 points more than dictated in an even fight. At the other extreme, Derby's sterling effort in gaining just 11 points is a full 27 points below the average points total for team's finishing bottom of the "luck"