Pages

Sunday 28 July 2013

Predicting the Rare and Significant from the Incidental and Commonplace.

The transfer window remains open and the net is awash with statistical comparisons between present incumbents and perceived replacements, but with a few notable exceptions, much of this fevered speculation fails to address the problems of sample size in the data used. What you see, especially over a single season is what you got, not necessarily what you'll get. However, even when regression towards the mean is addressed, a much more fundamental issue often remains, namely does the statistic used actually bear any relationship to the ultimate, desired outcome. Do the numbers we are attaching to a player, correlate and hopefully have a causative link with winning ?

Simply because a number is recorded, it doesn't naturally follow that the stat is valuable in evaluating a player, especially in the absence of context, such as the playing style of the team where he has amassed those figures. For example, interceptions are probably a skill. But generally the more interceptions a team makes, the worse they do in the game.

More interceptions (leading to worse results) should hardly be a great selling point for a player. It would be much better to try to identify how situations develop that require an interception to be made and purchase a player whom might prevent those situations from arising in the first place. That might be the route more successful, but less intercepting prone teams have taken.

The great interceptor may look good on paper, but you may be purchasing the equivalent of a signing who is really good at picking the ball out of his own net. Once you require his particular skills, the team has already dug itself into a substantial hole.

So it makes more sense to concentrate on numbers that at least positively correlate to success, rather than throwing everything at the analysis. The choices of these potential killer stats are readily available. For a small monthly fee, a few sites provide reams of individual player stats, sorted on a game by game basis and free, but less exhaustive data can also easily be found. Added context, such as pitch location divided by thirds isn't ideal, but it does exist.

One fairly obscure recorded stat is possession won in the attacking third, essentially a turnover without the opposition threatening your goal. In the NFL this type of turnover is gold dust and a positive differential during a single game in this statistics makes victory extremely likely. Such powerful events are relatively rare in gridiron and so over a mere 16 game regular season, we are unlikely to be able to readily distinguish the lucky from the good.

This particular problem can be eased by looking for a particular skill that is likely to be connected to this rarer, but valuable skill, but occurs in much larger quantities. Poorer pass completion rates for quarterbacks correlate well with the undesirable "talent" of throwing a less common and drive ending, interception and on the defensive side of the ball, the ability to defend a pass is a good indicator of the rarer event of catching an interception. We'll return to this later.

The Premiership equivalent of winning possession deep in opponent's territory appears to share both the rarity and partly, the worth of the NFL counterpart. To take Everton over the previous two seasons as an example, the correlation between final 3rd turnover differential and a positive match day outcome is significant and strong, despite the relative rarity of the event.

Over the last two seasons, Everton has averaged 2.6 such events per game and allowed their opponents to gain an average of 1.5 turnovers in the Toffees defensive 3rd. Above I've plotted the line of best fit that relates match day final 3rd possession turnover differential to outcome. The red bar represents the likelihood that Everton won the game, whilst the green bar incorporates the draw as well.

Everton bridge the gap between Champions League regulars and the remainder of the Premiership, so they are good enough to overcome a final 3rd possession winning differential of minus 4 and still  be more likely than not to take something from the match. Once their opponents enjoyed a positive differential of around eight or more, the Toffees' chance of a win drops below 10%.

By contrast, the outcome of tussles for possession in the middle third are much more numerous, but this particular differential in possession turnover provides virtually no correlation with match result. Despite the added opportunities for the better teams to demonstrate their skill and drive a gap in the respective successes posted by themselves and their opponents, it is rarer events, slightly closer to the defended goal that appear to determine where the spoils are most likely to go.

Post game knowledge of the differential between each side's possession steals in the middle third gives you absolutely no extra information as to the likely winner of a match involving Everton since August 2011. Simply picking the generic win % is likely to be as close to predicting the outcome of a group of otherwise anonymous matches as trying to glean information from the middle 3rd possession differentials from each game.

So we have more numerous, but apparently non predictive, middle 3rd possession interchanges and rare, but indicative final third events. Therefore, recruiting a player with final 3rd capabilities would appear to be desirable because of the strong correlation of that stat to a positive match outcome. Unfortunately, few players reach double figures for successfully winning possession in the final 3rd, even over two seasons of regular playing time.

To solve this inconvenience in trying to accurately establish players of final 3rd potential, we need to briefly return to the NFL. Interceptions, primarily by cornerbacks and safeties are rare, but defended passes, whereby the ball is knocked away from the intended receiver, much less so. The skillsets needed to perform both tasks is very similar. Good hand, eye coordination and speed, coupled with athleticism. So it shouldn't be surprising that defended passes are strongly correlated to full bloodied interceptions.

And the same is true of a soccer player winning the ball in the middle and final third. The skills required are virtually identical and the two stats are extremely strongly correlated with each other. Therefore, instead of trying to work with extremely small, often single figure sample sizes from the final 3rd, we can use the strongly related, middle third possession figures as a proxy for the trait that appears to drive match results, but turns up barely twice a match for a team of up to 13 outfielders.

How Winning Final 3rd Possession Relates to Winning Middle 3rd Possession, Everton 2011-13.

Playing Position. Number of Successful Final 3rd Events per Middle 3rd Events.
Defenders. 5 per 100.
Midfielders. 10 per 100.
Strikers. 10 per 30.

Now instead of drawing information from events that in the case of Everton occur only five times every two matches, we are in a position to take a player's seasonal statistics from correlated events that happen over 20 times a match for each team. Winning possession in the middle 3rd may not correlate to winning, but it is tied intimately to the ability to win possession in the final 3rd..... and that stat does correlate to match success.

Choosing between two players with similar playing time, whom have both regained possession just three times in the final third, becomes slightly easier if we also know that one has also recorded 60 similar actions in the middle third compared to just 20 for the second player. The 60/20 middle third split gives the former a potential edge over the latter that is absent in the tied single figure sample of the stat that has really attracted our scouting interest.

Whether or not Everton's final 3rd possession winning stats are an integral part of just their match winning makeup, a spurious correlation or a strong Premiership wide theme, isn't the most important part of this post. Sometimes if you need to estimate a rare, yet valuable occurrence, it is better to look elsewhere at a more mundane, apparently unimportant, but highly correlated and much more commonplace alternative. That way you don't have to wait ten years until a potential transfer target bulks out his cv with rare and valuable actions.  

Friday 26 July 2013

Unpeeling The Shooting Onion.

The quality of a foot shot in football is decided by a variety of parameters. This list, not necessarily an exhaustive one, includes shot location, the power of the attempt, the placement of the shot and the defensive pressure on the shooter. Most of these listed can, at least in some general way be recorded and quantified to allow further analysis. However, other possibly significant factors, such as pitch condition, player fatigue and quality of the previous pass are less easy to tabulate.

It is easy to overestimate the amount of control a player has over the execution of a shot. Variables, such as defensive pressure present an obvious and changing irritant for the shooter, but constants within the context of a single shot, such as shot distance may also apply some influence on the type and hence the quality of the attempt.

For example, distance shooting often requires a more powerfully struck shot, not only to make blocks by outfielders less likely, but also to get the ball to it's target quick enough to prevent the keeper from correctly positioning himself. But the requirement for greater power may relieve the player of some control of other aspects of the shot, such as placement.

Given sufficient time a keeper's movement and positioning allows him or her to cover the entire goal. But shots taken closer to goal may be hit with less power, but arrive quick enough at an area of the goal that the keeper is unable to reach in the time available. Therefore, distance shooting may require power to hurry them on their way and closer efforts may rely more on guile and craft to consistently direct the ball to it's intended target.

I've plotted the correlations between these more commonly charted shot characteristics and various shot outcomes. We may then try to build up clearer picture of how they are likely to relate to outcomes at the sharp end of the effort and to allow speculation as to whether the shooter is controlling the relationship or if the initial inputs are at least partly hard baking the outcome.





Shot strength is broadly divided into powerfully, normally and weakly hit shots and the three types show a gradual, significant trend as we get closer to goal. The perpendicular distance from the goal in yards is used to denote the x,y coordinate of each attempt.

Most shots are hit with normal power (obviously, from the name alone), but the percentage increases as we near the goal. The decline as we enter the box is keenly felt in the proportion of strongly hit shots. Over a quarter of shots are given a real thump from 30 yards, but less than 5% from the edge of the six yards box. Are we seeing the need to get the ball quicker to the target being proportionally more important from distance being displayed in the data ? Closer in, placement appears to become more important and presumably profitable than raw power.

The  interesting category is weakly hit shots, surely never an advantage for the ball striker. The steady increase in the proportion of weak shots as we approach the goal is possibly due to the higher intensity of defensive pressure or the eagerness to get a shot away from close range regardless of quality. Miscues will always happen, but something appears to be encouraging them in the six yard box.


The next graph is essentially a "high or low" plot for shot placement dependent on the strength of the shot. Shots from distance that are on target tend to be more likely to hit the goal plane vertically above the half way point. Possibly another manifestation of distance shooting being more powerful and tending to go high compared to placement orientated closer efforts.

If we imagine a goal kick as an extreme variation of the long distance shot, it is hit with power (to prevent opponents blocking the effort), it is hit aerially (again to reduce blocking, but also to carry the required distance, maximum range is achieved with a trajectory of 45 degrees to the horizontal). By contrast a pass or a short goal kick is hit with just enough power and more placement to avoid an opponent, but safely reach it's destination.

Long distance shots maybe the goalkicks of the shooting world and shorter ranged shots are the passes

We can further refine the shooting tendencies by looking at the preferred destination for on target efforts in regards to corners of the net, either top or bottom, again sorted by shot distance. Shots from distance are nearly twice as likely to arrow towards the top corner than the bottom. A controlled effort by the striker to try to hit the most difficult area for the keeper to reach or merely the natural outcome of the relative inability to keep powerfully struck, long range efforts from flying to an elevated height ?





The final graph looks at the sacrifice in accuracy you give up by hitting a more powerful shot compared to a normal one. Powerful shots are more likely to miss or be blocked than weaker, or normally hit efforts.

To summarize, from distance (30 yards), powerfully struck shots are at their peak, they take the aerial route more often, hit the top rather than the bottom corners of the goal, but lack the "on target" accuracy of normally struck efforts. A player can certainly chose how hard he hits a shot, but that the attempts then more frequently take to the air may be at least partly down to the difficulty in controlling the flight of a powerful shot, rather than conscious aim taking.

Once we home in on the penalty box, the areas of the goal that a keeper cannot cover adequately in the timescale of a normal shot rapidly increases. Hitting top corners or going for power is no longer a priority if "passing" the ball into an area of the goal that a keeper's angles have left exposed carries a higher likelihood of success.

Shooting appears to be a constant interplay of trade offs between power, placement and a multitude of other variables, with distance partly dictating the required approach and then possibly having a say in the most likely end point of the effort. It is easier to manufacture a shot from distance, but the skillset and rewards may differ from those needed from closer to goal and it is easy to speculate that the spread of talent for players taking shots may well differ at different distances from goal. Are there more Bale's than goal poaching Lineker's in the general striking population.

The introduction of more variables certainly widen the scope for evaluating shooters. Ultimately though, at the moment, we don't even really know how much of the process is fully in control of the feet of the shooter.


Wednesday 24 July 2013

Robin van Persie Really Isn't Afraid to Miss.

A year ago I wrote a couple of posts about the shooting characteristics of Robin van Persie under the general heading of "strikers shouldn't be afraid to miss". The data was from 2010/11, so the Dutchman was still an Arsenal legend, rather than a dangerous opponent from a championship seeking rival who makes a once a season visit to the Emirates stadium.

To summarize the post, when comparing the chances of a van Persie goal effort resulting in a score against location corrected efforts for the other main Arsenal shooters, the Dutchman was clearly superior. However, in terms of accuracy, it was his group of colleagues that were more likely to hit the spot and no matter where the shot originated from, van Persie was slightly more likely to miss the target completely.

I speculated that the rest of Arsenal's strikeforce was taking the mantra of "make the goalie work" to rigidly, taking less risks and more guaranteed applause by failing to shoot towards the less accessible areas of the goal for fear of missing. In short, van Persie's shot placement was more adventurous, more prone to missing the target completely, but ultimately more rewarding in terms of goals scored as a result of this efforts struck towards the periphery of the goal, generally being more difficult to save.

I intended to compare van Persie's shot placement to see if he was indeed working with finer margins, but instead the post became a general look at shot placement as well as shot power on the regularly seen location based shooting models. That post turned up last month as Power and Placement , so now is a good time to revisit the original van Persie venture.

Placement is an obvious factor in determining the outcome of a shot, although the effect of shot power is likely to have a much greater influence on the final result. From data used in the previous link, shots hit from identical pitch positions, but aimed closer to the frame of the goal, increase the likelihood of scoring. But when going from a weak to a more normally paced shot from the same spot, the improvement in expectation is generally many times greater. Shot pace is likely to be a major factor in calculating goal expectancy, but it just isn't generally available, so we will have to include placement alone, largely ignorant of the potential effect of shot power.

Although both keepers and strikers have a "natural" side, it's unlikely, given the numerous body shapes a player has to adapt to get a shot away that one top corner is very much different from the other. Therefore, to see how van Persie's shot placement may have enhanced his performance compared to a shot location based model alone I summed his total top corner, bottom corner and central attempts and saw how they compared in terms of frequency to those of similar contemporaries, namely the top scorers from all the other Premiership sides from 2008 onwards and his fellow Arsenal strikers over the same time scale. This approach also boosts sample size.

Percentage of Shots Hitting Particular Areas of the Goal, 2008/13.

Player/Team. Top Corners. Lower Corners. Top Central. Lower Central.
R. van Persie. 15% 54% 8% 23%
Arsenal exc. RvP. 12% 50% 7% 31%
League Ave.
Top Scorers
13% 50% 8% 29%

The impressions from the original post appear to be confirmed by more granular data. Robin van Persie does appear to find the top and bottom corners proportionally more often than his now former goalscoring teammates, as well as beating the pooled average of the top scorer from each permanent member of the Premiership club since 2008. Only Villa's cumulative pool of leading scorers has bested van Persie's ability to hit the corners over the last five completed seasons. For those fond of anecdotal evidence, the Dutch striker has also hit the woodwork 17 times in the last two seasons, perhaps indicating that he is likely to be shooting with an extremely narrow margin between probable glory and individual shooting failure.

By daring to miss, the premise of the original post appears sound, RvP seems to be boosting his scoring rate at the expense of his accuracy.

van Persie outperforms a simply model based around just shot location. If we then incorporate shot placement into an updated version and van Persie (or any other player) consistently outperforms that version, we need to entertain the idea of including other variables, such as shot power. And then onto defensive pressure or player movement and quality of assist. Models always fail to include every possible factor, but as long as we know what is included, they remain valuable. What's absent is the likely cause of discrepancies and this can lead to fruitful, further investigation.

Perfecting shooting models is an onion that's going to take some peeling.

Tuesday 23 July 2013

Home Specialists Are Largely Random.

One of the universally accepted truths about professional team sports is the existence of a home field advantage or the ability of sides to produce consistently better results in the familiar comforts of their own stadium compared to those on their travels. The causes of hfa are likely to be many and varied and a few have been examined in this blog. The sport where the scope to match a causative agent to the observed effect is in the NFL, where pitch dimension and uniform regulations are one of the few constants on any given Sunday (or Monday or Thursday...)

Artificial turf dwellers, as well as domed teams can draft track athletes as wide receivers and quarterbacks with booming arms to take full advantage of their benign home conditions, but may find the harsher realities of a mid December trip to Lambeau Field, Wisconsin less productive. Natural advantages, such as the thinning, stamina sapping, mile high air in Denver has traditionally made the Broncos a harder nut to crack at home compared to when they are the visitors, once the usual six point swing from switching venues is allowed for. Since 1989, Denver has enjoyed hfa of around 4.5 points, compared to a league average of 2.8 over the same period.

Green Bay, the tenants of Lambeau, have, since 2009 done even better than the Broncos and posted a hfa of over 5 points a game in an era where the average league figure is in gradual decline. The stadium at Lambeau may never host a Super Bowl, but their harsh climate can guarantee a better than average home performance. Or maybe not.

If we instead start counting in 1989, when presumably Wisconsin winters were just as harsh and hfa across the years was 2.8 points, the Packers won their home matches by an average of 7.2 points and their road trips by 1.2 points. So over the long haul the Pack's hfa is around 3.2 points. Still better than league average, but well short of the near unconverted touchdown seen in recent years.

So the question now becomes, is Green Bay's recent, elevated hfa just simply a random, small sample fluctuation from the long term league mean to which their results from 1989 onwards have more closely adhered to or can their fans risk frost bite along with their sport, safe in the knowledge that the harsh weather is disproportionately helping their side and confounding their opponents. If it's the latter, it is only for eight matches a year of sometimes sub zero football, discounting the playoffs.

Simulating an NFL game isn't easy, they really should have awarded one point for a touchdown and scrapped the rest of the scoring events, so for the remainder of this hfa post I'll return to football or soccer, to make the distinction clearer.

Expressing a side's talent differential compared to their opponents can usefully be done in terms of the average number of goals they might concede and score in a contest. A fairly typical match between two equally matched sides would see the hosts scoring around 1.5 goals and the visitors about 1. From here it is a short, but tedious step to simulate all manner of match outcomes.The final score and more usefully the margin of victory or defeat is most helpful in determining the range of hfa that might be seen over a simulated 38 match EPL season once random chance intervenes in matches with a perfectly normal hfa.


In the plot above, I've simulated 10,000 38 game seasons for an average Premiership side and I've assumed that they enjoy a typical home field advantage of 4 tenths of a goal in each home game they played compared to each of their 19 away matches. HFA is typically expressed as comparisons between success rates (games won + games drew/2 as a percentage of total matches played) at home and on the road or by venue specific goal differences. I've chosen to use goal difference and have plotted seasonal averages from the simulation grouped in tenths of a goal per game over the 38 game campaign.

The most common average HFA value seen over a season in the simulation is, not surprisingly centered around 4 tenths of a goal. But the variation in recorded values for a side with the usual level of HFA baked into the figures is perhaps surprising. A signification minority of the time, a 0.4 HFA side can over the short course of 19 home and 19 away matches record a "negative" HFA or in other words they produce better results as visitors than they do as hosts.

The plot also contains similarly rare, but significant occasions where their home results are vastly superior to their away ones. These particular seasons are nailed on to be described as home specialists, yet the outcomes are merely the relatively unlikely product of a normal HFA season being replayed enough times for the right hand tail of the plot to show itself.

Finding real life dopplegangers of these rare, but fully expected  deviations from a side's true HFA isn't difficult. Twelve seasons ago, West Ham won 75% of their total points at home, compared to a league average of just over 60% and slightly further back in May 1998 Crystal Palace bowed out of the Premiership having collected just 33% of their meagre total of points at Selhurst Park. And every year numerous sides delight their travelling fans, but then regularly frustrate their "stay at home" majority the following weekend.

Neither WHU nor Palace continued the trends of their anomalous seasons in the subsequent campaign. West Ham fell to 53% of points gained at home and Palace went on a neat WHU like home spree in gaining over 70%. As we added more seasons, the long term average HFA for each side trended towards more usual levels

A team can record levels of performance that appears far removed from their underlying talent in small sample sizes and in this regard, HFA is no different to other documented stats. Regression towards the mean, rather than external forces is invariably the greatest influence on season to season HFA correlation and if you look often enough, a team with an underlying normal HFA is going to turn in a "home specialist" looking 38 game performance.

It is unlikely to stand the test of time.

Sunday 21 July 2013

Skill and Luck in Cricket.

The cricket season is now in full swing, even if the Ashes is virtually a done deal, so I thought I'd trawl the web and crunch a few cricketing numbers. As a sport, cricket is much more akin to baseball than it is to an invasive team sport such as football. Therefore, whereas team statistics is the more rewarding starting point for the latter, cricket lends itself more to individual player analysis.

Evaluating player talent in football is complicated not only by the support given by teammates to statistical headline grabbers, such as goal scorers. But the lack of a rigid playing format also means that a soccer player can shoot from any position on the pitch, whereas a bowler is restricted to repeatedly hurling the ball down a consistent length of pitch. Therefore a bowler's figures are the product of a controlled, repeatable trial, but a goals scorer may be benefiting from taking his shots from varying field positions.

Cricket is much more about simply taking wickets. Batsmen finding scoring difficult at one end of the pitch may be tempted to take more risks at the other and present wickets to the lesser bowler in the process. However, is would be hard to argue that the best bowlers aren't also prolific wicket takers. So I'm going to dip my toe into cricket stats and look at wicket taking rates for test bowlers and particularly the likely spread in talent across this stat.

The method follows a familiar pattern, the average wicket taking rate for a group of bowlers is taken and then the spread of this rate within the group is compared to that expected from random chance. The more smeared out reality is from the hypothetical, entirely luck driven expectation, then the greater the spread of real, actual talent is likely to be within that group.

Perhaps unsurprising to anyone who has picked up a cricket bat, bowling and specifically in this case the rate of wicket taking is a talent. And a spread still exists at the highest echelons of the game. Such is the amount of cricket played in the modern era and the vast number of "trials" the top bowlers have taken part in, a top player's raw rate requires very little regression towards the mean to improve our estimation of his likely true ability.

Dale Steyn is the overall leader in the rate of wicket taking in test cricket with over 300 wickets from over 13,000 deliveries and very few test players change positions from their raw percentages, even when sample size is allowed for. Glenn McGrath has a marginally inferior strike rate to England's Darren Gough, but he manages to creep ahead into 9th overall once sample size and regression towards the mean is applied. A rare Aussie victory over England this summer.

Australia in happier cricketing times.
An average, elite test bowler takes a wicket every 60 balls and before you can begin trust the evidence of your own eyes and begin to believe that the performance you are seeing represents more skill than good or bad fortune, a bowler should be observed to bowl something in the region of 222 overs of test cricket. So a player who bowls poorly or well on debut hasn't been given anywhere near the opportunity to demonstrate his likely test career path.

If we repeat the exercise for one day  international cricket, again we see a spread of wicket taking rates that differ significantly from a random spread based around the group mean that indicates the presence of repeatable, human talent. Although in this case slightly less so than in test cricket. Wicket taking on the part of bowlers and wicket retention for the batsmen are less important in these day long run chases, so it is understandable that less skill and more luck is involved in taking wickets in a late order slog.

In the curtailed form of cricket an elite bowler strikes on average once every 38 balls, so despite the greater element of chance, this overall enhanced strike rate means that a bowler will most likely demonstrate his talent over and above his good or bad fortune after about 180 overs. Still a substantial number of matches given the limits on the number of overs a bowler may bowl in a single one day international.

Regressed Bowling Strike Rates in Test and One Day Cricket.

Player. Reg. Strike Rate. Balls/Wicket. Test Cricket. Players. Reg. Strike Rate. Balls/Wicket.
One Dayers.
D Steyn (SA) 42.3 B Lee (A). 30.0
Waquar Younis (P). 44.4 Waquar Younis (P). 31.0
M Marshall (WI). 47.5 Saqlain Mushtaq (P) 31.1
A Donald (SA). 47.8 A Donald (SA). 32.1
F S Trueman (E). 50.1 S Malinga (SL). 32.1
Sir R Hadlee (NZ). 51.3 Shoaib Akhtar (P). 32.1
J Garner (WI). 51.5 M Johnson (A). 32.3
M Holding (WI). 51.6 S Broad (E). 33.2
G McGrath (A). 52.2 M Ntini (SA). 33.2
D Gough (E). 52.3 A Agarkar (I). 33.4
D Lillee (A). 52.5 N Bracken (A). 33.8
J Thomson (A). 53.4 A Flintoff (E) 34.0


If we finally compare the talent required to effect certain dismissals across the two formats of cricket, test match play invariably allows for the talent to shine through. The top leg before wicket takers have the time and less pressure to prevent runs in test cricket to demonstrate their art fully. A list of the top lbw wicket takers in one day matches is more likely to contain a lucky player than one with a deadly, inswinging yorker. Likewise, when it comes to hitting the stumps or inducing a stumping, talent is more diffused than in the short game.

However, one category of dismissal appears to show a greater spread of talent in odi's than in test cricket. Namely, wickets by way of catches. The days of odi's being contested by the same team as turned out in the test arena is long gone. Boycott opening for England in a 60 over match is unthinkable, nowadays. Specialist fielders/batsmen are now common place. The talent spread may be catching bowlers who were fortunate to play with excellent catching teammates. So ultimately individual cricket stats, as with football also owe some level of input from the players surrounding you.


Saturday 20 July 2013

Crosses As A Retrieval Tactic.

As a continuation of the posts based around how a side's attacking tendencies change depending upon the current state of the match, here's a short post on crossing frequency. As in previous posts, game state is measured by referencing the initial expectation of the side going into the match against the reality of the ebb and flow of the contest. Therefore, an inferior side that loses by the only score in the final moments will have been more than happy with the state of the contest for the majority of the 90 minutes, if somewhat disgruntled by the final cruel twist. Roughly opposite the emotions and satisfaction experienced by their victorious opponents.

Geoff Cameron acquaints himself with the art of crossing.
Open field crossing is often seen as a desperate, last refuge of the less skilled sides, who lack the required technique to unlock defenses. The reluctance of such teams as Barcelona to throw the ball into the mixer, even in times of greatest need is perhaps indicative of a lack of the correct type of players to exploit an often aerial route, but also a belief that continuing with a ground based, passing approach provides superior longterm rewards.

To see if the Premiership from 2011/12 shared that strategic outlook or succumbed to the temptation of quick, easy, but low grade opportunities, I plotted their expectation corrected game state over individual home matches during 2011/12 against their crossing frequency.


A game state value of around 1 indicates that the match panned out close to pre game expectations. Blackburn may have won as narrow favourites, but they didn't romp away with the victory. The nearer we get to zero the worse the reality matched early morning hopes and the more likely Blackburn were to move to a more offensive approach. The home defeat by Newcastle, for instance, where the visitors took an unchallenged lead after 12 minutes and Blackburn fired in an above average 20 crosses in a bid to retrieve the scoreline. The line of best fit predicts just over a quarter of that total in games they dominated, denoted by a game state score of around 2.

If Blackburn's reliance on crosses as a means to reverse their fortunes is hardly as surprise, the extent to which it was practiced by home sides over the 2011/12 season may be. Despite a broad cross section of managerial and playing talent, virtually every side increasing called upon the cross as a key ingredient when they found themselves under performing in matches.

The David Moyes led Everton demonstrated an even tighter correlation of cross frequency and game state, shunning the tactic when comfortable, but following Blackburn's lead when struggling. His current side Manchester United also followed suit, although with less fanaticism. Only Bolton, Wigan (managed by Moyes' successor at Everton) and possibly predictably Arsenal, showed no real tendency to cross more when performing disappointingly. A strong positive increase in the proportion of final third passes for the Gunners in games where they consistently experienced poor game states hints at their preferred mode of retrieval.

The general case in 2011/12 saw a side increase their open play crossing by around 60% as they went from games they comfortably won to games that they struggled badly to turn around, although as Blackburn and Everton demonstrate, there were team variations within the strong trend.

Friday 19 July 2013

Spotting Age Related Decline in Strikers.

One of the simplest concepts to grasp, but the hardest to measure is the effect of age on an individual player's statistical output. Precocious youth is noteworthy mostly because of it's rarity and every player heading into his forth decade will surely know that his best seasons are almost certainly behind him.

Many will intuitive believe that the peak of a professional sportsman's career arrives around his mid twenties especially in a physically demanding sport such as football. However, providing evidence for this claim is often more difficult. Managerial selection that favours mid twenties players at the expense of either extreme provides some confirmation, but attempts to quantify this is fraught with problems.

Counting or averaging headline performance indicators, such as goals scored by strikers often produces a pleasing curve that obligingly peaks around the expected age groupings. But selection bias is an ever present danger, as youngsters deemed not yet good enough to start and veterans, no longer good enough to merit selection are inevitably absent. Exceptional youthful and aged talent such as Messi and van Persie respectively can easily dominate the apparent contribution of  usually under represented age groups and, depending on how the data is presented, created false peaks where none generally exist.

Also because, almost uniquely, football has a global reach, some exceptional players can leave the EPL, not because they are declining, but because they are exceptional. Ronaldo's goalscoring exploits as a 20 something at United would hang heavy on that particular age group, but wouldn't be represented in a cumulative approach in the later age grouped performances, because his is now in Spain.

One way to try to eliminate the influence of statistical freaks is to plot the change in key indicators. Messi's scoring achievements may be big enough to skew an age related plot based on cumulative totals, but for him to continue on an upward curve, he, as well as every other player in a sample need to try to better their previous year's performances.

Whilst a player can generally continue to improve a rate statistic, such as goals per minute played, he can be considered to be still climbing towards his statistical, age based peak. Once that indicator begins to consistently fall compared to previous achievements, then he can be considered to be in decline. Although this shouldn't make his departure from his league of choice imminent. A declining Messi may well be superior to most other players in La Liga. Inevitable decline should be balanced against how high the player has set the bar by his earlier performances.

However, this approach also flounders because fluctuating player statistics often have various likely causes. A player may see his goal per minute rate fluctuate for reasons other than just maturing or ageing. Individuals regularly move between clubs and different sides may employ different attacking formations, new, improved team mates may compete for goals.

High profile loanees, such Sturridge and Lukaku at respectively lower scoring Bolton and WBA scored a higher percentage of team goals than they can expect to score at their parent club. A higher percentage of a lesser total goals may not equate to a lower percentage of a higher scoring team total once they return to their parent club or in the case of Sturridge move on to a higher rated side. So the goal environment in which they play and the relative attacking abilities they are surrounded by may be expected to change individual scoring rates in addition to any expected ageing effect.

We can try to eliminate possible fluctuations to a player's goal scoring rates caused by a change of scenery by sticking with players players who remained at the same club for a period of five or more seasons. Dennis Bergkamp, who spent his mid twenties to his mid thirties at Arsenal and less prominent strikers, such as Deon Burton's five seasons at Derby.


















In the plot above I've taken seasonal rates for goals per minute of every striker who stuck with the same side for at least five consecutive seasons since 1990 and recorded the fall or rise in that statistics as each player went from one birthday to the next.

Improvement appears to stop, and decline in scoring rate for individual strikers begins to set in just before a player hits 25. So, on average for this group, (who distinguish themselves from the rest of the EPL's strike force, randomly through their reasonable loyalty to one club), goal rates improve until 23, plateau for a year before beginning to tail off as the player enters his mid to late twenties.

Similarly, from the line of best fit, the average decrease in scoring rate for a player going from 34 years old to 35 is around ten percent of his figure recorded in the season when he was 34. If he is/was an exceptional talent, his higher base level may keep him in the league. But often by now, only the very best can fight off younger challengers or imports from abroad, who may also be in decline, but from a higher perceived level.

By comparing pairs of performance years, controlled for employer, the effect of a player carrying raw, extreme talent (even if it is declining) into less populated age brackets is avoided.

The trends though are general. Some players may prolong their scoring streak, but the league trend is often a powerful influence, eventually.

Ageing effects undoubtedly exist, but we may require the less turbulent setting of a one team clubman to be able to possibly identify evidence within the numbers. The ability to better project the likely scoring feats of players who are transferred into better or weaker teams, sometimes abroad, resulting in them playing contrasting tactical game plans is the logical first step before trying to seek evidence in the wider playing population.

Tuesday 16 July 2013

Gerhard Tremmel, An Above Average Keeper.

As the calendar meanders from the end of one season to the pre-season of another, the lull in on-field action allows for the handing out of awards to the statistical leaders from 2012/13. The extension of the usual percentage based statistics to include additional x,y coordinated data has widen the scope of the discussion as well as hopefully improving our grasp of where the talent may lie, but  simple shot and save percentage can still be a powerful tool.

The leader in save % for Premiership goalkeepers during 2012/13 was Everton's Jan Mucha, with an impressive 92.3% save percentage and languishing in last spot was Norwich's Lee Camp, who saved just 20% of the on target shots that he faced.

Of course the missing ingredients from this short list of the supposed best and worst of Premiership stoppers is the number of shots the two polar opposites faced. Munca faced 13 shots and saved 12, while Camp, depending on the source faced just five shots and conceded four times in three games, two of which were from the bench.

So the best and worst keeper, from a save % perspective were also keepers who faced relatively few shots on target. The average number of on target shots faced by a Premiership keeper in 2012/13 was a shade over 80. The raw table of save % admirably demonstrates the ability of very small samples to possibly produce extremes of good or bad percentages that may not be fully representative of the likely skill levels of an individual player.

The feast or famine of the small sample size is further demonstrated by the save % for Swansea's Gerhard Tremmel. He chases home Mucha with a save % of 81% from a below average number of shots in 2012/13, but had a Lee Camp like disaster the previous year when in a single performance at the Britannia he faced two Stoke shots and failed to stop either. In the absence of a numerical context, Tremmel has gone from last place to nearly first in just a season.

Because of the small sample problem, it is usual to impose a qualifying threshold for trials and ignore evidence below a certain number. Conceding two from two, even against a traditionally lethal Stoke, shouldn't be sufficient to condemn a keeper as the league's worst, but it is nevertheless information and we should try to use it to make an informed judgement about Tremmel's likely talent level.

Prior to his Swansea debut at Stoke, Tremmel had played in the Bundesliga, most notably at Energie Cottbus and least notably for Hertha reserves. Over 200 matches in an unforgiving position, therefore demonstrates a degree of competence, even if a single match in February of the 2011/12 season hardly allowed him chance to demonstrate that ability in the Premiership.

If we were ignorant of his career in Germany and Austria, his mere presence in a single Premiership game is sufficient  to mark him as a talented keeper. If, prior to his Swansea debut, we grade him as an average Premiership keeper, we are more likely to get close to an unknown keeper's later potential. Conceding goals to Crouch and Upson should only result in a minor downgrade, rather than overreaction to two events and demotion to the foot of the goalkeeping table.

In 2012/13 we have thirty times the amount of information on Tremmel, who played in 14 Premiership games, but is this sufficient to propel a journeyman keeper to the near top of the Premiership keeper standings ?  As with his 0% save percentage from 2011/12, we need to look at his 81% figure in the context of all the available information.

He is still at Swansea, therefore his Stoke performance wasn't considered typical of the levels Swansea think him capable of. So our slightly below average for a Premiership keeper conclusion is probably still valid. He faced around 60 shots in 2012/13, below the league average and well below a career figure, so we should entertain the idea that 81%, while closer to his true, likely save % than his duck egg from 2011/12 is still a figure that requires confidence limits. Only then can we be reasonable sure that the spread of our estimate of his ability encompasses what we may expect if he enjoys a long term run in the side.

In the absent of use shot location data and without detailed knowledge of Swansea's commitment to defence, 2012/13 should probably result in an updated opinion of Tremmel's ability that now places him as an above average keeper, but we don't yet have enough evidence to declare him even the best EPL keeper during last season.

Single season data, especially if the player isn't a regular is merely a sample of that player's average ability and one particular sample may fall well above or well below his typical performance levels. The limited evidence of Tremmel's 15 game Premiership career projects that his long term save% will most likely lie somewhere between 82% (which would be exceptional and unlikely if we included his Austrian/Bundesliga stats) and 71% (which would be more in keeping with his slightly above average standing as Swansea's number two keeper). A career path and current age when even keepers begin to decline, suggest he is a good, but not great keeper who is unlikely to be an upgrade on Michel Vorm, Swansea's current first choice.

Just as one bad day at Stoke doesn't make you a poor keeper, one good, possibly luck or shot location driven run in mid season doesn't make you the league's best.

Sunday 14 July 2013

Ageing Profiles in the Premiership.

Another old guest post that hasn't been linked to this blog. Here I take a look at some general ageing profiles in the Premiership and highlight the precocious performances of Lukaku and Stirling.

Friday 12 July 2013

Youth Verses Experience On The Field of Play.

It is surprising to recall that for a large proportion of it's history, football has been played without the benefits of substitutions. In an earlier era, when robust tackling was both the norm and almost always went unpunished, it was often the perpetrators of foul play that benefited from a numerical advantage following a game ending tackle. Nowadays, the sinned against side gets a large number of potential replacements to chose from and their opponents get a red or yellow card.

Initially, in the days well before squad numbers, 12 was universally the number of the sub. A player whose versatility was often prized above his raw technical talent because a side was allowed but a single roll of the dice. Tactical alterations, by necessity, took a back seat to the threat of enforced replacement because of injury. Mick Bates was a prime example at Leeds, for those who remember the 60's and 70's.

Nowadays substitutions are much more about tactical change or the virtually premeditated replacing of tired legs or brains with equally talented players from a squad system that has grown well beyond the constraints of a 1 to 11 numbering system for the starters and 12 for the single, track suited replacement. A lack of innovation that possibly denied Jimmy Greaves, suited and football booted involvement in a Wembley World Cup Final, has evolved from it's sticking plaster beginnings to become an essential, present day part of managerial match day planning.

All Change.
We can attempt to isolate the effect of substitutions on match result, although this is problematical. A substitution from one side can be countered or not by one from the opposition and comparisons of historical norms of performance will already have substitution effects hard baked into the baseline.

However, more fundamental information can be mined relatively easily, most notably if substitutions have any age related bias. I've previously looked at age effects on such easily measured attributes, such as goal keeping skill and age decayed playing time and scoring patterns, so I'll use a similar approach here.

As anyone knows who has seen her number lifted aloft by the fourth official, there are two sides to every substitution and the format of much of the data readily allows for evaluating age profiles for the replaced players rather than those who are subbed onto the pitch. The age profile of the starting 11 is readily obtainable. Therefore, we can know with accuracy the make up of the pool of players from which the manager is choosing his replacee. By contrast, profiling the replacement requires accurate knowledge of the make up of the entire and always partially unused bench and that is much less common information.

Footballers appear to reach a peak of performance in their late twenties and so it is not surprising to see that this age group has disproportionately more players active in Premiership sides. Extremes of youth and age, while present are much less numerous. Therefore, to try to add context to any age related substitution strategy common within the EPL, we need to know the relative proportions of each age bracket within a starting eleven across a reasonable sample of matches.

Once we move onto see how frequently a particular age of player is substituted, more problems arise.

Firstly, some substitutions are forced, rather than tactical.

Secondly, a player subbed for the final minute to receive a standing ovation for an excellent performance or in the case of Jamie Carragher in May, an outstanding career, is different to a replacement on the hour, possibly because of poor form or an anticipated improved contribution from the newly introduced colleague.

Thirdly, keepers, not only have slightly atypical aged related performance profiles, they are also very rarely replaced for reasons other than injury.

To combat these issues, we can eliminate keepers from the data, hope that injury replacement is largely dwarfed by tactical, manager inspired change and vanity subs can be allowed for by using the proportion of time absent from the field in the case of the subbed individual, rather than merely counting every subbed out player incident as being equal.

We are now in a position to try to see if some age groups are subbed out of a game proportionally more or less often than you may expect compared to their initial representation amongst the starting eleven.



Unfortunately, the plot is cluttered to try to maintain information, but essentially the paired red and green columns show the proportion of players of one age group who were present in the starting eleven and thus available to be subbed out of the game (red column) and the rate at which they actually suffered substitution, measured as minutes absent from play as a proportion of all playing minutes lost by replaced players (green column).

For example, 24 year old players comprised 6.8% of starters, but accounted for 8.6% of the time spent sitting on the bench as a replaced player. By contrast 29 year olds were 9.5% of the starting population, but suffered just 7.3% of the substitutions.

The salient point is depicted by the orange and black arrows. The black arrow shows the age groups that were replaced more frequently than would have occurred from a purely random selection process and the orange arrows show the groups that were allowed by the manager to stay on the pitch more frequently and possibly for longer on the occasions when they were replaced.

To cut through the detailed definitions, younger players tend to get subbed out of a match more frequently than older ones, once you account for the relative frequency of each age bracket.

These samples come from all EPL sides for a significant portion of 2012/13, if they survive the necessary assumptions and are repeated in larger samples they offer an insight into the choices being made on the bench during a game. The mangers appear to be making a conscious decision that balances youthful energy, but possible inexperience and lack of peak technical ability against possibly lower fitness levels, but much more experience and appear to come down in favour of removing youth (even if they are replacing it with more youth) and keeping with experience as the game enters it's last, tactically active final half hour.

Players that are still playing past the general ageing peak are a biased selection of the very best, so it is likely that their contribution in a sport of technical, cerebral and physical demands will overall outweigh that of younger, and not yet fully realised talent.

Alternatively, the plot may do no more than demonstrate a safety first attitude that placates senior pros and fans alike at the expense of a potentially more profitable and adventurous selection policy.

Wednesday 10 July 2013

Reina verses Mignolet. A Shot Analysis.

In this guest post, I take a look at when keepers in general appear to reach their peak and how his may have impacted on Liverpool's decision to splash out on Simon Mignolet to provide stiff competition for Pepe Reina. As a direct comparison between the two keepers, I then see how Mignolet may have dealt with the shots face by Reina in 2012/13 and then repeat the exercise for Reina and all attempts faced by Mignolet.

Tuesday 9 July 2013

Headers As A Valued Scoring Method.

About this time last year I wrote a piece in defence of the art of crossing the ball at a time when crossing was usually randomly juxtaposed with "aimless" and was considered the first and often also the last resort of the technically limited, be it Stoke in the Premiership or England on the international stage.

Appealing though the sentiment may be, aesthetics win no extra points. Crosses, despite their inefficiencies and possible reliance on more luck than perceived inevitability, remain an essential component of a balanced attack and unashamedly, a route to mid table respectability for those sides that are less able to attract sufficient numbers of technically skilled players to try to play Barcelona lite. There is also enough of a spread in crossing completion rates to entertain crossing as a repeatable talent.

Inevitably, headers, the most visible product of a cross have shared the opprobrium heaped on their precursor. So let's try to rehabilitate headers.

The first knock that headers have to try to repel is that they are less likely to result in a goal than a similarly positioned shot and this is easy to confirm. Self collected shot expectancy models are currently limited in their usefulness by an almost absence of defensive data, but they are sufficient to speculate as to the likely pecking order for goal attempt types. Head the ball from almost any position in and around the box and a similarly positioned attempt with the feet in open play will, on average prove to be a more rewarding proposition.




If we use shots generated from open play as the standard bearer for goal attempts and compare the goal expectancy for other types of attempts to this most usual product of a ground based attack, we can illustrate the relative potency of different scoring attempts. Headed goal attempts from either set pieces or open play spread inside and outside of the area, on average see their likelihood of scoring decreased to around a third of that for a shot from the exact same pitch position. But is this sufficient evidence within a proper context to dismiss headers completely ?

The standout goal attempt in the above plot is for a direct shot from a free kick, it is around twice as likely to see a goal scored compared to the cultural elite's weapon of choice, a shot from open play. ( A penalty kick, if included would be off the scale shown here). So under this initial cross examination, open play shots beat headers, but direct free kicks then beat open play goals, (and penalty kicks would trump everything).

But only in this context-less vacuum of a goal expectation model.

Arrow length denotes goal expectancy for each average attempt type.
Direct free kicks fail to confirm their early promise in the race to be the most productive method of scoring. Firstly, although a direct free kick from a central position on the edge of the box is almost twice as likely to result in a score than an identical shot from open play, that is the limits of a direct free kick's scoring ambitions. Because of their nature a direct free kick cannot venture into the penalty area.

A free kick struck, first time, from the average position for all direct attempts in my sample has about a one chance in twenty of producing a goal, much better than an identical effort from open play, where success rates fall to nearly one in 50. So despite the impressive initial plot, once we add context that includes real life experience, we realize that we are merely doubling our chances on a long shot. Reality bites even deeper if we then include the relative rarity of direct free kicks, they account for around 5% of a side's total goal attempts. Try as some teams might, it doesn't appear possible to build a strategy around creating large numbers of direct free kicks, outside of the box.

Frequency of attempt and average position from where those attempts originate in reality, along with the ability of teams to alter those parameters are the bare essentials to consider before we should be passing judgement on a method of attempted scoring. Shots generated from open play use their ability to be performed inside the penalty area to get closer to the goal, but not by very much. While headers, be they created in open play or from more attack friendly set plays, on average, threaten the no man's land for keepers between the penalty spot and the six yard line, open play shots barely creep past the edge of the box itself.

In short, headers appear to get you closer to the intended target, but defenders or anxious strikers are kept at arms length when teeing up a shot. The advantage of creating chances closest to the goal appears to lie with headed opportunities. Of the four different attempt categories, by virtue of the distance and width from the goal from where they originate, the average header, following a set play, has the best chance of providing a goal. Relatively distant and defensively harried, open play shots, based on average shot position are now the lest likely to provide a goal from their average point of execution.

Only frequency of attempts can now restore open play shots and elevate them back above headed attempts.

And it does. Open play shots account for around 60% of goal attempts, headers, from set or open play, just 20%. If we add frequency of attempt to average position, open play shots account for around half a goal a typical game, around twice the tally for the category of headers described here. So shots in this ballpark assessment, on average is a superior scoring method compared to headers, but the contribution for the latter remains significant and greatly outstrips that made by other goal attempts, such as direct free kicks.

Chill Guys. It's "only" going to be a header.
Headers, as has been mooted contribute to a diverse attacking strategy, their goal contribution is not insignificant and ignorance (through lack of detailed data) of the outcome of breakdowns in attempts to generate open play shooting opportunities may obscure an advantageous residual value for a failed cross. Failed crosses are there for all to see, but may result in possession being maintained or quickly regained, while failed attempts to create open play shots may go disguised and unrecorded as simple losses of the ball. We simply don't have enough detailed data to know with any authority.

Judging the effectiveness of headed goal attempts merely on the indisputable evidence that shooting is almost universally preferable to a header from the same spot on the pitch is to ignore the realities of the game. Headers are a starting point for the less technically adept to hone in the hope that continued, but limited success may attract better quality players, but it may also provide the wrinkle that prevents pass orientated sides from drifting into tactical stagnation.

Saturday 6 July 2013

Serving Strategies and Top Tennis Players.

A short follow up post to yesterday's look at how close a generic, top tennis player is getting to the crossover point, where two full bloodied serves, in theory should give you a better chance a winning a point on your own serve than the current, universally used strategy of one fast serve, followed, if necessary by a safer, slower one.

The stats required are available at such sites as http://www.atpworldtour.com although the figure for second serve accuracy isn't generally listed and has to be derived from other stats, such as the number of double faults.

I've taken full tournament figures for players that progressed to the later stages of various tournaments so far in 2013 and others of general interest. The smallish sample sizes should be enough to catch the general trend, but would need further data collection or heavy regressing for the actual figures to carry greater authority.

Likelihood of Winning A Service Point Using Either A Conventional or Aggressive Strategy.

Player/Surface. Fast/Slower Serve. Two Fast Serves. % Advantage For Conventional Strategy.
Murray/Australian. 0.674 0.667 1.0
Djokovic/Australian. 0.742 0.694 7.0
Federer/Australian. 0.708 0.696 1.7
Murray/Wimbledon. 0.717 0.701 2.2
Djokovic/Wimbledon. 0.745 0.722 3.2
Janowicz/Wimbledon. 0.731 0.707 3.4
Verdasco/Wimbledon. 0.710 0.684 3.7
Lisicki/Wimbledon. 0.611 0.602 1.5
Bartoli/Wimbledon. 0.627 0.617 1.5
Sharapova/French. 0.611 0.574 6.4
S Williams/French. 0.708 0.672 5.4
Nadal/French. 0.678 0.668 1.6

Every player I've looked at would have been better off using the conventional fast/slower service strategy, assuming that the stats they displayed in the various tournaments were repeatable and typical of their true abilities. Tennis players, it seems, know exactly what they are doing...at the start of the point at least.

Thursday 4 July 2013

Conventional Tennis Strategy Wins Out.

Aside from partnering John McEnroe to seven Grand Slam doubles titles, Peter Fleming was also noted for his uncompromising second serve in which accuracy was sacrificed for power. Although the tactic of using power on both first and second serves is open to anyone, it is seldom used apart from as an occasional surprise, or desperation tactic. The payoff between a reduced second serve success rate and an increased chance of winning the subsequent rally because of the greater power of the initial shot of that rally has yet to be tested in top class competition.

The changed aesthetics of the game that would result from a likely shorter, serve volley spectacle, especially on the faster grass courts of Wimbledon would probably not be welcomed by spectators, but we can use maths to see how close the current game has come to make the adoption of such a tactic potentially profitable.

To calculate the likelihood of winning a point under the two different serving scenarios, the traditional fast serve followed by a slower one and one using two full bloodied "first" serves, we need to know four things.
These are;

The Probability of a Fast Serve not being a Fault, (F(nf)).
The Probability of a Slower Serve not being a Fault, (S(nf)).
The Probability of winning the point following a successful Fast Serve, (F(wp))
The Probability of winning the point following a successful Slower Serve, (S(wp)).

These probabilities are easily pulled for the top players from the various tennis stats websites. For F(nf) the top ranked players hit around 64%, S(nf) is in the region of 90% and F(wp) and S(wp) are 75% and 56% respectively.

Now we need to map out the two routes for each scenario that lead to a player winning a point. For the traditional fast/slow routine she can serve a successful fast serve (with probability of 0.64) and win the subsequent point (with probability of 0.75). The probability of this occurring is 0.48

In addition she can serve a first serve fault (with probability (1-0.64)) followed by a successful slower second serve (with probability of 0.9) and then win that subsequent point (with a probability of 0.56). Overall, a probability of 0.18 for this route to a winning point for a combined likelihood for both routes of 0.66.

The adventurous use of two fast serves follows the same sequence as above except that the slower second serve is replaced by a fast, conventional first serve (with probability of 0.64) and an enhanced probability of winning the point of 0.75 instead of just 0.56. Overall this commitment to speed and brevity gives the generic, top class player a point winning probability of a marginally inferior 0.65.


Wimbledon can look forward to longish rallies for the moment.
Therefore, by happy accident rather than design the current court dimensions and tennis equipment leads to the conventional, more rally friendly serving strategy having a small, but cumulative and often repeated advantage over a power orientated alternative. Federer's dominance may be declining, but longer, second serve induced rallies, especially in the men's game and on the faster courts are still hanging on in there.

I've added some player specific figures from 2013 here.

Efficiency Stats and Team Talent .

It is relatively easy to obtain quite detailed team or individual statistics for most of the major European leagues, most notably the Premiership. Sources and interpretations do vary, therefore it is important that consistency of source is ensured before conclusions are made. The new data is at it's most useful when presented in the form of efficiency figures. The margins within the Premiership are often extremely tight and a team that can perform a task a couple of percentage points better than their rivals may have a decisive edge in fulfilling their aims.

Team style and tactics of course vary greatly between teams in the same division and so looking at how effective side A is when hitting the target with their goal attempts compared to team B is likely to be because of a clash of styles rather than a raw talent differential. Side A may merely engineer shooting or heading opportunities from closer to the goal or in markedly different proportions to side B. Therefore if we look at such figures across all teams, inevitably a difference will be apparent, but often the resolution will require detailed knowledge of the approach favoured by each side.

A much simpler method is to look at efficiency stats, such as the ability of one single side to hit the target, spread over multiple seasons to see if there is a significant, attributable change in this rate over time.

The rate at which a side performs tasks such as getting their shots to hit the desired target will of course fluctuate over multiple seasons through random variation, even if there is no change in the side's inherent talent levels. However, if the seasonal rates change to a larger extent than should be expected due to simply chance fluctuations we can begin to suspect that something or someone within the side has engineered this change. Looking for possibly significant changes in a largely stable team format over a limited number of seasons, should be possible.

I've used simple "success/fail" shooting data for the twelve ever present Premiership teams over the last five seasons, so the teams involved are as follows. Manchester City, United, Everton, Liverpool, Spurs, Arsenal, Chelsea, Wigan, Sunderland, Fulham, Villa and Stoke. All data looks at each side from an attacking perspective, because, unfortunately defensive shooting numbers are much less easy to come by or collect.

Attempts On Target.

Half of the sides appeared to show seasonal fluctuations that weren't wholly consistent with the usual random fluctuation in their ability to find the target with all of their goal attempts given their average success rate over the five seasons. The biggest likelihood that we are seeing a real effect came at Manchester United, where the change in accuracy was accompanied by a near relentless improvement from a likely true accuracy of just under 32% in 2008/09 to 37% in 2012/13. The first major improvement came in 2010/11, this was repeated in 2011/12 and consolidated at peak levels in 2012/13.

Sunderland appeared to show the next highest levels of fluctuation, recording five year highs in 2009/10 of 34.5%, but immediately following  up with lows of 28% the next season. Fulham showed and maintained distinct improvement in shooting accuracy from the start of 2011/12.


Teams Showing Change. (Most Likely to Least) Season High Accuracy Rate. Season Low Accuracy Rate.
Manchester United. 37.0% in 2012/13. 32.4% in 2009/10.
Sunderland. 34.6% in 2009/10. 28.6% in 2010/11.
Fulham. 35.2% in 2011/12. 30.2% in 2008/09.
Wigan. 32.9% in 2012/13. 29.1% in 2008/09.
Liverpool. 33.2% in 2010/11. 30.4% in 2009/10.
Spurs. 33.7% in 2012/13. 31.7% in 2010/11.

All of the remaining six sides exhibited shooting accuracy figures over the five seasons that did vary, but not by enough for random chance alone to be ruled out as the sole cause.

Converting On Target Attempts.

If we now move on to conversion rates for on target shots, remembering that we are not looking for the most or least efficient teams, but for a side that has shown greater than expected efficiency fluctuations over a five year period and it may be possible to attempt to isolate some of the potential causes for the observed change.

Only three teams showed sufficient variation to raise suspicion that on field actions or a different squad make up had lead to real, repeatable and controllable change over seasons and they were Liverpool, Chelsea and Manchester City. The best evidence from the raw conversion figures came at Anfield, but there was similarly strong output from the other two sides. The most significant conclusion, here is that the vast majority of sides failed to demonstrate an ability to increase or a failing to decrease their goal expectancy from the shots they did manage to get on target.

Teams Showing Change. (Most Likely to Least) Season High Conversion Rate. Season Low Conversion Rate.
Liverpool. 31.5% in 2008/09. 22.3% in 2011/12.
Manchester City. 35.3% in 2011/12. 29.6% in 2012/13.
Chelsea. 33.0% in 2009/10. 27.6% in 2008/09.

Having Attempts Blocked.

In the previous two categories higher percentage rates are desirable, but it is more ambiguous here. Seeing a shot blocked tends to be bad news overall, although if your opponent chooses to block a shot that is going wide at least the ball remains in play and available to all. The likelihood of an effort being blocked is often a function of shot type, shot distance and therefore, also game state. So there are more likely to be causes that may not be directly attributable to mere team approach in this case.

Seven of the sides appeared to be able to alter their opponents blocking ability outside of random variation over the five year period. Again there were no borderline cases, so in these seven cases the fluctuations may  have a cause if you are prepared to look hard enough. In the table below, again remember that a high rate of blocking is probably a bad thing for the team trying to score.

Teams Showing Change. (Most Likely to Least) Season High Attempt Blocked Rate. Season Low Attempt Blocked Rate.
Liverpool. 28.1% in 2009/10 and 2012/13. 22.8% in 2008/09.
Chelsea. 28.6% in 2008/09. 24.1% in 2011/12.
Manchester United.
29.7% in 2008/09.
23.5% in 2012/13.
Everton.
29.8% in 2010/11.
23.2% in 2008/09.
Manchester City. 29.2% in 2010/11. 24.1% in 2008/09.
Aston Villa.
27.0% in 2012/13.
23.1% in 2008/09.
Arsenal. 27.8% in 2012/13. 24.6% in 2010/11.

Converting Clear Cut Chances.

Villa took good advantage of clear chances in 2012/13, although this one got away.
We are down to just three years of completed seasonal data in this final category. However, the more restricted definition of a clear cut opportunity may make the smaller sample size similarly valuable because of the more consistent repeatability of each goalscoring "trial". There is very strong evidence that Manchester City's ability to convert so called clear chances has changed even over three season, along with much less convincing clues for Sunderland and Villa, with Arsenal and Liverpool sandwiched in between.


Teams Showing Change. (Most Likely to Least) Season High Clear Chance Conversion Rate. Season Low Clear Chance Conversion Rate.
Manchester City. 42.0% in 2011/12. 30.0% in 2012/13.
Liverpool. 42.3% in 2010/11. 31.5% in 2011/12.
Arsenal. 39.6% in 2012/13. 31.3% in 2010/11.
Aston Villa. 38.9% in 2012/13. 31.5% in 2010/11.
Sunderland. 35.9% in 2012/13. 29.5% in 2011/12.

This post is as much about the sides that don't appear in any of the above tables as it is about those that do. Stoke's shooting accuracy ranged from 25% in 2011/12 to 29.5% the previous year, but given their league low shooting attempts, combined with data from the other three seasons, those figures are perfectly consistent with being a product of their average rate over the five season period of 28%. You may be embarking upon a fool's errand if you try too hard to rationalize the drop from the Potters from 2010/11 to 2011/12.

This post has been about where and when to look and where not to look for substantial change in teams and it also highlights that an apparent drop in efficient output may be simply down to random chance with little change in team ability. A side my appear to falter in a recorded stat, but underlying talent remains if it is given more time to show itself.