Sunday, 28 July 2013

Predicting the Rare and Significant from the Incidental and Commonplace.

The transfer window remains open and the net is awash with statistical comparisons between present incumbents and perceived replacements, but with a few notable exceptions, much of this fevered speculation fails to address the problems of sample size in the data used. What you see, especially over a single season is what you got, not necessarily what you'll get. However, even when regression towards the mean is addressed, a much more fundamental issue often remains, namely does the statistic used actually bear any relationship to the ultimate, desired outcome. Do the numbers we are attaching to a player, correlate and hopefully have a causative link with winning ?

Simply because a number is recorded, it doesn't naturally follow that the stat is valuable in evaluating a player, especially in the absence of context, such as the playing style of the team where he has amassed those figures. For example, interceptions are probably a skill. But generally the more interceptions a team makes, the worse they do in the game.

More interceptions (leading to worse results) should hardly be a great selling point for a player. It would be much better to try to identify how situations develop that require an interception to be made and purchase a player whom might prevent those situations from arising in the first place. That might be the route more successful, but less intercepting prone teams have taken.

The great interceptor may look good on paper, but you may be purchasing the equivalent of a signing who is really good at picking the ball out of his own net. Once you require his particular skills, the team has already dug itself into a substantial hole.

So it makes more sense to concentrate on numbers that at least positively correlate to success, rather than throwing everything at the analysis. The choices of these potential killer stats are readily available. For a small monthly fee, a few sites provide reams of individual player stats, sorted on a game by game basis and free, but less exhaustive data can also easily be found. Added context, such as pitch location divided by thirds isn't ideal, but it does exist.

One fairly obscure recorded stat is possession won in the attacking third, essentially a turnover without the opposition threatening your goal. In the NFL this type of turnover is gold dust and a positive differential during a single game in this statistics makes victory extremely likely. Such powerful events are relatively rare in gridiron and so over a mere 16 game regular season, we are unlikely to be able to readily distinguish the lucky from the good.

This particular problem can be eased by looking for a particular skill that is likely to be connected to this rarer, but valuable skill, but occurs in much larger quantities. Poorer pass completion rates for quarterbacks correlate well with the undesirable "talent" of throwing a less common and drive ending, interception and on the defensive side of the ball, the ability to defend a pass is a good indicator of the rarer event of catching an interception. We'll return to this later.

The Premiership equivalent of winning possession deep in opponent's territory appears to share both the rarity and partly, the worth of the NFL counterpart. To take Everton over the previous two seasons as an example, the correlation between final 3rd turnover differential and a positive match day outcome is significant and strong, despite the relative rarity of the event.

Over the last two seasons, Everton has averaged 2.6 such events per game and allowed their opponents to gain an average of 1.5 turnovers in the Toffees defensive 3rd. Above I've plotted the line of best fit that relates match day final 3rd possession turnover differential to outcome. The red bar represents the likelihood that Everton won the game, whilst the green bar incorporates the draw as well.

Everton bridge the gap between Champions League regulars and the remainder of the Premiership, so they are good enough to overcome a final 3rd possession winning differential of minus 4 and still  be more likely than not to take something from the match. Once their opponents enjoyed a positive differential of around eight or more, the Toffees' chance of a win drops below 10%.

By contrast, the outcome of tussles for possession in the middle third are much more numerous, but this particular differential in possession turnover provides virtually no correlation with match result. Despite the added opportunities for the better teams to demonstrate their skill and drive a gap in the respective successes posted by themselves and their opponents, it is rarer events, slightly closer to the defended goal that appear to determine where the spoils are most likely to go.

Post game knowledge of the differential between each side's possession steals in the middle third gives you absolutely no extra information as to the likely winner of a match involving Everton since August 2011. Simply picking the generic win % is likely to be as close to predicting the outcome of a group of otherwise anonymous matches as trying to glean information from the middle 3rd possession differentials from each game.

So we have more numerous, but apparently non predictive, middle 3rd possession interchanges and rare, but indicative final third events. Therefore, recruiting a player with final 3rd capabilities would appear to be desirable because of the strong correlation of that stat to a positive match outcome. Unfortunately, few players reach double figures for successfully winning possession in the final 3rd, even over two seasons of regular playing time.

To solve this inconvenience in trying to accurately establish players of final 3rd potential, we need to briefly return to the NFL. Interceptions, primarily by cornerbacks and safeties are rare, but defended passes, whereby the ball is knocked away from the intended receiver, much less so. The skillsets needed to perform both tasks is very similar. Good hand, eye coordination and speed, coupled with athleticism. So it shouldn't be surprising that defended passes are strongly correlated to full bloodied interceptions.

And the same is true of a soccer player winning the ball in the middle and final third. The skills required are virtually identical and the two stats are extremely strongly correlated with each other. Therefore, instead of trying to work with extremely small, often single figure sample sizes from the final 3rd, we can use the strongly related, middle third possession figures as a proxy for the trait that appears to drive match results, but turns up barely twice a match for a team of up to 13 outfielders.

How Winning Final 3rd Possession Relates to Winning Middle 3rd Possession, Everton 2011-13.

Playing Position. Number of Successful Final 3rd Events per Middle 3rd Events.
Defenders. 5 per 100.
Midfielders. 10 per 100.
Strikers. 10 per 30.

Now instead of drawing information from events that in the case of Everton occur only five times every two matches, we are in a position to take a player's seasonal statistics from correlated events that happen over 20 times a match for each team. Winning possession in the middle 3rd may not correlate to winning, but it is tied intimately to the ability to win possession in the final 3rd..... and that stat does correlate to match success.

Choosing between two players with similar playing time, whom have both regained possession just three times in the final third, becomes slightly easier if we also know that one has also recorded 60 similar actions in the middle third compared to just 20 for the second player. The 60/20 middle third split gives the former a potential edge over the latter that is absent in the tied single figure sample of the stat that has really attracted our scouting interest.

Whether or not Everton's final 3rd possession winning stats are an integral part of just their match winning makeup, a spurious correlation or a strong Premiership wide theme, isn't the most important part of this post. Sometimes if you need to estimate a rare, yet valuable occurrence, it is better to look elsewhere at a more mundane, apparently unimportant, but highly correlated and much more commonplace alternative. That way you don't have to wait ten years until a potential transfer target bulks out his cv with rare and valuable actions.  


  1. Very interesting post. I never thought about using a proxy which doesn't correlate to winning, but it makes a lot of sense.

  2. Yeah, very interesting!
    Would love to see some player statistics on the proxy metric.

  3. Hi anon,
    I've got a post written for that looks at player passes as a better indicator of chances created. It should be up soon. I'm also putting together one for interceptions both by individuals and teams in the NFL.