Wednesday, 30 January 2013

Do Rugby Kickers Have A Best Side ?

Everyone agrees on the greatest rugby union try of all time. It was scored in a losing cause, by a scratch side, contained at least one forward pass and began with " Phil Bennett covering..." and ended with "This is Gareth Edwards. A dramatic start. What a score!"

It really should have been worth more than four points.

There's less of a market for the best drop goal of all time, but with all due respect to Joel Stransky in 1995, few would look beyond Jonny Wilkinson's 2003 World Cup winning punt. As half of the Australian side loitered just on the wrong side of legality and the two hemispheres held their collective breath, Matt Dawson decided against another sniping run into the heart of the opposition defence and with just 20 seconds of extra time remaining, slung a pass back to the waiting Wilko. Positioned mid way between the centre spot and the left touchline, Wilkinson calmly curled the ball between the uprights....with his right foot.

Sports players invariably have a preferred foot when kicking a ball. From personal experience, everyone will recognise the awkward mechanics when kicking a ball, especially a dead ball, with their less favoured foot. In his final season at Arsenal, Fabregas attempted just eight shots at goal with his left foot compared to 60 with his right, a clear preference and in the same season, Van Persie used his head for scoring attempts more often than he took aim with his right boot.

So you would expect that Wilkinson would have trusted the most important kick of his England career to his natural, strongest foot. Except he didn't. While he has no doubt practiced extensively to become a two footed player (in the best sense of the phrase), he is a natural left footed striker of the ball and it is very difficult to find an image on Google where he isn't kicking with his left foot.

The reason he chose to kick the ball right footed may lie in the field position. He was probably about 20 metres in from the left hand touchline, therefore if he had used his more natural left foot, the angle of the attempt, combined with the natural expected hook on the ball would have probably seen the ball tend towards the right hand upright and potential disappointment. By using his right boot, providing the kick started inside the right hand upright, the natural trajectory of the flight of the ball would take the attempt towards the posts rather than away.

James Hook warms up with a right footed kick from the left touchline.
It was even rumoured that Sir Clive Woodward, during his brief stint at Southampton football club had encouraged penalty takers to aim for the post and allow curl of the shot to place the ball just inside the upright. Maybe a similar process was at work in the England 2003 rugby World Cup squad ?

The wider implications for Wilkinson's choice of foot is that kicking against the grain of the ball's natural in flight tendency should make right footed conversions progressively more difficult as the kick gets closer to the right hand touchline compared to the same kick attempted by a left footed player. Using conversion and penalty kicking data courtesy of Opta, I've re run the regression in this post that looked at the expected success rates of kicker by field position of the attempt. I've included another term to differentiate between attempts when the natural flight of the ball is towards the posts (right footed kickers from the left touchline and left footed kickers from the right touchline) and when it is across the face of the target (right footed from the right and left footed from the left).

Distance to Posts and from Touchline.(metres) 46 and 1. 46 and 5. 46 and 10.
Success Rate from Faourable Side. 44.4% 51.5% 60.0%
Success Rate from Unfavourable Side. 39.4% 46.4% 55.0%

The idea that there is a favourable side to kick from appears to be supported by the results of the regression. In the table above, I've used a distance of 46 metres to the posts as a yardstick and the likely success rates have been calculated from over 2000 kicks taken from 10 metres or closer to the touchline. Kickers foot preference was taken from Youtube or still images from Google.

Kicking from the preferred side from virtually the touchline increased the success rate by nearly 13% compared to potentially kicking across the face of the posts. A typical kicker would land around 39 kicks per hundred attempts from his unfavoured side, but that figure would increase to 44 if he swapped touchlines. Similar splits were recorded as the ball was moved further infield, although the improvement declined as the kick moved closer to the centre of the goals. At 10 metres from the touchline the difference between conversion rates had fallen to 9%. The chances that the difference between sides was merely down to random chance was around 1 in 50, so it is likely that we are seeing a real effect and the effect persists for all distances of kicks from near to the sideline.

As we move nearer to the centre of the pitch, statistical significance for the favoured kicking term is lost. Kicks a metre either side of a perpendicular line through the halfway line, in other words nearly dead centre to the posts, show a much reduced advantage for kicks attempted from the favoured side. However, there is a nearly 70% chance that this difference has arisen just through chance.

As far as I can tell, Wilkinson despite his right footed drop goal, always converts kicks with his favoured left. Again possibly a throwback to a static ball being harder to tame than a moving one, but it is possible that he was also playing the angled percentages when he connected with that ball over a decade ago. In short, the numbers appear to support the theory of a favoured kicking side. However, all of the theory goes out of the window, if you are as good as Dan Carter and you can do this!

Crunching The Numbers For Super Bowl 47.

Super Bowl XLVII takes place on Sunday evening or possibly Monday morning for some, so this will be the last NFL post for over eight months. For this guest post I've broken down the passing and rushing efficiency stats for each side and then combined them to produce an expected margin of victory based on the actual outcome of hundreds of similar matchups spread over the previous decade.

The game looks like producing a competitive contest, to find out who I think will win by just over a field goal, follow this link

Monday, 28 January 2013

What Chance Bradford City ?

Prior to last season, the average, end of season league position for Swansea since the 2000/01 season was 56th in the league pyramid. The comparable figure for Bradford, their Capitol One Cup opponents later next month was 58th. Despite this near identical average record, over the ten seasons or so since 2000, the respective fortunes of both sides have taken very different courses.

The Yorkshire side began the decade by quietly slipping out of the top flight, 16 points short of safety, after just two seasons of Premiership football. There then followed a descent into the lower leagues, punctuated by two spells in administration and culminating in relegation to the bottom flight at the end of the 2006/07 season.

Swansea have taken a mirror image trip in the opposite direction, climbing from the lower reaches of the football league, to briefly share playing time with Bradford in the League One. Most closely during the 2005/06 season, when they finished, respectively sixth and eleventh, before pushing onto Premiership promotion for the 2011/12 season.

                                     The Respective Rise & Fall of Swansea and Bradford.

If you want to check the league position of other sides over the decade check out this interactive tableau graph.

The League Cup has often provided the lesser clubs a route to silverware and European participation. Cardiff represented Wales last season, this time as underdogs to Kenny Dalglish's eventual winners, Liverpool and prior to that Birmingham denied Arsenal a trophy in the Blues' relegation season. However, this year provides one of the biggest mismatches for English domestic football's first trophy decider of the season.

The records of inter divisional games in such competitions as the FA and League Cups, as well as the minor lower league knockout competitions has provided a large amount of matches involving teams from different leagues.We can therefore, use the scorelines from this interlocking format to produce a numerical comparison of the relative qualities of side from vastly different positions from within the league pyramid.

But an alternative method can be used involving calculating how much the top of one league overlaps with the bottom of the next highest league. When teams are promoted or relegated between divisions we can compare their season long records in the two divisions to again quantify the difference in quality as we move either up or down the pyramid. For example, teams who have dropped out of the Premiership have, on average seen their goal difference improve by around a goal per game in the less challenging Championship.

It is therefore possible to related Swansea's currently goal difference per game to that of a typical relegated side, which in turn can be equated to an average goal difference when they revert to being a Championship side. The process can be continued down through the divisions until we reach League Two, where Bradford City currently reside. So instead of viewing each league as a separate entity, comprising 92 teams in total, we can look at the four divisions as 92 teams where there is varying amounts of overlap. Therefore, although there are currently 70 places between Swansea and Bradford, the effective gap from a quality standpoint is less because, for example teams at the bottom of the Premiership will occupy a similar standing as some of the teams at the head of the Championship.

Our final step to quantifying the actual difference in quality between Swansea and Bradford is to first determine the effective number of places between the sides, having accounted for league overlap and then use historical results to find the average difference in class in goals that a one place change in position in the league hierarchy represents.

The average difference between sides next to each other in the lower half of the Premiership and the rest of the Football league is around 0.05 of a goal and the number of places between Swansea and Bradford, once doubly occupied positions are allowed for is only half the actual difference in numerical order. Combining these figures, we find that the current difference between the two sides is 1.73 of a goal.

Swansea, well to the front to taste Cup glory in 2013.

Goals are an extremely useful measure of the relative abilities of teams, but on a single match basis, win, draw and loss probabilities convey more information. The Poisson distribution, where the likelihood of an event occurring, given that we know the average rate at which that event is expected to happen, can be used to model individual correct scores and ultimate the probability of match outcomes.

The limitations of the Poisson approach is well known, particularly the under estimation of the draw, but typically a side with an expected advantage in the region of 1.73 goals would win such a game 74% of the time, draw 18% and lose 8%.

All League Cup ties are currently decided on the day or night, first after 90 minutes, then over 30 extra minutes, if required and finally by penalties. So the 8% chance given to Bradford over normal time only accounts for one of the three routes by which they may lift the trophy at Wembley in a month's time.

To take the game into extra time, Bradford need to draw over ninety minutes, a 18% chance based on interlocking league form and applying a Poisson to the expected goal difference. Once the game reaches extra time, Swansea's pregame advantage will be a fraction of their initial supremacy, due to the now truncated nature of the contest, but they will still be the most likely winners by a considerable margin. The Swans would be about a coin toss to prevail in the extra 30 minutes, a draw would be about a 39% chance and Bradford would have a similar chance of winning as they had in the original 90 minute stretch.

The third possible route to victory for either side involves drawing at both full time and extra time and then winning a shootout, for which I've allowed each side a 50% chance of being successful, despite Bradford's apparent prowess from 12 yards.

Chances of Each Team Winning the League Cup By Every Available Route.

TEAM. Win in 90 Minutes. Win in ET. Win on Penalties.
Swansea City. 74% 9.4% 3.5%
Bradford City. 8% 1.6% 3.5%

Overall, a magnificent feat by Bradford City to reach the first showcase final of the season, but their 13% overall chance means they are still long outsiders to parade the cup in February.

Friday, 25 January 2013

Rating Rugby Union Kickers By Kick Difficulty

Rugby Union, much like football is a fluid, fast moving sport, which requires a large amount of context related data to adequately describe on field events. However, rugby also has a reasonably large array of discrete events, such as kicks at goal, where the object of the play is well defined. The quality of shooting chances in football can be calculated with reasonably clarity by using such information as the x and y field coordinates of the event and the same can be done using a logistic regression approach for penalty and conversion attempts in rugby.

 OptaPro have kindly supplied kicking data from a wide range of rugby competitions from 2010 through to 2012, including English domestic league matches, European and domestic cup games, Internationals and Six Nations matches as well as domestic and Tri Nations Southern Hemisphere contests. Some of the data also included age restricted matches from the under 20’s Six Nations and the IRB under 20’s World Championship.

A model for the expected success rate for kicks at goal can be constructed using knowledge of the position, distance and angle from where the attempt was taken, combined with the actual outcome. We can then use this scoring expectation to not only rate the records of individual kickers more accurately by accounting for the relative difficulty of the kicks they attempted, but also chart the career development of young players. We can also begin to investigate the drop off in kicking effectiveness, both generally and individually with increasing distance and this opens up the development of in game strategies which may decide whether a side decides to kick for goal or kicks for the corner.

Common sense tells us that increasing distance from goal and increasingly widening of the kicking angle will make a kick less likely to be successful. For example, a kick, twenty metres out and in front of the posts, based on the data from 2010-2012 will be successfully kicked about 95% of the time. However, move the ball to the touchline, so that the ball is still twenty metres from the tryline and the kick becomes much more difficult. A combination of the angle and the increased actual distance to the posts now makes the kick just a 60% chance to be converted by an average kicker.

 I’ve initially included all players in the sample in setting a baseline for kicking ability, so the group includes promising youngsters, such as Owen Farrell, who played in the IRB under 20’s World Championship, as well as for the full England team during the course of the data and also includes household names in the later stages of their careers, such as Jonny Wilkinson and Andy Goode.

The analysis compares the expected success rates for kicking attempts, comprising both penalties and conversions, based on the field position for each kick. I’ve used an arbitrary cut off point of a minimum of 75 attempts to qualify for inclusion in the final list, because small sample sizes can often produce extreme outliers which can make players appear either much better or much worse than they actually are. Most Southern Hemisphere players are omitted from this initial analysis.

Wales’ Leigh Halfpenny tops this fleeting glimpse at recent kicking ability. The data omitted the Celtic league clashes, so his figures originate from International, World Cup and European games and his number of attempts just exceeded 100. Judged on kicking field position, an average expected return was 71 successes from 107 kicks at goal and Wales’ full back and now fulltime kicker made 88, almost 25% higher than an average player from the dataset would expect to achieve.

Expected Successes.
Actual Successes.
% above Expected.
Leigh Halfpenny.
Charlie Hodgson.
Tom Homer.
Daniel Carter.
Dan Biggar.
Greig Laidlaw.
Ian Humphreys.
Dimitri Yachvili.
Ronan O'Gara.
Jonny Wilkinson.
Jonathan Sexton.
Andy Goode.
Stephen Jones.
Jimmy Gopperth.

Halfpenny originally took just the long range kicks for club and country and his total attempts have a much higher proportion of kicks from extreme distance than most others on the list. Over 16% of his kick attempts traveled 50 metres or more in the air and if we further break down his actual success rate compared to expectation, we see that he truly excels at these longer kicks. An average player’s expected success rate for Halfpenny’s kicks of 40 metres or more would have been 24 successes from 49 attempts and Wales’ full back actually made an astonishing 38. Overall Halfpenny is a slightly above average kicker at distances short of 40 metres, but exceptional beyond that. The quality and range of his kicks combine to make him the top kicker.

PlayerAttempts.Expected SuccessesActual Successes.% Above Expectation.
Shorter than 40 metres.
Ronan O'Gara.123100110+10
Jonathan Sexton.170137140+2
Longer than 40 metres,
Ronan O'Gara.3016160
Jonathan Sexton.633342+27

The inter play between a kicker’s range and his accuracy can also be demonstrated by comparing two current Irish internationals. O’Gara is nearing the end of his international career, while Sexton is just beginning his. The numbers from the table above can help to pinpoint their respective kicking strengths and weaknesses over the last couple of seasons.

Proportionally Sexton attempts more kicks beyond 40 metres than does O’Gara and in addition, the Leinster man is much more proficient at these longer attempts. In short, O’Gara, at this stage of his career is just an average long range kicker, while Sexton is very good. However, the reverse is true at distances shorter than 40 metres, although the gap in performance is narrower than at longer distances. Both are above the group average, but this time it is the Munster man who comes out on top.

Moving down to those who underperformed in the limited dataset, we find a mixture of seasoned kickers and young players.  Priestland’s presence possibly shows how Halfpenny became Wales’ fulltime kicker, rather than simply their long range specialist. The Scarlets player failed to match the average group expectation at all distances, although the figures are likely to be less extreme in a larger sample.

Jonny Wilkinson, still a top kicker.
The presence of younger players in this under achieving selection should be expected. Age appears to be a significant factor in a kicker’s development. From the data presented, a typical under 20 player would be expected to hit a 54 metre conversion 27% of the time from in front of the posts, compared to 38% for a typical Aviva Premiership kicker. The size of the talent gap begins to narrow for central kicks as we move closer to the posts and from 20 metres out the gap is down 95% for seniors compared to 91% for younger players. Tentatively we may conclude that the difference is at least partly down to less leg strength, rather than lack of talent for the younger group.

Player.Attempts.Expected Successes.Actual Successes.% Below Expectation.
Rhys Priestland.836050-16
George Ford.158113101-11
Matthew Morgan.846559-9
Tom Heathcote.1188880-9
Nicky Robinson.298214196-9
Sam Vesty.755853-8
Jeremy Staunton.785956-6
Billy Twelvetrees.1329591-4
Ignacio Mieres.188138133-4
Ryan Davis.1057776-2
Ryan Lamb.269194192-1
Alex Goode.1007574-1
Olly Barkley.2151541540
Toby Flood.3692752771

This analysis illustrates how simple numbers can be used to enhance visual and anecdotal evidence. Kicker development can be evaluated more easily, along with strengths and weaknesses which can then be used to target improved performance.

 Perceived wisdom can also be tested. It is commonly supposed that right footed kickers are better able to convert touchline kicks from the left than from the right because the natural trajectory of the strike is towards the posts in the former case. Using this data there is a small, but statistically significant difference between conversion rates from different sides of the pitch amounting to about a 2% improvement when a kicker kicks from his more “natural side”.

 We can also begin to investigate whether Southern Hemisphere kickers have a better record than a similar group of northern hemisphere counterparts, either through better skills or the advantages of occasionally kicking at altitude. Super 15 kickers did record slightly better records than their Aviva counterparts, but not by a significant amount. The ultimate aim of such studies is to provide more informed decision making during matches. Players and teams are obviously making such decisions already, such as declining to kick for points if the kicker feels the distance is beyond his boot strength, but Opta are making quantify such decisions achievable.

If a coach had accurate information on his kicker’s likely success from a certain point on the field, it is a simple process to convert this to a point expectation for that kick. For example a penalty with a 50% chance of success has a point expectation of 1.5 points. If instead the decision was made to kick to touch for a lineout near to the opponent’s try line, this new phase of play would have to yield a try around one time in ten attempts, along with the chance of an easier penalty kick to make the new decision equivalent in terms of the average expected points of the kick attempt. Such decision making processes, where the risk and reward of different decisions are evaluated and balanced are fast becoming an accepted way of coaching in similar sports such as the NFL.

 Most importantly, from a spectator viewpoint, these numbers can settle arguments and by more accurately rating kickers by kick difficulty we can confirm that Jonny Wilkinson remains a kicker of the highest quality.

For new visitors to this post, especially from Australia, I've done a follow up post on the kicking in the First Lions test here

Sunday, 20 January 2013

Logistic Regression and Predicting Sporting Outcomes.

There is often a subtle difference between why a team wins a particular match and why they might win a match in the future. Sometimes teams win games because of hugely significant and obviously important events, such as red cards for their opponents or penalty kicks awarded. However, these events tend to occur rarely and predicting when they may feature in the future cannot be done with any great confidence.

So to predict the likely outcome of a future sporting event, we need to use more common events, which correlate well with both themselves and match outcome. We are looking at goals or shots if you prefer in football and yardage stats in the NFL. In this guest post I look at how we can use logistic regression to predict game outcomes in the NFL.

Follow this link to read more.

For a similar approach for soccer, check out Zach Slaton's post at A Beautiful Numbers Game.

Wednesday, 16 January 2013

...And Then There Were Four. The NFL Championship Games.

Firstly, I'd like to reassure any readers whom prefer their brand of football to come without points, that this blog is not turning by stealth into one based around American Football. However from 2001 to 2007 I posted almost exclusively about gridiron, so I couldn't resist the temptation to follow this years playoffs for just one more post. Or possibly two.

Sunday will see the NFC Championship game decided when Atlanta host San Francisco and three and a half hours later, New England entertain Baltimore for the AFC crown in a match that will finish around 3 am Monday, UK time.

As in soccer, the simplest metrics are often also the most powerful and the ability to efficiently move the ball, either on the ground or through the air in the NFL acts as an excellent proxy for team ability, especially if the unbalanced schedule is allowed for.

How one defense matches up with the opposing offense also has a big say in the long term outcome of NFL games. A potent passing attack can become merely pedestrian in the face of an excellent passing defense, whilst a run of the mill thrower can have a career day if faced by a poor secondary. Raw ability matters in the NFL, but it is the matchups that go some way to deciding the likely result and if you compile enough pregame matchups and compare them to actual results you can produce a passable predictive model.

Nate Silver was brave enough to put his predictive credentials gained in the field of politics and to a lesser degree baseball on the line on TV this weekend by plumping for a Seattle/New England Superbowl on February 3rd. That prediction is no longer an option following Seattle's demise in Atlanta and predictably Silver's reputation as a soothsayer is seen as tarnished in some quarters because his prediction was wrong. However, all any predictive model can do is produce estimates of how likely an event is to happen. An event may be predicted to happen 60% of the time, but that also means it won't happen 40% of the time. 40% is the minority event, but it is still going to happen fairly regularly.

A couple of statistically based models made Seattle narrow favourites in Atlanta, (I made them 52% favourites), but the numbers were close enough to call the game a coin toss. So if we called heads and saw a tail, we wouldn't be too surprised. Disappointed maybe, but not surprised. That's pretty much what occurred in Atlanta, a single trial on a coin flipped failed to go that extra half revolution in the last 25 seconds. A failed prediction on the day, but only time and numerous repeats of similar predictions can validate the model from which the single prediction originated.

Silver himself tweeted a version of Billy Beane's famous quote from Moneyball, "my sh*t doesn't work in the playoffs" at the end of the regular season. The truth is that no one's sh*t works that well in small sample sizes, which is what the playoffs are. Ten games and a (usually) neutral venue, inter conference showdown is all there is and random luck and a sudden death format is going to have a huge influence on individual games and the final destination of the Vince Lombardi Trophy.

I've looked back at every playoff prediction made by my corrected yardage based model from 2001 to 2007, excluding the actual Superbowl games which were all played on a neutral field. Some of the individual results make the model appear absurd. For example Pittsburgh to beat New England in the AFC Championship games of 2001 and 2004, both with Steeler chances of around 70% and the Patriots won both games. Baltimore (66%) to beat Tennessee in the 2003 AFC wildcard game, the Titans won and Dallas (74%) to see off the Giants in the 2007 NFC Divisional game, Manning junior won.

Individual games aren't a good advert for predictive models because they exaggerate their effectiveness if they do well over a run of half a dozen games or they appear useless if they continually pick favourites in the short term which then go on to lose. However, the same model that appeared to overrate the Steelers and discount the Pats also predicted 43 home wins from the 70 playoff games from 2001-2007 and the actual number was 44. The more times you run a predictive model, the more information you gather as to the effectiveness of that model. Predicting the result of an election is much easier than predicting the result of one NFL game, or even ten.

With that thinly disguised excuse over, let's move on to each game.

NFC Championship Game.

Team. Offensive Run Efficiency. Offensive Pass Efficiency. Defensive Run Efficiency. Defensive Pass Efficiency.
San Francisco @ 1.15 1.15 0.86 0.86
Atlanta. 0.86 1.05 1.12 1.04
Win % for Atlanta 33% Point Spread. +5

We have the same NFC west superiority on show here as last week. Offensive numbers in excess of 1.0 indicate that the team is superior to an average offense once the strength of opponents has been accounted for. The 49ers have faced some fairly poor run defenses, which have allowed 4.4 yards per carry, but the west coast side's 5.1 yards per carry average indicates that they are still well above average. They show a similar level of quality in their passing game.

The matchup with Atlanta's defense doesn't improve matters for the hosts. On this side of the ball numbers above 1.0 indicate that a defense is below average. Atlanta have allowed teams which averaged 4.3 yards per carry in their combined 250+ games to rush for 4.8 yards per carry when faced with the Falcon's defense. The Falcon's passing defense, while an improvement is also the wrong side of par.

...and a 67% Chance of a Superbowl Appearance in 2012.

After combining the paired statistics, the numbers make ugly reading for the hosts. They look likely to struggle moving the ball on offense, especially on the ground and face a San Francisco side which should be able to move the ball easily, particularly on the ground. Similar matchups from the past indicate that the visitors should win such a game about 67 times out of 100, by an average of 5 points. A bag, two red balls and one slightly off coloured red ball should suffice for a crude match simulation.

AFC Championship Game.

Team. Offensive Run Efficiency. Offensive Pass Efficiency. Defensive Run Efficiency. Defensive Pass Efficiency.
Baltimore @ 1.04 1.00 0.96 0.97
New England. 0.97 1.17 0.93 1.16
Win % for the Pats. 60% Point Spread. -3.5

Nothing much has changed about the Pats since they upset the Rams in 2001. They are rarely above average at running the ball, care very little about defensive vulnerability, even through the air and dare you to keep up with Tom Brady. Just as the Falcons of recent times have out played their stats by maintaining a, usually regression prone, high third down conversion rate (2nd best, 6th and 3rd since 2010), the Pats are also better than their raw stats. Outstandingly talented quarterbacks are often the reason, Manning performed the same trick for years at Indianapolis by continually taking a poor defense and limited running game deep into the post season, cementing each win by continuing to pass the football. It is no surprise to see New England favoured by just over a field goal, when experience tells you the margin should probably be at least twice that.

At quarterback for Baltimore, Flacco has spent his career trying to make the step up to elite, but remains in limbo just off top class. He's been average this season. Defensively Baltimore are above average, but someway off the frighteningly good unit of recent times, age is finally taking a toll.

Matchups indicate that both sides will be above average through the air, but struggle on the ground. Home field accounts for much of the advantage predicted by the model for the Patriots, but Belichick puts the ball in Brady's hands often and opponents find themselves having to deal with a greater than usual aerial assault. Baltimore will probably have to throw more and run less than they are accustomed just to keep up and the Patriots half of Silver's Superbowl prediction would appear to be a reasonable expectation come Monday morning in the UK.

Monday, 14 January 2013

Grabbing Victory/Defeat from The Jaws of Defeat/Victory.

Sporting contests have to get the scoring system finely balanced between providing ample scoring events to keep the neutral interested, while also keeping games close enough to produce the kind of comebacks that were seen this weekend in both the EPL and the NFL. Soccer takes the route where scoring is difficult enough that a goal is an event in itself and teams are rarely able to race away on the scoreboard and the NFL opts for a more frequently, slightly easier tiered scoring method where caution whilst leading can be punished and adventure whilst trailing well rewarded.

The first unlikely comeback of the weekend came at Reading, where the home side trailed by two goals to a WBA side currently drifting slowly back to earth after a hot start to the season. Just two wins in the last ten matches has reinforced the belief that the Baggies aren't quite ready to challenge for Champions League football, but they were still far enough ahead of Reading to be marginal pregame favourites.

0-1, Lukaku, 19'
0-2, Lukaku, 69'
1-2, Kebe, 82'
2-2, Fondre (pen), 88'
3-2, Pogrebnyak, 90'

Two goal leads are usually well defended (see here and 5addedminutes ) and with the clock at 81 minutes, a come from behind win was an extremely remote possibility. Late goals, scored in a losing position quickly shed value as the clock ticks towards fulltime and Reading's 1.5% chance of winning when they pulled a goal back at 1-2 in the 82nd minute had fallen back to below 1% just prior to their equaliser.

From Zero to 92% in Nine Minutes. Reading's Chances Just Prior to Each Goal.

Scoreline. Chance of a Reading Win. Chance of a Draw.
0-2, 81 mins. 0.1% 2%
1-2, 87 mins. 0.7% 10%
2-2, 89 mins. 9% 82%
After Third Goal, 3-2, 90 mins. 92% 8%

Dramatic wins appear even more so if they occur in a high profile match and although the three points may prove vital to Reading's survival chances come May, the impact of Saturday's result was partly diluted by its appearance on a mundane Premiership weekend.

Equally dramatic, but of a much higher profile was Seattle's late comeback, followed immediately by an even later collapse in the Divisional round of the NFC playoffs in Atlanta. In addition to playing last week at home, while Atlanta rested, West coast Seattle had traveled from the Pacific to the Eastern time zone, leaving their body clocks feasting on breakfast as Atlanta built up a 20-0 halftime lead.

As suggested here Settle's above average passing and running offense gave them potentially a very good matchup against Atlanta's below average defense and belatedly the Seahawks became competitive on both sides of the football. The NFC west side's chances of winning their contest never fell to the levels seen by Reading, but their lowest in game position came just before the end of the 3rd quarter as Atlanta took at 27-7 lead following Ryan's pass to Snelling.

Seattle's In Game Highs and Lows.

Scoreline. Time Remaining. Chance of a Seattle Win.
7-27, 1st & 10 (Seattle) at own 20. 2 min 11 secs, 3rd Q 2%
28-27, 1st & 10 (Atlanta) at own 28. 25 secs, 4th Q 93%

Reading's late winner was seen as almost inevitable by some, but their chances at 2-2 weren't much better than 10%. Many felt Seattle were equally destined to convert their last 2 minute drive into a game leading score and it was Wilson's sack escaping toss to Lynch for a 24 yard gain to the Atlanta 3 with 44 seconds left that finally, if fleetingly made Seattle favourites to progress to the NFC Championship game.

86% quickly became 93% as Lynch completed the drive by the slimmest of possible margins. Unfortunately, 40 yards in 25 seconds, even against a defense which statistically has your number is perfectly possible against a soft prevent strategy and two passes and one field goal later, the comeback had died.

Seattle will look to those two completed Ryan passes when time and Seattle's defense should have been daring Ryan to beat them instead of sitting back and hoping for a mistake, but two late second quarter visits to the Atlanta 11 were equally decisive. Even at third and long with little time left or fourth and 1, a team of Seattle's quality should be averaging between two and four points from such field positions, but on each occasion they came up scoreless in a game they lost by two.

Sport at its best for the neutral, rather more extreme emotions for the respective supporters.

Sunday, 13 January 2013

When Does NFL Quality Shine Through.

America's love affair with celebrity and sport collide in a couple of weeks when Superbowl XLVII takes to the stage to decide the world champions of American football. Audience figures rival those for truly inclusive sports, such as Olympic games and FIFA world cup finals and for one night only a game played almost exclusively on one continent holds centre stage. The sporting contest on the field usually managing to eclipse a 45 minute halftime show, complete with rock concert.

Despite the shared name, American football is a much easier sport to dissect numerically than association football. Significant measurable events which have a major impact on match outcome are much more numerous (a side will attempt to move the ball between 50 and 60 times per game) and the extreme stop start nature of the contest ensures that every inch of territory gained or lost comes with an attached change in game winning probability. How efficiently a team moves the ball either on the ground or through the air as measured by yards per attempt is an excellent proxy for team ability, enabling match predictions to go hand in hand with a couple of easily digestible headline statistics.

Almost as attractive from an analytical viewpoint is the inevitable strength of schedule issues which persist throughout a competition where 32 teams compete in a minimum of 16 games each. During this curtailed season, sides play other sides either once, twice or occasionally three times, but more often, not at all. Correcting raw efficiency stats for this constantly shifting strength of schedule can reveal just how good a side is, whereas traditional win loss records in a 16 game season can often fall short.

Let's Play Some Football !

In this guest post, I outlined the stats which are most useful in predicting match outcomes in the NFL and as the post season rapidly reaches a conclusion it seems an ideal opportunity to re visit the topic. Running the football comes with a much higher guarantee of retaining possession, but gains are often steady if unspectacular and passing the ball carries higher risks of incompletion or interception, but potentially has bigger rewards.

The average gain from running the football usually falls around four yards per carry and it is tempting to consider a side which averages in excess of four yards as above average. However, a side may have faced a large number of particularly porous run defenses and their rushing efficiency may be inflated by poor opponents, in reality they may actually be a below average running team.

This problem is most easily solved by comparing the average yards gained by a side to the average yards per attempt allowed by the combined pool of games played by their opponents to date. Tampa's 4.41 yards per carry over the year appears impressive, but it was achieved against defenses which allowed an average of 4.41 yards per carry, making the Buccs no more than an average rushing side in 2012.

Correcting efficiency stats on both the offensive and defensive side of the ball for opponent ability quickly produces a very potent indicator. The number of observations taken quickly reach three figures and after just four games a team's previous opponents will already have accumulated combined data from 16 matches, an entire season for a single team. Performance indicators taken in this way also begin to stabilizes very quickly. The best teams after a month of the season as measured by corrected efficiency rates are very often the teams involved in the post season lottery.

The uses of such rushing and passing stats are many and varied. At a team level strengths and weaknesses are readily seen and by pairing one team's set of stats with their opponents and regressing actual game outcomes against these numbers, pregame winning and losing estimates can be produced for future games.

In September of last year in the linked post, I ranked all thirty two teams based on their corrected rushing and passing abilities on both side of the ball after just four games had been played. Baltimore ranked number one and now contest the AFC championship game after beating Denver in overtime, San Francisco were second and they are also in the NFC championship game. Houston, ranked third after a month,also contest an AFC divisional game in New England and overall six of the eight teams who made it through to the divisional round were ranked in the top 10 as early as October.  

Using Efficiency Stats To Predict Game Outcome.

Team. Offensive Run Efficiency Offensive Pass Efficiency. Defensive Run Efficiency. Defensive Pass Efficiency.
Seattle @ 1.13 1.18 1.04 0.88
Atlanta. 0.86 1.05 1.12 1.04
Win % for Atlanta. 45% Points Spread. +2

Seattle visit Atlanta later today. Seattle have been the biggest improvers since I last looked at these figures back in October. Their corrected passing stats have gone from being below average to well above average. The 'hawks are below average at defending the run, allowing 104% of average yardage, but very good against the pass, where they allow just 88% of average passing yardage. Proportionally it is better to be good at defending the pass or passing the ball compared to identical levels of competence against the run.  The current NFL is an offensive game, but it is also increasingly an aerial one as well.

The bottom line, backed up by a couple of similar models is that Seattle is the better side here, even on the road. The caveats are that Atlanta along with Houston's opponents New England have regularly been able to out perform such models. Atlanta maintained an unlikely third down ability a few years back and New England consistently beat their stats (although Spygate may partly explain their better than expected results). Also Atlanta's bye week isn't incorporated into the final number.

Small sample sizes are no friend to models base on probable outcomes and you can't get smaller than one game, but Seattle have a shot at producing an all NFC West Championship game later today. Whatever the result, corrected efficiency stats are frequently potent tools in predicting who gets to the playoffs, whatever the outcome of the sudden death, post Christmas lottery.

Friday, 11 January 2013

The Premiership Title Race. What Chance Manchester City ?

Over much of its life the Premiership title race has been a contest lacking in runners. It is never less than absorbing for much of its course, even if you can merely admire the brilliance of a runaway winner, but the preseason list of potential winners would rarely qualify for each way betting were it actually a horse race.

Midseason is an ideal time to examine those contenders at the head of the table with an eye to quantifying the likely advantage that the current leaders have gained. More league matches have now been played than remain to be contested and only the January transfer window remains as a means to bridge the gap between Premiership triumph and mere Champions League qualification.

You have to go back five completed seasons before the team leading after 21 games failed to see through the task to May. On that occasion Arsenal led the table with 50 points, two points clear of eventual champions, Manchester United and six clear of Chelsea, who also toppled the Gunners for second place. Therefore, current table toppers, Manchester United enter 2013 in an extremely strong position, seven points clear of near neighbours, City and an impressive 13 in front of their next nearest contenders, Tottenham.

 Recent history would suggest that Sir Alex Ferguson should be celebrating yet another title come early May and that view is unsurprisingly reflected in the prices. However, forecasting events has to be couched in terms of uncertainty and probability. An event may be extremely likely to happen, but nothing is certain until all alternative outcomes have been completely extinguished.

Estimating United’s chances of wresting the title from City is dependent upon the size of their current points advantage, but smaller, more subtle influences are also at play. For example, a seven point lead is welcome at any time of the season, but the number of nearest challengers can make the gap seem more or less comfortable. A single pursuer turns the contest into a match bet, but four challengers, each within seven are potentially a much bigger threat.

A related factor is the overall quality of the division as a whole. The chasing pack will view a game against the current leaders as a vital opportunity to claw back their lead. But increasingly the challengers will hope that other teams, unconnected with the title race will be able to muster a performance capable of also toppling the quarry. A team’s overall quality will be made up of a small number of performances which eclipse their usual levels of performance and the chasers ideally want lesser teams to produce such performances against the leaders, but not the chasing pack. No one single game can be said to have won or lost a title, but there is no denying the huge influence Wigan’s surprise 1-0 win over Manchester United in April 2012 had on the increasingly closely fought title race. On that particular day, Wigan produced a performance well in excess of their overall season long ability and blew the 2011/12 title race wide open.

So to begin to make an informed decision regarding the likely destination of the Premiership trophy, we need to have views and information on the amount of games left to be played, the quality of the leaders and the chasing pack, the number of credible challengers and lastly, the quality of the bit part players, who may potentially take a hand, without being actively involved in the finish.

Many factors are easily defined. Average number of games played denotes the stage in the season and points per game accrued by teams is a good proxy for ability and requires little strength of schedule correction at around halfway. The spread of ability within the division and closeness of the challengers can be adequately described by converting each team’s points per game figures to standard scores. Such conversions incorporate the average points per game total of all teams in the league and also how dispersed each individual total is around the mean.

Runaway leaders who are also vastly superior to the overall talent pool are therefore, more easily identified. For example, Chelsea’s standard score for their points per game halfway average in 2004/05 was 2.43, with Liverpool their nearest challengers with a score of just 1.35. The most congested midterm title race of recent years was 2007/08 when United were locked together with Liverpool, followed closely by Chelsea.

To win the title, the champions, rather obviously must finish above every other team in the league. So by using our estimations of each side’s ability compared to those of their immediate challengers and also the spread of quality throughout the rest of the league, we can create a variety of regressions to attempt to predict the finishing order in 2012/13 using historical precedents from the previous Premiership tables at halfway.

No one method perfectly captures all the inputs we have available. We can for example estimate the chances of United finishing above City given the current state of the table, but this approach then has to account for the admittedly small possibility that Spurs may then finish above both of them.

Alternatively, we can use the halfway position of each contender and their four nearest challengers as the inputs an actual finishing position as the output. Overall each method tries to capture at least some of the quality of the title contender and the quality of the challengers and in the table below I've averaged the likelihood of the top seven going on to win the title using a variety of such methods. No one method gives the current leaders a greater than 78% of winning in May.

United's Title Chances Given the Strength of their Challengers.

Team. Title Chance.
Manchester United. 71%
Manchester City. 22%
Spurs. 3%
Chelsea 2%
Everton. 1%
Arsenal. 1%
WBA. 0.5%

United, as expected are strong favourites and failure to lift the title would be a major shock, although not as big as Newcastle's post January slump in 1995-96 when they ceded a six point lead, made more formidable by their rivals having mostly already played their 23rd games of the season compared to Newcastle's 21. Newcastle should have been considered 80% favourites at that stage.

None of these figures account for anticipated transfer dealings. Everton would be more likely to be net sellers in the window than Chelsea and many would consider WBA's odds in excess of 250/1 still to be optimistic in the extreme. In addition these odds will have potential for rapid fluctuation as games begin to slip away.

The predictions are based on the life of the Premiership and the make up of the league was different in the 90's compared to now when a handful of teams dominate the title race. Hopefully some of the variety has been accounted for in the analysis and City fans will be equally hopeful that the '95-96 season, where Newcastle were caught by a certain other team from Manchester, will be repeated 17 years later.

For a much more detailed look at the title race check out Simon Gleave here.

Thursday, 10 January 2013

Frank Lampard, Goal Scoring Midfielder.

All the signs appear to indicate that Frank Lampard will be the latest of the triumphant Champions league winning Chelsea side to leave the Bridge. In this guest post I've looked at the midfielder's remarkable scoring record since he made his debut for the Blues and take a detailed look at his three most recent seasons for any signs of a letup in his goal scoring abilities.

To read the full post follow this link. 

Wednesday, 9 January 2013

How Big A Shock Was Bradford 3 Aston Villa 1 ?

Seventy places may separate Aston Villa from their deserved League Cup conquerors Bradford City, but few would claim that the Yorkshire side should now be considered superior to Paul Lambert's beleaguered kids. The seventy place gap may not accurately reflect the current gulf in class, but Villa are still superior to Bradford and the second leg may confirm this view.

What Tuesday night's game does demonstrate is that teams can and do produce performances that are someway above or someway below the average level of their usual performance. Football is a low scoring sport and therefore random chance will be a big contributor to a single match result. Over a larger run of matches a side's true levels of talent will begin to come to the fore in the win/ draw and loss column and the best teams will show average levels of performance that more accurately reflects their real ability. That Bradford can defeat Villa merely indicates the levels to which Villa's standards can fall and Bradford's can rise to on one particular gameday in an environment where a couple of outstanding saves can be immediately followed by a second goal for the less accomplished team.

To visualize the peaks and troughs that can be expected in leagues and for particular teams it is first necessary to establish an expected level of performance for each team based on reliable indicators, ideally recorded over a reasonable length of time. There are numerous ways to forecast the most likely outcome when two sides meet at a particular venue ranging from methods involving goal scoring and conceding rates corrected for opponent strength to correlating performance indicators with previous match outcomes. Invariably, and most usefully we can eventually express the likely outcome of a match in terms of the average number of goals one team would be superior to their opponents should the game be repeatedly replayed.

Any reasonable proficient prediction system should eventually mirror a side's actual performance, especially if a continual under or over performance is taken as a sign that the overall strength of a team has shifted. If we are satisfied that we have a good prediction method, we can use a side's match by match goal difference compared to the pre game, expected goal difference to show the extremes of performance, both good and bad that typical teams are capable of.

For example if a team is a 1 goal favourite and wins by three clear goals, they could be considered to have over performed by a margin of two goals, while a draw would indicate under performance to the tune of one goal. By collecting and plotting the frequency of such outcomes, the spread of a team's actual performance can be charted.

Above I've plotted the frequency at which Tottenham beat or failed to beat a goal based pre game estimation of their chances in each match from the 2010-11 EPL season. The residual expectations are collected in half goal bands, so the zero column counts each occasion where the actual margin by which Spurs won or lost the game fell between 0.25 of a goal above or below expectation. The column immediately to the right of the nearly central zero column consists of the six occasions during the season when Spurs over performed by between 0.251 and 0.75 goals when compared to their expected pre game performance.

The majority of Tottenham's performances in 2010/11 cluster around their pre game estimates, a sign that the the model used is tracking reality fairly well. Two matches, a 1-0 defeat at home to Wigan and a 3-1 defeat away at Blackpool fall into the category at the extreme left comprising matches were the Spurs side under achieved by between 2.25 and 2.75 goals.

Overall the model does a good job of describing Tottenham and one season's worth of results produces over 20 results that were predicted with reasonable pre game accuracy. The half dozen games at either extreme shows the frequency with which Spurs turned in atypically good or bad performances and the distance from the zero line indicates roughly how extreme those outcomes actually were.

If we plot a similar graph for every EPL game played by every home team since the beginning of the 2005-06 season, a similarly well defined curve results. The majority of matches fall very close to the average margins of victory predicted beforehand.

 Again the buckets into which each game is placed are half a goal wide and once again the more extreme actual margins of victory or defeat compared to pre game estimates appear with increasing scarcity. Aston Villa were considered a shade over 1.2 goals superior to Bradford on most reliable indicators before kick off at Valley Parade, so a defeat by two goals represented an under achievement of 3.2 goals on the night or a similar over achievement by their hosts. Using the crude sampling bins, that would have placed the Bradford Villa result into a bin comprising home teams who had played between 2.75 and 3.25 goals better than expected, as indicated by the arrow on the plot above. An occurrence which was played out just 26 times out of 2280 matches in the EPL between August 2005 and May 2011.

Sunday, 6 January 2013

How Well Do Individual Scoring Rates Survive A Change Of Scenery ?

The opening of the transfer window always brings with it a huge amount of speculation concerning the possible destination of unsettled, out of favour or much sought after players. Many of these apparently "done deals" fail to materialize and the only real certainty surrounding the January spending spree is that the window will "slam shut" at the end of the final day.

Previously much of the pairing of players to their ideal destination was done on gut feeling, but the ever increasing amount of individual data  has now enabled even the casual fan to pick out that tough tackling midfield enforcer who will propel his side to either safety or greater heights. So has the readily available shooting, scoring and tackling statistics made completing the jigsaw an altogether simpler process ?

Goals are widely used to define strikers and in this post on super subs, I used the number of goals scored by Dzeko for Manchester City as a proportion of all goals scored whilst he was on the pitch in an attempt to quantify his contribution in both roles. This method is attractive because it partly accounts for quality of opportunity, a rout against a poor side will see other players also getting on the score sheet and it also allows for the heightened rate of scoring later in matches. Also a player scoring two goals in a first half isn't penalized if he is absent for the second period and further goals are added to the total.

By normalizing the scoring environment, while Dzeko's team mates remain reasonably constant, we can try to judge if there is likely to be any difference between Dzeko, the starter and Dzeko, the sub. We can further develop this approach to try to estimate how a newly acquired striker may fit into a new team, especially as measured by the bottom line of goals scored.

Estimating player ability is always difficult, ageing and survivor bias for example is hardly ever addressed, but the most glaring problem in this instance is that transfer targets are playing for a different team, alongside players of differing abilities and varied tactical priorities compared to the side which is pursuing him. A player may score a high proportion of the goals claimed by a struggling Premiership side, but that may be because he is head and shoulders above his striking team mates. The numbers may be telling you that he is "too good" for his present side, but is he good enough to play for a potential suitor ? A potential buyer needs to know how he is likely to perform for them and raw scoring exploits are unlikely to provide reliable conclusions. Are you buying a squad player or an upgrade ?

The case of Daniel Sturridge highlights how we may be able to make more informed predictions using the  scoring exploits a player achieves whilst playing at different clubs. Sturridge has performed in the top flight at the then mid table Manchester City, Championship winning Chelsea, struggling Bolton, by virtue of the loan system and now appears to have found a level where appearances and the opportunity to showcase his talent will be guaranteed at Liverpool.

During his three seasons at City, he scored 30% of his side's goals whist he was on the pitch. At Chelsea the figure dropped to just below 20% and in a half season loan at Bolton it shot up to nearly 50%. Sturridge's opportunities have been limited, so sample sizes are small, but his "talent" as a striker appears to rise as the overall quality of his side falls. His proportional strike rate was exceptional at Bolton, very good at City and good at Chelsea.

The reality is probably that he was close to being the same player at all three clubs, especially during his time at Chelsea and Bolton. He shone outstandingly at The Reebok because he was the best attacking player on the team and as such was able to dominate his fellow strikers. Once he arrived back at Chelsea he was partly eclipsed by better strikers and his proportional rate of goals scored fell. Had Chelsea purchased, instead of merely recalling Sturridge from Bolton on the basis of his 50% strike rate, they would have been disappointed if they had expected that headline stat to be repeated at The Bridge.

In the table above, I've added the proportion of goals scored by a variety of strikers who have played for multiple seasons at Premiership clubs of differing overall quality. The strikers included Crouch, Adebayor and Bent in addition to Sturridge and the clubs involved range from Chelsea and Arsenal, down to struggling EPL sides like Southampton and Charlton. The relationship which appears to exist for Sturridge also shows up in this larger group of similarly talented strikers, each of whom have been traded for eight figure transfer fees over the course of the sample. They tend to score a smaller proportion of the goals scored as they move to bigger and better clubs.

When playing for struggling EPL sides, denoted by success rates in the region of 0.35, the line of best fit indicates that such strikers are likely to account for between 40 and 50% of the team goals scored whilst they are on the pitch. When making the step up to a Champions League quality side, their contribution then typically falls to around 20% because they are probably surrounded by a bigger pool of goal scoring talent.

Crouch, Scorer of 22% of Liverpool's Goals when playing compared to 35% for Stoke.
Collecting the proportional scoring records of players provides an added level of information which may prove valuable in predicting future performance at other clubs. Knowledge of a player's record at his previous clubs may enable a side looking to buy in the transfer market to estimate the proportion of goals a new signing would typically contribute at a different class of club. And more importantly whether this figure will go hand in hand with an overall increase in total goals for the team. Sturridge's presence in the Bolton lineup also coincided with a 10% increase in total goals scored compare to their previous 30 games, although this needs to be confirmed in much larger sample sizes.

Demba Ba's record at West Ham and Newcastle has seen him score around 40% of his side's goals as an active player at teams with a combined success rate in the region of 0.44, well in line with expectations. If he continues to hug the regression line, Chelsea may have purchased a player capable scoring 15% of the goals at a club such as Chelsea. Further research may show whether this will be an upgrade at the London club, although the reduced price tag for such an apparent talent would already appear to be good value.

Luis Suarez, Daniel Sturridge's intended strike partner at Liverpool also fits neatly onto the line of best fit. He has scored around 30% of the goals when he's played at Anfield, just the figure you would expect for a top striker playing for a team with an overall success rate of 0.54 spread over two part and one completed season. The figures for Sturridge and Suarez therefore point to Liverpool having now acquired two top class strikers. The hope will be that the pair will produced strike rates nearer to 20%, provide scoring opportunities for other team mates through assists or by occupying opposition defences and Liverpool's success rate as measured in wins and draws will increase in line with the profile of their new striking threat.

Liverpool's previous big money swap saw Andy Carroll join from Newcastle. Carroll and Newcastle spent 2009/10 in The Championship, but during his time playing in the Premiership, Newcastle's success rate hovered just below 0.4 and a top striking prospect playing for such a struggling side should have scored around 45% of the team's goal. Carroll only accounted for 35%, so on that basis he was a much bigger gamble than the one they've taken on Sturridge, who cost less and has better numbers.

This methodology is reasonably straightforward when used for goals, but it is equally feasible to apply it to add context to other on field actions that could otherwise potentially lead to misleading conclusions. Looking at the number or even the proportion of successful passes or assists may be just the start of the selection process and not the endgame.

Also check out Danny Pugsley who takes at look at the subject here

Thursday, 3 January 2013

The Usefulness Of Tackles.

Anyone who doubts the elevated regard in which attacking play is held in comparison to defensive talent need only watch an episode of Match of the Day. The day's worst defensive performance invariably appears first in the running order, while the best defensive show is aired, usually with poor grace in the show's final minutes.

Analysis of attacking play is also more heavily developed compared to the defensive side of the ball. Goals are the obvious end product of attacking intent, they alone decide the final match outcome and there is a well defined and easily quantifiable chain of evidence leading from goal creation to realisation.

Shots and shots on target are one step removed from an actual goal and correlate well with winning matches.  Assists come next in the chain, again providing a comfortingly strong correlation with success and it is only when we step even further back into the scoring process to look at mere, run of the mill passes that we begin to experience a more confused correlation between quantity and quality of execution and end product.

At the sharp end of goal scoring there is no compromise, a goal is scored (usually) when the ball crosses the line, but how we arrive at the successful conclusion can take on a multitude of different tactical approaches. Therefore, we eventually reach a point in the scoring process where there is no catch all statistic that can describe with equal clarity a possession based Barcelona approach or a direct, long ball plan.

Shots and assists can be used to illustrate a team's previous successes and partly predict their future expectations, but passes are more indicative of team tactics and such diversity often defies easy or relevant measurement.

Defensively the evidence chain is much shorter and potential for confused and weak  correlation is reached much quicker. Interceptions and tackles are widely regarded as the currency by which defenders are valued, but the reality is that these stats are much more the product of how a team is setup than how talented are a side's defenders. Tackles are the passes of the defensive world, they are not the equivalent of shots or assists. Tackles per minute were an irrelevance to the likes of Paolo Maldini because there are more ways to prevent a goal than there are to score one.

Bolton, A Successful Tackle or Just Buying Time?

A good starting point in trying to understand the part tackles play in the game is to see how often teams commit to a tackle. The risk involved in tackling is twofold. A mistimed effort can lead to cards and a failed attempt usually sees the tackler out of position, if not removed entirely from the rest of the attacking move. Sorting tackles by outfield playing position can give reasonable picture of where a team is making their challenges and in the table below I've corrected the figures to allow for playing time. Defenders or midfielders accrue over twice the playing time of strikers, partly through tactical substitutions and because they are more numerous in the lineup.

A straightforward totting up of tackles would therefore always see strikers trailing the other two positions, so I've corrected cumulative totals from the 2011/12 season to produce normalized per game figures for all three positions. The numbers should depend on the amount of tackling demanded of each position by each club and the requirement imposed on them by the opposition.

         Tackling Rates Per 90 minutes by Position, EPL Sides 2011/12.

Team. Defenders. Midfielders. Strikers.
Aston Villa. 24 23 9
Manchester United. 23 25 10
Arsenal. 22 21 9
Norwich. 22 19 6
Manchester City. 22 18 10
Chelsea. 21 22 10
Stoke City. 21 18 9
QPR. 21 23 8
Wigan. 20 21 7
Sunderland. 20 25 10
Swansea. 20 22 9
Bolton. 19 21 16
Newcastle. 19 25 8
Liverpool. 19 22 13
Spurs. 19 23 10
Wolves. 18 17 10
Blackburn. 18 20 13
Everton. 17 19 8
WBA. 16 23 9
Fulham. 15 19 19

The baseline figure for team tackles is just under 19 per game. So Fulham's strikers were asked to tackle back at near league average team rates last term, almost twice the rate required of strikers as a group. A quality they may have sacrificed this term with the acquisition of Berbatov. They were followed in the tackling stakes by Bolton, anecdotally a side which defends from the front. In contrast, Stoke strikers are expected to contribute well below average tackling numbers, although regular Stoke watchers will know that Pulis demands pressure before contact as a tactic. Indeed Stoke's midfield also had the second lowest number of attempted tackles in 2011/12 and are only just in the top half of the rankings for defenders. Tackles are a component of defending, but not the only one and in the case of some teams such as Stoke, they are not the primary one.

Manchester United's commitment to tackles in 2011/12, both overall and particularly from defenders and midfielders is well illustrated. Last year's runners up required their midfield players to be able to tackle often, an attribute used by a variety of other teams ranging from Villa, Sunderland, QPR to Spurs, Chelsea and Newcastle. By normalizing tackling stats to the equivalent of a team consisting entirely of strikers, midfielders or defenders, season long playing styles by position become more transparent and may better highlight qualities required from various sides.

Overall there is virtually no correlation between tackle numbers as frequently exemplified by tackles per minute figures for individual teams and the ability to prevent goals, again indicating that raw tackle numbers are a product of tactics rather than proficiency. We need to look at tackling efficiency to see how effective these tactics are and how successful tackles may contribute towards goal prevention and ultimately help teams to win.

A successful tackle at worst slows down an opponents passing sequence and ideally teams would like to see a positive correlation between efficient tackling rates and goal prevention. Overall tackle rates usually lie between 70% and 80%.

          Percentage Tackling Success Rates by Position, EPL 2011/12.

Team. Defenders. Midfielders. Strikers.
Arsenal. 75.1 72.9 80.8
Aston Villa. 73.9 76.9 79.5
Blackburn. 72.8 75.5 80.0
Bolton. 75.3 69.9 77.1
Chelsea. 77.1 76.5 82.8
Everton. 75.1 80.9 75.0
Fulham. 78.5 76.0 76.9
Liverpool. 74.0 73.2 75.3
Manchester City. 75.7 73.2 67.7
Manchester United. 77.1 71.3 87.3
Newcastle. 75.3 73.0 75.9
Norwich. 68.7 73.7 78.4
QPR. 73.1 72.0 69.2
Stoke. 72.1 68.0 72.0
Sunderland 78.1 75.7 69.6
Swansea. 73.8 71.2 71.0
Tottenham. 78.8 71.6 69.2
WBA. 73.4 77.6 78.8
Wigan. 72.3 73.4 85.3
Wolves. 71.9 76.8 80.6
Overall. 74.6 74.0 76.9

It's initially surprising to see strikers as the most efficient group of tacklers, with Manchester United's attackers particularly out stripping their defensive team mates. However, this highlights the danger of taking statistics at face value. Strikers are out numbered by other positions by around two to one in on field presence, so the smaller sample sizes are more likely to lead to extreme value either above or below the norm. Hence United's 87% success rate may not be repeated in larger samples, bringing rates closer to those enjoyed by the defence.

The higher overall figure for strikers as a group is likely to be because they can pick the tackles they attempt and sometimes opt for other defensive tactics, such as forcing a pass by closing down an opponent. Midfielders and particularly defenders are often required to attempt tackles that are the last ditch alternative to allowing a decisive pass or shot on goal. In short, tackle difficulty isn't well represented in these raw figures.

The correlation between team tackle efficiency sorted by position and goal prevention is non existent for strikers and possibly surprisingly, also for midfielders and it's only when we look at just defenders that a reasonable  correlation develops.

The efficiency of defensive tackles is the strongest tackling indicator of goal prevention beating out overall rates, rates for the other two nominal positions and raw tackle counts for defences. The relative weakness does indicate that other factors, most probably interceptions, applying pressure to induce errors and goalkeeping ability are also significant factors.

We can use a similar approach from an attacking perspective by recording the rates at which opponents complete tackles against individual teams to see if an ability to make defenders miss tackles leads to increased scoring rates.

The correlation in this instance is weaker, but the indication is that an ability of a team to make defenders proportionally miss more tackles tends to bring rewards with increased scoring. Low scoring sides from 2011/12 such as Stoke, Wolves, Villa, Swansea and Sunderland allowed opponents to successfully complete upwards of 75% of tackles attempted by defenders, while more free scoring sides such as Newcastle, Spurs and Arsenal made defenders miss more frequently. Both Manchester clubs were big outliers, accounting for the reduced r^2 values, with City particularly being capable of scoring extremely freely despite facing near average tackle rates from opposing defenders. Possibly an indication that City excel at overcoming other defensive tactics which are omitted from mere tackling data.

Both graphs can be combined to see how important tackles made and tackles received may be to the majority of EPL sides. Teams such as Chelsea, Arsenal, Manchester United, Spurs and Newcastle had defenders who tackled at above average rates and had attackers who induced opposing defenders to tackle at below average rates. This seemingly heady mix was translated into impressive finishing positions. However, Bolton and to a lesser degree QPR produce similar splits, but without the successful results. So once again we could conclude that tackle statistics are noteworthy, but incomplete predictors and possibly partly flawed.

To hint at why tackles aren't as indicative of success as say shots, we can look at Bolton's game against Manchester City from last term. Overall, Bolton made 19 tackles, 16 of which were deemed successful for a tackle rate of 84%. However, in just 11 of those 16 successful tackles were Bolton able to gain possession and try to develop a footballing move of their own. On five occasions Manchester City retained possession because the ball was put out of play for either a corner or a throw. It's churlish to deem an excellent touchline tackle as a failure, but tackles do come with varying degrees of success. Five times City were able to continue the move from very close to where a successful tackle was made, dropping Bolton's success rate from  84% to 58%. Similarly for City, they recorded nine success from ten tackles, but on three occasions Bolton retained possession from the resulting throw and a 90% strike rate fell to 60%.

Just as shot based models improve as we add details such as x, y co ordinates, tackling models may benefit from reference to the immediate aftermath of the challenge.

Defence is always more complex to describe numerically. Ways to defend your goal are more numerous, varied and sometimes in the case of pressing, difficult to record compared to scoring which ultimately boils down to a shot or header at goal. Tackling efficiency appears to be preferable marker compared to frequent but less successful tackle attempts and by grading different levels of success for individual tackles we may creep closer to identifying the great defenders from the merely good. While appreciating that some teams demand different approaches to defence where tackling may not be the primary priority.