Saturday, 30 June 2012

Searching For A Winger.

The last post highlighted the need to treat small sampled sized rate statistics with a lot of caution.Football managers,both real or imaginary are currently looking at the numbers produced by last year's crop of professionals with a view to adding them to their actual or fantasy lineup for the 2012/13 campaign.As a result of the readily available source of more in depth individual statistics,the first stop for the aspiring SAFs is the data provided by the numerous Opta re sellers.

And there will almost certainly be a bite sized chunk of numerical data to give form to whichever quality you are looking to invest in with your new signing.A tackle rate of 89% from 30 total tackles may indicate that some of the success enjoyed by last year's defence was down to this precise individual contribution,but once that figure starts being used to predict future performance,then the picture becomes much less clear.A tough tackling defender may be just what your fantasy or real team needs,but do these kind of raw numbers give you much confidence that this is what you will get if you press the "buy" button.

Let's imagine you want to purchase a throwback winger from the 80's,possessed of dribbling skills and the ability to pick out a cross with pin point accuracy.A Mark Chamberlain,proud father of The Ox,for example.EPL Index usefully provide the number and success rate of crosses for all players from last season's EPL as well as their total dribbles and their success rate in that category.

We saw in the previous post that sample size in these type of success or fail statistics can quickly compromise the predictive quality of these rate figures.Less attempts mean a larger proportion of random variation in the observed figures,so to produce a figure that better represents expected future performance we have to add a fairly hefty portion of league average to the observed rate.The larger the number of trials,the less we need to regress toward the mean of the whole sample.

Regressed Crossing Rate Stats from 2011/12 EPL.

Player. Total
Observed Success Rate %. Regression
 Rate %.
David Fox. 92 38 41.3 34% 35.1
Luka Modric. 122 46 37.7 29% 33.6
James Morrison 84 33 39.3 37% 33.5
Mark Gower. 95 35 36.8 34% 32.3
Barry Bannan 100 35 35 33% 31.2
A'thony Pilkington 108 37 34.2 31% 30.9
Michael Knightly 108 35 32.4 31% 29.6
Wes Hoolahan. 55 19 34.5 47% 29.3
Florent Malouda. 93 30 32.2 34% 29.2
Jermaine Pennant. 228 68 29.8 18% 28.7
Stilian Petrov. 65 21 32.3 43% 28.5
Chris Eagles. 161 48 29.8 23% 28.3
Population Average. 23.4

Above are the top ten most successful crossers taken from EPL data for 2011/12.Some are midfielders,whilst others are more traditional wide players.The top ranked cross efficient player was Norwich's David Fox who was successful with over 41% of his 92 attempts compared to a sample average success rate of 23.4%.Around 50 crossing attempts are needed for the contribution from random chance to equal that from player talent and so Fox's 92 crosses implies that his cross conversion rate is more heavily influenced by his talent.The make up of the population from which Fox's score is taken indicates that his observed conversion rate needs to be regressed about 34% towards the group mean.So in Fox you my get the EPL's most accurate crosser of a ball,but you are extremely unlikely to get a conversion rate anywhere near his observed 41.3% in 2012/13.

The effect of applying a regression not only pulls the more extreme results toward more realistic levels,but also the ranking order also changes in cases where impressive conversion rates have been achieved over relatively large sample sets.For example Pennant's corrected rate leapfrogs that of Petrov because the latter's higher raw rate was achieved in only 65 trials compared to Pennant's 228 observations.Pennant's observed numbers are regressed only 20% compared to nearly 50% for Petrov.

Regressed Dribbling Rate Stats 2011/12 EPL.

Player. Total
Observed Success Rate %. Regression
 Rate %.
Steven N'Zonzi 25 22 88.0 29% 76.7
Mikel Arteta 23 20 87.0 30% 75.3
Mark Davies 60 46 76.7 14.5% 72.7
Ramieres 59 43 72.9 14.5% 69.4
Sandro 22 17 77.3 31% 68.3
Joe Allen 50 36 72.0 17% 68.1
Ryan Giggs 30 22 73.3 25% 67.1
Nigel Reo Coker 42 30 71.4 19% 67.0
Assou-Ekotto 44 31 70.5 18.5% 66.4
Patrice Evra 75 51 68.0 12% 65.7
Nani 79 53 67.1 11% 65.0
Charlie Adam. 36 25 69.4 22% 64.9

Applying the same technique to the 2011/12 dribbling statistics sees Steven N'Zonzi's small sample sized 88% conversion rate dragged down to 76.7% and Ramieres and Nani's larger sample size rate allows them to leapfrog Sandro and Adam respectively in the rankings.

Top Combinations of Crossing and Dribbling Stats,2011/12 EPL Players.

Player. "True" Dribbling
Success Rate.
"True" Crossing
Success Rate.
Luca Modric. 60.6 22 33.9 2
Joe Allen. 65.7 5 27.2 27
Mikel Arteta. 70.0 3 26.4 37
Wes Hoolahan. 58.7 32 29.6 9
Tomas Rosicky. 62.1 13 26.8 30
Maynor Figueroa. 56.8 38 30.7 7
Benoit Assou-Ekotto. 63.4 9 25.8 48
James Morrison. 54.6 55 33.8 3
Jermaine Pennant. 55.3 48 28.8 11
Stilian Petrov. 55.1 49 28.7 12
Leighton Baines. 55.6 42 27.4 22
Gareth Barry. 55.4 44 27.4 23
Florent Malouda. 52.2 73 29.4 10
Juan Mata. 59.5 27 25.5 57
Ryan Giggs. 63.9 10 24.4 82
Seb Larsson. 56.0 41 25.8 52
Kieren Richardson. 50.6 83 27.6 19
Gylfi Sigurdsson. 53.6 65 26.2 41
Alejandro Faurlin. 55.3 47 25.2 60
Branislav Ivanovic. 52.1 75 26.4 38

Having dampened down unrealistic expectations and reshuffled the respective leader boards,all that's required to draw up a shortlist of potential wingers is to combine the crossing and dribbling ranking tables.As this post is more about demonstrating the effect of regressing observed values,I've simply arranged the players according to their combined ranking from each discipline.Of the top twenty combined outfielders WBA's James Morrison is the top ranked player who has operated in a wide role in 2011/12 and his particularly impressive projected crossing percentage gets him the vote in front of Stoke's Jermaine Pennant.

Seasonal player statistics perfectly describe what has occurred,but their predictive value often leads to inflated expectations if they are used in raw "as is" form.

Friday, 29 June 2012

How Unlikely Was Balotelli's Two Goals from Three Strikes.

No prizes for knowing the story of the semi final.No doubt spurred on by the views of a largely trophyless Alan Shearer that he had achieved nothing,Mario scored both goals in Italy's dismissal of the Germans.His first was via a magnificent Cassano assist.It's true that Mario had to deftly position himself the wrong side of Badstuber,but the cross was just inside the sweet spot for headers.Once inside the six yard box a header starts to become more difficult to save than a shot,a point not lost on Neuer who accepted his fate as soon as the ball left Mario's forehead.

The German keeper had less excuse for tamely capitulating on the second goal,18 yards distant and around 5 yards right of centre aren't natural goal scoring chances,typically having a goal expectancy of around 1 in 10,but Mario found the ball on the sweet spot of it's bounce to find the target just inside the right upright with both power and swerve.

As nights go it was almost perfect.The exception being Mario's only other shot of the match from his most difficult opportunity where he was odds on to miss the target and duly obliged.At least he spared himself the dilemma of whether or not to celebrate a Euro semi final hat trick.

The Likelihood of Mario's Three Chances Resulting in Goals.

Outcome from Mario's 3 Shots. Likelihood of Outcome.
Three Goals. 1 in 1000
Two Goals. 1 in 25
One Goal. 1 in 3
Zero Goals. 2 in 3

Thursday, 28 June 2012

Where Does the Luck End and the Skill Begin ?

Sooner or later any blog discussing the use of advanced statistics in any sport will give a hat tip to Moneyball,the catch all phrase that describes the rise of an alternative view of reality that originated in baseball.Popularised by Michael Lewis in the book of the same name,practiced by coaches,most notably Billy Beane and presumably immortalized in film by a Hollywood A lister in Brad Pitt,the concept is guaranteed at least a footnote as a primer in the development of football analysis.Much of the nuts and bolts of advanced baseball analysis has taken place out of the glitzy spotlight appearing instead on internet blogs and usenet bulletin boards and it's in these places rather than at the local multiplex that much of the hard edged research is found.

The baseball movement grew through better access and quality of raw data,development of both descriptive and predictive metrics that replaced tried and relied upon older ones and a proper understanding of the limitations and pitfalls that come with use of such data.At the moment football's approach is attempting to tackle the former,is in danger of being flooded by a wave of new statistics,but is largely ignoring the limitations of it's newfound knowledge.

Fortunately football's current weakness is very much baseball's previous strength.Issues of sample size and the reliability of the conclusions we can draw from different population sizes is the bedrock of much of the advanced baseball analysis.Regression towards the mean is a powerful and necessary component of any advanced analysis and much of the foot slogging has already been done.

The nature of baseball overall is very different to the fluidity of football,but football does have set piece,success or failure events that can be incorporated into baseball's approach to evaluating the role of skill and luck.The interplay of luck and talent is easy to visualize,the maths less so and rather tedious.Therefore,I'll concentrate on the former and skip over the latter,much of which can be found on any Sabermetric blog.

A player or team's recorded performance over a period of time will be a combination of skill and luck and in smaller samples the random element will predominate,while in larger ones it is talent that will begin to shine through.A conversion rate of one goal from three shots is very impressive in the context of one game,but if that is the only piece of information we have about a player or team,what does it tell us about how he will perform in the future.

That answer will depend on a variety of factors.How a player performs at a particular statistics is valuable information that increases in reliability as we increase our sample size,but we must also include how the population as a whole performs.There are many more average performers than truly outstanding or poor ones,so if we have limited information about a player our overall projections will be more accurate if we assume he is more likely to be average than extremely good or bad.

How deeply we regress an individual observation towards the population mean is also dependent upon the reliability of the observed parameter.If the spread of talent within the overall population is small,then an extreme observed value for an individual is more likely to be down to the intervention of random chance,therefore more weight should be given to the population's average than to individual scores.

Below I've shown the typical amounts by which a team's shooting efficiency statistics should be regressed towards those of the population mean for data taken from six seasons of the EPL upto and including 2010/11.

Amount of Regression to the Mean Needed to be Applied to EPL Data,2005-2011.

Number of Shots.
of Goals.
Man City.
Stoke City.

29 teams played at least one season in the EPL over the period and they combined to produce over 53,000 shots and almost 6,000 goals.Arsenal of course were present for all six seasons and accounted for 3238 shots and 428 goals for a strike rate of 13.2%.An average team attempting that many shots would only have expected to score 360 goals,so the Gunners outscored the average by almost 70 goals.Their individual observed rate was in excess of 3 standard deviations above the average for the group of teams,so they were extreme outliers.However,they did record that figure over a substantial number of trials and so their individually observed rate is likely to be a credible record of their true ability at converting shots into goals.Once all the maths has shaken out only 8.5% of the group's average conversion rate of 11.1% is combined with 91.5% of Arsenal's actual rate recorded over 3238 trials.This has the effect of dragging Arsenal's observed rate over the 6 years slightly towards the league's average.

By contrast Burnley spent just on season in the top flight and they only managed 406 shots on goal and that record is more likely to contain random noise than was Arsenal's more numerous body of work.In predicting a representative conversion rate for Burnley during their brief stay in the EPL it is better to add a larger portion of the league's average conversion rate.In this case a 42.7/57.3 split of league average to observed rate,which pushes Burnley's observed rate up towards the 11.1% average.

If we regress all observed rates for all teams in this way the overall predictive quality of the new rates on average will be better than merely using the actual observed rates.A small but worthwhile improvement.

An extremely useful by product of the process of regressing observations towards the population mean is the requirement to calculate the contribution made by luck and talent to the variable as sample sizes increase.After  a small number of trials luck predominates and as we move upwards talent gains the upper hand in defining the size of the observations.It's possible,if algebraically tedious to be able to fairly accurately calculate the point at which the two variables are equal and below I've charted the number of various attempts required by a team for this position of luck/skill parity to exist.

How Many Observations Are Needed Before Skill Starts To Shine Through.

Team Skill. Number of
Number of Games.
Goals per Shot. 300 24
Goals per Shot on Target. 190 29
Shots on Target per Shot. 390 30

It appears that an EPL team needs around 300 shots before their goal haul is an equal product of talent and luck,in more understandable terms that's about 24 games.(Most goals based models for  predicting future outcomes use at least this number of matches).Shooting accuracy reaches parity after 390 attempts,roughly 30 games and you'll need to watch a similar amount of games before you can conclude that a team's conversion rate from on target shots is thereafter down more to skill than chance.

Of much more interest from a scouting perspective is knowing when an individual player's performance begins to be driven by his talent rather than an unequal sharing with good or bad fortune.Branding a player as poor because of insufficiently small sample size (think The Beatles) is just as bad as purchasing a "superstar" riding short term luck who later turns out to be mediocre.From data limited to scorers from the last three EPL seasons I get a figure of around 45 shots before skill begins to overtake luck in the quest for goals and that's likely to be in excess of 22 games for the average goalscorer.Perhaps pertinently Wenger is reputed to watch a player over at least 30 matches before he commits to buying or not,so we certainly seem to be in the right ball park figure in terms of teasing skill from randomness.

Regression Towards the Mean,the single most important aspect of football analysis...............

Monday, 25 June 2012

England 1 Italy 2.Euro 2012 Quarter Final.(A Shot Analysis).

Unfortunately the headline scoreline isn't actually correct.Italy didn't put England out of their misery inside the 90 minutes,allowing Hodgson's embarrassed fans to catch all of Rihanna's set live from the Hackney Marshes on BBC3 and sparing Jann Kermorgant a "that could have been me" moment when Pirlo took that penalty kick.

Even once every two years football fans could see that England had allowed themselves to be totally dominated by their Italian opponents and the barrage of match stats that arrived along with shot after Italian shot only reinforced the idea that The Three Lions should have been well beaten in regulation.A 3:1 shot ratio of 27 to 9 had been pushed out to 4:1 by the end of extra time as Italy added 9 more shots without reply,yet the scoreline remained stubbornly at 0-0.

Luck intervenes in all sporting events and it's contribution can be keenly felt in single matches.On any given Sunday night,teams can bury one chance when marginally offside while blazing a similar opportunity high over the bar when remaining legal and vice versa.

So lets imagine that the first 93 minutes on Sunday is a reasonable representation of how an Italy England match under Roy Hodgson will pan out if the teams met over a limitless series of matches.What reward,on average would Italy gain from producing such an apparently dominating performance,especially if they kept repeating their superior shot ratio.

In this post I developed a goal expectancy for average shots based on the position on the pitch from which the shot was taken.In addition the model predicted how likely a shot was to hit the target requiring a save or was blocked by a defensive player.The data used wasn't extensive,comprising shots largely from the top six teams from the EPL and Stoke City.Therefore,the model was likely to be biased towards very efficient teams (despite their regular mid to lower level finishing positions,Stoke have maintained a ruthless level of scoring efficiency in their four season in the EPL,without which they would have almost certainly been relegated).In short the model is likely to represent high scoring teams (again Stoke would be high scoring if they had more than their paltry possession figures).

Games between highly ranked countries,especially in the knockout stages of a competition are very likely to be lower scoring than the environment from which our model originates.So it is probable that our conclusions will overestimate the average likelihood of goals in our simulated re runs of the Italy England spectacle.But we are only working in ball park figures anyway and a crude correction can be made at the end.

I've taken an screenshot from FourFourTwo's Euro2012 stats app initially showing England's nine shots from Sunday's game and I've labelled each shot.I've then entered the co ordinates of each shot into my regression model to calculate how likely it is that a typical shot from that co ordinate will score,miss the goal or be blocked near to source.From the outputs we can firstly get a feel for the quality of the chances.Were England attempting low percentage,longrange efforts or were they carving out clear cut efforts?

The game's best chance overall appears to have fallen to England's Glenn Johnson (shot 5) early on in the match.Not only did the chance fall centrally,giving the player the largest shooting area to aim for,but it was also just six yards out.In the sample comprising my model these type of chances,ranging from one on ones,crowded goal mouths to tap ins are converted 34% of the time,are more likely than not to be on target and are relatively difficult to block.So the most likely outcome was an on target shot that wasn't blocked,but was saved.And that's what happened.

Without having seen the actual match footage,Rooney's chance (shot 1) would appear to be more likely to have produced a goal based on the regression alone as it was taken from slightly closer to the goal.However,this effort excellently demonstrates the limitations of a data dump method of analysis that includes limited variables.Rooney had his back to goal,the ball was above and behind him and it required an overhead attempt.So the actual likelihood of a goal,in reality is far below the estimation derived from a two dimensional plot of a three dimensional fluid event.All models can deceive,as can shot maps.

Expectancy Values for Each of England's Shots Verses Italy. Euro 2012.

England Shot Number.
Probability of Shot Being On Target.
Probability of Shot Being Blocked.
Probability of Goal Being Scored.
Cumulative Probability.

Rooney's effort aside,the other eight England attempts were more typical of the type of chances that predominate in the dataset from which the model was made.If we add up the cumulative probabilities of all nine shot's chances of resulting in a goal,a block or a forced save we find that on average over many repetitions,England would have seen 3 shots on target,2 blocked and one goal,allowing for the non capture of the difficulty of shot one and the elevated goal environment in which the model was originally constructed.The reality of the night saw just one shot on target,3 blocked and zero goals.

Expectancy Values for Each of Italy's Shots Verses England. Euro 2012.

Italy Shot Number. Probability of Shot Being on Target. Probability of Shot Being Blocked. Probability of Goal Being Scored.
1 0.42 0.13 0.08
2 0.22 0.31 0.02
3 0.52 0.17 0.33
4 0.45 0.21 0.21
5 0.20 0.34 0.01
6 0.28 0.37 0.06
7 0.49 0.16 0.23
8 0.25 0.30 0.03
9 0.28 0.23 0.03
10 0.27 0.23 0.03
11 0.34 0.22 0.06
12 0.25 0.29 0.03
13 0.27 0.36 0.05
14 0.28 0.35 0.05
15 0.48 0.16 0.21
16 0.44 0.17 0.14
17 0.47 0.18 0.20
18 0.29 0.31 0.05
19 0.27 0.18 0.02
20 0.33 0.31 0.09
21 0.23 0.24 0.01
22 0.27 0.27 0.03
23 0.34 0.29 0.09
24 0.31 0.31 0.07
25 0.26 0.37 0.05
26 0.26 0.37 0.04
27 0.25 0.37 0.04
8.7 7.2 2.3

By contrast with England,Italy positively peppered their opponents goal with shots,whilst also coming up empty.However,their superficially impressive numbers are bulked up with a large amount of long range,low expectancy attempts.Shot 3 was positionally on par with Glen Johnson's best effort and shots 4,7,15 and 17 were inferior,but should be classed as reasonable chances.In addition two very good chances followed one another after a Hart parry of a Balotelli shot and such efforts should be treated as a continuation of a single event,thereby slightly depressing the cumulative goal expectancy.The remaining efforts were mainly outsiders running for pride.

For those having trouble finding De Rossi's wonderfully struck sliced dipper that hit England's post after 3 minutes,it's shot 6.Unlikely to be on target,(technically it wasn't),quite likely to be blocked,it would have been a most unlikely opening goal,but a great one.

Having made similar adjustments in the case of Italy,their shot count on average would expect to see 8 shots on target (actual 6),7 blocked (actual 12,so well done England) and 2 goals (actual zero,well done Joe Hart).

Italy dominated the game,but England created three very good chances and generally shied away from the kind of speculative long distance efforts that characterised around three quarters of Italy's attempts.If we round down the most probable scoreline from the cumulative probabilities in lieu of asking both teams to continually reenact such an absorbing contest,the quality and quantity of last night's shots suggests that 2-1 to the Azzurri shouldn't have been a surprising result......assuming Italy didn't take the lead and then imitate the reactive approach of their opponents.

Hence the title of the post.....................

Saturday, 23 June 2012

Shooting Efficiency.Covering All The Angles.

Knowing the kind of numbers that a league average team is likely to produce over a range of in game performance indicators is an almost essential tool in analysing a football team.For example the average number of goals per EPL game has hovered around 2.5 over recent years and this number is split about 1.4 goals for the home side and 1.1 for the visitors.Therefore,if a team has scored an average of 2 goals per home game over a representative number of match,say a whole season,then we can be fairly confident that they possess an above average attack.Knowledge of the average immediately places an individual team either above or below that mean mark in the pecking order.

Unfortunately,while this kind of data is easy to collect for the more common place measurements such as goals,once we move towards even common advanced stats such as shot conversion,we begin to encounter limit and incomplete sets.Therefore in trying to produce an average conversion rate for shots spread over the length and breadth of the pitch,I am not going to pretend that the following analysis is definitive.It is at best ballpark and biased towards the kind of teams for which I have collected chalkboard shot data.

Unsurprisingly,we've seen in various posts that distance from goal is one of the main defining factors in how likely it is that a shot or header will result in a goal or even if a shot ends up on target or not.The further the ball has to travel to reach the goal,the less potent and less accurate those attempts become.However,I'd now like to go a step further and try to incorporate the actual position on the pitch from which the attempt originated,thus incorporating the effect shooting angle has on conversion efficiency.In addition I'll try to show which shots are more likely to be blocked,giving a route to evaluate the effectiveness or otherwise of individual team defensive actions.

Most shots originate from the central area area of the pitch at varying distances from the goal.So I've mapped each goal attempt in this analysis using the centre point of the goalline (roughly where the keeper stands to face a penalty) as the origin to record the width wise position of the shot and the perpendicular distance to the deadball line to signify the length wise co ordinate.Therefore,for example a penalty shot would have a  width co ordinate of zero yards and a length co ordinate of 12 yards.

I've regressed the outcome of every shot in my data set (around 4,000 attempts) against the co ordinates for the shot's origin to obtain the likelihood that the shot will be either blocked,on target or result in a goal.This allows a methodology to produce an average goal or shot on target expectancy to be calculated for any shot made in the EPL from any position on the field.With access to much larger datasets ultimately this will allow comparisons to be made between league average values for shot accuracy and conversion and those of individual teams or players.It can also allow us to evaluate how unlikely a player is to score certain goals and at the end of this post I'll try to put a numerical value on the two Goal of the Season contenders from 2011/12.

Rather than present a regression equation that is almost certainly biased towards my sample of teams,I'll graphically represent how the likelihood of success begins to fall away as attackers are forced to shoot from wider and wider angles from distances ranging from 12 yards out (level with the penalty spot),18 yards and finally 30 yards remote.I've compared the lateral distance effect of a central shot (in line with the penalty spot),one 4 yards either to the left or right of centre (in line with one of the uprights) and finally shots that are taken from 7 yards wide of the centre of the goal.

How the Likelihood of Scoring Decreases as Shooters are Forced Wide.

Modelling from the available data,the benefit of forcing an attacker laterally across the pitch prior to a shot is well demonstrated.A shot from around the penalty spot is going to result in a goal about once in every four attempts,but the conversion rate halves if a player has to shot from level with the spot but seven yards wider.The changing angle and potentially longer distance that the shot needs to travel to reach the goalline would appear to contribute to reduce the goal expectancy of such attempts.The effect is present at all distances.24 shots on average are required per goal from 30 yards out and central in position,but this leaps to over 50 if the attacker is pushed wide of a perpendicular line to the uprights.Shots unwisely taken from 30 yards out and level with the perpendicular of the edge of the area succeed once every 300 tries.

How the Likelihood of Shots Being On Target Decreases as Shooters are Forced Wide.

For shots on target I've included attempts made from as wide as the perpendicular of the penalty box because it's inclusion still maintains informative height proportions in the bar charts.This time the drop off in potency is less pronounced as the origin of the attempt moves wider,but the same effect is demonstrated.The previous graph are of interest from an attacking perspective,but one way to gauge the defensive contribution of a team or player or identify a defensive tactical approach may be to see if teams are blocking more shots than is usual.

How the Likelihood of Shots Being Blocked Decreases as Shooters are Forced Wide.

When it comes to blocking shots defensive players are much more adept at getting in the way of longer ranged efforts.At least a third of longrange attempts receive some kind of intervention before they have a chance to threaten the goal keeper and blocking becomes increasingly difficult as shooters approach and enter the box.Perhaps defenders and attackers are aware that a player intent on blocking has to commit to the action and so the contest closer to the goal has become more about defenders staying on their feet and attackers maintaining possession until a clear opportunity arises.

These tentative first steps may ultimately produce goal expectancy values for every region of the pitch and a method to evaluate both attackers and defenders.Numbers of players present around the ball,division of scoring method (head or feet) and even which side of the goal and which foot was used for the shot are obvious parameters that will change the raw prediction of success.But for the moment they can be used to give a raw flavour of how likely was an attempt to result in a goal.Two EPL Goal of the Season candidates stand out this seasons.Cisse's "wrong footed" strike against Chelsea and Crouch's acrobatic effort against the Champions.

Using the co ordinates of Cisse's shot and the regressions,an effort from that position on the pitch would result in a goal around once in every 100 attempts.Crouch's effort scores at about the same frequency.So we can't split the goals in term of how unlikely they were.The skill required to execute the attempts were also comparable,although Crouchie was much more involved in his own set up.Cisse's certainly had the wow factor and Crouch's had Stoke's trademark long ball game mixed in with skilful one touch control.Sometimes even advanced analysis can't split players,ultimately we sometimes have to fall back on aesthetics.

However,we can say with certainty that Crouch's goal is a long way from being the most unlikely scoring event seen at the Britannia Stadium.That accolade goes to Wigan's Maynor Figueroa whose quickly taken free kick from inside his own half would produce a goal once in every 3,000 or so efforts.

Thursday, 21 June 2012

Is Ronaldo's Euro 2012 Improvement Just Down to Confidence?

The explosion of interest and availability of both player and team statistics means that many football articles now lean heavily on numbers to either drive or validate their narrative.An invitation to read on such piece dropped into my Twitter feed a few days ago as Prozone teamed up with the BBC website to give an insight into the performance of Portugal's Ronaldo over the three group games at Euro 2012.

The article was brief and to the point.Ronaldo had apparently gained in confidence from the first group game,Portugal's 1-0 defeat at the hands of Germany and was now showing the kind of potency he had demonstrated with Real Madrid in their Championship winning season.The website then provided a raft of Prozone data that was designed to reinforce the point,most notably Ronaldo's shooting and scoring exploits in the final match against the Dutch and his increased time in possession from the German game,through the Danish game and onto the final Dutch game.

Superficially,the points were solid,if unsurprising and appeared to be backed up by legitimate use of statistics.

The BBC's site firstly states that Ronaldo's confidence has increased as the group has progressed,but they don't demonstrate how they are measuring "confidence" nor do they tell us what levels his confidence was at at the start of the tournament following his astonishingly impressive season with Real Madrid.Fairly high,I would suspect.Furthermore,they don't explain how,in their eyes,a poor performance against Germany and Denmark can lead to and increase in confidence against the Dutch.Maybe their choice of words was poor and they are merely pointing out that his level of performance has risen as the tournament has progressed.

But this generous interpretation of their article also has problems.Firstly,we are talking about just three games,so sample size issues exist.It's tempting to look at steadily increasing or decreasing player statistics over a short timespan and associate these movements with nebulous qualities such as increasing confidence.But the reality is usually much more mundane and random.Looked at over longer time scales,players or teams will produce similar levels of performances,but within these runs of matches particularly good or bad patches will occur and these are not automatically indicative of an immediate decline or improvement.

The tactical approach of a team's opponents will also play a large part in how isolated sequences of games are played out.The first group game of an international tournament is characteristically low scoring because teams are reluctant to risk defeat.Initial World Cup and Euro group games have nearer 2 goals per game,whereas final group matches are much more open and average nearer 3 goals per game.Therefore it's hardly surprising to see an attacker,such as Ronaldo struggle for goals and possession in a first group game against Germany,but prosper in a final group game against a Holland team who were required to win by at least two goals to stand any chance of progressing.In short the priorities of Ronaldo's first and last group opponents were almost polar opposites.

It's also tempting to compare Ronaldo's much more frequent appearances with his club side with his appearances at international level.Football is ultimately a team game and during his earlier career at Manchester United and latterly at Real Madrid,Ronaldo has played on sides who were almost always odds on to win their games.By contrast,Portugal,especially in the final stages of world or European competitions are unlikely to find themselves odds on to win a game in 90 minutes and could occasionally find themselves the outsider of the three possible outcomes.In the opening Portuguese game,the average weighted finishing position of the the teams for which the German players starred for last season was 3rd in a combination of the German and Spanish leagues.For the Portuguese team it was 5th in a combination of Portuguese and Russian leagues.So the German team was significantly superior to Ronaldo's Portugal and therefore to expect him to dominate the game in the manner of his club form was unrealistic at best.

More detailed raw data is great,but match situation,opponent strength,context and an awareness of the randomness of small sample size makes it even better.The BBC's blog let down the wider statistical blogging community with it's use of without context numbers to bulk out mere opinion.Lee Dixon alone deserves credit for briefly mentioning that the open Dutch approach may have contributed to Ronaldo's "improvement",but I'd expect nothing less from a former Stoke fullback.

Lee Dixon,an insightful pundit in the making.

The Ronaldo who lines up for Portugal in the quarter finals will be the same player who starred in La Liga and played out the group games and his performance will be related much more to the tactical approach of both sides,the fluctuating scoreline and the relative abilities of the teams than a journalistic opinion as to his level of confidence.

Monday, 18 June 2012

How Likely is a 2-2 Scoreline Between Spain and Croatia.

Putting aside for one moment the irony of Italy fretting over the possibility of Spain and Croatia deliberately playing out a mutually beneficial 2-2 draw,there is always the possibility that both teams may "call off the dogs" if the game naturally arrived at that scoreline.So for any anxious supporters of the Azzurri here are the chances that a fair game between Spain and Croatia will see the scoreline tied at 2-2 in ten minute intervals.

Italian Football Fans off duty at the rugby.

How Likely is the Spain Croatia Game to be Tied at 2-2 During the 90 Minutes.

Time Elapsed in Minutes. Chances of the Score Being 2-2.
10 1  in 33,000
20 1 in 2,500
30 1 in 600
40 1 in 200
50 1 in 100
60 1 in 70
70 1 in 45
80 1 in 30
90 1 in 25

When To Shoot Smart and When To Shoot Often.

71% of France's shots against England were attempted from outside the penalty area,or 15 out of 21 and one of those long range efforts saw Nasri beat his club keeper,Joe Hart with a cut shot that dipped late and low into the bottom corner.I haven't checked France's longterm shooting preference under Blanc,but I'd be surprised if such an extreme shoot on sight policy on Monday is indicative of their usual style.The further away you are from goal,the less likely you are to score or hit the target.Teams do become slightly more likely to shoot from distance if they are trailing or drawing against sides who they feel they should defeat,so perhaps that,coupled with England's tendency to get men south of the ball contributed to Les Bleus tactics.

If we take last year's EPL season as a guide,goal attempts from inside the box outnumbered those taken from distance,6100 for the former and 4841 for the latter and the average league conversion rates confirms the relative difficult of scoring from greater distances.14.5% for those attempts inside compared to 3.7% for those outside.These ratios of course vary between games and between clubs,but in view of the large drop off in conversion rates,it's worth trying to ask which teams are at least trying to balance their attempts towards an optimum.

Realistically our attempts won't be entirely successful for both practical and methodical reasons,so it's sensible to initially list the obstacles.Firstly,there's shot quality.Not all shots are going to be as easy to save as others,even from the same area of the pitch and there's no reliable way to measure a shot's difficulty,anyway.Opinion was evenly split between Hart being at fault or simply beaten by a dipping topspin shot in England's first Euro 2012 game.So even if we witness every shot personally our opinions will often be as diverse as a those of a boxing judge.The shooter and the keeper are the best judges of a shot's quality and their voiced opinions are likely to be biased based on the outcome of the attempt.

Secondly,we know with absolute clarity the fate of 14.5% of shots from inside the box in last season's EPL,they went in the goal,but we know next to nothing about the other 5214 attempts.They could have been saved,have gone wide or been blocked.They could have resulted in a change of possession or led to a corner,a goal mouth melee and a subsequent goal.So we know the benefit of a successful shot,but we can only guess at the average benefit of an unsuccessful one.

Lastly,the alternative to attempting a longrange effort isn't automatically trying one closer to the goal.We many be able to deduce that a team is shooting too often and with poor efficiency from distance,but they may need to develop the play from ten potential longrange efforts to produce one extra effort from inside the box.Overall their scoring could become even less prolific because of their laboured efforts to effect a shot from closer range.

Goal Attempts for Each Team from the EPL 2011/12 Season.

TEAM. Attempts
Inside Box.
Goals from
Inside Box.
Outside Box.
Goals from
Outside Box.
Man City. 452 73 288 20
Arsenal. 419 67 222 10
A Villa. 260 30 179 8
Sunderland. 242 31 217 13
Man Utd. 394 73 253 15
Wigan. 263 29 257 13
WBA. 301 36 245 10
Liverpool. 394 39 277 7
Stoke. 240 34 138 3
Everton. 297 42 225 8
Spurs. 371 52 331 14
Newcastle. 256 44 238 12
Swansea. 252 36 221 8
Bolton. 277 40 222 6
Wolves. 279 37 194 3
Blackburn. 241 39 214 7
Fulham. 289 42 257 6
Norwich. 280 47 243 5
Chelsea. 349 60 324 5
QPR. 244 35 296 7

The number of goals a team scores over a season is a function of how often they shoot and their conversion rate for turning those shots into goals.Therefore,in simplistic terms a team can try to increase the number of goals it scores by either shooting more or shooting more effectively or ideally both.As we've divided shots into long and shorter range attempts we can see if shooting efficiency or shot count is the better indicator of goals scored in both cases.

If we plot the number of goals a team scored from outside the box against firstly each team's number of shots and then against each team's shot conversion rate we find that the number of long range goals scored by a team is much more strongly related to conversion rate than number of attempts.

The number of long range goals a team scores is strongly linked to having players who are adept at shooting from distance,merely shooting more often may not result in many extra goals,especially if the extra shots are being taken by less skilled long range shooters.Last season at least,Chelsea's scored from just 1.5% of their shots from outside the box and they were similarly below the league average for conversion rates in 2010/11. Teams with players who can only score at that lowly rate from distance would expect to score around four goals from outside the box.Chelsea managed five last year,but only because they attempted 80 more shots than an those of an average side.Unless their longrange shots were providing extra value to go along with their tiny goals tally,such as corners or rebound shots closer to goal,they would probably been better off using the possession to try to create a shot from inside the box or alternatively restrict their long range efforts to a core group of proven specialists in an attempt to drive up their conversion rate.

If we now look at shots from inside the box we now see that the situation is reversed and there is a stronger positive correlation between the number of shots you take and the number of goals you score rather than between conversion rate and goals.So in contrast to long range goals,the main,reliable driving force behind increasing your scoring rate is to take more shots.In contrast to longer range efforts when only proven talent with a well established shot conversion rate should be encouraged to attempt a shot,once we get inside the 18 yard box,the experience of last season's EPL would suggest that shots should now be taken at every opportunity by whoever they fall to.

The picture is far from complete,we can probably calculate roughly how often a 30 yard shot results in a goal,but we are still a long way from knowing how the likelihood of a goal changes if the player chooses to make a pass instead of taking a shot.So at the moment we are merely looking at shots in terms of a success if a goal is scored and as a failure if the effort remains scoreless.

How teams try to improve their scoring can vary between sides,either through new signings adding better skills or established players discovering newfound talents.But the previous analysis may indicate that in general,teams are best served by leaving longrange efforts to a select few and allowing a free for all and a shoot on sight policy when chances fall inside the box.

If we return briefly to Nasri's strike against Hart and England,the Frenchman's conversion rate from outside the box over the last four EPL seasons with first Arsenal and then Manchester City is 5 from 111 attempts,making him likely to be a well above the league average,especially if we factor in the four additional shots that hit the woodwork.And therefore,a prime candidate to be one of France's designated long range shooters.

Saturday, 16 June 2012

How the Euro 2012 Group Games May Have Turned Out.

Each country has now played two matches at Euro 2012 and the probable qualifiers from each of the four groups are beginning to emerge.However,the results from a pair of games are likely to contain a fair amount of luck and teams may owe their presence at the top or the bottom of the tables to a couple of fairly random events.One way to see which teams have enjoyed a smooth passage and which ones haven't is to compare their actual number of points with their high and low points over the two games.

Italy for example have found themselves in very good positions in both of their matches,but they haven't seen either game through to the end.Just prior to their two opponents,Spain and Croatia each grabbing an equaliser,Italy had respectively a 62% and 78% chance of taking all three points from each of those games.

Sweden are already going home having failed to pick up a points in their opening two matches,despite leading in both and owning a near 70% chance of taking all three points in each game.

The figures below are much more descriptive of what has occurred rather than being predictive of what may happen in the last group games or the knockout phase.But they do illustrate how teams are vulnerable to luck in such competitions.Spain were more likely to take nothing from the Italy game at their low point than they were to take all three points in the same game where they eventually finished with a point.And England had a 68% chance of returning either pointless or with all three points from different stages of their match with Sweden and they somehow took home all the spoils.

Making too much of "form" shown over just two games often distorts a team's underlying level of talent.The true talent of each team at Euro 2012 indicated by a weighted sample of results over a much longer timescale probably hasn't changed much for the addition of two extra results.

The Highest Chances of Taking either 3 or 0 Points Achieved by Teams at Euro 2012 After Two Group Games.

COUNTRY Probablity High for 3 Points. Probability
 High for 3 Points.
Probability High for 0 Points. Probability High for 0 Points. Actual
1st Game. 2nd Game. 1st Game 2nd Game. 1st Game. 2nd Game.
Germany. 100 100 21 34 3 3
Spain. 54 100 62 8 1 3
Russia. 100 75 28 31 3 1
Italy. 62 78 54 32 1 1
England. 68 100 36 68 1 3
France. 36 100 68 28 1 3
Croatia. 100 32 32 78 3 1
Czech Republic. 28 100 100 35 0 3
Portugal. 21 100 100 28 0 3
Poland. 94 31 59 75 1 1
Sweden. 67 68 100 100 0 0
Denmark. 100 28 63 100 3 0
Ukraine. 100 28 67 100 3 0
Greece. 59 35 94 100 1 0
Holland. 63 34 100 100 0 0
R.O.I 32 8 100 100 0 0