## Tuesday, 4 October 2011

### Predicting and Explaining.How to use Statistics in Soccer.

Many of the posts on this blog rely on the ability to use readily available soccer stats to predict as accurately as possible the likely course a game will take.By predictions I do not mean the ability to foresee precisely what will happen in the next ten minutes of a game.Instead stats can help you understand the most likely course a match will take,while accepting that much less likely outcomes may still occur.

Therefore in this post I've gathered together some of the game statistics and will show how they impact on match results,both in the past and in the future.

The first choice to make is to decide how to measure success in soccer.Wins are the obvious and overwhelmingly popular selection,but this way fails to account for around 28% of the games that result in a draw.Therefore I have used success rate which is calculated by taking the number of wins plus half the number of draws as a percentage of total games played.So,to illustrate a team playing 10 games,winning 3 and drawing 1 will have a success rate (SR) of (3+1/2)/10 or 0.35.A team winning all 10 matches will have a SR of 1.0 or 0.5 in the relatively unlikely event that they draw all 10.

The next step is to discover if there is a strong connection between certain statistics and match result and how influential a stat is in determining the game outcome.This is most easily achieved by plotting a regression graph.Wiki provides numerous explanations of regression analysis,but at a basic level  the closeness of the points to the regression line indicates the strength of the correlation and the rate at which success rate changes given a change in the value of the statistic we are examining provides an idea of how influential each stat is.

Lastly,we have to make the very important distinction between using match statistics to explain events that have happened in previous matches and using them to predict events that will occur in the future.To again illustrate the point,consider a contrived situation where a team has lost it's last 5 games,has a success rate of zero and in each of these games they have received an early red card.Red cards as we have shown in an earlier post are detrimental to a team's chances,reducing a team's expected goal difference by about 1.5 goals over the course of 90 minutes.Therefore,the team's low success rate over those 5 games can be attributed in part to the red cards they received,but we can only use red cads in predicting future results if the number of red cards received in the past is strongly correlated to the number of red cards a team is expected to receive in the future.And as we will see there is no such correlation.

I collected data from the last five completed EPL seasons and averaged such things as success rate,shots on target,corners gained and allowed over both the first and second halves of the season.These 19 games give a reasonable sample size to work with and they also allow a method to see if statistics remain relatively stable,at least intra season.In the table below,I've listed various statistics and how they relate to success rate.The figure in the "correlation" column indicate how strongly the two value are related,the closer the number is to 1.0,then the stronger the correlation.A negative sign before the correlation number merely indicates that an increase in the variable leads to a decrease in the success rate.For example if you allow more goals,generally speaking you accumulate less wins or draws.

 Game Variable. Strength of Correlation to Success Rate. Goal Difference. 0.89 Goals Conceded. -0.69 Goals Scored. 0.68 Shots by Team. 0.43 Shots by Opponents. -0.43 Shots on Target 0.41 Shots allowed on target. -0.41 Corners Conceded. -0.30 Corners Gained. 0.26 Opponents Yellow Cards. 0.15 Fouls Conceded. -0.10 Fouls by Opponents. 0.06 Yellow Cards Received. -0.04 Opponents Red Cards. 0.04 Red Cards Shown. 0.006

No one should be too surprised to find that goal difference is the most strongly correlated to success rate.The more goals you score and the less you concede obviously gives you a great chance of winning matches.As we can see from the plot below,the relationship between the two is very strong and an increase in goal difference of 20% leads to an average increase in success rate of about 5%.

As we move down the table the correlations become less strong.Simply considering goals scored or goals conceded in isolation is still a relatively good indicator of likely success on the field,but it isn't as strong as a combination of the two.The lack of a salary cap in the EPL is an obvious factor in goals scored or conceded alone still being a good indicator.EPL team's aren't compelled to spend heavily on one side of the ball at the expense of the other and top team are free to spend as much as they can afford on attackers and defenders.Outstanding players,of whatever position tend not to stay at unsuccessful teams or long.Therefore a successful team tends to end up with very good attackers and defenders,while a mediocre one has mediocre players throughout.This leads to goal scoring ability or defensive ability being a good proxy for overall team strength.Team's that score a lot of goals,but also concede large numbers are relatively rare.

Shots attempts,shots on goal and corners have a weaker,but still noticeable relationship to success rates.As with goals scored previously,we may be witnessing a correlation between say corners gained and goals scored,which then correlates to goal difference and finally onto success rate.In short all of these "minor" stats appear to have a correlation that ultimately leads to goal difference and they can be used as a weaker proxy for attacking or defensive ability.

To demonstrate the relationship between corners gained and goals scored,here's the regression line for the last 5 EPL seasons.

Therefore,if we are presented with a series of soccer stats and want to make an informed opinion as to how good or bad the team is we can deduce that an above average number of corners gained,goals scored or shots taken at goal are likely to indicate an above average side,but the most informative piece of knowledge to have is the one most people would have intuitively chosen.Namely average goals scored minus goals allowed.Yellow cards and fouls would appear to be much less significantly related to match result when viewed over the EPL as a whole.A team facing up to concerted pressure from their opponents may be forced to commit more fouls and this may be an indicator that they are more likely to lose the game.However,this could be balanced by the team in front seeing yellow to disrupt their opponents attempts to equalise,time wasting or they may have received their fair share of cards while the game was still level.
Red cards are insufficiently common to register on a league wide,season long survey,but they are of course hugely significant in the individual games where they occur.

In a next short follow up post,I'll show you how these stats endure between teams and how useful they can be as a way of predicting,rather than merely explaining match results.