Tuesday, 30 August 2016

The 2016 NFL Regular Season Done & Dusted in Excel.

The NFL’s back and so are the LA Rams, so here’s how I'd go about modelling the 2016 season.

Firstly, you need a rating for each team in the new season.

Previous season’s data is always a good starting point, but the NFL is a relatively short 16 games regular season, so wins and losses from 2015 can be heavily influenced by luck or random variation to give it a less provocative name.

Winning lots of games by narrow margins, winning more or fewer games than your points scoring/conceding merits and feeding off lots of turnover ball are usually big indicators that your win/loss record may regress towards the league of 8 wins in the upcoming season.

These traits are often described as "knowing how to win", but they are almost always just random......or possibly cheating.

Based on how teams with these flags against their record did in the subsequent season over the last decade, the Carolina Panthers (15-1) would expect to regress to around 10 wins in 2016. 

The four win Chargers should be dragged upwards to just over 7 wins.

These projections are around where the Panthers and the Chargers are quoted in the season total wins markets. So we’ve got a decent starting point that’s potentially stripped out some of the unsustainable luck that went into the 2015 regular season.

Here's the predicted wins for all 32 teams based on last year's luck based indicators.

Next we need a way to predict individual match results.

The NFL is blessed because we have Bill James’ Pythagorean Log 5 method, which takes winning percentage and home field advantage and spits out the odds of a home and away win. 

(It looks like this =((E2)*(1-G2)*$B$1)/((E2)*(1-G2)*$B$1+(1-E2)*(G2)*(1-$B$1)), where E2 is the home team win%, G2 is the away team win% and B1 is the home winning %, currently around 0.57).

The Panthers are projected as a 10-6 team, so 10/16 or a 0.625 team.

You put estimated win% for each team into each match up and get the estimated home win% for the match as the output.

Do this for every regular season game. 

The projected 0.39 Rams go to the 0.27 SF 49ers on Monday night, week one. 

Last season the Rams were in St Louis and if we stick our regressed, new season win% into Bill’s formula we get LA 1.8 (a shade of odds on) to win on the money line in SF or a spread of around 3 points in LA’s favour.

LA are currently quoted at -2.5 favs, so again, we’ve got a decent model.

Next we need simulate all 256 games from week one to week seventeen.

Again it’s easy in excel. 

We judge that LA will win in SF with a probability of 0.56 based on our money line odds of 1.8 and if we stick a random number between 1 and 0 alongside this estimated win probability and if it falls below 0.56, we grant LA their win, otherwise it’s a SF win.

Again we can do this for every regular season game and we’ve simulated one NFL season for 2016.

Ideally we’d like to repeat this a couple of thousand times and once again even excel obliges in less than a minute with a decent computer.

Check out this soccer post for the basic method.

We now have a range of season wins for all 32 teams, based on a reasonably robust new season rating and their actual intertwined schedule.

Seattle has the best projection in 2016, they are expected to average just over 11 wins.

However, the simulations illustrate the range of outcomes that are possible even for the likely best team in the NFL just due to the randomness in a short 16 game season.

From the plot of the outcomes from 10,000 iterations of the 2016 season, there is an 8% chance that Seattle will not have a winning season. Small, but certainly not insignificant.

More positively, there’s around a 1 in 1,000 chance they go 16-0. Bookies will currently give you around 499/1

It’s around 200/1 that someone goes 16-0 in 2016, again you might get 66/1 on that.

Ratings can be updated as the season progresses or you can add your own tweaks at any time, such as subjective adjustments to account for current rosters.

Finally, here's the finishing probabilities for the NFC West, based on 10,000 league simulations and appropriate tie breakers.

Fairly close to the quoted odds for the top two, with the Rams and 49er's shortened on the books in case either "do a Leicester".

Seattle and Arizona are most likely to also nab the top seeding in the NFC, along with Cinci, New England and Pittsburgh in the AFC. But you could have probably guessed that.

The Jets have about a 14% chance of winning 12 or more games, which might be an AFC outsider worth running with. 

While Dallas and Detroit each has a 5% chance of doing likewise in the NFC and getting a high post season seeding.

Monday, 29 August 2016

On The Rebound.

Expected goals are designed to look at the process of scoring, rather than the singular outcome on the day.

I've previously written about how few big chances aren't equivalent to the same expected goals spread over more shots. The later trades the possibility of scoring a larger number of goals for the greater likelihood that you will score at least once.

Another acknowledged problem when using cumulative expected goals to represent a side's achievements is the treatment of quickfire attempts from rebounding shots.

Often the chances are created well inside the box, sometimes leading to cumulative expected goals totals that exceed 1 for a connected opportunity that could at best only result in a single score.

Choosing cutoff points is always subjective, after all every action in a real match is connected to some degree from the first kick to the last, but in the table below I've charted the percentage of shots for each team in last year's La Liga that came within 10 seconds of their previous attempt.

Over 90% of attempts were made at least 30 seconds after the preceding effort, so the majority of attempts are preceded by a lengthy phase of general play.

The average in 2015/16 for La Liga as a whole was 6%, but Eibar, Espanyol and Real Betis heeded the call to "follow up" with greater regularity.

The problem of over estimating a side's attacking potential by inflating rebounds can be reduced by simulating chances, but limiting such sequences to a maximum of just one actual goal scored.

For example, two sequential chances, in quick succession each having a singular probability of being scored of 0.5 doesn't guarantee a goal, on average, as their cumulative total may suggest. Instead you score with three quarters of such related opportunities.

Real Betis hone their blocking skills against a top Premier League team.
We can demonstrate the difference between the two sets of circumstances by accounting for and then ignoring the connected events in a match simulation.

In 2015/16 Eibar drew a late season fixture with Betis, 1-1.

Visitors, Betis had six shots, one of which was the game's biggest chance, from which they scored, nicely illustrating the value of creating the odd gilt edged opportunity.

Eibar had 21 attempts, 16 of which had a goal probability of less than 10%.

Five Eibar attempts came within 10 seconds of an initial attempt. Four combined low value attempts with more valuable ones, but the final salvo united two attempts that were individually marginally odds on to be scored.

Cumulatively, Eibar's expected goals almost reached three compared to just over one for Real Betis.

Although score effects also played a part, the hosts would appear to have been unlucky to not gain three points.

Simulations based on attempts conform this impression.

A straight simulation of all 27 attempts in the game give Eibar more goals in 83% of the iterations, scoring and conceding an average number of goals per game that equals the cumulative expected goals tallies of each attack.

However, once you treat rebounds as connected events, Eibar's share of victories falls to 77% and the average goals scored does likewise from their average cumulative expected goals of 2.9 to just 2.6.

Expected goals do provide insight into a side's ability or achievement in a single match, but occasionally they over or under rate the teams at the extremes.

Friday, 26 August 2016

48 Games into the Championship.

The Championship may be only four match days old, but granular data on the state of the teams is beginning to pile up.

Over 1,000 goal attempts have been made, 300 plus of which required the keeper to try to at least make a save and Shane Duffy has already scored three league goals, although none for his actual employers.

Prediction is a constant balancing act between using recent data and larger samples that inevitably contain information from previous seasons, when a side may have had a very different lineup.

Huddersfield currently sit top of the Championship, while Newcastle, the short priced preseason favourites are closer to the relegation zone, from a points perspective than they are to the top of the pile.

The betting markets do not expect this situation to remain and Newcastle still head the market and the current leaders are given around a 4% chance of remaining in their current elevated position.

Fans of Huddersfield will no doubt relish their current position and perhaps dream that they are deserved pacesetters at this early stage, much as Crystal Palace. Swansea, Leicester supporters did in the early 2015/16 Premier league.

So is there any useful information to be gained from a sample size of just four games?

Many will be familiar with the idea that individual matches are rife with luck and looking at the process of chance creation, rather than just the relatively infrequent outcomes can be more predictive.

Huddersfield currently has a goal difference of +3, the smallest possible differential when acquiring 10 points from four matches and they have won each of their three victories by the margin of a single goal.

They've taken just slightly more attempts than they've faced and expected goals, based on shot type and position suggest that they might score, on average 4.5 goals and allow 5.5 from such chances.

They have a negative expected goal difference after four matches, that is only the 16th best record this season.

Small numbers of matches can also have very different strengths of schedules for different sides and Huddersfield has played a reasonably taxing first four games against relegated teams, Villa and Newcastle, along with Barnsley and Brentford.

Using interlocking collateral form of all 24 sides and their expected goal differential from Opta sourced data, the solutions that describe the events of the 48 games to date, place Huddersfield as the 12th best team in terms of strength of schedule corrected expected goals.

Newcastle are second under this approach, behind only Brighton.

All three promoted teams are comfortable inside the top 10, along with the likes of Wolves, Fulham, Derby, QPR and perhaps surprisingly, Reading.

Blackburn prop up whichever approach you use, with Nottingham Forest and Birmingham enjoying more elevated league positions than their shooting and schedule perhaps merits.

It's early days for the 24 team league and Huddersfield fans should perhaps screen capture for posterity this early incarnation.