Wednesday 22 February 2012

Compiling Match Odds (Part II)

In the first part of this series (HERE), we looked at Goal Supremacy. This produced match odds based on the goal difference of each team in question over the last six matches. Taking a match at random (and it really was random), the compiled odds weren't too far away from the odds that the bookmakers were offering. So this would indicate that we've made a decent enough start on the path to generating our own match odds.

This was all fine and dandy, but it's not too difficult to see some problems with this first method. Chief amongst these is that bookmakers are really pretty good at pricing up matches (the bastards), and if we perform our own pricing just based on goals scored and conceded, we'll probably not gain any real ground. Relying only on such an obvious and basic method may not throw-up any value bets for us to get our teeth into.

Also, as discussed in the comments after the initial posting, there is the issue with the type of goal scored. Should a goal scored against Man Utd or Man City be given the same points value as a goal scored against Wolves or Bolton? And not discussed but just as relevant is whether to use all the goals for all the past six matches. By that I mean should we use all the last six matches, home and away? Or should we just use the last six home matches for the home team and the last six away matches for the away team? Yes, yet another weighting issue to consider. Sorry about that.

In essence, we can say that Goal Supremacy was a good start at trying to price up a match, but if we're serious about our pursuit of finding value, then we probably need to delve just a little bit further.

The next method, detailed below, sadly does not address all the issues just outlined, but does provide another way of looking at football data and another way of compiling match odds. This can only help us to build-up an overall calculation. Later methods should begin to address more of our concerns.


Dyads and Triads

You'll be pleased to know that this is nothing to do with gang culture. Unfortunately I cannot remember where I stole this method from, so am not able to give due credit and reference to the originator. This doesn't really matter. I'm sure whomever I stole it from probably stole it themselves. That's how these things work. I do seem to remember, however, that this was a system devised by some professor or other when he was trying to formulate an effective way to play the Pools.

So what is a dyad and a triad? Nothing too dramatic: a dyad is a sequence of two matches and a triad is a sequence of three matches. If we look at dyads first, we can see that from the three possible results, we can acquire six possible outcomes depending on whether we're dealing with the home team or the away team:

  1. Home Win    (HW)
  2. Home Draw   (HD)
  3. Home Loss   (HL)
  4. Away Win    (AW)
  5. Away Draw   (AD)
  6. Away Loss   (AL)

A dyad is a combination of any of these two, and the triad is the predicted third result. These groupings have been looked at in detail over the course of several seasons, and back-tested to predict the result of the third match in the sequence. 
Now my stolen... erm I mean acquired, data for this method was taken from 10 years worth of English football league results, and looked at the percentage of home wins, draws and away wins based on every single dyad combination. For example, the triad result for the dyad "HW, AL" showed the following:

   Home Win = 47.38%, Home Draw = 26.19%, Home Loss = 26.43%

Of course, this is not too far from the long-term average anyway, but it's these relationships and the results recorded for each combination that going to be is the basis of our predictions, allowing us to draw-up probabilities for each outcome. Opportunities will arise when the dyads throw-up odds away from the norm.

Right, so when looking at Goal Supremacy, we used the upcoming Arsenal v Spurs match. So let's now use the same match for this method.

   Arsenal's current dyad (EPL only) is: HW, AW
   Spurs' current dyad (EPL only) is     : AD, HW

Note, that the latest match always comes last in the dyad (this is important). Using the data available, this equates:

   Home = 49.30%, Away = 25.10%, 25.60%

And if we divide 1 by all these percentages, we get match odds of:

   Home = 2.03, Away = 3.98, Draw = 3.91

It's about now that you may be groaning and saying to yourself, "Well, how the hell am I going to generate all this data that he's talking about? I might as well stop reading now."

That's your decision, of course, but I would like to just mention that I will be providing all this data for you in a spreadsheet. Please find the link to download the file at the end of this post. The data comes in the form of an Excel workbook. Inside the workbook are the following worksheets:
  • "Config" - contains each combination to be looked-up with a lookup number (defined as "dyad_lookup" name)
  • "Triad HW" - Every dyad combination that resulted in a home win triad over the last ten years
  • "Triad AW" - Every dyad combination that resulted in an away win triad over the last ten years
  • "Triad D"     - Every dyad combination that resulted in a draw triad over the last ten years
  • "Analysis"   - contains an example of how I perform the lookup. Examine the H9, I9 and J9 for the lookup itself. If you're an Excel wiz, then you may have a superior method to the one I've used.

And that's it. This method, even if used on it's own, is still good enough to raise any predictions well above random. Good luck.

Download the data in Excel spreadsheet format =>  HERE

18 comments:

  1. Good article Eddie. I'll be emailing you for a copy of the spready.

    As I'm not a Blogger user I'm not familiar with how you would go about uploading media/files to your blog and offering them for download via a link. The followeing site suggests you can't do it directly but it is possible indirectly by having your files that you want to give away on another site then post a link to that site in your blog post...but you probably already knew that. Anyway, here's where I read that...

    http://www.google.com/support/forum/p/blogger/thread?tid=1a24f84153cdf305&hl=en

    All the best

    Swearbox

    ReplyDelete
  2. Thanks for the URL, Swearbox. I have tried Google Docs, but there seems to be a bug in the uploading. I may try another site as the host for the upload.

    Cheers
    Eddie.

    ReplyDelete
  3. I don't think it's possible to host non-image files on blogspot blogs. There are lots of sites that provide hosting services though - many free for small volumes (which would include spreadsheets such as yours). Google for "free hosting service" to see a selection. Never used one personally, so I can't offer a recommendation.

    ReplyDelete
  4. Thanks Mike. You're right. Blogspot does not support this, so I'll have to get the file uploaded somewhere else.

    ReplyDelete
  5. I can stick it on one of my sites if you want, and email you the URL. No charge obviously.

    ReplyDelete
  6. Eddie - going back to goal supremacy, what is your preferred method for the last 6 games? i.e last 6 both home and away, or last 6 home for home team and last 6 away for away team? I am leaning towards the latter.

    Cheers

    ReplyDelete
  7. Boro and Ken, thanks to both of you for offering to host the file for me. That's very kind of both of you. Before I received your offers, I had already plonked it in "dropbox" and that seems to work okay.

    Cloppa, I'll answer your question in my next post.

    Cheers
    Eddie.

    ReplyDelete
  8. Re: Cloppa

    What about maybe a weighted average, favouring the latter (last 6 home games for home team, last 6 away games for away team)

    I would maybe start with a 66/33 weighting (in favour of the latter method) and see what yields the more accurate results.

    If you set it all up in a spreadsheet, you have have odds at different weightings created (e.g, 50/50, 33/66, 66/33, 25/75, 75/25, etc.) and then compare them to the actual odds (after accounting for overround). After about 1,000 games or so, I suspect you would be able to get an idea of what sort of weighting looks more accurate overall then start to tweak from there.

    Might be something I look at for the upcoming season.

    ReplyDelete
  9. Cloppa, I realise that this is late and may not be read, and also that I haven't seen soccerdude's reply. However my view is that you should use the last 6 games regardless of whether they are home games or away games. Otherwise you are double accounting for home advantage. Once through historic home advantage I.e. the + 0.4647 and once through home form I.e. the likelihood of improved goal difference for home games only. That is just my slant on it though and your experience may tell you different?!

    ReplyDelete
  10. Probably is a bit late, Anon.

    For what it's worth, I agree with you. I think it should be overall games and not a weighted set.

    ReplyDelete
  11. due to a cpu failure i lost all my data on dyads an was looking back into it an found this
    i had been working on dyad some years back an cant recall the book
    think it was taken from Football Fortunes im in the process of looking through my loft for the book as im sure i still have it an the excel sheet you've shared looks very much like the 1s in the book before i lost everything i was trying to do the same for triads but got stuck in the data set out in excel an looking for the % of say HW,HL,HW instead of the dyads HW,HL
    done this as a start =IF(E2>F2,"HW",IF(E2<F2,"HL",IF(E2=F2,"HD",)))
    based on the sheets downloaded fom football data
    RND Date HomeTeam AwayTeam FTHG FTAG
    1 18.8.12 Arsenal Sunderland 0 0
    E being FTHG an F FTAG the tricky bit is setting up the metrics an getting it all into excel as i would like to have a sheet that does it for the last 6,5,4 as well as 2,3 to see if it predicts match out comes better anyway thanks for the sheet if i get a working sheet up an running in excel ill let you know may be of some use to other readers

    ReplyDelete
  12. Hi Anon

    Now you mention it, I think this probably is the data from Football Fortunes as I definitely do have that book - although like you, I'm not sure where it is these days.

    Can't quite work-out what you're trying to achieve, but if you just want percentage of how many HW, HL, HW a particular team has, then my advice would be to use either countif() or sumproduct().

    ReplyDelete
  13. found the book who ever you borrowed it off used this book an it was a bloke called Frank George taken from a book called football pools and how to win cant find any reference to it anywhere on google
    the odds on your sheets are taken from 1985/1995 at the time think 2005 ish when i was looking into these i was after refreshing the data to see if it had change much from the 85/95 period to 95/05
    but my excel was terrible an i gave up
    2009 tried again got a working sheet that could look up past results using a sheet like yours
    i then had a page that looked up the past 2 results for both teams which popped up the said % but as before i couldn't be sure if the data was reliable that's when my cpu died
    as for what i am trying to achieve i would like to know how the % were gotten so i can update them from 85/95 to 02/12 to see if they are the same
    but my excel is not much better an i cant seem to grasp how to get the data for each dyad from a 10yr period

    ReplyDelete
  14. For example, the triad result for the dyad "HW, AL" showed the following:

    Home Win = 47.38%, Home Draw = 26.19%, Home Loss = 26.43%

    How did you end up with these percentages please ?

    ReplyDelete
  15. And also for these dyads how did you end up with those 3 percentages ?

    Arsenal's current dyad (EPL only) is: HW, AW
    Spurs' current dyad (EPL only) is : AD, HW

    Note, that the latest match always comes last in the dyad (this is important). Using the data available, this equates:

    Home = 49.30%, Away = 25.10%, 25.60%

    ReplyDelete
  16. How do we get the Over/Under odds for a soccer match ?

    ReplyDelete
  17. Hi Betvo

    Apologies for neglecting your comments. It wasn't intentional.

    As mentioned in the post itself, the percentages were all stolen from some book (Football Fortunes I think). They probably need refreshing at some point.

    With regards to your last question, look at this post:

    http://footytradingposts.blogspot.co.uk/2012/01/poisson-for-dummies.html

    Cheers

    ReplyDelete
  18. Hi SoccerDude thank you very much for your kind reply I really appreciate your time and interest.

    I want to ask a question because this is somewhat blurred to me... using the Poisson system with Excel Poisson(X,Mean,Cumulative) is the same as the Equation you have used there in that article?

    One last question... with the Poisson model we can Price both 1x2 and over under? How can we price the over/under with the Poisson model if this is to be used for the 1x2 ?

    Thanks

    ReplyDelete

Note: only a member of this blog may post a comment.