World Cup Elo Part 3: Results of Toy Model for Group Stage
Part 1 can be found here, and Part 2 here.
Over the winter, I built a straightforward Elo rating system to model soccer/football national teams during the World Cup. Elo rating systems are used all around us, but most people never bother to delve into the details of how they are constructed, which makes it difficult to draw precise inferences, so this is a useful exercise. Now that the group stage of the 2018 World Cup has wrapped up, I’ll take a quick look at how the Elo rating system performed if it were used as a betting system..
Major Caveats
The resulting Elo model should not be considered truly predictive. It operates with a huge restriction, utilizing only the data from past World Cups and the qualifying rounds (ignoring other international competitions). It also doesn’t draw in any outside information such as injury news, or player retirements. Clearly, we could improve the predictive ability of our model by including more information. The advantage of this simple approach is that the resulting model is readily interpretable. It simply represents our best estimate of a team’s strength based on their World Cup and qualifying results, and seeks to quantify those results in the most accurate way possible (because simply noting the wins and losses is woefully insufficient). Incorporating injury information is valuable, but there’s no straightforward way to do it, and it would be a very subjective process. Thus, we could improve the specific predictions of our model, but we would lose the ability to properly interpret the results. Instead, I think it’s more valuable to have a model that has a narrow but precise use. This provides a foundation which we can use as a baseline for our own subjective analysis (such as the loss of players to injury, or rumors about a team being poorly prepared, or not giving it their all during qualifying matches).
Even with this narrow approach, there are a vast number of decisions needed to implement an Elo rating system. While we try our best to leave subjective analysis outside of our model, many of these decisions do not have a clear right answer. In a past post (linked here), we outline some of these decisions. These range from home field advantage, the structure of temporal weighting of results (i.e. the “K factor”), and whether to include margin of victory rather than match result. We selected our final parameters using a variety of validation techniques, performing a search of the parameter space and comparing how predictive our model is along the way (compared to the observed results tempered with betting markets). This is just a brief summary, expanded upon in other posts, the point is that we result in an implementation of the Elo rating system that performs well for the peculiar situation of football national teams in the World Cup.
This preamble leads to the fact that the group stage just wrapped up, and it’s interesting to take a look at the results. Now, simply comparing the actual match result to the percentage chance we predicted for that result isn’t terribly insightful. Our model will always select the “favorite”, so this would just tally up the total number of upsets. In fact, as a draw in football is essentially always less likely than the stronger team winning, this means that we would consider every draw to be an “incorrect” prediction. What we want to quantify is how shocking our model found those upsets relative to other norms.
Evaluating its Performance
The natural way to evaluate the Elo rating model is to consider the results of using it to blindly bet on the matches. Again, it cannot be said enough that there’s little reason to think that the raw model would be an accurate tool to beat the betting markets.Bettors take into account all information available, and our model solely uses past results. However, it’s a useful measure to see whether these predictions are at least plausible. Our final caveat here is that a null result doesn’t mean all that much. If the betting markets were perfectly efficient, picking bets at random would be just as effective as any other strategy. Obviously, betting markets are not perfectly efficient, they represent our best collective wisdom, which may well be misguided. But it’s important to remember that we would be surprised to find any betting strategy result in huge losses over a long period. The most telling “sanity check” of our model is that by and large, the odds it estimates are quite similar to the implied odds of the betting markets. This shows that the Elo rating system is at least a plausible model of team strength, and that it should be a valid starting point for subjective amendment (taking into account other factors beyond past team performance).
Our simple betting game goes as follows. For each of the 48 games played in the 2018 group stage, we use our model to compute the likelihood of either team winning, or a draw occuring. We compare those probabilities to the implied odds of those results based on the moneylines given by OddsPortal, and we select the result where our prediction is the most above the implied odds. Then, we consider the observed result had we placed a $100 bet on that result. Thus, if we are incorrect, we lose $100, and if we’re right, our winnings are based on the given money line.
Results
Team 1 | Team 2 | Model: P(Win 1) | Model: P(Draw) | Model: P(Win 2) | Line: Win 1 | Line: Tie | Line: Win 3 | Profit |
---|---|---|---|---|---|---|---|---|
Russia | Saudi Arabia | 0.562 | 0.233 | 0.206 | -217 | 334 | 807 | -100 |
Russia | Egypt | 0.566 | 0.231 | 0.203 | -105 | 257 | 330 | 95.2 |
Russia | Uruguay | 0.462 | 0.262 | 0.276 | 193 | 208 | 177 | -100 |
Saudi Arabia | Egypt | 0.36 | 0.289 | 0.351 | 445 | 263 | -130 | 445 |
Saudi Arabia | Uruguay | 0.278 | 0.262 | 0.46 | 1700 | 573 | -455 | -100 |
Egypt | Uruguay | 0.274 | 0.261 | 0.465 | 744 | 309 | -196 | -100 |
Portugal | Spain | 0.307 | 0.273 | 0.421 | 338 | 227 | 104 | -100 |
Portugal | Morocco | 0.586 | 0.225 | 0.189 | -143 | 270 | 504 | -100 |
Portugal | Iran | 0.528 | 0.243 | 0.23 | -167 | 309 | 539 | -100 |
Spain | Morocco | 0.641 | 0.207 | 0.152 | -278 | 417 | 927 | -100 |
Spain | Iran | 0.587 | 0.225 | 0.188 | -476 | 586 | 1832 | -100 |
Morocco | Iran | 0.307 | 0.273 | 0.42 | 123 | 198 | 317 | 317 |
France | Australia | 0.536 | 0.241 | 0.224 | -370 | 623 | 944 | -100 |
France | Peru | 0.583 | 0.226 | 0.191 | -164 | 287 | 561 | -100 |
France | Denmark | 0.447 | 0.266 | 0.287 | 106 | 181 | 434 | -100 |
Australia | Peru | 0.407 | 0.276 | 0.316 | 193 | 254 | 149 | -100 |
Australia | Denmark | 0.287 | 0.266 | 0.447 | 369 | 244 | -110 | -100 |
Peru | Denmark | 0.25 | 0.251 | 0.498 | 271 | 211 | 131 | 131 |
Argentina | Iceland | 0.635 | 0.209 | 0.156 | -303 | 441 | 1034 | -100 |
Argentina | Croatia | 0.516 | 0.246 | 0.237 | 108 | 229 | 312 | -100 |
Argentina | Nigeria | 0.583 | 0.226 | 0.191 | -200 | 426 | 499 | -100 |
Iceland | Croatia | 0.257 | 0.254 | 0.488 | 341 | 299 | -120 | -100 |
Iceland | Nigeria | 0.309 | 0.274 | 0.417 | 172 | 210 | 199 | 199 |
Croatia | Nigeria | 0.428 | 0.271 | 0.301 | -147 | 279 | 514 | -100 |
Brazil | Switzerland | 0.561 | 0.233 | 0.206 | -208 | 343 | 711 | -100 |
Brazil | Costa Rica | 0.556 | 0.234 | 0.209 | -476 | 571 | 1817 | -100 |
Brazil | Serbia | 0.571 | 0.23 | 0.199 | -213 | 368 | 661 | -100 |
Switzerland | Costa Rica | 0.352 | 0.289 | 0.36 | -167 | 270 | 663 | -100 |
Switzerland | Serbia | 0.367 | 0.287 | 0.346 | 196 | 204 | 175 | 196 |
Costa Rica | Serbia | 0.371 | 0.286 | 0.343 | 436 | 243 | -119 | -100 |
Germany | Mexico | 0.536 | 0.24 | 0.224 | -204 | 358 | 624 | 624 |
Germany | Sweden | 0.56 | 0.233 | 0.207 | -213 | 366 | 664 | -100 |
Germany | South Korea | 0.698 | 0.187 | 0.115 | -588 | 740 | 1837 | -100 |
Mexico | Sweden | 0.381 | 0.283 | 0.335 | 121 | 240 | 259 | 259 |
Mexico | South Korea | 0.547 | 0.237 | 0.216 | -143 | 281 | 473 | -100 |
Sweden | South Korea | 0.523 | 0.244 | 0.233 | 127 | 204 | 293 | 127 |
Belgium | Panama | 0.608 | 0.218 | 0.174 | -455 | 577 | 1662 | -100 |
Belgium | Tunisia | 0.488 | 0.254 | 0.258 | -303 | 419 | 1104 | -100 |
Belgium | England | 0.309 | 0.274 | 0.418 | 272 | 184 | 147 | 272 |
Panama | Tunisia | 0.259 | 0.255 | 0.486 | 337 | 278 | -114 | -100 |
Panama | England | 0.14 | 0.201 | 0.659 | 1773 | 548 | -455 | -100 |
Tunisia | England | 0.216 | 0.237 | 0.547 | 755 | 323 | -204 | -100 |
Poland | Senegal | 0.397 | 0.279 | 0.324 | 152 | 206 | 233 | 233 |
Poland | Colombia | 0.204 | 0.232 | 0.563 | 249 | 246 | 121 | 121 |
Poland | Japan | 0.314 | 0.276 | 0.411 | 171 | 215 | 195 | -100 |
Senegal | Colombia | 0.179 | 0.221 | 0.6 | 419 | 282 | -132 | 75.8 |
Senegal | Japan | 0.283 | 0.264 | 0.452 | 165 | 199 | 221 | -100 |
Colombia | Japan | 0.512 | 0.248 | 0.24 | -112 | 234 | 411 | 411 |
Team 1 | Team 2 | Model: P(Win 1) | Model: P(Draw) | Model: P(Win 2) | Line: Win 1 | Line: Tie | Line: Win 3 | Profit |
---|---|---|---|---|---|---|---|---|
Russia | Saudi Arabia | 0.562 | 0.233 | 0.206 | -217 | 334 | 807 | -100 |
Russia | Egypt | 0.566 | 0.231 | 0.203 | -105 | 257 | 330 | 95.2 |
Russia | Uruguay | 0.462 | 0.262 | 0.276 | 193 | 208 | 177 | -100 |
Saudi Arabia | Egypt | 0.36 | 0.289 | 0.351 | 445 | 263 | -130 | 445 |
Saudi Arabia | Uruguay | 0.278 | 0.262 | 0.46 | 1700 | 573 | -455 | -100 |
Egypt | Uruguay | 0.274 | 0.261 | 0.465 | 744 | 309 | -196 | -100 |
Portugal | Spain | 0.307 | 0.273 | 0.421 | 338 | 227 | 104 | -100 |
Portugal | Morocco | 0.586 | 0.225 | 0.189 | -143 | 270 | 504 | -100 |
Portugal | Iran | 0.528 | 0.243 | 0.23 | -167 | 309 | 539 | -100 |
Spain | Morocco | 0.641 | 0.207 | 0.152 | -278 | 417 | 927 | -100 |
Spain | Iran | 0.587 | 0.225 | 0.188 | -476 | 586 | 1832 | -100 |
Morocco | Iran | 0.307 | 0.273 | 0.42 | 123 | 198 | 317 | 317 |
France | Australia | 0.536 | 0.241 | 0.224 | -370 | 623 | 944 | -100 |
France | Peru | 0.583 | 0.226 | 0.191 | -164 | 287 | 561 | -100 |
France | Denmark | 0.447 | 0.266 | 0.287 | 106 | 181 | 434 | -100 |
Australia | Peru | 0.407 | 0.276 | 0.316 | 193 | 254 | 149 | -100 |
Australia | Denmark | 0.287 | 0.266 | 0.447 | 369 | 244 | -110 | -100 |
Peru | Denmark | 0.25 | 0.251 | 0.498 | 271 | 211 | 131 | 131 |
Argentina | Iceland | 0.635 | 0.209 | 0.156 | -303 | 441 | 1034 | -100 |
Argentina | Croatia | 0.516 | 0.246 | 0.237 | 108 | 229 | 312 | -100 |
Argentina | Nigeria | 0.583 | 0.226 | 0.191 | -200 | 426 | 499 | -100 |
Iceland | Croatia | 0.257 | 0.254 | 0.488 | 341 | 299 | -120 | -100 |
Iceland | Nigeria | 0.309 | 0.274 | 0.417 | 172 | 210 | 199 | 199 |
Croatia | Nigeria | 0.428 | 0.271 | 0.301 | -147 | 279 | 514 | -100 |
Brazil | Switzerland | 0.561 | 0.233 | 0.206 | -208 | 343 | 711 | -100 |
Brazil | Costa Rica | 0.556 | 0.234 | 0.209 | -476 | 571 | 1817 | -100 |
Brazil | Serbia | 0.571 | 0.23 | 0.199 | -213 | 368 | 661 | -100 |
Switzerland | Costa Rica | 0.352 | 0.289 | 0.36 | -167 | 270 | 663 | -100 |
Switzerland | Serbia | 0.367 | 0.287 | 0.346 | 196 | 204 | 175 | 196 |
Costa Rica | Serbia | 0.371 | 0.286 | 0.343 | 436 | 243 | -119 | -100 |
Germany | Mexico | 0.536 | 0.24 | 0.224 | -204 | 358 | 624 | 624 |
Germany | Sweden | 0.56 | 0.233 | 0.207 | -213 | 366 | 664 | -100 |
Germany | South Korea | 0.698 | 0.187 | 0.115 | -588 | 740 | 1837 | -100 |
Mexico | Sweden | 0.381 | 0.283 | 0.335 | 121 | 240 | 259 | 259 |
Mexico | South Korea | 0.547 | 0.237 | 0.216 | -143 | 281 | 473 | -100 |
Sweden | South Korea | 0.523 | 0.244 | 0.233 | 127 | 204 | 293 | 127 |
Belgium | Panama | 0.608 | 0.218 | 0.174 | -455 | 577 | 1662 | -100 |
Belgium | Tunisia | 0.488 | 0.254 | 0.258 | -303 | 419 | 1104 | -100 |
Belgium | England | 0.309 | 0.274 | 0.418 | 272 | 184 | 147 | 272 |
Panama | Tunisia | 0.259 | 0.255 | 0.486 | 337 | 278 | -114 | -100 |
Panama | England | 0.14 | 0.201 | 0.659 | 1773 | 548 | -455 | -100 |
Tunisia | England | 0.216 | 0.237 | 0.547 | 755 | 323 | -204 | -100 |
Poland | Senegal | 0.397 | 0.279 | 0.324 | 152 | 206 | 233 | 233 |
Poland | Colombia | 0.204 | 0.232 | 0.563 | 249 | 246 | 121 | 121 |
Poland | Japan | 0.314 | 0.276 | 0.411 | 171 | 215 | 195 | -100 |
Senegal | Colombia | 0.179 | 0.221 | 0.6 | 419 | 282 | -132 | 75.8 |
Senegal | Japan | 0.283 | 0.264 | 0.452 | 165 | 199 | 221 | -100 |
Colombia | Japan | 0.512 | 0.248 | 0.24 | -112 | 234 | 411 | 411 |
In the end, our net profit is a whopping $106, after wagering $4,800 total. Obviously, over a sample size of 48 games, this is a quite insignificant result. But it’s actually slightly more impressive than it initially seems, because the betting lines take into account the rake of the sportsbook. If we picked randomly at a perfectly efficient market, we would on average have about a -5% return on investment (ROI). So our observed 2% ROI could charitably be considered in the context of being about a 7% improvement on the accuracy of the betting market, but obviously this is still not terribly significant over such a small sample.
So far from revolutionary, but seeing the predictions turn a small profit above the rake is neat as our model bet on all games, rather than just the ones it had confidence in. One would hope that this result could be improved upon by actually using the information available leading up to the world cup, such as squad selection, fitness rumors, and injury absences, to improve the result (however, beating the betting markets tends to be trickier than people expect, even with additional information).
Our most profitable selection was Mexico’s win over Germany, which had an incredible line of +624. Our system would never give a competent team like Mexico such low odds, given the historical likelihood of upsets at the World Cup. This is a nice reminder about how large a sample would be necessary to draw any inferences about an actual betting strategy. Our system is sometimes selecting events which the betting lines have at <15%, and believing that the true number is slightly higher. We would lose money on this bet the vast majority of the time, even if we actually had a significant edge on the field.
Summary
In sum, this Elo rating system is not intended to be predictive, but to be a foundation for how to quantify the team strength shown in past world cup and qualifying games. It’s promising to see that the resulting predicted outcome percentages are close to those implied by the betting odds prior to the match. When we use the resulting model as a betting strategy for every single group stage match, we see that it turns a small profit beyond the rake taken by the bookies, although over a 48 game sample this effect is much too small to prove anything besides the rating system being at least plausible.