Probability Outcomes in the 2018 FIFA World Cup

Dec. 17, 2021
Lumivero
Published: Dec. 17, 2021

Veteran oil and gas industry analyst and decision-making expert Steve Begg garnered some global media attention prior to the FIFA World Cup when he used his expertise with @RISK to build a model to simulate outcomes for the popular, month-long international tournament.

Background

Steve Begg has spent most of his career dealing with uncertainty in the oil and gas business, initially with BP and Halliburton and lately on the teaching side of the industry in his adopted home of Australia. But the native of Northern Ireland recently took a temporary, and arguably more fun, detour when he combined his lifelong passion for soccer and his expertise with Monte Carlo simulation to build an @RISK-powered model to generate outcomes for the 2018 FIFA World Cup.

A professor and former Head of the University of Adelaide’s School of Petroleum whose research and teaching focus on decision-making under uncertainty and the psychological and judgmental factors that influence it, Begg created a probability model to estimate the chances of particular outcomes occurring during the tournament. Though the difference between the uncertainty in a team’s playing ability and their chance of winning is perhaps a subtle one to many, it is one of the things that make Begg’s exercise unique.

“The difficulty is in knowing how to propagate uncertainty in something we can assess, the teams’ playing ability, through to an assessment of their chance of advancing to various stages of the tournament, ultimately to the final. This is what Monte Carlo simulation enables us to do.”

In the end, the model calculated the highest probabilities of winning for world number two-ranked Brazil, with 15.4%, and not number one-ranked and defending champion Germany (13.32%). Of the top 10 teams ranked by Begg’s model, only Germany (part of Group F, the consensus “Group of Death’) and tenth-ranked Poland failed to advance to the Round of 16.

An “Uncertain” Approach

“The outcomes of many decisions we make are uncertain because of lack of information and things outside of our control,” Begg says. “Uncertainty is crucial in predicting the chance of an oil or gas field being economic. In the World Cup, it determines the many ways the whole tournament might play out – there are nearly 430 million possible outcomes of the Group Stage alone. What makes it so hard to predict is not just uncertainty in how a team will perform in general, but random factors that can occur in each match.”

Begg’s approach was to model enough possibilities to estimate the chance of any particular team progressing. FIFA world rankings, essentially determined through a relatively simple system in which points are earned through victories on the pitch, are but one part of predicting a team’s success in a tournament, and a rather simplistic one at that. Due to the complex World Cup tournament format, which places the qualifying teams in eight groups of varying difficulty, with prescribed rules as to how the winners progress, Begg constructed a sophisticated model that incorporated both the known (tournament structure) and the unknown or uncertain (team performances). The latter included what he called “tournament form” (how well a team will play, on average, over the course of the finals) and “match form” (the extent to which the team plays better or worse than its tournament form in any given match).

"From an experienced-user perspective, I really liked being able to use an @RISK function, just like any other Excel functions, without having to go through a series of input screens or boxes."Dr. Steve Begg
School of Petroleum, University of Adelaide

PERT Distribution

For each of his 100,000 simulations, Begg used the PERT probability distribution function (PDF) to describe uncertainty in tournament form. “The PERT distribution is easy to use because it just requires three numbers, a minimum, maximum and most likely,” Begg says.

The “most likely” value was derived from FIFA rankings over the past four years, supplemented by Begg’s own knowledge of international soccer to account for factors like recent “friendlies” played. The biggest change he made was to give Russia a higher score than its FIFA ranking suggested, due to its home advantage. (With a victory over heavily favored Spain in a penalty kicks shootout July 1, Russia moved on to the quarterfinals, which the model gave it a 10.9% probability of doing).

The PERT minimum and maximum values were assigned based on the most likely values. For lower-ranking teams, Begg skewed the distribution upwards based on the theory that they have a greater chance of playing better than their rankings suggest (as it turned out for Russia and Japan) than playing worse. The higher-ranking teams had the reverse – a greater chance of playing worse than their ranking (as it turned out for Germany and Argentina) than outperforming them. Middle ranked teams had a more symmetrical distribution.

For each match, each team’s “match form” was drawn from a truncated normal Probability Distribution Function (PDF) whose “mean” was that simulation’s tournament form, with a standard deviation of 1/10th the mean.

Begg then assigned the total number of goals scored in a match from a discrete PDF, derived from the number of goals scored in all of the matches played in the last three World Cups. The total goals were then divided between the two teams based on their relative match form. In the 100,000 simulations of the event’s first match, which saw Russia defeat Saudi Arabia by the unusually high score of 5-0, the model picked that exact score 91 times.

For the Group Stage, the order in the table (including goal difference and goals scored) was computed and the top two teams moved on to the next round according to the competition rules. The same process was used for all subsequent rounds. If there was a draw (tie) in a later round, then the winner of the penalty shootout was drawn from a Bernoulli PDF (featuring discrete, random variables and having only two outcomes – success/failure) with a mean of the teams’ relative form.

“From an experienced-user perspective, I really liked being able to use an @RISK function, just like any other Excel function, without having to go through a series of input screens or boxes.” Steve Begg, University of Adelaide.

Degrees of Belief

Begg stored all of the winners after each round in order to calculate the probabilities of a team progressing based on 100,000 simulations (one million simulations produced no significant differences) – which he says took only five minutes on his laptop. He also calculated the probability of the World Cup Final being between any two teams.

“Its important to realize that probability is subjective. It depends on what information you have. There’s this tendency for people who do this kind of work to obsess on data,” Begg says. “You might argue that these simulations are the most useful when you have no data at all. But you do need to understand your uncertain quantities well enough to assign a probability distribution that reflects your degree of belief in what the outcomes might be. What’s crucial is that neither the information nor your reasoning is biased.”

Although he’ll continue updating his model until the World Cup ends on July 15, Begg is already back at his paying job, where he’s been using @RISK since the mid-1990s for technical and business uncertainty assessments to support decision-making.

“At one point the nature of my work changed to things that could be tackled in spreadsheets, like economic evaluations and simple production models, so I adopted @RISK to model their uncertainty. From an experienced-user perspective, I really liked being able to use an @RISK function, just like any other Excel function, without having to go through a series of input screens or boxes,” Begg says.

“When I teach Monte Carlo simulation I do it native in Excel, so that my students in industry and at the University can see how easy it is and that there is nothing mysterious about the process – but it is cumbersome. They are then delighted to find out how much quicker and simpler it is to do it with @RISK.”

Veteran oil and gas industry analyst and decision-making expert Steve Begg garnered some global media attention prior to the FIFA World Cup when he used his expertise with @RISK to build a model to simulate outcomes for the popular, month-long international tournament.

Background

Steve Begg has spent most of his career dealing with uncertainty in the oil and gas business, initially with BP and Halliburton and lately on the teaching side of the industry in his adopted home of Australia. But the native of Northern Ireland recently took a temporary, and arguably more fun, detour when he combined his lifelong passion for soccer and his expertise with Monte Carlo simulation to build an @RISK-powered model to generate outcomes for the 2018 FIFA World Cup.

A professor and former Head of the University of Adelaide’s School of Petroleum whose research and teaching focus on decision-making under uncertainty and the psychological and judgmental factors that influence it, Begg created a probability model to estimate the chances of particular outcomes occurring during the tournament. Though the difference between the uncertainty in a team’s playing ability and their chance of winning is perhaps a subtle one to many, it is one of the things that make Begg’s exercise unique.

“The difficulty is in knowing how to propagate uncertainty in something we can assess, the teams’ playing ability, through to an assessment of their chance of advancing to various stages of the tournament, ultimately to the final. This is what Monte Carlo simulation enables us to do.”

In the end, the model calculated the highest probabilities of winning for world number two-ranked Brazil, with 15.4%, and not number one-ranked and defending champion Germany (13.32%). Of the top 10 teams ranked by Begg’s model, only Germany (part of Group F, the consensus “Group of Death’) and tenth-ranked Poland failed to advance to the Round of 16.

An “Uncertain” Approach

“The outcomes of many decisions we make are uncertain because of lack of information and things outside of our control,” Begg says. “Uncertainty is crucial in predicting the chance of an oil or gas field being economic. In the World Cup, it determines the many ways the whole tournament might play out – there are nearly 430 million possible outcomes of the Group Stage alone. What makes it so hard to predict is not just uncertainty in how a team will perform in general, but random factors that can occur in each match.”

Begg’s approach was to model enough possibilities to estimate the chance of any particular team progressing. FIFA world rankings, essentially determined through a relatively simple system in which points are earned through victories on the pitch, are but one part of predicting a team’s success in a tournament, and a rather simplistic one at that. Due to the complex World Cup tournament format, which places the qualifying teams in eight groups of varying difficulty, with prescribed rules as to how the winners progress, Begg constructed a sophisticated model that incorporated both the known (tournament structure) and the unknown or uncertain (team performances). The latter included what he called “tournament form” (how well a team will play, on average, over the course of the finals) and “match form” (the extent to which the team plays better or worse than its tournament form in any given match).

"From an experienced-user perspective, I really liked being able to use an @RISK function, just like any other Excel functions, without having to go through a series of input screens or boxes."Dr. Steve Begg
School of Petroleum, University of Adelaide

PERT Distribution

For each of his 100,000 simulations, Begg used the PERT probability distribution function (PDF) to describe uncertainty in tournament form. “The PERT distribution is easy to use because it just requires three numbers, a minimum, maximum and most likely,” Begg says.

The “most likely” value was derived from FIFA rankings over the past four years, supplemented by Begg’s own knowledge of international soccer to account for factors like recent “friendlies” played. The biggest change he made was to give Russia a higher score than its FIFA ranking suggested, due to its home advantage. (With a victory over heavily favored Spain in a penalty kicks shootout July 1, Russia moved on to the quarterfinals, which the model gave it a 10.9% probability of doing).

The PERT minimum and maximum values were assigned based on the most likely values. For lower-ranking teams, Begg skewed the distribution upwards based on the theory that they have a greater chance of playing better than their rankings suggest (as it turned out for Russia and Japan) than playing worse. The higher-ranking teams had the reverse – a greater chance of playing worse than their ranking (as it turned out for Germany and Argentina) than outperforming them. Middle ranked teams had a more symmetrical distribution.

For each match, each team’s “match form” was drawn from a truncated normal Probability Distribution Function (PDF) whose “mean” was that simulation’s tournament form, with a standard deviation of 1/10th the mean.

Begg then assigned the total number of goals scored in a match from a discrete PDF, derived from the number of goals scored in all of the matches played in the last three World Cups. The total goals were then divided between the two teams based on their relative match form. In the 100,000 simulations of the event’s first match, which saw Russia defeat Saudi Arabia by the unusually high score of 5-0, the model picked that exact score 91 times.

For the Group Stage, the order in the table (including goal difference and goals scored) was computed and the top two teams moved on to the next round according to the competition rules. The same process was used for all subsequent rounds. If there was a draw (tie) in a later round, then the winner of the penalty shootout was drawn from a Bernoulli PDF (featuring discrete, random variables and having only two outcomes – success/failure) with a mean of the teams’ relative form.

“From an experienced-user perspective, I really liked being able to use an @RISK function, just like any other Excel function, without having to go through a series of input screens or boxes.” Steve Begg, University of Adelaide.

Degrees of Belief

Begg stored all of the winners after each round in order to calculate the probabilities of a team progressing based on 100,000 simulations (one million simulations produced no significant differences) – which he says took only five minutes on his laptop. He also calculated the probability of the World Cup Final being between any two teams.

“Its important to realize that probability is subjective. It depends on what information you have. There’s this tendency for people who do this kind of work to obsess on data,” Begg says. “You might argue that these simulations are the most useful when you have no data at all. But you do need to understand your uncertain quantities well enough to assign a probability distribution that reflects your degree of belief in what the outcomes might be. What’s crucial is that neither the information nor your reasoning is biased.”

Although he’ll continue updating his model until the World Cup ends on July 15, Begg is already back at his paying job, where he’s been using @RISK since the mid-1990s for technical and business uncertainty assessments to support decision-making.

“At one point the nature of my work changed to things that could be tackled in spreadsheets, like economic evaluations and simple production models, so I adopted @RISK to model their uncertainty. From an experienced-user perspective, I really liked being able to use an @RISK function, just like any other Excel function, without having to go through a series of input screens or boxes,” Begg says.

“When I teach Monte Carlo simulation I do it native in Excel, so that my students in industry and at the University can see how easy it is and that there is nothing mysterious about the process – but it is cumbersome. They are then delighted to find out how much quicker and simpler it is to do it with @RISK.”

magnifierarrow-right
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram