Goldman Sachs model to predict World Cup game results didn’t come close

yk · on July 16, 2018

> And in any case, the model only generated probabilities of winning a game and advancing, and no team was given more than an 18.5 percent chance of winning the World Cup.

> [...]

> But Goldman Sach’s misfire is perhaps the most curious.

The model said, that there is a lot of uncertainty, and as it happens, it was entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5 times the team will not win, and that that is the highest chance does not say much about the model.

And in general this is one instance of the well practiced journalistic technique to wait for results first and then define a bar afterwards to criticize the results according to standards that did not exist when the performance happened. (I guess in this case it is even worse, we could construct a reasonable test of the model performed, I have the suspicion that that was in the original paper and that the journalist either did not understand it, or, more likely, choose to ignore it in favor of writing a better story.)

basch · on July 16, 2018

Their model also had France at 2nd most likely, Belgium at 5th, and England at 7th. 3 of their top 7 made the Semi-Finals, and they called the eventual winner as Second Most Likely, and more likely than Germany. They actually predicted the Brazil/Belgium game in the Quarter Finals, but got the winner wrong. Brazil had 27 shots and 9 on target with 59% posession. Belgium only had three shots on target, and made two of them to win.

They overranked Germany, and underranked Croatia. Nearly every other person in the world did the same.

Look how disingenuous the Bloomberg article is. "Goldman Sachs updated the model throughout the tournament. It predicted a Brazil-Spain final on June 29 and Brazil-France on July 4. Its most recent prediction had England and Belgium squaring off for the cup. Both were eliminated in the semifinals." But their actual Brazil-France prediction had 8 teams left, and the winners of that round were all in the top 5. https://twitter.com/GoldmanSachs/statuses/101448576794142720... They even had Croatia over England, and France over Belgium.

PaulRobinson · on July 16, 2018

> Brazil had 27 shots and 9 on target with 59% posession. Belgium only had three shots on target, and made two of them to win.

A modern model would accommodate for the fact that those numbers alone mean nothing, because they don't. Those are the numbers broadcasters reluctantly put on a screen for entertainment value, but they don't have real analytical power because they have no comparative metric.

How up or down were each of those numbers against previous wins and losses for each team?

What was Brazil's conversion from on-target shots before the tournament?

What was Belgium's success/failure rate on on-target shots they were defending against?

Likewise the other way around: were Brazil guilty of particularly poor defending? Were Belgium finding ways of making on-target shots count against all opposition, or was it luck on this game?

Any human analyst could tell you going into that game that Belgium were "lucky" and easily free scoring beyond expectations, able to make more of fewer opportunities. Likewise the consensus from most experts was that Brazil were guilty of mild complacency, the team were young and not yet formed into a strong unit yet (rather still just 11 strong individuals at any one point in time), and their on-target shots - whilst frequent - were of lower probability of being able to turn into goals due to distance, power, position, etc.

So why did the Bloomberg model not pick that up?

I actually think they did pretty well all things considering, but I'd love to see whether they did any runs on previous World cups to try and check their thinking and whether they over-fitted a little to a couple of key metrics. I think the lack of metrics from previous games might mean they relied on some headline numbers, but there's more that they could have done to get a better model here...

Still, it's not their job is it? Just a bit of fun... which is a good job, because I find it just a little bit amusing.

toolslive · on July 16, 2018

Some teams/coaches like possession, others do not. If a team plays a dominance based game, eventually, their defenders will be (almost) on the opponents half. When this happens, it becomes edgy, and a loss of possession can be punished by a counter. That counter needs to be executed as fast as possible. Teams that are ahead often retreat and let the opponent have the ball to be able to break out like that. It just means possession doesn't really say anything. Belgium went ahead against Brazil with a bit of luck, and then let Brazil have the ball. Belgium's second goal was a classic counter punch. After that Brazil was allowed to have the ball while Belgium tried to control the game. Regarding odds, Belgium was number 3 in the world when the game was played, Brazil was number 2. Obviously, it would not be a `walk over` for anyone.

If you look at both Belgium/England games, you see number 2 against number 12. The ranking was respected there.

https://www.fifa.com/fifa-world-ranking/ranking-table/men/in...

Used to be a silly ranking system, but it's elo based these days, so it's not too shabby.

cambalache · on July 16, 2018

Instead they mean, a lot. Shots on target is the proxy you have (except of course goals) to derive who team dominated more. As a matter of fact if you follow the sport, you will know most coaches will be satisfied if the shots on target is good, even if one particular game no goals are scored. The tragic thing for Brazil is that the WC is a short and direct elimination tournament. A bad game and you are gone.

dasil003 · on July 16, 2018

Sure those are just basic stats and could be improved probably, but they do reflect the reality that Brazil should have won; they got unlucky with an own goal, and they made some key mistakes at critical times, failing to finish great chances.

You're not going to find a statistical approach that will account for the subtleties that led to this outcome. The problem with soccer stats in general is that everything hinges on low-frequency events based on subtle differences of timing and space.

Basketball by comparison is much more stat-rich, and there are a lot of cool advanced analytics, but even still they are full of gaps that are obvious to any expert watching the game. Afterwards maybe you can find the statistical signature of something you saw, but then you risk overfitting again, just the same as soccer.

thaumasiotes · on July 16, 2018

> You're not going to find a statistical approach that will account for the subtleties that led to this outcome. The problem with soccer stats in general is that everything hinges on low-frequency events based on subtle differences of timing and space.

I think this deserves to be elaborated a bit: a game in which 1 is a good score, and often a game-winning score, is never going to be accurately predicted based on a statistical approach, because scoring is too rare for a statistical approach to work well. Low scores mean that individual games have an extremely large element of chance.

Imagine one team is about 4% better than another team; they should be favored about 51-49 to score a point. If a game scored 300 points, that difference would be perceptible within one game. But to resolve the same difference accurately in games that score 3 points each takes many, many, many games.

bambax · on July 16, 2018

> The model said, that there is a lot of uncertainty, and as it happens, it was entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5 times the team will not win, and that that is the highest chance does not say much about the model.

But do you need a sophisticated model and lots of so-called "AI" to arrive at the conclusion that there's a lot of uncertainty?? The point of the model is to reduce uncertainty, not find that it's there and do nothing about it.

thousandautumns · on July 16, 2018

The point of the model is absolutely not to reduce uncertainty, it is to quantify it, which are two very different things. No model reduces uncertainty in a probabilistic sense.

And no, you don’t need statistics or machine learning to say “there is a lot of uncertainty”, but you do in order to quantify that uncertainty.

young_unixer · on July 16, 2018

I think the right way to measure the correctness of the model is to compare it with various other predictions:

-Predictions from the general public

-Predictions from football experts

-Predictions from other mathematical models

For example: If over time, the new model is 5% better than the best of the old models, then it's very good.

Doesn't make much sense to compare it with reality and jump to the conclussion that the model doesn't work because no prediction can be 100% accurate.

jtolmar · on July 16, 2018

Say I have two models - model A returns around 20% likelihood that the top team wins the world cup, and model B returns around 80% likelihood. I use both of the modeling techniques a few thousand times in various parallel universes, and both of them are exactly right - 20% of 20% predictions result in a win, and so on. Despite them both quantifying uncertainty accurately, isn't model B still better?

ghayes · on July 16, 2018

Think about the actual uncertainty since anything can happen in the game itself. The easiest way to look at this is to play the game 100 times in a row (preferably in parallel universes, as you say). If team A wins in 60% of games, then that caps the ability to predict the result. You can predict a die roll to be 6 with an certainty of 17%. You can’t do any better.

jtolmar · on July 16, 2018

Say I have a bag of dice one of each of the usual D&D denominations (d4, d6, d8, d10, d12, d20). I draw one at random, ask the models for predictions, and roll it. Model A ignores the information about which one I drew, and predicts a correct distribution of rolls (12.9% chance of rolling a 6). Model B correctly processes the information about which one I drew, and predicts a correct distribution given that information (I drew the d6 so 17% chance of rolling a 6). Both models give correct results overall, but Model B has higher probabilities on average, and I would say it is a better model.

A model should be judged both on how accurately it characterizes its uncertainty and how much evidence it's able to successfully make use of.

thaumasiotes · on July 16, 2018

You can do better if you have foreknowledge or retroactive foreknowledge of the outcome of the die roll, which is the obvious suggestion of jtolmar's comment. If I know the recorded outcomes of a sequence of die rolls, I can have models that predict those outcomes to any accuracy I want. But they're not doing it by measuring the uncertainty involved in prospectively rolling the die.

thousandautumns · on July 17, 2018

No, because if the underlying phenomenon happened 20% of the time, that’s what you want your model to predict. The point of the model is to describe reality as accurately as possible. So a model that predicts a particular outcome to happen 80% of the time, and the outcome actually does happen 80% of the time, isn’t any better or worse than a model that predicts an outcome to happen 20% of the time that happens 20% of the time.

pbhjpbhj · on July 16, 2018

Uncertainty is a truism; that's why people want to use a prediction algo. Did the system so better on results it was more certain about?

Predicting the result of an A or B contest the bar is already defined. Either the system gets it right or doesn't, if it gets it right more often than not then (despite this being poor grounds mathematically, on a small result pool) popular press will report it as successful.

IMO if matches become easy to predict then rules will change to reduce that predictability.

LeifCarrotson · on July 16, 2018

> Predicting the result of an A or B contest the bar is already defined.

I disagree: If team A has a 10-30% chance of winning, and A pulls off the upset, the correct answer was not "A Wins" it was "B has a 70-90% chance of winning".

For Goldman Sachs' investments, the bar is not to predict that A wins or that B wins, it's to predict the probability and variance regarding which team will win. Of course, from a single upset game, it's impossible to tell whether these estimates are correct. You'd need to see the success or failure of many trials.

ElevenLathe · on July 16, 2018

The problem is that the 2018 World Cup is not a repeatable event. Neither are most open-market trades (presumably the point of this whole PR stunt being to show that their quants are good at making smart bets in the markets) but they're a LOT closer.

Soccer is a pretty data-poor environment, or at least was historically. Before movement trackers, there was very little data to play with. With movement tracking data slowly building up, I suspect that soccer analytics will soon have their "Moneyball" moment the way baseball did.

The reason baseball got there sooner is that, even without advanced player movement tracking, baseball is a data rich environment. There are ~2500 MLB games played per year in the the 30-team era, and we have at least box scores going back to the late 19th century for most professional games, and pitch-by-pitch data going back to the eighties. In addition, a lot of the most important data is cleaner in nature (pitcher-batter match-ups) and also abundant (compare ~200 pitches in a baseball game to ~15 shots on goal in a soccer game, to take a guess at the order of magnitude).

Computing power can help squeeze more information from the soccer data we collect going forward, but there is a century or more of player tracking data that we can just never ever have, since it wasn't being collected. We know Babe Ruth's batting line but we will never have the soccer equivalent of UZR for Pele. I don't know if there is a retrosheet-equivalent effort for soccer to collect stats from old film, but that would be one way to partially bridge the gap.

LeifCarrotson · on July 16, 2018

> The problem is that the 2018 World Cup is not a repeatable event. Neither are most open-market trades...

The 2018 World Cup is not a repeatable event, Elon Musk buying $10M of Tesla shares is not a repeatable event, and Donald Trump winning the 2016 presidential election is not a repeatable event. Therefore, to meaningfully discuss any of these in the context of probabilities and confidence intervals, we must assume that we generalize them to any soccer game, a stock purchase, or an election, and can do this meaningfully by adjusting our priors. It does make the mathematics a lot less pure.

satsuma · on July 16, 2018

wasn't leicester city a "moneyball" team? a zero-to-hero club with a roster of modest salaried players who have statistical synergy? i don't follow much premier league but from what i remember hearing about it, they bucked a trend of spending tens/hundreds of millions for megastars to solo carry the team

swores · on July 17, 2018

There are many premier league teams doing a lot more than Leicester when it comes to statistical analysis.

They did indeed win the league with a budget far below many of the normal contenders, but it was a mixture of good management, luck, a few players having the breakout seasons of their careers which took them to the point where only big teams can now afford them, and a few other players having great runs of form that saw them playing better than they would before or after.

Despite the elements of luck, it was an incredible achievement. But the following season they were back to being a team with no realistic chance of competing for the title, and were actually in a relegation fight to stay in the top division.

thousandautumns · on July 16, 2018

Literally no data producing phenomenon is a repeatable event, outside of controlled experiments.

usgroup · on July 16, 2018

Model totally sucked against betting odds and if you used the model probabilities to price bets you would have lost a lot of money vs even an average bookmaker.

Score it yourself against implied probabilities from Betfair for example and marvel at the suckage.

_1qd4 · on July 16, 2018

But Goldman Sachs are the kings of predicting uncertainty! This is their whole business! They make billions predicting certainty through the murky, uncertain waters of the global economy. Would you argue that the global economy is more uncertain that soccer? I'd say so. How is it that they can find success in the market but not in soccer?

I think this is a smoke signal. Soccer is corrupt; you can't predict the winner unless you know what's being passed around under the table. Goldman Sachs does these predictions so people read between the lines to see how corrupt it is.

My argument is: "Goldman is amazing at statistical analysis and they routinely practice it on much tougher models (the global economy), so they should have no problem predicting a simpler model (soccer). But since they drastically failed at predicting soccer, then there must be an equally drastic variable missing from their predictions. Since we can trust Goldman to use all available public information in their analysis, there must be critical information that is hidden from the public which affects the outcomes". I make some assumptions, but it's fairly sound, no?

appleiigs · on July 16, 2018

Goldman's business model is not to predict the future. Goldman has 2 business models: 1) transfer risk, 2) provide advice. For #1, it's a middleman. For #2, it's paid for brain power, experience and speed.

cepth · on July 16, 2018

Unclear if your comment is tongue in cheek, but assuming that you're serious, I'd encourage you to give a listen to a podcast episode like this: https://soundcloud.com/bettheprocess/episode-35-ted-knutson.

In the world of sports betting/analytics, you have baseball and basketball at the forefront, and then American football, soccer, and hockey (roughly in that order).

Off the top of my head, there are several reasons why the latter three sports have all lagged behind:

-Lack of data

It wasn't until the last 4-5 years that widely available, affordable, and accurate data for soccer matches was available. Companies like Opta have accomplished this by outsourcing the watching of games and the manual tagging of events, which was made possible by the advent of cheap cloud computing.

It should be self-evident why tracking the position and actions of 22 players is more complicated than something like baseball, where for the most part you are looking at one pitcher vs. one batter, much of which can be automated with computer vision that tracks pitch position, speed, and spin.

-Complexity

It's no accident that baseball was the first sport to be revolutionized by analytics. Most of the time, it's a static game, with a clearly defined action set. I.e. do I swing at the pitch or not. Do I throw a fastball or not. Do I attempt to steal a base or not.

In games like American football, soccer, and hockey, you have anywhere from 12-22 players on the field at a time. Tracking what the players without the ball or the puck are doing is a difficult task technically, as is quantifying their impact. Concepts like expected goals and expected goals added are recent ones.

-Sample size

Typical elite soccer leagues see each team play each other twice. In England and Spain, this means you have 38 games per season.

Baseball has a 162 game season and playoff games, basketball has an 82 game season and playoff games, etc. Coupled with the fact that quality data has been only collected for a few years, and you get other problems.

In basketball and baseball, the effects of aging on player performance and statistics is fairly well understood now. We can generally calculate the 5-year market value of a player etc. In the other sports I mentioned, we don't yet have that kind of time series data to be able to make those judgements.

--

Specific to the World Cup, there are other reasons why you may find it hard to predict results.

-Team chemistry and style

Even though the World Cup is the most high-profile soccer event in the world, most players are spending 1-3 months a year with their national teams. Their "day jobs" with their clubs teams take up most of their playing time and attention.

As anyone who has played the game Football Manager will know, managing a national team is a tough job. You have no say over how the players are practicing when they're away from you, and no control over the physical condition in which they arrive at the World Cup. This year, there was barely a month between the end of the regular European seasons and the start of the World Cup.

In that month's time, you have to get at least 11 players who have not played with each other, to learn your style of play. Do you want to play a pressing style? Are you attempting a slow buildup, or trying long balls? Etc. etc.

-Home field advantage

In baseball and basketball, most modern statistical models account for home field advantage. Having 60,000 Russian fans chanting and heckling likely played a role in the team's ability to upset Spain, particularly during penalty kicks.

This goes back to the sample issue. How many times before have Spain played Russia IN Russia in front of a large crowd? Probably never.

---

All this is to say, cut Goldman some slack. There are a number of non-nefarious reasons why you may expect a soccer model to produce some spectacular miscues.

iainmerrick · on July 16, 2018

On top of all that, as a low-scoring game, soccer is inherently more random, and therefore harder to predict.

_1qd4 · on July 16, 2018

Ok, I understand this - that soccer has many variables and it is difficult to create a model with all of these variables. But my point is, the global economy has way more variables than soccer. Way way way way more variables. At least 7.5 billion of them.

So would you argue that creating a statistical model of soccer is harder than creating one for global economies? I think it's harder to model economies.

I'm not even trying to give Goldman a hard time! I'm saying that Goldman probably put together a very accurate model of "soccer", but we aren't watching an accurate model of soccer; we're watching the corrupted one where the players and skills don't matter.

cepth · on July 16, 2018

I think we have to be very clear on what economic "models" Goldman uses.

If you're talking about GDP growth forecasting, or forecasting unemployment numbers, these are ultimately questions of aggregation. Yes, there are 7.5 billion people, but at the end of the day each individual agent's actions don't make a tremendous difference for an aggregate measure like GDP. During periods of low volatility, as we are currently experiencing, it's really not all that impressive to forecast the unemployment rate +/- 0.25%, or GDP growth within 0.5%.

If you're taking about their market-making and trading businesses, they've had some horrendous quarters recently as well (http://www.businessinsider.com/goldman-sachs-just-had-a-hist...). A very small portion of Goldman's business is taking an opinionated stance, most of their income comes through relatively low-risk market making activities.

And let's not forget that during the 2008 financial crisis, certain departments within the company correctly wagered against credit default swaps, while others had exposure to subprime mortgages. The company still needed an injection of capital from Warren Buffett and the US Treasury to weather the crisis. Point being, they aren't clairvoyant oracles.

---

Regarding your last point, which was also made in your original comment, you seem to be claiming some form of what economists call "omitted variable bias", and seem to be hypothesizing that the "omitted variable" is corruption or cheating.

From the purely technical standpoint of building models, the tiny samples (https://www.theringer.com/soccer/2018/7/11/17557720/world-cu...) and the nature of the "data" being collected means that there are plenty of other explanations, like incorrectly estimated parameters or measurement error.

If you're trying to suggest that there is corruption or cheating in soccer, please point to a concrete example of a team in a critical game receiving a disproportionate number of calls. Unsure if you're aware, but this was the first World Cup with instant video replays for the referees to use. Had this replay been in use more widely in international soccer, the US might've qualified for this World Cup (https://deadspin.com/u-s-a-out-of-world-cup-on-phantom-goal-...), England might've won/tied that pivotal 2010 World Cup game (https://en.wikipedia.org/wiki/Ghost_goal#England_v_Germany_a...), etc.

Soccer may have had a sordid past with the picking of host countries, but the trends in the actual game itself point to technology reducing the ability of referees to make blatantly terrible calls.

_1qd4 · on July 16, 2018

Thanks for the replies and the detailed sources, it's interesting to read!

> Point being, they aren't clairvoyant oracles.

Yeah, my argument was weak in that regard. They aren't anywhere close to perfect or accurate, I'll admit.

> you seem to be claiming some form of what economists call "omitted variable bias"

Yes! Is that what it's called?

> please point to a concrete example of a team in a critical game receiving a disproportionate number of calls

Corruption doesn't have to be that explicit. Maybe key players or coaches are paid to perform poorly? It doesn't always come down to the ref. But I admit I have no examples.

rco8786 · on July 16, 2018

> you can't predict the winner unless you know what's being passed around under the table

Pretty sure you just inadvertently identified why GS is so “great” at predicting economic movements.

jasode · on July 16, 2018

Leonid Bershidsky and a lot of other journalists laughing at Goldman Sachs' incorrect predictions seem to miss the point.

The World Cup predictions from Goldman Sachs (and also UBS) are a form of recreation and entertainment with machine learning. It's an expression of quant nerd humor.

Analogous intellectual games would be engineers devising ridiculous Rube Goldberg contraptions[1] or programmers building "enterprise" FizzBuzz[2].

(I think it would add to the fun if GS uploaded their raw data and models to Github for others to play with.)

>It certainly didn't predict the final opposing France and Croatia on Sunday.

True, but it did predict France having better chance winning overall but was handicapped by a tougher draw. It also predicted France beating Croatia in round 16 instead of the final. The pdf says:

>While Germany is more likely to get to the final, France has a marginally higher overall chance of winning the tournament,

[1] https://en.wikipedia.org/wiki/Rube_Goldberg_Machine_Contest#...

[2] https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...

learnstats2 · on July 16, 2018

On the other hand, this is a predictive task that has defined outcomes and clear historical data - by my understanding, it is easier than commercial uses of machine learning [at least, easier to measure the effectiveness].

It's also Goldman Sachs and UBS choosing to attach their names to these and stake some reputation on these predictions. If they had hit the bullseye, they would be lauding these results.

CoryG89 · on July 16, 2018

It may be easier to measure the effectiveness (give a confidence level for the prediction), but just because there is clear historical data and defined outcomes, that does not mean you will be able to predict a particular outcome with any high level of certainty.

For example, imagine a tournament with a large number of participants, where the winner is picked simply by fairly choosing a single random participant.

If I then gave you all the perfect historical data going back decades, you could do statistical analysis and determine that the winner is completely random and therefore the probability of success, for any particular participant, is p~=(1/n), where n is the number of participants. Your confidence in correctly predicting any particular outcome will drop as n rises.

Not everything can be easily predicted just because you have enough data.

sgt101 · on July 16, 2018

Yup, the worst thing is that if they had got it right it would have been more or less due to pure chance, and it would have led to business flowing their way!

keiferski · on July 16, 2018

Yeah, I don't know...this feels a bit like if we ended up correct, we were clearly serious, but if we ended up incorrect, it was clearly a joke.

raverbashing · on July 16, 2018

People conflate statistics with actual results more often than not and I think those reporting on such stories and maybe even the original authors might fall for this.

It was not wrong to say Hillary had a 95% chance of winning the presidential election, but the confidence was low and that value still allowed for the opposite result to happen.

Also football has a lot of variance concerning team capability and end results. The better team might (and does) lose often, especially when going to penalty shoots.

With basketball, the stronger team will be easily scoring more in most cases.

corpMaverick · on July 16, 2018

According to 538 Hillary had a 70% chance. Yes. Many people completely interpreted that like 70% would vote for Clinton. I had to explain the meaning of "70% chance" to people who should know better. I think they just heard the number and didn't give it a second thought.

afterburner · on July 16, 2018

In general, people don't think.

ghaff · on July 16, 2018

Basketball has a lot of points scored so one or two freak plays or lucky bounces aren't likely to affect an outcome. And there are a lot of games so even a star player having an off night or two isn't likely to affect the outcome of a season.

kgwgk · on July 16, 2018

> had a 95% chance [...] but the confidence was low

So she had 95% chance of winning with 50% probability or what?

fny · on July 16, 2018

So this is something that people don't seem to grok quite well, and it really depends on the type of statistical analysis used.

Say you make the assumption that the quantity being estimated is truly fixed: that there's some true value for the force of gravity or some true value for the number of people that vote for X or Y.

The second assumption that comes along is that the stochasticity observed comes from your perspective of observation, and not from the ground truth. To be more blunt, you know that of all the observations you make 95% of them have the probability of yielding the result observed... but the ground truth is still fixed. Gravity has a fixed quantity, despite your experimental error, and you may have been lucky enough to observe it in your sample.

Predicting elections with frequentist methods has this same characteristic, except the observed quantity itself shapeshifts and even lies... so then there are other complications that need to be dealt with.

This is where that 50% feeling comes from. There are two outcomes, one will be true. You're data analysis just tells you that if you repeat your procedure, you'd expect 95% of those result to give you the outcome you observed.

kgwgk · on July 16, 2018

If you expect to get it right (in this particular prediction, Clinton to win) with 95% probability, what does it mean to say that this 95% is with low confidence or with high confidence?

thousandautumns · on July 16, 2018

Not OP, but that opens a whole different can of worms. “Confidence” has a specific meaning in the context of statistical theory, and specifically in a particular flavor of statistics called “frequentism”. I won’t get into what is involved in frequentism, and how it differentiates itself from the alternative, Bayesianism, but essentially “confidence” refers to a measure that really says more about the statistical methodology used to arrive at the estimated value (in this case, that Hillary had a 95% chance of winning) than the value itself. This makes is a bit esoteric and something that people misinterpret all the time.

Basically, confidence refers to a hypothetical scenario in which a the data gathering process were to be repeated and the same analysis done, X% of the confidence intervals (essentially, the +/- bounds around your estimate) will contain the true value for what you are trying to estimate.

So in this hypothetical scenario, we say we have the power to go back in time and recollect the polling data in 2016 and run the same analysis used to arrive at that 95% number. And let’s say we use this power over and over again, a very large number of times. Then 95% of the error bounds we construct should contain the true value of the probability Hillary wins, whatever that is.

The thing is that those error bounds can be huge. You can have 95% confidence that the probability that Hillary wins is between 3% and 98%, for example. You can also have 10% confidence that the probability of a Hillary win is between 94% and 96%. Without the confidence intervals, a “confidence level” doesn’t say much. It’s also predicated on the assumption you haven’t screwed up your data collection process or analysis methodology. And if you are predicting something will occur with a probability of 95%, and it doesn’t, that doesn’t automatically mean you are wrong, but the likelihood of you having screwed something up is definitely higher.

kgwgk · on July 16, 2018

I agree that this is a different can of (nasty) worms.

The message I replied to said that > It was not wrong to say Hillary had a 95% chance of winning the presidential election,

Frequentist inference cannot be interpreted as a probability unless one goes through some (often misunderstood, as you pointed out) contortions. In your scenario where you have 95% confidence of something it would be wrong to say that Clinton had a 95% chance of winning.

sobani · on July 17, 2018

The way I see it (as someone who knows nothing about statistics), confidence would be the difference between 9/10 vs 900/1000. And/or how much effort you spend ramming a square peg in a round hole to have a prediction model.

You have a lot of data about donkeys vs elephants. But this contest is between a mule and a mammoth. If you assume a mule is equivalent to a donkey and a mammoth is equivalent to an elephant, the mule has 95% odds in its favor. But you recognize the assumptions so your prediction doesn't have a high confidence.

frockington · on July 16, 2018

60% of the time, it works every time - Brian Fantana

LeifCarrotson · on July 16, 2018

There are two probabilities in question: The first, of course, is the probability of victory. The second is the probability that the first probability is correct.

Consider: If someone offered to give you $2 every time a fair coin toss came up heads, or take $0.50 every time it came up tails, you'd be foolish not to take that bet a million times as you can because you know that the coin has exactly a 50% chance of coming up heads.

However, if it was an unfair coin, you'd want to know the degree to which it was unfair, and you'd have to measure it. How much do you trust those measurements? You might say that you're 90% sure that the coin has a 40-60% chance of coming up heads, or give a probability of 2% that a $1.04 to $0.96 wager would be profitable while a $1.03 to $0.97 wager would be unprofitable.

Hillary had a 95% chance to win the election. But on top of the fact that 1 in 20 times she'd lose that election if that really was the probability, the 95% number was uncertain because the measurements were difficult to pin down - maybe she'd have lost 1 in 40 times, or maybe she'd have lost 1 in 5 times. All we know now is that she lost, and that many of the assumptions and measurements the pollsters had to make concerning factors like voter turnout, nationalism, corruption, foreign interference, debate results, and fundraising turned out to be inaccurate.

With unfair coin measurements, you can get very accurate numbers with just a handful tests. When predicting election results or World Cup games, you're much less likely to make an accurate estimate. The confidence is an estimate of how likely that estimate is to be accurate.

kgwgk · on July 16, 2018

There are two probabilites is you want to make it so in your model. In the coin example it may make sense, you model the coin as a binomial probability and you can estimate it. You can repeat the events, it makes sense to talk about frequencies and you can improve your estimate of the parameter.

In the election model it's not clear to me what's to gain by saying that there are two probabilities (or more) instead of one. There is one single event.

>Hillary had a 95% chance to win the election. But on top of the fact that 1 in 20 times she'd lose that election if that really was the probability,

Which is the only thing that matters if we say that the probability was 95%.

> the 95% number was uncertain because the measurements were difficult to pin down - maybe she'd have lost 1 in 40 times, or maybe she'd have lost 1 in 5 times.

You have lost me here. Did she have a 95% chance to win or not?

If this 95% is uncertain, because it could have been 97.5% or 80%, then the probability would be the weighted average of those numbers and not 95%. And if it was so uncertain that nothing was known at all it would be 50%.

Consider the following cases:

a) You are going to flip a coin that I know is completely fair. I would say that the probability of heads if 50%.

b) You have flipped a coin that I know is completely fair. Nobody knows what has been the result. I would say that the probability of heads is 50%.

c) You have flipped a coin that I know is completely fair. You know the result but I don't. I would say that the probability of heads is 50%.

In some cases you would say that I'm 100% right on my assesment of the probability being 50% while in others the actual probability is either 100% (with probability 50%) or 0% (with probability 50%). This seems irrelevant as far as my statement about the probability being 50% is concerned.

raverbashing · on July 16, 2018

it is captured in the 95% but that was probably a bit overestimated (and there are always unknown biases and improbable events can happen)

What's wrong is thinking 95% chance of winning means they will win

kgwgk · on July 16, 2018

So it was not captured in the (overestimated) 95% :-) But I agree, the most likely thing, even if the probability was perfectly known, is not always what happens.

thousandautumns · on July 16, 2018

Sort of. The problem is people get confused about the same statistical methodology they use in other situations when the measure they are estimating is also a percentage.

For example, if you are estimating the height of a male in the US, you would collect data on US males and get the average. But unless you surveyed every male in the US, there is some error associated with your estimate. So you would either construct error bounds (a frequentist approach) or a probability distribution (a Bayesian approach) around the mean height. So your results may dictate that the mean height of the American male is 5’11, plus or minus 2 inches. Those two inches represent uncertainty around your data collection. That’s the exact same thing that is done here, but with a percentage instead of a height. Outlets may predict Hillary winning at 95%, but the reality is their methodology should provide a plus-minus value around that. The problem is that few of them actually report that.

But it gets more confusing. That error bound is only around the mean. Pick a random guy out and not only will he likely not be 5’11, there is a decent chance he will be outside of that range of 5’9 - 6’1. You will get 5’7 guys and 6’4 guys pretty commonly. In the case of the election, it may actually be true that Hillary had a chance between say, 93% and 97% of winning. But even if that is the case, she will still lose between 3-7% of the time. But since we only have one reality to observe, we can’t know if she lost simply because we saw that 3-7% realized, or because they people coming up with that number screwed up. That’s why groups like 538 deserve more leeway. When they say that Donald Truml has a 30% chance of winning, and he does. That’s not that crazy. And therefore there is much less reason to assume they screwed something up than the people who predicted a 5% chance of Trump winning. It’s possible those models were right, but much less so.

boomboomsubban · on July 16, 2018

The World Cup is about the worst sporting event for data led predictions like this, far too much can rely on a few events that are basically a coin flip. It would be interesting to see how the predictions went for something like the Premiere League tables.

laumars · on July 16, 2018

Premiere League is over a far too long period of time with variables that can change completely without prediction (managers getting sacked, players leaving / joining, etc).

The reasons events like the World Cup are far more interesting is because it's over a shorter period of time.

I think the problem here isn't the event but rather the sport. Something like snooker or tennis will offer the same brevity over the period but with chance playing a less significant role due to the number of games played per match.

That all said, if my years of watching snooker has taught me anything, it's that people are not machines and thus will perform vastly different from day to day depending on how what mood they're in.

JanisL · on July 16, 2018

Interesting observations here about team sports have a bunch of extra opportunities for randomness. Do you know if there's anything equivalent to Fargo Rate for snooker (or other individual sports like tennis)?

amelius · on July 16, 2018

I'd like to see some scientific evidence of this (i.e. using multiple experiments, null hypothesis, etc.)

sgt101 · on July 16, 2018

I think that you have to come at this from an analytical direction rather than an empirical one; the problem is that everyone can say that the approaches that you have used to show how difficult it is to model the world cup are just the wrong ones and you need to redo the experiment.

Analytically the difference between premier league and the world cup is that you have momentum and continuity in the premier league and the world cup is essentially one shot. So in the PL team A will play team E and G and H before it plays team B, team B may play team E and H and Q (which played G). Team A may be winning games that your strength model shows they should lose, Team B may be losing games... and so on and so on. There is more evidence that might matter. More importantly you can be wrong quite a lot of the time in a season and still be right at the end of it (as the bounces of the ball even out over time). Not so much in the world cup - one goal knocks you out and there is no coming back! Basically the world cup demands an algorithm that works with less evidence and with a much higher degree of accuracy.

fwdpropaganda · on July 16, 2018

> far too much can rely on a few events that are basically a coin flip

Can you give us some examples?

gtr · on July 16, 2018

As there are relatively few goals, anything that can turn a goal into not a goal or vice versa can have a massive impact on the game. For example the penalty decision against Croatia in the final. Another thing that adds to the randomness is the chance that a key player may be sent off or injured.

systoll · on July 16, 2018

To make that be specific -- in 44 of 64 games, and in every single penalty shoot-out, turning one goal into not a goal or vice-versa would've changed the outcome.

ufo · on July 16, 2018

It is not so simple because goals in football are not independent from each other. A team that scores first has the opportunity to play more cautiously and go for more counter attacks.

kryptiskt · on July 16, 2018

For example: Croatia got to the final via two penalty shootouts, which is very much like coin flipping.

fwdpropaganda · on July 16, 2018

Can't you put probabilities on that though? What you said is basically "impossible to predict, because one or the other might score more goals, so it's like coin flipping."

boomboomsubban · on July 16, 2018

You can put probabilities on it, but the chance a game ends tied is rather high to begin with. 25% of this years knockout games were draws, and a 1/4 chance the game is decided by a coin flip is already enough to ruin predictions.

zaphirplane · on July 16, 2018

Someone deciding to handball the ball out of the way, a tackle that goes harder in a temper flare. A penalty that is saved/missed

megaman22 · on July 16, 2018

It's an incredibly low scoring sport with a single-elimination bracket. A fluke goal can swing the whole bracket.

In the NBA, NHL, or MLB, seven game series tend to even out the variance, so the best team usually wins. And even in NCAA basketball, there's enough scoring that any individual play loses significance.

anonu · on July 16, 2018

People love to beat up on these companies because of this stupid world cup prediction. Yes, Goldman is a giant vampire squid wrapped around the face of humanity (Matt Taibi quote). But it turns out it's really just great marketing for their research teams.

Also, I've seen some people say (not in this forum) that banks now look stupid because they're in the business of making predictions and they can't even get the world cup right. Guess what? Banks make no money on predictions. They make money on flows and taking spreads on trades they do with clients. Any research or prediction is meant to be a catalyst for that trade.

jasode · on July 16, 2018

>Banks make no money on predictions. They make money on flows and taking spreads on trades they do with clients.

You're mostly right but to further clarify, an investment bank like Goldman Sachs has revenue from mostly "market making" spreads but it does also have activities that depend on predictions such as their proprietary trading (before the Volcker Rule shut them down) and their GSAM (Goldman Sachs Asset Management) fund. The GSAM is basically a hedge fund for their wealthy clients' money. They will run predictions on macro trends on data like interest rates, commodities, indexes, etc to help them pick stocks for their portfolio.

As the pdf noted, the World Cup data models and simulations came from Adam Atkins of GSAM.

anonu · on July 16, 2018

The Volcker Rule shutdown approximately 0 amount of proprietary trading on wall street. Any articles you can point me to were merely media stunts by their respective firms.

The rule was too complex and onerous to be implemtable. Case in point, it's already being rolled back... Certainly because of the current administration we're in. But more because it was just a poorly written and thought out idea to start with.

ig1 · on July 16, 2018

The line between marketing making and prop trading is blurrier than you think. Whenever you quote a price you're implicitly making a prediction on the future of the market.

chopin · on July 16, 2018

I am pretty sure Banks make money on predictions if they get people on following them.

anonu · on July 16, 2018

Yes... This is less a prediction and more a legal form of front running.

throwawaymath · on July 16, 2018

In what way is this a legal form of front running?

tedunangst · on July 16, 2018

Everything Goldman Sachs does is front running and it's legal because they bribed all the regulators. QED.

crispyambulance · on July 16, 2018

I am somewhat shocked that GS would jump into the prediction business of the World Cup, even as joke. The risk of people getting the wrong idea about the prediction and GS itself is too great, even with a perfectly defensible model.

This is an enterprise for bookies, not Goldman Sachs.

TuringNYC · on July 16, 2018

FYI - I worked at Goldman Sachs and then a hedge fund for a decade. On the Capital Markets / Trading side, you are literally a bookie. In fact the nomenclature is "you have a book." You are setting trading spreads based on where you think things will go. Depending on the market, your work may be more or less statistical and you're trying to gain a statistical advantage.

Maro · on July 16, 2018

Off-topic: were you able to retire after that decade?

lloyd-christmas · on July 16, 2018

Not OP, but I worked in finance as an trader for more than half a decade. I make more per hour once I switched to software development in NYC.

I think people have a strange view of finance. Most people aren't paid obscene amounts of money in finance, just like most software developers don't make the salary of a senior developer at Big Tech. They also work an obscene amount of hours. During earnings season, I would be at my desk by 5am and work 80+ hours per week. Nowadays, It's a rarity to go more than 50. My brother currently works at Big Bank, and makes more than I do on an absolute basis, but I definitely make more than he does hourly. I get to work at 9:30-10, he gets to work at 7:30-8. I get home at 6:30-7:00, he gets home 8:30-9. He works at least a half day every Sunday, I enjoy my hobbies. I'm also commenting on HN at 11:00...

Most of my college friends still work in finance. I make more money than a few of them based on overly honest drunken conversations, and we're all more than 10 years into our careers. There is a glass ceiling in tech that is a lot more all-encompassing, but it's not like it doesn't exist in other industries. There are only so many higher-up positions, and most people burn out (or aren't capable of competing) before they even get in position for that promotion. The running joke when someone was getting poor performance reviews was "That's it, I'm moving to Vermont to open an antique store".

For some more comparison, I grew up in a 1%er town in the suburbs of NY. The average lawyer family lived in nicer houses than the average finance family, who in turn lived in nicer houses than the average medicine family. However, the most expensive house was owned by the CFO of Big Bank. Income is very right-skewed in finance.

TuringNYC · on July 16, 2018

Short Answer: No, but very comfortable.

Long Answer: Full retirement is hard, Healthcare is a pain in the US. You cant really "save" for it in the US, it can swallow all your savings, so you'll always need some job or another to cover healthcare and catastrophic needs. That said, you can very easily down-shift once you have a house, savings, etc.

Longer Answer: Could have, if I wanted to -- but you always give something up in exchange. These jobs will take everything you give them (time, health, life) and give back a decent percentage (income.) But you cannot easily dial up or down the work, it comes in chunks and you have to complete it. My life was increasingly unhinged at 27 and I decided to jump off the treadmill after seeing a colleague continue to work through his mother's terminal illness and death. Inertia and greed are a toxic combination. Numerous colleagues were on drugs, uppers, anti-depressants, etc. One died from stress (heart attack in his 30s.)

I chose to get married, have two kids. I switched to a pure tech job (now an ML product owner at a Series A pure tech firm.) We have dinner together almost every single day. Weekends are completely ours. We go to the park most warm days. We take 3 to 4 vacations a year, many with my mom as well. There is a decent amount of work but I can choose when to do it (unlike Wall St.) and the work is longer term and I can dial it up/down as family requires. I sit outside and read during lunch. I turn off the markets when i step out of work.

Many of my colleagues were easy millionaires by ~30 and multi-mullionaires if they stuck till their mid 30s and were focused. Many others blew through their bonuses (or snorted it away) and ended up with nothing and just live bonus to bonus. It also depends on the job (business/deal side vs quant side vs tech -- the money is a waterfall across the 3 sections.) As with all industries, you get ripped off if you dont fight for your share of the pie. Plenty of people avoid conflict and life comfortable lives and nothing more. I also saw several C++ programmer/manager earn double digit millions of dollars over several years, one earned over 100MM USD over the course of his time at the hedge fund (public records, check out AIG-FP https://en.wikipedia.org/wiki/AIG_bonus_payments_controversy)

I think I did well and hopefully dont have to worry about poverty anymore. You either get lucky (early FB employee, hot product at xyz.com) or you have to give up something. I havent seen someone truthfully say they got both money and family and happiness all together.

soVeryTired · on July 16, 2018

Surely your book is meant to be hedged though?

TuringNYC · on July 16, 2018

Well, you've got exposures on both sides (and you keep the tiny bit in difference between what you buy at and what you sell at, in aggregate.) I'm not sure how gambling bookies work, but I'm assuming it is very simple principles...arent gambling bookies essentially market makers just like Wall Street market makers?

lordnacho · on July 16, 2018

Yeah, when I started as a trader it was pretty much a given that you'd be interested in poker and sports betting. The "training" would consist of the the senior traders asking you to make markets in just about every imaginable sports event, an after-work poker game, and a good few questions about things that aren't normal bets (how many burgers do you think you could eat?).

tchalla · on July 16, 2018

The "Ludic Fallacy" strikes again [0].

> The ludic fallacy, identified by Nassim Nicholas Taleb in his 2007 book The Black Swan, is "the misuse of games to model real-life situations."

...

> The alleged fallacy is a central argument in the book and a rebuttal of the predictive mathematical models used to predict the future – as well as an attack on the idea of applying naïve and simplified statistical models in complex domains. According to Taleb, statistics is applicable only in some domains, for instance casinos in which the odds are visible and defined.

Both Taleb's books, "The Black Swan" and "Fooled by Randomness" are an interesting take for such models. Meanwhile, most economists know about "Knightian Uncertainty" [1] which talks about differentiation of risk and uncertainty.

> "Uncertainty must be taken in a sense radically distinct from the familiar notion of Risk, from which it has never been properly separated.... The essential fact is that 'risk' means in some cases a quantity susceptible of measurement, while at other times it is something distinctly not of this character; and there are far-reaching and crucial differences in the bearings of the phenomena depending on which of the two is really present and operating.... It will appear that a measurable uncertainty, or 'risk' proper, as we shall use the term, is so far different from an unmeasurable one that it is not in effect an uncertainty at all."

[0] https://en.wikipedia.org/wiki/Ludic_fallacy

[1] https://en.wikipedia.org/wiki/Knightian_uncertainty

fwdpropaganda · on July 16, 2018

Damn, do I disliked Nassim Taleb. I don't think I've ever heard him say anything deep. That wikipedia article is an excellent.

In [0] you have the following:

> The ludic fallacy, identified by Nassim Nicholas Taleb in his 2007 book The Black Swan, is "the misuse of games to model real-life situations."

And he gives an example of this:

> One example given in the book is the following thought experiment. Two people are involved:

> Dr. John who is regarded as a man of science and logical thinking

> Fat Tony who is regarded as a man who lives by his wits

> A third party asks them to "assume that a coin is fair, i.e., has an equal probability of coming up heads or tails when flipped. I flip it ninety-nine times and get heads each time. What are the odds of my getting tails on my next throw?"

> Dr. John says that the odds are not affected by the previous outcomes so the odds must still be 50:50.

> Fat Tony says that the odds of the coin coming up heads 99 times in a row are so low that the initial assumption that the coin had a 50:50 chance of coming up heads is most likely incorrect. "The coin gotta be loaded. It can't be a fair game."

> The ludic fallacy here is to assume that in real life the rules from the purely hypothetical model (where Dr. John is correct) apply. Would a reasonable person bet on black on a roulette table that has come up red 99 times in a row (especially as the reward for a correct guess is so low when compared with the probable odds that the game is fixed)?

So Nassim Taleb wanted to discuss "using games to model real-life situations" and to demonstrate the pitfalls he uses two characters. He _portrays_ the characters as "man of logical thinking" vs "man who lives by his wits", but as we'll see he's missing one dimension to his characterization.

The first problem here is that implicitely he's suggesting to the reader that the decisions of the "man of logical thinking" represent the pitfalls of "applying games to model real-life situations", whereas the the other guy's decision represent.... it's not specified, but clearly has a better outcome.

The second problem, is that he conflates "applying something you read on some textbook to real life without thinking" with "modelling real-life". He suggests to the reader that those two people are actually "logical" vs "instinct", but they're not. They're a dumb guy who knows maths vs a smart guy who doesn't know math. _Obviously_ real-life is more complex than your textbook examples, and so the smart guy is going to win because his fuzzy heuristics beat the first guys decisions which are optimal within his flawed model. An actual smart and logical person would update his model based on new evidence (i.e. "I was told that this coin was 50-50 but actually the chance of what I just saw is so small that it's more likely that I was just lied to") and then use maths to make predictions and beat the guy who's smart but doesn't know math.

So ironically, he wants to portray the dangers of using over-simplified models and to do that he uses an example where he obscured one dimension.

Nassim Taleb is really good a rhetoric but light on substance.

[0] https://en.wikipedia.org/wiki/Ludic_fallacy

neuromantik8086 · on July 16, 2018

Nassim Taleb is basically a stopped clock. He's pretty big on pointing on how we're all prone to find illusory correlations (not his discovery) and he's a great promoter for Kahneman and Tversky, but there are other areas where he clearly is out of his depth. It's beyond obvious, for instance, that he's never gotten past Popper in his studies of philosophy of science. Unfortunately, his disciples are (ironically) quite terrible at thinking for themselves and buy into his demagoguery.

Basically a book by Nassim Taleb is an incoherent summary of the books that Nassim Taleb has read within the past year, with a few morsels of recycled insight here and there.

thousandautumns · on July 16, 2018

Agreed. His ideas are often completely incoherent and occasionally devolve into a mixture of pseudoscience and philosophical babble. He is also difficult to argue against because he is essentially the statistics equivalent of a nihilist.

I’m not sure why there are so many people who take him seriously.

lowkeyokay · on July 16, 2018

If anything, this is a clear illustration of poor use of probabilistic prediction. When used for investments you have many outcomes. If the model is any good, you will most of them right. In the World Cup you have very few. Even if you count all games played. Definitely not excusing Goldman Sachs here, they should have known better than to try to predict this. There was only a tiny chance this could be great advertisement for their model.

Ntrails · on July 16, 2018

> they should have known better than to try to predict this.

There's no downside, only free publicity. If they, by good fortune and a following wind, get it right - then the publicity is incredible. If it's wrong they laugh and say "well, better stick to predicting what we're good at!" and they still get a shitload of headlines and awareness of their product.

This was not a mistake.

jamespo · on July 16, 2018

Well, there's the downside of articles like this pointing out they've had 4 years to work on their models and they've got worse

geraldbauer · on July 16, 2018

PS: If you want to build or train your own model or make predications, you can find open (structured) data about all world cups at the football.db, see https://github.com/openfootball/world-cup and https://github.com/openfootball/world-cup.json Enjoy the beautiful game.

kgwgk · on July 16, 2018

The predictions were not so bad. At least one of the favourites won in the end. GS had France winning with 11.3% probability, second to Brazil with 18.5%. UBS was less fortunate, they had Germany (24%), Brazil (19.8%), Spain (16.1%) and England (8.5%) before France (7.3%).

I compared the logloss for their predictions with the "uniform" benchmark (giving each team 1/32 probability of winning, 1/16 probability of getting to the finals, etc) and the results are the following (if I transcribed the data properly):

Getting to second round:

GS: 0.495 UBS: 0.495 bench: 0.693

Getting to quarter-finals:

GS: 0.463 UBS: 0.459 bench: 0.562

Getting to semi-finals:

GS: 0.310 UBS: 0.327 bench: 0.377

Getting to final:

GS: 0.231 UBS: 0.269 bench: 0.234

World-cap winner:

GS: 0.097 UBS: 0.113 bench: 0.139

The performance of the models was ok until Croatia got to the finals. This hurt specially UBS, who predicted less than 0.9% probability of such an event (compared to 2.1% in Goldman's model).

Edit: these would have been the "best case" scores (if the high-probabilty teams had classified to each round, ignoring that this may be impossible due to the structure of the tournament):

GS: 0.432 0.302 0.220 0.141 0.079

UBS: 0.365 0.251 0.176 0.111 0.070

UBS could potentially achive lower logloss metrics because it had more extreme predictions.

cascom · on July 16, 2018

Isn’t this a little like flipping a coin four times - getting heads four times in a row, and looking at your friend and saying “but you told me the odds were 50/50 each flip?!”

thousandautumns · on July 16, 2018

Yes, it is.

rcdmd · on July 16, 2018

This article didn't compare the Goldman Sachs model to any other models-- why not compare it with sports betting odds? Would Goldman have made or lost money betting their model was better than the crowd?

sunstone · on July 18, 2018

Or compare it with the fivethirtyeight blog predictions.

vl · on July 16, 2018

>Soccer, with the many factors that affect game outcomes — players’ injuries and intra-team conflicts, the refereeing, the weather, coaches’ errors and moments of inspiration — remains only a tightly-regulated game involving a few dozen people. The behavior and performance of big corporations, entire industries and nations is arguably even more difficult to model based on data about the past.

Author misses the way models work entirely, the larger the entity, the more statistics and averages kick in, and as a result, better model can be built.

Donald · on July 16, 2018

Depends on the complexity of the interactions between variables. There are plenty of examples where we have excellent local models, but make (comparatively) worse prediction at scale. A pretty classic example is biology - we have excellent knowledge about how genotypes work and their interactions in cells, but models of phenotypes are typically expensive, error-prone, or non-existent.

dmichulke · on July 16, 2018

I watched quite a few matches and among the things I saw in the matches but not in any statistics are:

- motivation (Germany and Croatia were the two extremes here, no idea how to measure it)

- team cohesion (number of articles in a few journals questioning the team cohesion, maybe also articles about individual players)

- creativity in offense (maybe measurable via "target missed from close distance" + "ball passed front of the goal")

- number of errors in defense that didn't lead to a goal

- percentage of times ball possession was lost from own goal to enemy's area (England was really bad here against Croatia)

lagadu · on July 16, 2018

> - creativity in offense (maybe measurable via "target missed from close distance" + "ball passed front of the goal")

This one would benefit possession-based teams, so it would fail to give decent odds to the current world and european champions (France and Portugal respectively) which don't play possession. Of course it's possible they're outliers but we'll never know.

smcl · on July 16, 2018

These kinda show what makes predicting football particularly difficult. I like the ideas, and I think we (or more likely some ML algorithm) can come up with the set of conditions that showed why France prevailed against the specific opposition at this specific World Cup ... but I suspect that the conditions would be pretty unique and invalid for Euro 2020, WC 2022 etc.

As you identified, motivation could be pretty hard to measure ... but even if we could it might be a pretty poor predictor anyway. France in the early stages didn't look very motivated, while England and Colombia looked pretty lively.

Team cohesion - the German team were pretty consistent (not dazzling, but consistent) and we know how that ended. Again France didn't really impress until the latter stages of the WC.

Creativity in offense - I guess it can indicate a sort of calm or confidence in front of goal but actually it can actually be seen as pretty negative. For example Arsenal a few years back came under fire for having plenty of possession in the 18 yard box but failing to convert. Spain's confident quick pass-and-move "tiki-taka" was ever-present and has in my eyes been impotent in the last few years (and more important as a neutral viewer - very frustrating to watch).

Defensive errors that didn't lead to a goal could be a nice indicator of the ability of a defence to pick up after each others mistakes - but at the same time these errors that lead to goals (i.e. Croatia's second goal in the final) are relatively rare and a lack of a goal could just point to the opposing team's inability to convert due to a poorly organised or a lack of opportunism from their strikers.

I'm not sure what you mean with the last one, but I think this could be a nice one - if you mean "times you lost possession in your own half". A profligate midfield and defence is bound to ship goals, I doubt there are many teams that can either fight back after trailing by a goal or two or score enough to maintain a reasonable buffer.

I applaud the effort though - it takes more creativity and care to think of some new angles (like you did) than to think of some possible counter examples (like I did)!

dmichulke · on July 16, 2018

Thank you for the warm words, I guess the reason is my occupation plus the fact that I just spend my last few weeks watching many games with family and friends.

> I'm not sure what you mean with the last one, but I think this could be a nice one - if you mean "times you lost possession in your own half"

Almost, England lost the ball frequently (> 50+x% with a large x AFAI could see) due to the keeper sending out long balls. I'd like to measure that somehow. Could be done via number of seconds in possession after a goal kick, an indicator whether a hypothetical 85% marker of the field was reached or measuring whether the ball was at least 5x successfully passed (or resulted in a goal).

smcl · on July 16, 2018

Ahhhh I see. Actually this is something I've really been curious about myself - whether the better strategy overall for a keeper returning the ball into open play (from goal kick or from hand) is to just boot it as far up-field as possible or passing it short to one of the defence or midfielders sitting deep.

Interestingly something like this is a tactic used in Rugby (https://www.youtube.com/watch?v=cbti6mLvSJs). I used to play a lot of football when I was younger and at our level (waaaay down the scottish league pyramid) against tired, hungover or generally weak opposition, keeping them under pressure by dominating the territorial game but sacrificing possession was criminally underrated. Usually if you could keep hammering them for 60 minutes and had the legs to step up a gear in the last 30 or so you could grab a valuable goal or two :-)

iainmerrick · on July 16, 2018

Thanks to the use of more granular data, made possible by AI, this year’s model should have worked better than the 2014 one.

If anything, it worked worse.

"If anything"? All the results are available, so it would be easy to put a precise number on this. Measure the Bayesian regret, or just report the winnings if you had used the GS model to bet on the outcomes. Unless it reports some concrete numbers, this article is garbage.

It doesn't report any concrete numbers.

corpMaverick · on July 16, 2018

Soccer is a sport with a big random component. This is probably why it is so exciting. An average team can beat a better team.

The reason is easy to see. The game can be decided by one, two or three key plays. Compare that to basket ball. To win a game you have to consistently score more and defend better. Rarely the game is decided by one or two plays. That only happens when the game is already very tight.

barrkel · on July 16, 2018

I put money on Belgium (12.0 decimal odds) and Croatia (15.0) after the group stages, where some form was visible, combined with knowledge that they had some of the world's best players.

The odds shortened as the tournament progressed, I was able to hedge as the shortened odds made lay betting profitable.

(High variance in football outcomes means there's no guarantee of profit, I don't bet big sums.)

anoncoward111 · on July 16, 2018

This answer is very useful and contains proper strategy advice :)

If someone were to bet during the round of 16, if someone were to bet $1 on the bottom 8 and $2 on the top 8, the strategy would most likely yield a small profit or a small loss, rather than a total loss.

tirumaraiselvan · on July 16, 2018

It's a fools errand to predict high variance events like football games.

pbhjpbhj · on July 16, 2018

Only predict events that are easy to predict, never fail!

patagonia · on July 16, 2018

Financial modeling is about risk adjust return. Because GS knows they can not determine with certainty the outcome of a given investment, they diversify and hedge. Most of all, GS is a market maker, the equivalent of a bookie. To say that GS’s models “didn’t come close” is to ignore all the ways in which such a grading scheme is different than GS’s actual business model. If their WC prediction efforts acted as anything more than a fun spirited PR project, it was likely that GS wanted to somehow keep its employees engaged and adding business value during the WC which they otherwise would have been certainly watched all month.

rossdavidh · on July 16, 2018

In addition to the many other problems with this article, I would like to point out that if, somehow, Goldman Sachs had managed to create a model that could accurately predict the results, the game of soccer would have to be changed to make it more unpredictable somehow. It is intrinsic to the nature of sport that, in order to be entertaining, there has to be a realistic chance for more than one team to win. Not many people (even from the winning country) would bother watching if it were accurately predictable.

kulu2002 · on July 16, 2018

Good... There was this discussion thread few days back on HN

https://news.ycombinator.com/item?id=17509407

Did this investment bank use same set of algorithms that they use for financial predictions?

...And then I remember there was this Octopus[1] who used to predict winners with 85% accuracy

[1]https://en.wikipedia.org/wiki/Paul_the_Octopus

IkmoIkmo · on July 16, 2018

You'd have to run this world cup thousands of times by simulation, running it a single time and determining the results are not in line with the model is meaningless and silly.

It's as silly as saying my claim for the odds of nearly perfectly modelling a coin toss (approximately 50/50%) is wrong because a series of 10 coin tosses show different results from my model. The model is not any less correct.

Keyframe · on July 16, 2018

It's as good time as any to plug in EA's simulation results: https://www.easports.com/fifa/news/2018/ea-sports-predicts-w...

msravi · on July 16, 2018

Duh. Looks like there's a fundamental misunderstanding of how statistics works all around. The probability of an event does NOT predict a particular outcome. Ever. It only says that if the experiment is performed again and again and again, like a few thousand times, then X% of those will match that probability.

If I toss a fair coin you cannot predict the next outcome. You can only say that if I toss the coin a 1000 times, then close to 500 are going to turn up heads, and another 500 are going to turn up tails.

It was stupid of Goldman Sachs or whoever to predict an outcome. It was stupid of anyone else to lend credence to that prediction.

Hopefully, Goldman Sachs is not relying on prediction of singular outcomes to make their investment decisions. I don't think they are. Probably just marketing brouhaha to ride the soccer wave. Although I'm not sure if that worked as expected.

Sean1708 · on July 16, 2018

> It was stupid of Goldman Sachs or whoever to predict an outcome.

If you read the actual report they did[0], they never claimed that any single outcome was more than 18.5% likely.

[0]: http://www.goldmansachs.com/our-thinking/pages/world-cup-201...

pbhjpbhj · on July 16, 2018

I agree completely with your opening remarks.

>"You can only say that if I toss the coin a 1000 times, then close to 500 are going to turn up heads, and another 500 are going to turn up tails."

Sometimes you can do that and every single flip will be heads. It's unlikely, and across zillions of universes you'd only find it once - but we don't have a pool of universes that we can sample statistically.

hsienmaneja · on July 16, 2018

They don’t have an edge like they do in their bread and butter markets, combined with a small sample set == high probability of a single year of sports predictions falling over like this

gesman · on July 16, 2018

If GS would need to bet money - their actual business model would likely be to sell a bit of each higher probability losers (less risk) vs. buy big on a projected winners (higher risk).

blattimwind · on July 16, 2018

This site is a good counter-example for website optimization: While it uses many assets, so a CDN domain makes sense, it spreads them out thinly. It loads over 100 CSS files, most of which are below 1K. Similarly it loads approximately 30 JS scripts, most of which are just a few K each. This is mitigated to a large extent by using HTTP/2.0, which permits a few dozen or so parallel requests, but it still means that a repeated load of the page takes 2-3 seconds. (Without HTTP/2.0 this probably takes ages, since browsers open only a few connections to each origin at most). There is also almost no difference between reloading with and without the cache.

rdlecler1 · on July 16, 2018

In the world of models increasing precision for not necessarily increase accuracy.

Sean1708 · on July 16, 2018

In case anyone was interested here is a table of how likely the model thought each team was to make it through any particular stage[0] along with the stage that that team went out in and the probability that the model gave for that particular outcome (i.e. [probability of making it through the final stage they made it through] - [probability of making it through the stage they went out in]).

                Groups  Round_16  Quarters  Semis  Finals    Out_In  Probability
        Brazil   87.5%     60.8%     42.0%  27.9%   18.5%  Quarters        18.8%
        France   81.4%     58.4%     36.6%  19.9%   11.3%       Won        11.3%
       Germany   80.5%     49.5%     30.5%  18.8%   10.7%    Groups        19.5%
      Portugal   75.2%     52.8%     32.2%  17.3%    9.4%  Round_16        22.4%
       Belgium   78.5%     51.1%     27.7%  15.8%    8.2%     Semis        11.9%
         Spain   72.3%     50.1%     28.8%  15.4%    7.8%  Round_16        22.2%
       England   73.1%     46.6%     24.4%  13.4%    6.5%     Semis        11.0%
     Argentina   79.7%     44.2%     24.1%  11.8%    5.7%  Round_16        35.5%
      Colombia   74.9%     37.3%     17.0%   8.5%    3.7%  Round_16        37.6%
       Uruguay   74.4%     34.6%     17.2%   7.2%    3.2%  Quarters        17.4%
        Poland   68.5%     30.5%     12.8%   5.8%    2.3%    Groups        31.5%
       Denmark   47.8%     26.3%     12.4%   5.2%    2.0%  Round_16        21.5%
        Mexico   52.0%     23.2%     10.5%   4.9%    1.9%  Round_16        28.8%
        Sweden   45.9%     19.4%      8.3%   3.7%    1.3%  Quarters        11.1%
          Iran   35.4%     18.1%      7.2%   2.6%    0.8%    Groups        64.6%
          Peru   37.3%     17.2%      6.8%   2.5%    0.8%    Groups        62.7%
     Australia   33.5%     15.4%      6.3%   2.3%    0.7%    Groups        66.5%
        Russia   47.9%     16.3%      6.0%   2.0%    0.7%  Quarters        10.3%
       Croatia   49.8%     16.9%      6.3%   2.1%    0.6%    Finals         4.2%
   Switzerland   52.8%     15.9%      6.1%   2.0%    0.6%  Round_16        36.9%
       Iceland   45.2%     15.1%      5.6%   1.8%    0.5%    Groups        54.8%
    Costa_Rica   36.8%     13.3%      4.7%   1.6%    0.5%    Groups        63.2%
        Serbia   32.9%     12.1%      4.5%   1.5%    0.5%    Groups        67.1%
         Japan   36.5%     12.8%      3.8%   1.3%    0.4%  Round_16        23.7%
  Saudi_Arabia   43.4%     12.7%      4.2%   1.3%    0.4%    Groups        56.6%
       Tunisia   35.2%     13.3%      4.1%   1.3%    0.4%    Groups        64.8%
         Egypt   34.4%      8.7%      2.5%   0.7%    0.2%    Groups        65.6%
   South_Korea   21.6%      5.9%      7.1%   0.5%    0.2%    Groups        78.4%
       Morocco   17.1%      6.8%      1.8%   0.5%    0.1%    Groups        82.9%
       Nigeria   25.2%      6.5%      1.7%   0.4%    0.0%    Groups        74.8%
       Senegal   20.1%      4.9%      1.2%   0.3%    0.0%    Groups        79.9%
        Panama   13.2%      3.3%      0.5%   0.1%    0.0%    Groups        86.8%

[0]: Exhibit 2 in http://www.goldmansachs.com/our-thinking/pages/world-cup-201...

Edit: Fix copy-paste errors and atrocious maths.

ernesth · on July 16, 2018

Japan went out in Round_16

Croatia went out in Finals

And I do not understand what the last column means (except for France and teams out in group phase)

Sean1708 · on July 16, 2018

Urgh, I hate that you can't edit HN comments.

First two were just me making a mistake because I write that in manually.

That last column makes no sense. It was supposed to be the probability that the model gave to the outcome that occurred, but I got the maths wrong.

sctb · on July 16, 2018

There's an edit window of a couple hours, which has probably just past. We've opened it up again so you can go ahead and edit.

Sean1708 · on July 17, 2018

Brilliant, thank you very much. Although it might be a tad late now.

ernesth · on July 17, 2018

Great work :)

So all in all, the only teams for which the prediction was more than 1/2 were teams out in groups. That is a little underwhelming.

Ah, for Croatia, I believe, it should read 1.5%.

known · on July 16, 2018

GarbageIn = ML = GarbageOut

known · on July 16, 2018

I worked in GS; Soccer/football prediction is not their forte

tomelders · on July 16, 2018

While I agree that it's somewhat silly to try and predict a word cup winner like this (and I suspect it was just a bit of fun anyway), there is one other reason that could explain why all these attempts got it so wrong.

Cheating.

Before people start booing, let's not forget where this tournament is being held, and all the other nefarious things that country has been up to recently.

teamk · on July 16, 2018

FIFA has been corrupt for decades. Although supposedly its been cleaned up since Blatter was removed, it is doubtful the institutional corruption has been eliminated completely. The only question is how pervasive it is.