But Goldman Sachs are the kings of predicting uncertainty! This is their whole business! They make billions predicting certainty through the murky, uncertain waters of the global economy. Would you argue that the global economy is more uncertain that soccer? I'd say so. How is it that they can find success in the market but not in soccer?
I think this is a smoke signal. Soccer is corrupt; you can't predict the winner unless you know what's being passed around under the table. Goldman Sachs does these predictions so people read between the lines to see how corrupt it is.
My argument is: "Goldman is amazing at statistical analysis and they routinely practice it on much tougher models (the global economy), so they should have no problem predicting a simpler model (soccer). But since they drastically failed at predicting soccer, then there must be an equally drastic variable missing from their predictions. Since we can trust Goldman to use all available public information in their analysis, there must be critical information that is hidden from the public which affects the outcomes". I make some assumptions, but it's fairly sound, no?
Goldman's business model is not to predict the future. Goldman has 2 business models: 1) transfer risk, 2) provide advice. For #1, it's a middleman. For #2, it's paid for brain power, experience and speed.
In the world of sports betting/analytics, you have baseball and basketball at the forefront, and then American football, soccer, and hockey (roughly in that order).
Off the top of my head, there are several reasons why the latter three sports have all lagged behind:
-Lack of data
It wasn't until the last 4-5 years that widely available, affordable, and accurate data for soccer matches was available. Companies like Opta have accomplished this by outsourcing the watching of games and the manual tagging of events, which was made possible by the advent of cheap cloud computing.
It should be self-evident why tracking the position and actions of 22 players is more complicated than something like baseball, where for the most part you are looking at one pitcher vs. one batter, much of which can be automated with computer vision that tracks pitch position, speed, and spin.
-Complexity
It's no accident that baseball was the first sport to be revolutionized by analytics. Most of the time, it's a static game, with a clearly defined action set. I.e. do I swing at the pitch or not. Do I throw a fastball or not. Do I attempt to steal a base or not.
In games like American football, soccer, and hockey, you have anywhere from 12-22 players on the field at a time. Tracking what the players without the ball or the puck are doing is a difficult task technically, as is quantifying their impact. Concepts like expected goals and expected goals added are recent ones.
-Sample size
Typical elite soccer leagues see each team play each other twice. In England and Spain, this means you have 38 games per season.
Baseball has a 162 game season and playoff games, basketball has an 82 game season and playoff games, etc. Coupled with the fact that quality data has been only collected for a few years, and you get other problems.
In basketball and baseball, the effects of aging on player performance and statistics is fairly well understood now. We can generally calculate the 5-year market value of a player etc. In the other sports I mentioned, we don't yet have that kind of time series data to be able to make those judgements.
--
Specific to the World Cup, there are other reasons why you may find it hard to predict results.
-Team chemistry and style
Even though the World Cup is the most high-profile soccer event in the world, most players are spending 1-3 months a year with their national teams. Their "day jobs" with their clubs teams take up most of their playing time and attention.
As anyone who has played the game Football Manager will know, managing a national team is a tough job. You have no say over how the players are practicing when they're away from you, and no control over the physical condition in which they arrive at the World Cup. This year, there was barely a month between the end of the regular European seasons and the start of the World Cup.
In that month's time, you have to get at least 11 players who have not played with each other, to learn your style of play. Do you want to play a pressing style? Are you attempting a slow buildup, or trying long balls? Etc. etc.
-Home field advantage
In baseball and basketball, most modern statistical models account for home field advantage. Having 60,000 Russian fans chanting and heckling likely played a role in the team's ability to upset Spain, particularly during penalty kicks.
This goes back to the sample issue. How many times before have Spain played Russia IN Russia in front of a large crowd? Probably never.
---
All this is to say, cut Goldman some slack. There are a number of non-nefarious reasons why you may expect a soccer model to produce some spectacular miscues.
Ok, I understand this - that soccer has many variables and it is difficult to create a model with all of these variables. But my point is, the global economy has way more variables than soccer. Way way way way more variables. At least 7.5 billion of them.
So would you argue that creating a statistical model of soccer is harder than creating one for global economies? I think it's harder to model economies.
I'm not even trying to give Goldman a hard time! I'm saying that Goldman probably put together a very accurate model of "soccer", but we aren't watching an accurate model of soccer; we're watching the corrupted one where the players and skills don't matter.
I think we have to be very clear on what economic "models" Goldman uses.
If you're talking about GDP growth forecasting, or forecasting unemployment numbers, these are ultimately questions of aggregation. Yes, there are 7.5 billion people, but at the end of the day each individual agent's actions don't make a tremendous difference for an aggregate measure like GDP. During periods of low volatility, as we are currently experiencing, it's really not all that impressive to forecast the unemployment rate +/- 0.25%, or GDP growth within 0.5%.
If you're taking about their market-making and trading businesses, they've had some horrendous quarters recently as well (http://www.businessinsider.com/goldman-sachs-just-had-a-hist...). A very small portion of Goldman's business is taking an opinionated stance, most of their income comes through relatively low-risk market making activities.
And let's not forget that during the 2008 financial crisis, certain departments within the company correctly wagered against credit default swaps, while others had exposure to subprime mortgages. The company still needed an injection of capital from Warren Buffett and the US Treasury to weather the crisis. Point being, they aren't clairvoyant oracles.
---
Regarding your last point, which was also made in your original comment, you seem to be claiming some form of what economists call "omitted variable bias", and seem to be hypothesizing that the "omitted variable" is corruption or cheating.
From the purely technical standpoint of building models, the tiny samples (https://www.theringer.com/soccer/2018/7/11/17557720/world-cu...) and the nature of the "data" being collected means that there are plenty of other explanations, like incorrectly estimated parameters or measurement error.
If you're trying to suggest that there is corruption or cheating in soccer, please point to a concrete example of a team in a critical game receiving a disproportionate number of calls. Unsure if you're aware, but this was the first World Cup with instant video replays for the referees to use. Had this replay been in use more widely in international soccer, the US might've qualified for this World Cup (https://deadspin.com/u-s-a-out-of-world-cup-on-phantom-goal-...), England might've won/tied that pivotal 2010 World Cup game (https://en.wikipedia.org/wiki/Ghost_goal#England_v_Germany_a...), etc.
Soccer may have had a sordid past with the picking of host countries, but the trends in the actual game itself point to technology reducing the ability of referees to make blatantly terrible calls.
Thanks for the replies and the detailed sources, it's interesting to read!
> Point being, they aren't clairvoyant oracles.
Yeah, my argument was weak in that regard. They aren't anywhere close to perfect or accurate, I'll admit.
> you seem to be claiming some form of what economists call "omitted variable bias"
Yes! Is that what it's called?
> please point to a concrete example of a team in a critical game receiving a disproportionate number of calls
Corruption doesn't have to be that explicit. Maybe key players or coaches are paid to perform poorly? It doesn't always come down to the ref. But I admit I have no examples.
I think this is a smoke signal. Soccer is corrupt; you can't predict the winner unless you know what's being passed around under the table. Goldman Sachs does these predictions so people read between the lines to see how corrupt it is.
My argument is: "Goldman is amazing at statistical analysis and they routinely practice it on much tougher models (the global economy), so they should have no problem predicting a simpler model (soccer). But since they drastically failed at predicting soccer, then there must be an equally drastic variable missing from their predictions. Since we can trust Goldman to use all available public information in their analysis, there must be critical information that is hidden from the public which affects the outcomes". I make some assumptions, but it's fairly sound, no?