Basically when the wiki says 2A because less than and A/2 because greater than things are okay, but we're in dangerous territory. If we ever introduce uncertainty the imperfect information scenario we are in is going to force us to acknowledge that we don't know which game we are in. Which means if that happens we don't have just 2A, but a information set {2A, A}. And we don't just have A/2, but an information set {A/2, A}. In game theory if you have two subtrees and they share an information set than the information sets are equal to each other. You can't tell which subtree you are in, so the information set contains both subgames. You can think of one as factual and the other as counterfactual and vice versa, but the elements in that set need to be equal. When we get to step seven we do a calculation that reintroduces our uncertainty. We multiply by the probability of being in each case. However, we don't actually know we are in one case and we don't know we are in the other case. So combining these two situations leads to the counterfactual part of subgames emerging. So we have {A, 2A} and {A/2, A}. The sets aren't equal. We can choose some parts of this, like grabbing A and comparing it to 2A. So now we have an identity rule that just let us say 2A=A. 2=1. Our contradiction is found.
> This is an unsatisfactory answer, because it provides an "alternate path" to the correct answer, without demonstrating which step went wrong in the "incorrect path" that leads to the wrong answer.
Eh, I mean, I guess I'm cheating by claiming an interpretation they aren't trying to use. This is why I detest the "no true scottsman" setup to the question. We can create an identity that maps their wrong step into my formalism where what they did is wrong in the way I claim it is. Even though they didn't "try to do it" doesn't mean I can't see that they did do it. But apparently - even though game theory notation neatly avoids these pitfalls - we need to stick to the footgun notation. It just seems stupid to me. If you want to avoid the problem, use the notation that trivializes avoiding the error. I really like the other person's analogy to type errors, because it is such a similar idea to what I'm saying, but they just use a different part of math to assert it. There are ways you can just make this decision fail to typecheck. If you don't want to have these type of errors? Be stricter about your typing so at to prevent them.
> the counterfactual terms that get realized into the same position, but which weren't written out, claim that Z = 2Z and that Z = Z/2.
Nobody makes this claim, directly or indirectly. If you got this result by formalizing the plain English statement into a proof assistant, then the error was introduced during the step where you interpret the plain English into a formal statement. If the claim "Z = 2Z" was inherent to the plain English version of the problem, then it wouldn't be possible to run the wager as a simulation, but it is. The entire Two Envelopes problem - including step 7 - is possible to simulate in code: https://pastebin.com/jPyZVrkx
What the simulation demonstrates is:
- It's perfectly possible to define "expected value of the final envelope in relation to the value of the firstly-chosen envelope", as the Wikipedia page describes. No contradiction exists that would prevent this computation.
- It's also perfectly possible to define "expected value of the final envelope in absolute terms".
- These are different goals to maximize. A rational actor should maximize goal 2 ("expected value of the final envelope in absolute terms") in this variant of the game. It's also possible to construct another variant where a rational actor should NOT maximize goal 2, but should instead maximize goal 1 ("expected value of the final envelope in relation to the value of the firstly-chosen envelope").
- The error in Wikipedia's "compelling line of reasoning" appears to be at step 8 where they conclude that "they stand to gain by swapping". This implies that a rational actor should maximize goal 1, when in fact a rational actor should maximize goal 2 instead. This is the error in Wikipedia's line of reasoning.
> When we get to step seven we do a calculation that reintroduces our uncertainty. We multiply by the probability of being in each case. However, we don't actually know we are in one case and we don't know we are in the other case. So combining these two situations leads to the counterfactual part of subgames emerging. So we have {A, 2A} and {A/2, A}. The sets aren't equal. We can choose some parts of this, like grabbing A and comparing it to 2A. So now we have an identity rule that just let us say 2A=A. 2=1. Our contradiction is found.
I'm not a mathematician, so I don't understand expressions like "counterfactual part of subgames emerging" or "there is an identity implied by being under imperfect information". I appreciate you writing back at length, but the majority of what you wrote went straight over my head. The impression I have is that you formalized the problem in a manner that lead to a contradiction. If this contradiction was inherent to the original English problem statement, it wouldn't have been possible for me to simulate the problem. But it is. So it seems to me that there is no inherent contradiction in the English problem statement, it seems that the contradiction was introduced during the formalization step.
> I detest the "no true scottsman" setup to the question [...] If you want to avoid the problem, use the notation that trivializes avoiding the error.
I wouldn't describe the setup as a "no true scottsman" scenario, because the setup describes what counts as a scottsman: determine which step in the line of reasoning is incorrect, why and under what conditions. I appreciate that you tried to do this when you identified step 7 and explained what was wrong with it in your opinion. This is also what I tried to do in my explanation, with regards to step 8 and the comparison between the two goals.
Besides, it's very trivial to conclude that nothing can possibly be gained by switching (in absolute money terms, which should be the goal a rational actor chooses to maximize). It would be boring and unsatisfying to accept a simple answer that explains how to get the "correct answer" without explaining what made it a paradox in the first place. It would be akin to looking at an illusion that displays a man or a cliff depending on how you look at it, then shouting "this is not an illusion! it's just a cliff! there is no man in this picture!"
> It would be boring and unsatisfying to accept a simple answer that explains how to get the "correct answer" without explaining what made it a paradox in the first place
Well, this is just a place where they do the math wrong. We can quibble about why, but it isn't a paradox. We both agree they do it wrong and we even have an overlap in the reason why - we both think they aren't comparing them relative to all the subgames they are in and we both think that this is required to reason correctly. So lets say we get to the correct relative EV of 0.
If both are worth the same, well, why not always switch? They are the same EV, right?
IMO this is the real paradox. We have a false equivocation. EV(policy, game) is not equal to EV(envelopes), because for the policy `ALWAYS_SWITCH` we get the logical contradiction that undefined = EV(envelope) or if you take the limit with a discount under a different formalism that 0 = EV(envelope)
When you tackle this problem starting there rather than at the other error the entire algorithm changes, because we need to modify the problem so that we can calculate EV with respect to policy. The algorithm changes enough so that it really annoys me, because I have this intuitive feeling that I'm changing the solution so much so that I'm no longer working within the spirit of the puzzle.
That is why I feel like "no true scottsman"; to use a metaphor, they gave me a coloring book and told me to color something beautiful but that I had to stay in the lines, but the lines of are a skunk and I want to draw a sunrise. I don't want to correct just the one particular step. I hate the framework that forces me into the confusion by falsely implying that if I get the right envelope EV, I know my EV.
> If both are worth the same, well, why not always switch?
I feel like this is a detour that we can avoid by adding a small cost to switching.
> I hate the framework that forces me into the confusion by falsely implying that if I get the right envelope EV, I know my EV.
I love it. I love it in the same way I love the trick of a magician who fools me. Also, the answer of "two EVs" is particularly satisfying for me, because it resembles a similar confusion in poker tournaments where you need to account for chip EV and dollar EV separately.
> I feel like this is a detour that we can avoid by adding a small cost to switching.
I don't think its a detour we actually want to avoid; the implications are really fascinating and help us to understand how to solve games more generally.
I agree with adding cost as a way to do it. Actually, that is why I've been saying it is zero or undefined. You know about dynamic programming right? Well, one of the reasons it was invented is kind of related to what we're talking about right now. There are these things called the bellman equations. They can look like this when you think of things in terms of a markov decision process.
Since they're recursively defined, you can compute them more quickly by caching the computation back to front then by computing them front to back. Anyways, putting aside that bit of trivia, do you see the symbol that looks a bit like a y? When we have an infinite sequence like we would if we kept switching you can set that term to be some constant, for example 0.9999999999999. That lets you take the limit, because the infinite sequence is obviously going to converge to zero.
Check out the formula again and notice one of the things that it does which the wikipedia article doesn't. Do you see the symbol for pi? That is talking about the concept of a policy function. A strategy, that way you compute what the agent would get if they played the game, rather than the envelope contents. Under this formalism we've doing an argmax over the policy function for the equation in order to get the policy that has the highest expected value.
> I love it. I love it in the same way I love the trick of a magician who fools me.
If you're actually interested in this sort of thing, I suggest checking out the poker research papers by Noam Brown. He had a talk at NIPS 2017 where he won best paper award. In 2019 he was runner up for best scientific advancement of the year. His work is about applying game simplification to poker to create a bluneprint of the game which is simpler, solving the blueprint using a modified form of counterfactual regret minimization, and then refining the solution during actual play with reach subgame solving. You might have heard of his work, because he was involved in the entire AI now better than humans at poker thing even in no limit breakthrough. I have some belief, but not certainty, that his paper contains a correction of someone who made this style of error. He mentions in the research paper that there was a mistake in another paper where they didn't account for the way subgames influence each other. I see that as the problem here, but I haven't read the other paper, so I don't know if it was the same category of error. Find the paper he referenced, read through it, and see if you're tricked. It is potentially a kind of real world two envelope problem to see if your tools for reasoning about these things is actually helping you to avoid the error in more complicated situations - though, since you disagree with me that this is about handling of imperfect information (or more precisely, the counterfactuals - there is a variant on wikipedia where they removed the probabilites, but counterfactual reasoning still applied to resolve the paradox) maybe you won't see it as related errors.
> You know about dynamic programming right? Well, one of the reasons it was invented is kind of related to what we're talking about right now. There are these things called the bellman equations.
Yes, I've done dynamic programming in competitions. I'm not familiar with bellman equations, and the explanation you provided about convergence, policy functions, etc. went over my head, sorry.
> If you're actually interested in this sort of thing, I suggest checking out the poker research papers by Noam Brown [...] He mentions in the research paper that there was a mistake in another paper where they didn't account for the way subgames influence each other. I see that as the problem here, but I haven't read the other paper, so I don't know if it was the same category of error. Find the paper he referenced, read through it, and see if you're tricked. It is potentially a kind of real world two envelope problem to see if your tools for reasoning about these things is actually helping you to avoid the error in more complicated situations - though, since you disagree with me that this is about handling of imperfect information (or more precisely, the counterfactuals - there is a variant on wikipedia where they removed the probabilites, but counterfactual reasoning still applied to resolve the paradox) maybe you won't see it as related errors.
This sounds really interesting! It was a huge deal in the poker scene when the poker AI developed by Brown and Sandholm defeated pro human players. I read the first few pages of the paper now, but the paper becomes very math-heavy after that. I don't have the necessary background to understand the notation they use. That said, the "mistake" in the way "subgames influence each other" that you referenced, I suspect it was of a far simpler kind - the kind that Brown explains in chapter 2 which he concludes with:
This shows that a player’s optimal strategy in a subgame can depend on the strategies and outcomes in other parts of the game. Thus, one cannot solve a subgame using information about that subgame alone. This is the central challenge of imperfect-information games as opposed to perfect-information games.
Although Brown says "imperfect-information games" here, he actually means a specific type of imperfect-information games: the type where the opponent's strategy is not fixed. We're talking about games where your opponent can change their strategy in order to exploit weaknesses in your strategy. This property was a key requirement of the "coin toss" example game that he provided in chapter 2. If you modified the coin toss game such that the opponent's strategy was fixed, then the situation would change completely. Crucially, Two Envelopes Game is not one of those games where the "opponent" can adapt their strategy according to your strategy. The strategy of the opponent is fixed in Two Envelopes Game. That's why you can solve a single subgame in Two Envelopes game independently of other subgames, even though it's an imperfect-information game. If you are suspectful of this claim, we can verify it by simulations.
> the explanation you provided about convergence, policy functions, etc. went over my head, sorry.
I was agreeing with you when I talked about convergence. You said that adding a cost would solve the problem and I agree that it does. I think you were probably thinking of a literal cost like "one cent". If you choose a cost like that then switching an infinite number of times has an infinite cost. You could instead think of the cost as a fraction of your expectation, the cost is 1% of whatever you end up getting back. Now if you switch an infinite number of times you end up having a cost of zero. That might seem counterintuitive, but recall that 1/3 is .3333 repeating. So when you sum 1/3 + 1/3 + 1/3 you get .9999 repeating. Yet 1/3 + 1/3 + 1/3 is equal to one. Infinitely close to something else is basically being the thing you are infinitely close to. Even though we never get the reward we know that the fraction is becoming infinitely close to zero. People call it "converging" when we have an infinite sequence we can sum to a real value. We know before ever seeing the value that we'll be multiplying it by zero. So we can refactor the equation to be 0*ev(switch) and then take advantage of the identity of 0x=0 to declare the result to be 0. Thus, the calculation converges to zero.
> Although Brown says "imperfect-information games" here, he actually means a specific type of imperfect-information games: the type where the opponent's strategy is not fixed. We're talking about games where your opponent can change their strategy in order to exploit weaknesses in your strategy.
This isn't really the central problem of imperfect information. Consider that in a perfect information game, your opponent will also adjust their strategy to exploit weakness in your strategy. So if it was the central challenge, why does it happen in both? You can rule it out as what he was referring to, because it doesn't discriminate between the two types of games. The central challenge in imperfect information games is you have to play with respect to your information set, not the subgame you are in. So the policies and outcomes of other subgames influences the expected value of the subgame you are in. In perfect information, the only game in your information set is the subgame you are in. So you only have to play with respect to the subgame. That is what makes perfect information different from imperfect information. It discriminates between the two game types.
> This isn't really the central problem of imperfect information. Consider that in a perfect information game, your opponent will also adjust their strategy to exploit weakness in your strategy. So if it was the central challenge, why does it happen in both? You can rule it out as what he was referring to, because it doesn't discriminate between the two types of games.
The type of issue I was referring does not occur in perfect information games.
As a practical example, consider the concept of "balancing your range" in poker. If you play poker without doing that - if you play with a purely exploitative strategy where you are only trying to maximize your EV for each hand - then your strategy will be very easily exploitable by an adaptive opponent. You will frequently end up in river situations where your opponent can deduce whether you have a strong or weak hand, so they can fold to your strong hands and bluff you out of weak hands. In contrast, if you attempt to "balance your range" - that is, consider all the subgames - then you won't end up in these situations as badly. For example, when you make a particular river bet, your opponent might deduce that you have a strong hand 70% of the time and a bluff 30% of the time (as opposed to 100% and 0%).
This issue does not exist in perfect information games. Yes, my definition of "adjusting your strategy to your opponent's strategy" was overly broad to define this issue. But if you think about the poker example, where a poker player will might make a decision like "I need to bluff with this part of my range, because I need to support my strong hands [other subgames] by having some bluffs in my range in this situation" - you won't find a corresponding example from perfect information games like chess. This class of problems is unique to imperfect-information games in which an opponent is allowed to adapt their strategy to yours. If you fix the opponent's strategy, the issue disappears. If you turn the game into a perfect-information game, the issue disappears. Both requirements must be present for this issue to exist.
I thought that Noam Brown was talking about this issue in chapter 2. He discussed a simple coin toss game where one player took a strategy, and then the other player adapted by taking the optimal (exploitative) strategy against them. Then the other player changed their strategy, and the other player again adapted their strategy. And then he described a balanced (GTO) strategy. Then he said this as a conclusion:
> This shows that a player’s optimal strategy in a subgame can depend on the strategies and outcomes in other parts of the game. Thus, one cannot solve a subgame using information about that subgame alone. This is the central challenge of imperfect-information games as opposed to perfect-information games.
I thought that this corresponds perfectly to my poker example. If it doesn't, and it means something completely different, ok, sure. I'm not a mathematician. I can't even read the notation that's used in subsequent part of the paper.
> I thought that this corresponds perfectly to my poker example. If it doesn't, and it means something completely different, ok, sure. I'm not a mathematician. I can't even read the notation that's used in subsequent part of the paper.
I think you understood his point very well. I just think you're making a mistake in trying to recast his claim from "imperfect information games" have this property to "this narrow subset of imperfect information games" has this property. Both imperfect games with an opponent and imperfect games without an opponent have the property of needing to play as if you are in multiple subgames, because the definition of the problem is that you don't know which subgame you are in. His claim was that the policy in one subgame could influence the EV of another subgame - and in this problem, it does. I believe you're thinking of the EV of the envelope when you think you are proving this game doesn't have that property via calculation.
To see his claim applies to this game consider the case where P(Switch)=1. Being able to calculate the EV of the envelope's contents is a bit different from solving the subgame. Here, we have two subgames E1 and E2. We can know a priori what the expected values of envelope's contents in E1 and E2 are. But if you change your policy in E1, it changes your expected value in E2. If you doubt this, remember the core of the paradox again - choose to always switch and your EV is no longer the EV of the envelopes. Ergo, the EV of the subgame E1 is dependent on the strategies and policies of another subgame, E2.
> I think you understood his point very well. I just think you're making a mistake in trying to recast his claim from "imperfect information games" have this property to "this narrow subset of imperfect information games" has this property. Both imperfect games with an opponent and imperfect games without an opponent have the property of needing to play as if you are in multiple subgames, because the definition of the problem is that you don't know which subgame you are in.
If we take the poker example and we modify it by "locking" our opponent's strategy, then this property is removed. Suddenly the optimal strategy for us no longer includes any GTO-like thinking such as "balancing our range", we should simply maximize our EV for each hand "in a vacuum" without any thought to other subgames. We no longer care how our range looks to our opponent, because they are no longer able to change their strategy.
> To see his claim applies to this game consider the case where P(Switch)=1. The policy you chose in one subgame just changed the EV of another subgame.
Sorry, but I don't understand this. This sounds to me like we fix the probability of switching to 1, which means that we end up in infinite loop and the outcome can not be computed. I don't understand this premise, nor its implications for the EV of the other subgame.
> This is a subtle distinction that I mentioned earlier - the EV of the envelope as in the wikipedia problem isn't the EV of the subgame. So you're not solving the subgame if you figure out the EV of the envelope.
The expression you use "EV of the envelope" is ambiguous in this context. I'm not sure if you mean EV relative to the value of the total amount of money in the game, or if you mean EV relative to the value of the firstly-chosen envelope.
When we're talking about "solving a game", we're talking about finding the optimal decisions within a game to maximize the expected value from the game as a whole. In the Two Envelope game we have just one decision: switch or not. So we need to find out if EV(switch) > EV(stay), where both EVs are relative to the total amount of money in the game (not relative to the value of firstly-chosen envelope). We have many ways to conclude that both of these actions have expected value zero. We don't have to be able to compute the "EV of the envelope". We only need to know the relative difference between the EV of these 2 actions. There's many ways of computing them, and all of those ways lead us to the conclusion that the EV of both actions is zero. I'm not aware of any "incorrect" way of computing those EVs such that we would get a nonzero result.
I guarantee you that your solution is going to have to incorporate a set somewhere that includes both subgames. You might forget it does, because you simplify to a scalar, but it is going to be there. In perfect information, it isn't there. In imperfect information it is.
> I guarantee you that your solution is going to have to incorporate a set somewhere that includes both subgames. You might forget it does, because you simplify to a scalar, but it is going to be there. In perfect information, it isn't there. In imperfect information it is.
I'm having a lot of trouble identifying what exactly it is that we disagree about. If you have identified what it is that we disagree on, can you please formulate the disagreement as a wager that can be simulated with code? That way we can easily resolve the disagreement (or conclude that we actually don't have a disagreement).
Your opponents strategy in poker is fixed; they will always play the nash equilibrium strategy, they will never play another strategy. Their strategy is fixed. They will never change it from this setting.
You've claimed that subgame perfect play can be calculated without respect to the subgame you aren't in, because you can make a choice on the basis of the EV of the subgame you are in without respect to the subgames you aren't in.
I disagree. I think you still need to account for every subgame you are in as if you are in all of them.
Let the subgame you are in be you having KK and your opponent having AA. However, obviously - you only know that you have KK.
Therefore, you should be able to compute the strategy which is the best response to 37 suited and according to your logic it should be equal to the best response to AA. After all, you have no means of determining which subgame you are in. So you have to have the same response in both subgames.
So compute the best response for KK to AA and prove that this is also the best response to 37 suited.
However, you've claimed you don't need to calculate this with respect to other subgames. So your computation of 37 suited and your computation for AA must not be equal to each other - if they are, then you share terms. You calculated them with respect to each other.
Let Br = Best response.
Write a program which shows Br(p1, p2, I[KK]) != Br(p1, p2, I[KK]) and Br(p1, p2, I[KK]) = Br(p1, p2, I[KK]) simultaneously. (That is to say, both your policy and your opponents policy are fixed)
My contention is that you can't do this. You claim you can. You are free to use a simpler variant of poker - Kuhn poker - so that the computation becomes more tractable.
Ah, looks like I misunderstood the definition of the term "subgame". I thought that this would be one subgame: "I have KK preflop on the button, and my opponent's range is X". And another subgame might be: "I have 75o preflop on the button, and my opponent's range is X". From my opponent's perspective, these 2 situations would occur on the same level of the game tree and my opponent has no way of distinguishing between these (at this stage of the game tree) due to the hidden information, thus I thought they would be called subgames. But you're telling me that the concept of subgame applies in both directions (not only imperfect information by my opponent's perspective, but also imperfect information in my perspective). So when I say "my opponent's presumed range is X", that sentence doesn't describe 1 subgame, that sentence actually describes multiple subgames.
My earlier point was that if an opponent's strategy is fixed, then I don't have to balance how I play that KK compared to how I play that 75o. That I can play each hand "in a vacuum", in a strategy where the EV of each hand is maximized. Note that this idea is only relevant if my opponent is not playing perfect GTO (contrary to your example). For example, my opponent may have a weakness where their preflop fold frequency is the same regardless of how whether I raise 3BB or 2BB. Against this weakness the optimal play would be to raise 3BB with KK and raise 2BB with 75o. Obviously, this would be a bad strategy if my opponent was allowed to adapt their strategy to me, but it's the optimal play if my opponent's strategy is locked.
To clarify further:
- "exploitative style" poker strategy is made at the level "My hand is 96s, and my opponent's range is presumed to be Y"
- "GTO style" poker strategy is made at the level "My range is X, and my opponent's range is presumed to be Y"
If the opponent's strategy is fixed, then our optimal strategy is to play 100% exploitative. If the opponent's strategy is not fixed, then the optimal strategy will be a mix with elements from both exploitative and GTO strategies. (Again, note that this only makes sense if the opponent is not playing perfect GTO. If the opponent is playing perfect GTO, then obviously the optimal strategy is to also play GTO.)
I thought that Noam Brown was making a point related to this concept. It seems that I was mistaken and his point was related to something else entirely.
> But you're telling me that the concept of subgame applies in both directions (not only imperfect information by my opponent's perspective, but also imperfect information in my perspective). So when I say "my opponent's presumed range is X", that sentence doesn't describe 1 subgame, that sentence actually describes multiple subgames.
Yes; consider a chess move, every move is a subgame - a subtree of that game. He just can't safely say tree, because there are games that are better described as a graph than a tree. Actually, Two Envelope is such a game. The action switch, in Two Envelope, leads to the subgame in which you are at I[null] and need to decide whether to keep or switch - the same situation as you started in. Switch again and you arrive at the subgame I[null]. It contains itself.
> If the opponent's strategy is fixed, then our optimal strategy is to play 100% exploitative. If the opponent's strategy is not fixed, then the optimal strategy will be a mix with elements from both exploitative and GTO strategies. (Again, note that this only makes sense if the opponent is not playing perfect GTO. If the opponent is playing perfect GTO, then obviously the optimal strategy is to also play GTO.)
> That's why you can solve a single subgame in Two Envelopes game independently of other subgames, even though it's an imperfect-information game.
Obviously the EV of envelope 1 is the contents of envelope 1. If you know the probability of reaching it you can calculate the expected value of that envelope by multiplying by the probability of reaching it. But why are you multiplying by 1/2? Probability is defined in terms of sets. What are the set that makes it 1/2? Does that set contain only the subgames that are part of the subgame you are in?
> I'm not a mathematician, so I don't understand expressions like "counterfactual part of subgames emerging" or "there is an identity implied by being under imperfect information". I appreciate you writing back at length, but the majority of what you wrote went straight over my head.
# Defining Counterfactuals
Consider a fair coin flip. You have {Heads, Tails}. Lets assume you are going to get heads, take it as a given - that is what actually happens. It actually happens. It is factual. However, for the purposes of analysis, sometimes it doesn't really matter that we know what happened. We need to consider all the cases that didn't happen. Tails in our analysis would be the counterfactual. These two situations, the factual heads and the counterfactual tails, they're associated with each other. There is a set {HT} that contains both of them. E.g P(head) = |{H}|/|{H, T}|.
When I'm saying counterfactual I'm referring to the events that we didn't assume to be factual, but which we want to keep track of. Probability kind of drops these terms when it says "assume" because it says |{H}|/|{H, T}| becomes |{H}|/|{H}| which is equal to one. This is perfectly fine from a math perspective. The thing is that just like saying 5/x = y lets us move to 5=xy is valid, it has some assumptions built into it. Namely that x can't be zero. Our counterfactuals are a bit like the x, because they offer an easily hidden constraint on what it is valid to do. If you reintroduce uncertainty about whether or not you are in H, you have to do so in a way that takes you back to having the set {H, T}.
Let me show you a practical example of that to make the point very clear:
Let the value of heads be one and the value of tails be zero. P(Head) = 0.5 But assume heads on the same coin flip. P(Head|assumptions) = 1.0 But assume heads again on the same coin flip. P(Head|assumptions) = 1.0. Now since P(Head) is actually true with 1/2 probability: Ev(Head) = 1/2*P(Head|assumption) + 1/2P(Head|assumption)?
Well, it is certainty true that 1/2P(Head|assumption) is a correct term. So you can't say this is wrong on the basis of the assumptions alone. Being very precise, the problem is the neglected counterfactual wasn't handled. Every term leading up to the equation was technically true, but obviously we just did something really weird right? And to be very precise, we neglected the counterfactual associated with heads.
# Defining subgames
Most games can be written out as a game tree. I showed one in my previous post. When you move down the tree, you are in a subgame of that tree.
In a perfect information game like chess if you are in a subgame, a portion lower on the tree, the other parts of the tree don't matter anymore. Your results are independent of the rest of the game tree.
In imperfect information though, just because you are in a subtree doesn't mean you know which subtree you are in. Your actual view into the game is through the information you have. Just like in the {HT} case you have to consider the potential that you have both H or T, in a subgame you have to consider the potential you are in every subgame that is reachable given the information you've seen.
> This is an unsatisfactory answer, because it provides an "alternate path" to the correct answer, without demonstrating which step went wrong in the "incorrect path" that leads to the wrong answer.
Eh, I mean, I guess I'm cheating by claiming an interpretation they aren't trying to use. This is why I detest the "no true scottsman" setup to the question. We can create an identity that maps their wrong step into my formalism where what they did is wrong in the way I claim it is. Even though they didn't "try to do it" doesn't mean I can't see that they did do it. But apparently - even though game theory notation neatly avoids these pitfalls - we need to stick to the footgun notation. It just seems stupid to me. If you want to avoid the problem, use the notation that trivializes avoiding the error. I really like the other person's analogy to type errors, because it is such a similar idea to what I'm saying, but they just use a different part of math to assert it. There are ways you can just make this decision fail to typecheck. If you don't want to have these type of errors? Be stricter about your typing so at to prevent them.