Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Although Brown says "imperfect-information games" here, he actually means a specific type of imperfect-information games: the type where the opponent's strategy is not fixed. We're talking about games where your opponent can change their strategy in order to exploit weaknesses in your strategy.

This isn't really the central problem of imperfect information. Consider that in a perfect information game, your opponent will also adjust their strategy to exploit weakness in your strategy. So if it was the central challenge, why does it happen in both? You can rule it out as what he was referring to, because it doesn't discriminate between the two types of games. The central challenge in imperfect information games is you have to play with respect to your information set, not the subgame you are in. So the policies and outcomes of other subgames influences the expected value of the subgame you are in. In perfect information, the only game in your information set is the subgame you are in. So you only have to play with respect to the subgame. That is what makes perfect information different from imperfect information. It discriminates between the two game types.



> This isn't really the central problem of imperfect information. Consider that in a perfect information game, your opponent will also adjust their strategy to exploit weakness in your strategy. So if it was the central challenge, why does it happen in both? You can rule it out as what he was referring to, because it doesn't discriminate between the two types of games.

The type of issue I was referring does not occur in perfect information games.

As a practical example, consider the concept of "balancing your range" in poker. If you play poker without doing that - if you play with a purely exploitative strategy where you are only trying to maximize your EV for each hand - then your strategy will be very easily exploitable by an adaptive opponent. You will frequently end up in river situations where your opponent can deduce whether you have a strong or weak hand, so they can fold to your strong hands and bluff you out of weak hands. In contrast, if you attempt to "balance your range" - that is, consider all the subgames - then you won't end up in these situations as badly. For example, when you make a particular river bet, your opponent might deduce that you have a strong hand 70% of the time and a bluff 30% of the time (as opposed to 100% and 0%).

This issue does not exist in perfect information games. Yes, my definition of "adjusting your strategy to your opponent's strategy" was overly broad to define this issue. But if you think about the poker example, where a poker player will might make a decision like "I need to bluff with this part of my range, because I need to support my strong hands [other subgames] by having some bluffs in my range in this situation" - you won't find a corresponding example from perfect information games like chess. This class of problems is unique to imperfect-information games in which an opponent is allowed to adapt their strategy to yours. If you fix the opponent's strategy, the issue disappears. If you turn the game into a perfect-information game, the issue disappears. Both requirements must be present for this issue to exist.

I thought that Noam Brown was talking about this issue in chapter 2. He discussed a simple coin toss game where one player took a strategy, and then the other player adapted by taking the optimal (exploitative) strategy against them. Then the other player changed their strategy, and the other player again adapted their strategy. And then he described a balanced (GTO) strategy. Then he said this as a conclusion:

> This shows that a player’s optimal strategy in a subgame can depend on the strategies and outcomes in other parts of the game. Thus, one cannot solve a subgame using information about that subgame alone. This is the central challenge of imperfect-information games as opposed to perfect-information games.

I thought that this corresponds perfectly to my poker example. If it doesn't, and it means something completely different, ok, sure. I'm not a mathematician. I can't even read the notation that's used in subsequent part of the paper.


> I thought that this corresponds perfectly to my poker example. If it doesn't, and it means something completely different, ok, sure. I'm not a mathematician. I can't even read the notation that's used in subsequent part of the paper.

I think you understood his point very well. I just think you're making a mistake in trying to recast his claim from "imperfect information games" have this property to "this narrow subset of imperfect information games" has this property. Both imperfect games with an opponent and imperfect games without an opponent have the property of needing to play as if you are in multiple subgames, because the definition of the problem is that you don't know which subgame you are in. His claim was that the policy in one subgame could influence the EV of another subgame - and in this problem, it does. I believe you're thinking of the EV of the envelope when you think you are proving this game doesn't have that property via calculation.

To see his claim applies to this game consider the case where P(Switch)=1. Being able to calculate the EV of the envelope's contents is a bit different from solving the subgame. Here, we have two subgames E1 and E2. We can know a priori what the expected values of envelope's contents in E1 and E2 are. But if you change your policy in E1, it changes your expected value in E2. If you doubt this, remember the core of the paradox again - choose to always switch and your EV is no longer the EV of the envelopes. Ergo, the EV of the subgame E1 is dependent on the strategies and policies of another subgame, E2.


> I think you understood his point very well. I just think you're making a mistake in trying to recast his claim from "imperfect information games" have this property to "this narrow subset of imperfect information games" has this property. Both imperfect games with an opponent and imperfect games without an opponent have the property of needing to play as if you are in multiple subgames, because the definition of the problem is that you don't know which subgame you are in.

If we take the poker example and we modify it by "locking" our opponent's strategy, then this property is removed. Suddenly the optimal strategy for us no longer includes any GTO-like thinking such as "balancing our range", we should simply maximize our EV for each hand "in a vacuum" without any thought to other subgames. We no longer care how our range looks to our opponent, because they are no longer able to change their strategy.

> To see his claim applies to this game consider the case where P(Switch)=1. The policy you chose in one subgame just changed the EV of another subgame.

Sorry, but I don't understand this. This sounds to me like we fix the probability of switching to 1, which means that we end up in infinite loop and the outcome can not be computed. I don't understand this premise, nor its implications for the EV of the other subgame.

> This is a subtle distinction that I mentioned earlier - the EV of the envelope as in the wikipedia problem isn't the EV of the subgame. So you're not solving the subgame if you figure out the EV of the envelope.

The expression you use "EV of the envelope" is ambiguous in this context. I'm not sure if you mean EV relative to the value of the total amount of money in the game, or if you mean EV relative to the value of the firstly-chosen envelope.

When we're talking about "solving a game", we're talking about finding the optimal decisions within a game to maximize the expected value from the game as a whole. In the Two Envelope game we have just one decision: switch or not. So we need to find out if EV(switch) > EV(stay), where both EVs are relative to the total amount of money in the game (not relative to the value of firstly-chosen envelope). We have many ways to conclude that both of these actions have expected value zero. We don't have to be able to compute the "EV of the envelope". We only need to know the relative difference between the EV of these 2 actions. There's many ways of computing them, and all of those ways lead us to the conclusion that the EV of both actions is zero. I'm not aware of any "incorrect" way of computing those EVs such that we would get a nonzero result.


> without any thought to other subgames

I guarantee you that your solution is going to have to incorporate a set somewhere that includes both subgames. You might forget it does, because you simplify to a scalar, but it is going to be there. In perfect information, it isn't there. In imperfect information it is.


> I guarantee you that your solution is going to have to incorporate a set somewhere that includes both subgames. You might forget it does, because you simplify to a scalar, but it is going to be there. In perfect information, it isn't there. In imperfect information it is.

I'm having a lot of trouble identifying what exactly it is that we disagree about. If you have identified what it is that we disagree on, can you please formulate the disagreement as a wager that can be simulated with code? That way we can easily resolve the disagreement (or conclude that we actually don't have a disagreement).


Okay.

Your opponents strategy in poker is fixed; they will always play the nash equilibrium strategy, they will never play another strategy. Their strategy is fixed. They will never change it from this setting.

You've claimed that subgame perfect play can be calculated without respect to the subgame you aren't in, because you can make a choice on the basis of the EV of the subgame you are in without respect to the subgames you aren't in.

I disagree. I think you still need to account for every subgame you are in as if you are in all of them.

Let the subgame you are in be you having KK and your opponent having AA. However, obviously - you only know that you have KK.

Therefore, you should be able to compute the strategy which is the best response to 37 suited and according to your logic it should be equal to the best response to AA. After all, you have no means of determining which subgame you are in. So you have to have the same response in both subgames.

So compute the best response for KK to AA and prove that this is also the best response to 37 suited.

However, you've claimed you don't need to calculate this with respect to other subgames. So your computation of 37 suited and your computation for AA must not be equal to each other - if they are, then you share terms. You calculated them with respect to each other.

Let Br = Best response.

Write a program which shows Br(p1, p2, I[KK]) != Br(p1, p2, I[KK]) and Br(p1, p2, I[KK]) = Br(p1, p2, I[KK]) simultaneously. (That is to say, both your policy and your opponents policy are fixed)

My contention is that you can't do this. You claim you can. You are free to use a simpler variant of poker - Kuhn poker - so that the computation becomes more tractable.


Ah, looks like I misunderstood the definition of the term "subgame". I thought that this would be one subgame: "I have KK preflop on the button, and my opponent's range is X". And another subgame might be: "I have 75o preflop on the button, and my opponent's range is X". From my opponent's perspective, these 2 situations would occur on the same level of the game tree and my opponent has no way of distinguishing between these (at this stage of the game tree) due to the hidden information, thus I thought they would be called subgames. But you're telling me that the concept of subgame applies in both directions (not only imperfect information by my opponent's perspective, but also imperfect information in my perspective). So when I say "my opponent's presumed range is X", that sentence doesn't describe 1 subgame, that sentence actually describes multiple subgames.

My earlier point was that if an opponent's strategy is fixed, then I don't have to balance how I play that KK compared to how I play that 75o. That I can play each hand "in a vacuum", in a strategy where the EV of each hand is maximized. Note that this idea is only relevant if my opponent is not playing perfect GTO (contrary to your example). For example, my opponent may have a weakness where their preflop fold frequency is the same regardless of how whether I raise 3BB or 2BB. Against this weakness the optimal play would be to raise 3BB with KK and raise 2BB with 75o. Obviously, this would be a bad strategy if my opponent was allowed to adapt their strategy to me, but it's the optimal play if my opponent's strategy is locked.

To clarify further:

- "exploitative style" poker strategy is made at the level "My hand is 96s, and my opponent's range is presumed to be Y"

- "GTO style" poker strategy is made at the level "My range is X, and my opponent's range is presumed to be Y"

If the opponent's strategy is fixed, then our optimal strategy is to play 100% exploitative. If the opponent's strategy is not fixed, then the optimal strategy will be a mix with elements from both exploitative and GTO strategies. (Again, note that this only makes sense if the opponent is not playing perfect GTO. If the opponent is playing perfect GTO, then obviously the optimal strategy is to also play GTO.)

I thought that Noam Brown was making a point related to this concept. It seems that I was mistaken and his point was related to something else entirely.


> But you're telling me that the concept of subgame applies in both directions (not only imperfect information by my opponent's perspective, but also imperfect information in my perspective). So when I say "my opponent's presumed range is X", that sentence doesn't describe 1 subgame, that sentence actually describes multiple subgames.

Yes; consider a chess move, every move is a subgame - a subtree of that game. He just can't safely say tree, because there are games that are better described as a graph than a tree. Actually, Two Envelope is such a game. The action switch, in Two Envelope, leads to the subgame in which you are at I[null] and need to decide whether to keep or switch - the same situation as you started in. Switch again and you arrive at the subgame I[null]. It contains itself.

> If the opponent's strategy is fixed, then our optimal strategy is to play 100% exploitative. If the opponent's strategy is not fixed, then the optimal strategy will be a mix with elements from both exploitative and GTO strategies. (Again, note that this only makes sense if the opponent is not playing perfect GTO. If the opponent is playing perfect GTO, then obviously the optimal strategy is to also play GTO.)

I agree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: