On reflection, it bites you on this particular problem too, because when you take the limit of P(Switch=1) the recursive switch case with a discount factor under the formalism mentioned above is zero.
So for every strategy selection except the one the article chooses, the EV is what I mentioned before. But for the last it is zero.
Its the same problem: they aren't walking the graph of the game tree. But its a bit different to my simplified version of the game that shows where there EV calculations went super wrong, because the actual game tree is of infinite length and there is /never/ a node with SWITCH that terminates in a reward, because every SWITCH recurses to another decision point.
You can solve it using various formalism by adding a discount factor. That lets the limit go to zero for the recursive cases. I guess technically its undefined since the game never ends, but I don't consider being stuck in a loop to be rational [1]. You really have to let your action probabilities vary as you solve imperfect information problems. It happens in MDP and MCCFR for a reason - you don't get the correct action without for the way your policy choice affects the outcome distribution. Your expectation is dependent on your policy it doesn't exist in a void.
That last statement, well, it should be very obvious in more complex situations. That it isn't obvious in this situation speaks to the problems deceptive simplicity.
[1]: Unlike people who think humans are irrational for having cognitive biases.
So for every strategy selection except the one the article chooses, the EV is what I mentioned before. But for the last it is zero.
Its the same problem: they aren't walking the graph of the game tree. But its a bit different to my simplified version of the game that shows where there EV calculations went super wrong, because the actual game tree is of infinite length and there is /never/ a node with SWITCH that terminates in a reward, because every SWITCH recurses to another decision point.
You can solve it using various formalism by adding a discount factor. That lets the limit go to zero for the recursive cases. I guess technically its undefined since the game never ends, but I don't consider being stuck in a loop to be rational [1]. You really have to let your action probabilities vary as you solve imperfect information problems. It happens in MDP and MCCFR for a reason - you don't get the correct action without for the way your policy choice affects the outcome distribution. Your expectation is dependent on your policy it doesn't exist in a void.
That last statement, well, it should be very obvious in more complex situations. That it isn't obvious in this situation speaks to the problems deceptive simplicity.
[1]: Unlike people who think humans are irrational for having cognitive biases.