Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On reflection, it bites you on this particular problem too, because when you take the limit of P(Switch=1) the recursive switch case with a discount factor under the formalism mentioned above is zero.

So for every strategy selection except the one the article chooses, the EV is what I mentioned before. But for the last it is zero.

Its the same problem: they aren't walking the graph of the game tree. But its a bit different to my simplified version of the game that shows where there EV calculations went super wrong, because the actual game tree is of infinite length and there is /never/ a node with SWITCH that terminates in a reward, because every SWITCH recurses to another decision point.

You can solve it using various formalism by adding a discount factor. That lets the limit go to zero for the recursive cases. I guess technically its undefined since the game never ends, but I don't consider being stuck in a loop to be rational [1]. You really have to let your action probabilities vary as you solve imperfect information problems. It happens in MDP and MCCFR for a reason - you don't get the correct action without for the way your policy choice affects the outcome distribution. Your expectation is dependent on your policy it doesn't exist in a void.

That last statement, well, it should be very obvious in more complex situations. That it isn't obvious in this situation speaks to the problems deceptive simplicity.

[1]: Unlike people who think humans are irrational for having cognitive biases.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: