And the final policy found

which the gambler reaches his goal, when it is +1. The state-value function then gives the

successive sweeps of value iteration, and the final policy found, for the case of .

Why does the optimal policy for the gambler's problem have such a curious form? In particular,

for capital of 50 it bets it all on one flip, but for capital of 51 it does not. Why is this a good

termination with capital of 0 and 100 dollars, giving them values of 0 and 1 respectively. Show

your results graphically as in Figure 4.6. Are your results stable as ?

Next:4.5 Asynchronous Dynamic ProgrammingUp:4 Dynamic ProgrammingPrevious:4.3

Policy Iteration