Pages: 2
Rating : ⭐⭐⭐⭐⭐
Price: $10.99
Page 1 Preview
and the final policy found

And the final policy found

which the gambler reaches his goal, when it is +1. The state-value function then gives the

successive sweeps of value iteration, and the final policy found, for the case of .

Why does the optimal policy for the gambler's problem have such a curious form? In particular,

for capital of 50 it bets it all on one flip, but for capital of 51 it does not. Why is this a good

termination with capital of 0 and 100 dollars, giving them values of 0 and 1 respectively. Show

your results graphically as in Figure 4.6. Are your results stable as ?

Next:4.5 Asynchronous Dynamic ProgrammingUp:4 Dynamic ProgrammingPrevious:4.3

Policy Iteration

You are viewing 1/3rd of the document.Purchase the document to get full access instantly

Immediately available after payment
Both online and downloadable
No strings attached
How It Works
Login account
Login Your Account
Place in cart
Add to Cart
send in the money
Make payment
Document download
Download File

Uploaded by : Yashvi Sarkar

PageId: ELI1FF43FF