rl-book

Table of Codes

# Caption
Code 1-1 Check the observation space and action space of the environment
Code 1-2 Closed-form agent for task MountainCar-v0
Code 1-3 Play an episode
Code 1-4 Test the performance by playing 100 episodes
Code 1-5 Check the observation space and action space of the task MountainCarContinuous-v0
Code 1-6 Closed-form agent for task MountainCarContinous-v0
Code 2-1 Use the example Bellman expectation equation
Code 2-2 Example to solve Bellman optimal equation
Code 2-3 Import the environment CliffWalking-v0 and check its information
Code 2-4 Find states values and action values using Bellman expectation equations
Code 2-5 Find optimal values using LP method
Code 2-6 Find an optimal deterministic policy from optimal action values
Code 3-1 Check the metadata of FrozenLake-v1
Code 3-2 Play an episode using the policy
Code 3-3 Calculate the episode rewards of the random policy
Code 3-4 Implementation of Policy Evaluation
Code 3-5 Evaluate the random policy
Code 3-6 Policy improvement
Code 3-7 Improve the random policy
Code 3-8 Policy iteration
Code 3-9 Use policy iteration to find the optimal policy and test it
Code 3-10 VI
Code 3-11 Find the optimal policy using the value iteration algorithm
Code 4-1 Play an episode
Code 4-2 On-Policy MC evaluation
Code 4-3 Visualize a 3-dimension np.array, which can be indexed by a state
Code 4-4 On-policy MC update with exploring start
Code 4-5 MC update with soft policy
Code 4-6 Policy evaluation based on importance sampling
Code 4-7 Importance sampling policy optimization with soft policy
Code 5-1 Initialize and visualize the task
Code 5-2 SARSA agent
Code 5-3 Train the agent
Code 5-4 Expected SARSA agent
Code 5-5 Q Learning agent
Code 5-6 Double Q Learning agent
Code 5-7 SARSA $(\lambda)$ agent
Code 6-1 Import the environment of MountainCar-v0
Code 6-2 The agent that always pushes right
Code 6-3 Tile coding
Code 6-4 SARSA agent with function approximation
Code 6-5 SARSA $(\lambda)$ agent with function approximation
Code 6-6 Experience replayer
Code 6-7 DQN agent with target network (with TensorFlow)
Code 6-8 DQN agent with target network (with PyTorch)
Code 6-9 Double DQN agent (with TensorFlow)
Code 6-10 Double DQN agent (with PyTorch)
Code 6-11 Dueling network (with TensorFlow)
Code 6-12 Dueling network (with PyTorch)
Code 6-13 Dueling DQN agent (with TensorFlow)
Code 6-14 Dueling DQN agent (with PyTorch)
Code 7-1 On-policy VPG agent (with TensorFlow)
Code 7-2 On-policy VPG agent (with PyTorch)
Code 7-3 On-policy VPG agent with baseline (with TensorFlow)
Code 7-4 On-policy VPG agent with baseline (with PyTorch)
Code 7-5 Off-policy PG agent (with TensorFlow)
Code 7-6 Off-policy PG agent (with PyTorch)
Code 7-7 Off-policy PG agent with baseline (with TensorFlow)
Code 7-8 Off-policy PG agent with baseline (with PyTorch)
Code 8-1 Action-value AC agent (with TensorFlow)
Code 8-2 Action-value AC agent (with PyTorch)
Code 8-3 Advantage AC agent (with TensorFlow)
Code 8-4 Advantage AC agent (with PyTorch)
Code 8-5 Eligibility-trace AC agent (with TensorFlow)
Code 8-6 Eligibility-trace AC agent (with PyTorch)
Code 8-7 Replayer for PPO
Code 8-8 PPO agent (with TensorFlow)
Code 8-9 PPO agent (with PyTorch)
Code 8-10 Calculate CG (with TensorFlow)
Code 8-11 Calculate CG (with PyTorch)
Code 8-12 NPG agent (with TensorFlow)
Code 8-13 NPG agent (with PyTorch)
Code 8-14 TRPO agent (with TensorFlow)
Code 8-15 TRPO agent (with PyTorch)
Code 8-16 OffPAC agent (with TensorFlow)
Code 8-17 OffPAC agent (with PyTorch)
Code 9-1 OU process
Code 9-2 DDPG agent (with TensorFlow)
Code 9-3 DDPG agent (with PyTorch)
Code 9-4 TD3 agent (with TensorFlow)
Code 9-5 TD3 agent (with PyTorch)
Code 10-1 Closed-form solution of LunarLander-v2
Code 10-2 Closed-form solution of LunarLanderContinuous-v2
Code 10-3 SQL agent (with TensorFlow)
Code 10-4 SQL agent (with PyTorch)
Code 10-5 SAC agent (with TensorFlow)
Code 10-6 SAC agent (with PyTorch)
Code 10-7 SAC with automatic entropy adjustment (with TensorFlow)
Code 10-8 SAC with automatic entropy adjustment (with PyTorch)
Code 10-9 SAC with automatic entropy adjustment for continuous action space (with TensorFlow)
Code 10-10 SAC with automatic entropy adjustment for continuous action space (with PyTorch)
Code 11-1 Closed-form solution of BipedalWalker-v3
Code 11-2 ES agent
Code 11-3 Train and test ES agent
Code 11-4 ARS agent
Code 12-1 Closed-form solution of PongNoFrameskip-v4
Code 12-2 Wrapped environment class
Code 12-3 Categorical DQN agent (with TensorFlow)
Code 12-4 Categorical DQN agent (with PyTorch)
Code 12-5 QR-DQN agent (with TensorFlow)
Code 12-6 QR-DQN agent (with PyTorch)
Code 12-7 Quantile network (with TensorFlow)
Code 12-8 Quantile network (with PyTorch)
Code 12-9 IQN agent (with TensorFlow)
Code 12-10 IQN agent (with PyTorch)
Code 13-1 The environment class BernoulliMABEnv
Code 13-2 Register the environment class BernoulliMABEnv into Gym
Code 13-3 $\epsilon$-greedy policy agent
Code 13-4 Evaluate average regret
Code 13-5 UCB1 agent
Code 13-6 Bayesian UCB agent
Code 13-7 Thompson sampling agent
Code 14-1 The constructor of the class BoardGameEnv
Code 14-2 The member function is_valid(), has_valid(), and get_valid() in the class BoardGameEnv
Code 14-3 The member function get_winner() in the class KInARowEnv
Code 14-4 The member function next_step() and get_next_state() in the class BoardGameEnv
Code 14-5 The member function reset(), step(), and render() in the class BoardGameEnv
Code 14-6 Exhaustive search agent
Code 14-7 Self-play
Code 14-8 Replay buffer of AlphaZero agent
Code 14-9 Network of AlphaZero agent (with TensorFlow)
Code 14-10 Network of AlphaZero agent (with PyTorch)
Code 14-11 AlphaZero agent (with TensorFlow)
Code 14-12 AlphaZero agent (with PyTorch)
Code 15-1 The environment class TigerEnv for the task “Tiger”
Code 15-2 Register the environment class TigerEnv
Code 15-3 Optimal policy when discounted factor $\gamma=1$
Code 15-4 Belief VI
Code 15-5 PBVI
Code 16-1 Adjust the camera
Code 16-2 Visualize the interaction with the environment
Code 16-3 Experience replayer for state–action pairs
Code 16-4 BC agent (with TensorFlow)
Code 16-5 BC agent (with PyTorch)
Code 16-6 GAIL-PPO agent (with TensorFlow)
Code 16-7 GAIL-PPO agent (with PyTorch)