Code 1-1 |
Check the observation space and action space of the environment |
Code 1-2 |
Closed-form agent for task MountainCar-v0 |
Code 1-3 |
Play an episode |
Code 1-4 |
Test the performance by playing 100 episodes |
Code 1-5 |
Check the observation space and action space of the task MountainCarContinuous-v0 |
Code 1-6 |
Closed-form agent for task MountainCarContinous-v0 |
Code 2-1 |
Use the example Bellman expectation equation |
Code 2-2 |
Example to solve Bellman optimal equation |
Code 2-3 |
Import the environment CliffWalking-v0 and check its information |
Code 2-4 |
Find states values and action values using Bellman expectation equations |
Code 2-5 |
Find optimal values using LP method |
Code 2-6 |
Find an optimal deterministic policy from optimal action values |
Code 3-1 |
Check the metadata of FrozenLake-v1 |
Code 3-2 |
Play an episode using the policy |
Code 3-3 |
Calculate the episode rewards of the random policy |
Code 3-4 |
Implementation of Policy Evaluation |
Code 3-5 |
Evaluate the random policy |
Code 3-6 |
Policy improvement |
Code 3-7 |
Improve the random policy |
Code 3-8 |
Policy iteration |
Code 3-9 |
Use policy iteration to find the optimal policy and test it |
Code 3-10 |
VI |
Code 3-11 |
Find the optimal policy using the value iteration algorithm |
Code 4-1 |
Play an episode |
Code 4-2 |
On-Policy MC evaluation |
Code 4-3 |
Visualize a 3-dimension np.array, which can be indexed by a state |
Code 4-4 |
On-policy MC update with exploring start |
Code 4-5 |
MC update with soft policy |
Code 4-6 |
Policy evaluation based on importance sampling |
Code 4-7 |
Importance sampling policy optimization with soft policy |
Code 5-1 |
Initialize and visualize the task |
Code 5-2 |
SARSA agent |
Code 5-3 |
Train the agent |
Code 5-4 |
Expected SARSA agent |
Code 5-5 |
Q Learning agent |
Code 5-6 |
Double Q Learning agent |
Code 5-7 |
SARSA $(\lambda)$ agent |
Code 6-1 |
Import the environment of MountainCar-v0 |
Code 6-2 |
The agent that always pushes right |
Code 6-3 |
Tile coding |
Code 6-4 |
SARSA agent with function approximation |
Code 6-5 |
SARSA $(\lambda)$ agent with function approximation |
Code 6-6 |
Experience replayer |
Code 6-7 |
DQN agent with target network (with TensorFlow) |
Code 6-8 |
DQN agent with target network (with PyTorch) |
Code 6-9 |
Double DQN agent (with TensorFlow) |
Code 6-10 |
Double DQN agent (with PyTorch) |
Code 6-11 |
Dueling network (with TensorFlow) |
Code 6-12 |
Dueling network (with PyTorch) |
Code 6-13 |
Dueling DQN agent (with TensorFlow) |
Code 6-14 |
Dueling DQN agent (with PyTorch) |
Code 7-1 |
On-policy VPG agent (with TensorFlow) |
Code 7-2 |
On-policy VPG agent (with PyTorch) |
Code 7-3 |
On-policy VPG agent with baseline (with TensorFlow) |
Code 7-4 |
On-policy VPG agent with baseline (with PyTorch) |
Code 7-5 |
Off-policy PG agent (with TensorFlow) |
Code 7-6 |
Off-policy PG agent (with PyTorch) |
Code 7-7 |
Off-policy PG agent with baseline (with TensorFlow) |
Code 7-8 |
Off-policy PG agent with baseline (with PyTorch) |
Code 8-1 |
Action-value AC agent (with TensorFlow) |
Code 8-2 |
Action-value AC agent (with PyTorch) |
Code 8-3 |
Advantage AC agent (with TensorFlow) |
Code 8-4 |
Advantage AC agent (with PyTorch) |
Code 8-5 |
Eligibility-trace AC agent (with TensorFlow) |
Code 8-6 |
Eligibility-trace AC agent (with PyTorch) |
Code 8-7 |
Replayer for PPO |
Code 8-8 |
PPO agent (with TensorFlow) |
Code 8-9 |
PPO agent (with PyTorch) |
Code 8-10 |
Calculate CG (with TensorFlow) |
Code 8-11 |
Calculate CG (with PyTorch) |
Code 8-12 |
NPG agent (with TensorFlow) |
Code 8-13 |
NPG agent (with PyTorch) |
Code 8-14 |
TRPO agent (with TensorFlow) |
Code 8-15 |
TRPO agent (with PyTorch) |
Code 8-16 |
OffPAC agent (with TensorFlow) |
Code 8-17 |
OffPAC agent (with PyTorch) |
Code 9-1 |
OU process |
Code 9-2 |
DDPG agent (with TensorFlow) |
Code 9-3 |
DDPG agent (with PyTorch) |
Code 9-4 |
TD3 agent (with TensorFlow) |
Code 9-5 |
TD3 agent (with PyTorch) |
Code 10-1 |
Closed-form solution of LunarLander-v2 |
Code 10-2 |
Closed-form solution of LunarLanderContinuous-v2 |
Code 10-3 |
SQL agent (with TensorFlow) |
Code 10-4 |
SQL agent (with PyTorch) |
Code 10-5 |
SAC agent (with TensorFlow) |
Code 10-6 |
SAC agent (with PyTorch) |
Code 10-7 |
SAC with automatic entropy adjustment (with TensorFlow) |
Code 10-8 |
SAC with automatic entropy adjustment (with PyTorch) |
Code 10-9 |
SAC with automatic entropy adjustment for continuous action space (with TensorFlow) |
Code 10-10 |
SAC with automatic entropy adjustment for continuous action space (with PyTorch) |
Code 11-1 |
Closed-form solution of BipedalWalker-v3 |
Code 11-2 |
ES agent |
Code 11-3 |
Train and test ES agent |
Code 11-4 |
ARS agent |
Code 12-1 |
Closed-form solution of PongNoFrameskip-v4 |
Code 12-2 |
Wrapped environment class |
Code 12-3 |
Categorical DQN agent (with TensorFlow) |
Code 12-4 |
Categorical DQN agent (with PyTorch) |
Code 12-5 |
QR-DQN agent (with TensorFlow) |
Code 12-6 |
QR-DQN agent (with PyTorch) |
Code 12-7 |
Quantile network (with TensorFlow) |
Code 12-8 |
Quantile network (with PyTorch) |
Code 12-9 |
IQN agent (with TensorFlow) |
Code 12-10 |
IQN agent (with PyTorch) |
Code 13-1 |
The environment class BernoulliMABEnv |
Code 13-2 |
Register the environment class BernoulliMABEnv into Gym |
Code 13-3 |
$\epsilon$-greedy policy agent |
Code 13-4 |
Evaluate average regret |
Code 13-5 |
UCB1 agent |
Code 13-6 |
Bayesian UCB agent |
Code 13-7 |
Thompson sampling agent |
Code 14-1 |
The constructor of the class BoardGameEnv |
Code 14-2 |
The member function is_valid() , has_valid() , and get_valid() in the class BoardGameEnv |
Code 14-3 |
The member function get_winner() in the class KInARowEnv |
Code 14-4 |
The member function next_step() and get_next_state() in the class BoardGameEnv |
Code 14-5 |
The member function reset() , step() , and render() in the class BoardGameEnv |
Code 14-6 |
Exhaustive search agent |
Code 14-7 |
Self-play |
Code 14-8 |
Replay buffer of AlphaZero agent |
Code 14-9 |
Network of AlphaZero agent (with TensorFlow) |
Code 14-10 |
Network of AlphaZero agent (with PyTorch) |
Code 14-11 |
AlphaZero agent (with TensorFlow) |
Code 14-12 |
AlphaZero agent (with PyTorch) |
Code 15-1 |
The environment class TigerEnv for the task “Tiger” |
Code 15-2 |
Register the environment class TigerEnv |
Code 15-3 |
Optimal policy when discounted factor $\gamma=1$ |
Code 15-4 |
Belief VI |
Code 15-5 |
PBVI |
Code 16-1 |
Adjust the camera |
Code 16-2 |
Visualize the interaction with the environment |
Code 16-3 |
Experience replayer for state–action pairs |
Code 16-4 |
BC agent (with TensorFlow) |
Code 16-5 |
BC agent (with PyTorch) |
Code 16-6 |
GAIL-PPO agent (with TensorFlow) |
Code 16-7 |
GAIL-PPO agent (with PyTorch) |