rl-book

Table of Codes

#	Caption
Code 1-1	Check the observation space and action space of the environment
Code 1-2	Closed-form agent for task `MountainCar-v0`
Code 1-3	Play an episode
Code 1-4	Test the performance by playing 100 episodes
Code 1-5	Check the observation space and action space of the task `MountainCarContinuous-v0`
Code 1-6	Closed-form agent for task `MountainCarContinous-v0`
Code 2-1	Use the example Bellman expectation equation
Code 2-2	Example to solve Bellman optimal equation
Code 2-3	Import the environment `CliffWalking-v0` and check its information
Code 2-4	Find states values and action values using Bellman expectation equations
Code 2-5	Find optimal values using LP method
Code 2-6	Find an optimal deterministic policy from optimal action values
Code 3-1	Check the metadata of `FrozenLake-v1`
Code 3-2	Play an episode using the policy
Code 3-3	Calculate the episode rewards of the random policy
Code 3-4	Implementation of Policy Evaluation
Code 3-5	Evaluate the random policy
Code 3-6	Policy improvement
Code 3-7	Improve the random policy
Code 3-8	Policy iteration
Code 3-9	Use policy iteration to find the optimal policy and test it
Code 3-10	VI
Code 3-11	Find the optimal policy using the value iteration algorithm
Code 4-1	Play an episode
Code 4-2	On-Policy MC evaluation
Code 4-3	Visualize a 3-dimension np.array, which can be indexed by a state
Code 4-4	On-policy MC update with exploring start
Code 4-5	MC update with soft policy
Code 4-6	Policy evaluation based on importance sampling
Code 4-7	Importance sampling policy optimization with soft policy
Code 5-1	Initialize and visualize the task
Code 5-2	SARSA agent
Code 5-3	Train the agent
Code 5-4	Expected SARSA agent
Code 5-5	Q Learning agent
Code 5-6	Double Q Learning agent
Code 5-7	SARSA $(\lambda)$ agent
Code 6-1	Import the environment of `MountainCar-v0`
Code 6-2	The agent that always pushes right
Code 6-3	Tile coding
Code 6-4	SARSA agent with function approximation
Code 6-5	SARSA $(\lambda)$ agent with function approximation
Code 6-6	Experience replayer
Code 6-7	DQN agent with target network (with TensorFlow)
Code 6-8	DQN agent with target network (with PyTorch)
Code 6-9	Double DQN agent (with TensorFlow)
Code 6-10	Double DQN agent (with PyTorch)
Code 6-11	Dueling network (with TensorFlow)
Code 6-12	Dueling network (with PyTorch)
Code 6-13	Dueling DQN agent (with TensorFlow)
Code 6-14	Dueling DQN agent (with PyTorch)
Code 7-1	On-policy VPG agent (with TensorFlow)
Code 7-2	On-policy VPG agent (with PyTorch)
Code 7-3	On-policy VPG agent with baseline (with TensorFlow)
Code 7-4	On-policy VPG agent with baseline (with PyTorch)
Code 7-5	Off-policy PG agent (with TensorFlow)
Code 7-6	Off-policy PG agent (with PyTorch)
Code 7-7	Off-policy PG agent with baseline (with TensorFlow)
Code 7-8	Off-policy PG agent with baseline (with PyTorch)
Code 8-1	Action-value AC agent (with TensorFlow)
Code 8-2	Action-value AC agent (with PyTorch)
Code 8-3	Advantage AC agent (with TensorFlow)
Code 8-4	Advantage AC agent (with PyTorch)
Code 8-5	Eligibility-trace AC agent (with TensorFlow)
Code 8-6	Eligibility-trace AC agent (with PyTorch)
Code 8-7	Replayer for PPO
Code 8-8	PPO agent (with TensorFlow)
Code 8-9	PPO agent (with PyTorch)
Code 8-10	Calculate CG (with TensorFlow)
Code 8-11	Calculate CG (with PyTorch)
Code 8-12	NPG agent (with TensorFlow)
Code 8-13	NPG agent (with PyTorch)
Code 8-14	TRPO agent (with TensorFlow)
Code 8-15	TRPO agent (with PyTorch)
Code 8-16	OffPAC agent (with TensorFlow)
Code 8-17	OffPAC agent (with PyTorch)
Code 9-1	OU process
Code 9-2	DDPG agent (with TensorFlow)
Code 9-3	DDPG agent (with PyTorch)
Code 9-4	TD3 agent (with TensorFlow)
Code 9-5	TD3 agent (with PyTorch)
Code 10-1	Closed-form solution of `LunarLander-v2`
Code 10-2	Closed-form solution of `LunarLanderContinuous-v2`
Code 10-3	SQL agent (with TensorFlow)
Code 10-4	SQL agent (with PyTorch)
Code 10-5	SAC agent (with TensorFlow)
Code 10-6	SAC agent (with PyTorch)
Code 10-7	SAC with automatic entropy adjustment (with TensorFlow)
Code 10-8	SAC with automatic entropy adjustment (with PyTorch)
Code 10-9	SAC with automatic entropy adjustment for continuous action space (with TensorFlow)
Code 10-10	SAC with automatic entropy adjustment for continuous action space (with PyTorch)
Code 11-1	Closed-form solution of `BipedalWalker-v3`
Code 11-2	ES agent
Code 11-3	Train and test ES agent
Code 11-4	ARS agent
Code 12-1	Closed-form solution of `PongNoFrameskip-v4`
Code 12-2	Wrapped environment class
Code 12-3	Categorical DQN agent (with TensorFlow)
Code 12-4	Categorical DQN agent (with PyTorch)
Code 12-5	QR-DQN agent (with TensorFlow)
Code 12-6	QR-DQN agent (with PyTorch)
Code 12-7	Quantile network (with TensorFlow)
Code 12-8	Quantile network (with PyTorch)
Code 12-9	IQN agent (with TensorFlow)
Code 12-10	IQN agent (with PyTorch)
Code 13-1	The environment class `BernoulliMABEnv`
Code 13-2	Register the environment class `BernoulliMABEnv` into Gym
Code 13-3	$\epsilon$-greedy policy agent
Code 13-4	Evaluate average regret
Code 13-5	UCB1 agent
Code 13-6	Bayesian UCB agent
Code 13-7	Thompson sampling agent
Code 14-1	The constructor of the class `BoardGameEnv`
Code 14-2	The member function `is_valid()`, `has_valid()`, and `get_valid()` in the class `BoardGameEnv`
Code 14-3	The member function `get_winner()` in the class `KInARowEnv`
Code 14-4	The member function `next_step()` and `get_next_state()` in the class `BoardGameEnv`
Code 14-5	The member function `reset()`, `step()`, and `render()` in the class `BoardGameEnv`
Code 14-6	Exhaustive search agent
Code 14-7	Self-play
Code 14-8	Replay buffer of AlphaZero agent
Code 14-9	Network of AlphaZero agent (with TensorFlow)
Code 14-10	Network of AlphaZero agent (with PyTorch)
Code 14-11	AlphaZero agent (with TensorFlow)
Code 14-12	AlphaZero agent (with PyTorch)
Code 15-1	The environment class `TigerEnv` for the task “Tiger”
Code 15-2	Register the environment class `TigerEnv`
Code 15-3	Optimal policy when discounted factor $\gamma=1$
Code 15-4	Belief VI
Code 15-5	PBVI
Code 16-1	Adjust the camera
Code 16-2	Visualize the interaction with the environment
Code 16-3	Experience replayer for state–action pairs
Code 16-4	BC agent (with TensorFlow)
Code 16-5	BC agent (with PyTorch)
Code 16-6	GAIL-PPO agent (with TensorFlow)
Code 16-7	GAIL-PPO agent (with PyTorch)