舉報

會員
Hands-On Reinforcement Learning with Python
Ifyou'reamachinelearningdeveloperordeeplearningenthusiastinterestedinartificialintelligenceandwanttolearnaboutreinforcementlearningfromscratch,thisbookisforyou.Someknowledgeoflinearalgebra,calculus,andthePythonprogramminglanguagewillhelpyouunderstandtheconceptscoveredinthisbook.
最新章節
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- Chapter 13
- Chapter 12
- Chapter 11
- Chapter 10
品牌:中圖公司
上架時間:2021-06-18 18:03:01
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Leave a review - let other readers know what you think 更新時間:2021-06-18 19:12:48
- Other Books You May Enjoy
- Chapter 13
- Chapter 12
- Chapter 11
- Chapter 10
- Chapter 9
- Chapter 8
- Chapter 7
- Chapter 6
- Chapter 5
- Chapter 4
- Chapter 3
- Chapter 2
- Chapter 1
- Assessments
- Further reading
- Questions
- Summary
- Inverse reinforcement learning
- MAXQ Value Function Decomposition
- Hierarchical reinforcement learning
- Hindsight experience replay
- Deep Q learning from demonstrations
- Learning from human preference
- Imagination augmented agents
- Recent Advancements and Next Steps
- Further reading
- Questions
- Summary
- Car racing
- Training the network
- Replay memory
- Dueling network
- Environment wrapper functions
- Capstone Project – Car Racing Using DQN
- Further reading
- Questions
- Summary
- Proximal Policy Optimization
- Trust Region Policy Optimization
- Swinging a pendulum
- Deep deterministic policy gradient
- Lunar Lander using policy gradients
- Policy gradient
- Policy Gradients and Optimization
- Further reading
- Questions
- Summary
- Visualization in TensorBoard
- Driving up a mountain with A3C
- How A3C works
- The architecture of A3C
- The three As
- The Asynchronous Advantage Actor Critic
- The Asynchronous Advantage Actor Critic Network
- Further reading
- Questions
- Summary
- Architecture of DARQN
- DARQN
- Doom with DRQN
- Basic Doom game
- Training an agent to play Doom
- Architecture of DRQN
- DRQN
- Playing Doom with a Deep Recurrent Q Network
- Further reading
- Questions
- Summary
- Dueling network architecture
- Prioritized experience replay
- Double DQN
- Building an agent to play Atari games
- Understanding the algorithm
- Clipping rewards
- Target network
- Experience replay
- Convolutional network
- Architecture of DQN
- What is a Deep Q Network?
- Atari Games with Deep Q Network
- Further reading
- Questions
- Summary
- Classifying fashion products using CNN
- CNN architecture
- Fully connected layer
- Pooling layer
- Convolutional layer
- Convolutional neural networks
- Generating song lyrics using LSTM RNN
- Long Short-Term Memory RNN
- Backpropagation through time
- RNN
- Neural networks in TensorFlow
- Gradient descent
- Deep diving into ANN
- Activation functions
- Output layer
- Hidden layer
- Input layer
- ANNs
- Artificial neurons
- Deep Learning Fundamentals
- Further reading
- Questions
- Summary
- Contextual bandits
- Identifying the right advertisement banner using MAB
- Applications of MAB
- The Thompson sampling algorithm
- The upper confidence bound algorithm
- The softmax exploration algorithm
- The epsilon-greedy policy
- The MAB problem
- Multi-Armed Bandit Problem
- Further reading
- Questions
- Summary
- The difference between Q learning and SARSA
- Solving the taxi problem using SARSA
- SARSA
- Solving the taxi problem using Q learning
- Q learning
- TD control
- TD prediction
- TD learning
- Temporal Difference Learning
- Further reading
- Questions
- Summary
- Off-policy Monte Carlo control
- On-policy Monte Carlo control
- Monte Carlo exploration starts
- Monte Carlo control
- Let's play Blackjack with Monte Carlo
- Every visit Monte Carlo
- First visit Monte Carlo
- Monte Carlo prediction
- Estimating the value of pi using Monte Carlo
- Monte Carlo methods
- Gaming with Monte Carlo Methods
- Further reading
- Questions
- Summary
- Policy iteration
- Value iteration
- Solving the frozen lake problem
- Policy iteration
- Value iteration
- Dynamic programming
- Solving the Bellman equation
- Deriving the Bellman equation for value and Q functions
- The Bellman equation and optimality
- State-action value function (Q function)
- State value function
- The policy function
- Discount factor
- Episodic and continuous tasks
- Rewards and returns
- Markov Decision Process
- The Markov chain and Markov process
- The Markov Decision Process and Dynamic Programming
- Further reading
- Questions
- Summary
- Adding scope
- TensorBoard
- Sessions
- Computation graph
- Placeholders
- Constants
- Variables
- Variables constants and placeholders
- TensorFlow
- Building a video game bot
- OpenAI Universe
- Training a robot to walk
- Basic simulations
- OpenAI Gym
- Common error fixes
- Installing OpenAI Gym and Universe
- Installing Docker
- Installing Anaconda
- Setting up your machine
- Getting Started with OpenAI and TensorFlow
- Further reading
- Questions
- Summary
- Natural Language Processing and Computer Vision
- Finance
- Inventory management
- Manufacturing
- Medicine and healthcare
- Education
- Applications of RL
- ViZDoom
- Project Malmo
- RL-Glue
- DeepMind Lab
- OpenAI Gym and Universe
- RL platforms
- Single and multi-agent environment
- Episodic and non-episodic environment
- Continuous environment
- Discrete environment
- Partially observable environment
- Fully observable environment
- Stochastic environment
- Deterministic environment
- Types of RL environment
- Agent environment interface
- Model
- Value function
- Policy function
- Agent
- Elements of RL
- How RL differs from other ML paradigms
- RL algorithm
- What is RL?
- Introduction to Reinforcement Learning
- Reviews
- Get in touch
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewers
- About the author
- Contributors
- PacktPub.com
- Why subscribe?
- Packt Upsell
- 版權信息
- 封面
- 封面
- 版權信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the author
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Introduction to Reinforcement Learning
- What is RL?
- RL algorithm
- How RL differs from other ML paradigms
- Elements of RL
- Agent
- Policy function
- Value function
- Model
- Agent environment interface
- Types of RL environment
- Deterministic environment
- Stochastic environment
- Fully observable environment
- Partially observable environment
- Discrete environment
- Continuous environment
- Episodic and non-episodic environment
- Single and multi-agent environment
- RL platforms
- OpenAI Gym and Universe
- DeepMind Lab
- RL-Glue
- Project Malmo
- ViZDoom
- Applications of RL
- Education
- Medicine and healthcare
- Manufacturing
- Inventory management
- Finance
- Natural Language Processing and Computer Vision
- Summary
- Questions
- Further reading
- Getting Started with OpenAI and TensorFlow
- Setting up your machine
- Installing Anaconda
- Installing Docker
- Installing OpenAI Gym and Universe
- Common error fixes
- OpenAI Gym
- Basic simulations
- Training a robot to walk
- OpenAI Universe
- Building a video game bot
- TensorFlow
- Variables constants and placeholders
- Variables
- Constants
- Placeholders
- Computation graph
- Sessions
- TensorBoard
- Adding scope
- Summary
- Questions
- Further reading
- The Markov Decision Process and Dynamic Programming
- The Markov chain and Markov process
- Markov Decision Process
- Rewards and returns
- Episodic and continuous tasks
- Discount factor
- The policy function
- State value function
- State-action value function (Q function)
- The Bellman equation and optimality
- Deriving the Bellman equation for value and Q functions
- Solving the Bellman equation
- Dynamic programming
- Value iteration
- Policy iteration
- Solving the frozen lake problem
- Value iteration
- Policy iteration
- Summary
- Questions
- Further reading
- Gaming with Monte Carlo Methods
- Monte Carlo methods
- Estimating the value of pi using Monte Carlo
- Monte Carlo prediction
- First visit Monte Carlo
- Every visit Monte Carlo
- Let's play Blackjack with Monte Carlo
- Monte Carlo control
- Monte Carlo exploration starts
- On-policy Monte Carlo control
- Off-policy Monte Carlo control
- Summary
- Questions
- Further reading
- Temporal Difference Learning
- TD learning
- TD prediction
- TD control
- Q learning
- Solving the taxi problem using Q learning
- SARSA
- Solving the taxi problem using SARSA
- The difference between Q learning and SARSA
- Summary
- Questions
- Further reading
- Multi-Armed Bandit Problem
- The MAB problem
- The epsilon-greedy policy
- The softmax exploration algorithm
- The upper confidence bound algorithm
- The Thompson sampling algorithm
- Applications of MAB
- Identifying the right advertisement banner using MAB
- Contextual bandits
- Summary
- Questions
- Further reading
- Deep Learning Fundamentals
- Artificial neurons
- ANNs
- Input layer
- Hidden layer
- Output layer
- Activation functions
- Deep diving into ANN
- Gradient descent
- Neural networks in TensorFlow
- RNN
- Backpropagation through time
- Long Short-Term Memory RNN
- Generating song lyrics using LSTM RNN
- Convolutional neural networks
- Convolutional layer
- Pooling layer
- Fully connected layer
- CNN architecture
- Classifying fashion products using CNN
- Summary
- Questions
- Further reading
- Atari Games with Deep Q Network
- What is a Deep Q Network?
- Architecture of DQN
- Convolutional network
- Experience replay
- Target network
- Clipping rewards
- Understanding the algorithm
- Building an agent to play Atari games
- Double DQN
- Prioritized experience replay
- Dueling network architecture
- Summary
- Questions
- Further reading
- Playing Doom with a Deep Recurrent Q Network
- DRQN
- Architecture of DRQN
- Training an agent to play Doom
- Basic Doom game
- Doom with DRQN
- DARQN
- Architecture of DARQN
- Summary
- Questions
- Further reading
- The Asynchronous Advantage Actor Critic Network
- The Asynchronous Advantage Actor Critic
- The three As
- The architecture of A3C
- How A3C works
- Driving up a mountain with A3C
- Visualization in TensorBoard
- Summary
- Questions
- Further reading
- Policy Gradients and Optimization
- Policy gradient
- Lunar Lander using policy gradients
- Deep deterministic policy gradient
- Swinging a pendulum
- Trust Region Policy Optimization
- Proximal Policy Optimization
- Summary
- Questions
- Further reading
- Capstone Project – Car Racing Using DQN
- Environment wrapper functions
- Dueling network
- Replay memory
- Training the network
- Car racing
- Summary
- Questions
- Further reading
- Recent Advancements and Next Steps
- Imagination augmented agents
- Learning from human preference
- Deep Q learning from demonstrations
- Hindsight experience replay
- Hierarchical reinforcement learning
- MAXQ Value Function Decomposition
- Inverse reinforcement learning
- Summary
- Questions
- Further reading
- Assessments
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 8
- Chapter 9
- Chapter 10
- Chapter 11
- Chapter 12
- Chapter 13
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-18 19:12:48