舉報

會員
PyTorch 1.x Reinforcement Learning Cookbook
Reinforcementlearning(RL)isabranchofmachinelearningthathasgainedpopularityinrecenttimes.ItallowsyoutotrainAImodelsthatlearnfromtheirownactionsandoptimizetheirbehavior.PyTorchhasalsoemergedasthepreferredtoolfortrainingRLmodelsbecauseofitsefficiencyandeaseofuse.Withthisbook,you'llexploretheimportantRLconceptsandtheimplementationofalgorithmsinPyTorch1.x.Therecipesinthebook,alongwithreal-worldexamples,willhelpyoumastervariousRLtechniques,suchasdynamicprogramming,MonteCarlosimulations,temporaldifference,andQ-learning.You'llalsogaininsightsintoindustry-specificapplicationsofthesetechniques.Laterchapterswillguideyouthroughsolvingproblemssuchasthemulti-armedbanditproblemandthecartpoleproblemusingthemulti-armedbanditalgorithmandfunctionapproximation.You'llalsolearnhowtouseDeepQ-NetworkstocompleteAtarigames,alongwithhowtoeffectivelyimplementpolicygradients.Finally,you'lldiscoverhowRLtechniquesareappliedtoBlackjack,Gridworldenvironments,internetadvertising,andtheFlappyBirdgame.Bytheendofthisbook,you'llhavedevelopedtheskillsyouneedtoimplementpopularRLalgorithmsanduseRLtechniquestosolvereal-worldproblems.
目錄(273章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- PyTorch 1.x Reinforcement Learning Cookbook
- About Packt
- Why subscribe?
- Contributors
- About the author
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Sections
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Get in touch
- Reviews
- Getting Started with Reinforcement Learning and PyTorch
- Setting up the working environment
- How to do it...
- How it works...
- There's more...
- See also
- Installing OpenAI Gym
- How to do it...
- How it works...
- There's more...
- See also
- Simulating Atari environments
- How to do it...
- How it works...
- There's more...
- See also
- Simulating the CartPole environment
- How to do it...
- How it works...
- There's more...
- Reviewing the fundamentals of PyTorch
- How to do it...
- There's more...
- See also
- Implementing and evaluating a random search policy
- How to do it...
- How it works...
- There's more...
- Developing the hill-climbing algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Developing a policy gradient algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Markov Decision Processes and Dynamic Programming
- Technical requirements
- Creating a Markov chain
- How to do it...
- How it works...
- There's more...
- See also
- Creating an MDP
- How to do it...
- How it works...
- There's more...
- See also
- Performing policy evaluation
- How to do it...
- How it works...
- There's more...
- Simulating the FrozenLake environment
- Getting ready
- How to do it...
- How it works...
- There's more...
- Solving an MDP with a value iteration algorithm
- How to do it...
- How it works...
- There's more...
- Solving an MDP with a policy iteration algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Solving the coin-flipping gamble problem
- How to do it...
- How it works...
- There's more...
- Monte Carlo Methods for Making Numerical Estimations
- Calculating Pi using the Monte Carlo method
- How to do it...
- How it works...
- There's more...
- See also
- Performing Monte Carlo policy evaluation
- How to do it...
- How it works...
- There's more...
- Playing Blackjack with Monte Carlo prediction
- How to do it...
- How it works...
- There's more...
- See also
- Performing on-policy Monte Carlo control
- How to do it...
- How it works...
- There's more...
- Developing MC control with epsilon-greedy policy
- How to do it...
- How it works...
- Performing off-policy Monte Carlo control
- How to do it...
- How it works...
- There's more...
- See also
- Developing MC control with weighted importance sampling
- How to do it...
- How it works...
- There's more...
- See also
- Temporal Difference and Q-Learning
- Setting up the Cliff Walking environment playground
- Getting ready
- How to do it...
- How it works...
- Developing the Q-learning algorithm
- How to do it...
- How it works...
- There's more...
- Setting up the Windy Gridworld environment playground
- How to do it...
- How it works...
- Developing the SARSA algorithm
- How to do it...
- How it works...
- There's more...
- Solving the Taxi problem with Q-learning
- Getting ready
- How to do it...
- How it works...
- Solving the Taxi problem with SARSA
- How to do it...
- How it works...
- There's more...
- Developing the Double Q-learning algorithm
- How to do it...
- How it works...
- See also
- Solving Multi-armed Bandit Problems
- Creating a multi-armed bandit environment
- How to do it...
- How it works...
- Solving multi-armed bandit problems with the epsilon-greedy policy
- How to do it...
- How it works...
- There's more...
- Solving multi-armed bandit problems with the softmax exploration
- How to do it...
- How it works...
- Solving multi-armed bandit problems with the upper confidence bound algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Solving internet advertising problems with a multi-armed bandit
- How to do it...
- How it works...
- Solving multi-armed bandit problems with the Thompson sampling algorithm
- How to do it...
- How it works...
- See also
- Solving internet advertising problems with contextual bandits
- How to do it...
- How it works...
- Scaling Up Learning with Function Approximation
- Setting up the Mountain Car environment playground
- Getting ready
- How to do it...
- How it works...
- Estimating Q-functions with gradient descent approximation
- How to do it...
- How it works...
- See also
- Developing Q-learning with linear function approximation
- How to do it...
- How it works...
- Developing SARSA with linear function approximation
- How to do it...
- How it works...
- Incorporating batching using experience replay
- How to do it...
- How it works...
- Developing Q-learning with neural network function approximation
- How to do it...
- How it works...
- See also
- Solving the CartPole problem with function approximation
- How to do it...
- How it works...
- Deep Q-Networks in Action
- Developing deep Q-networks
- How to do it...
- How it works...
- See also
- Improving DQNs with experience replay
- How to do it...
- How it works...
- Developing double deep Q-Networks
- How to do it...
- How it works...
- Tuning double DQN hyperparameters for CartPole
- How to do it...
- How it works...
- Developing Dueling deep Q-Networks
- How to do it...
- How it works...
- Applying Deep Q-Networks to Atari games
- How to do it...
- How it works...
- Using convolutional neural networks for Atari games
- How to do it...
- How it works...
- See also
- Implementing Policy Gradients and Policy Optimization
- Implementing the REINFORCE algorithm
- How to do it...
- How it works...
- See also
- Developing the REINFORCE algorithm with baseline
- How to do it...
- How it works...
- Implementing the actor-critic algorithm
- How to do it...
- How it works...
- Solving Cliff Walking with the actor-critic algorithm
- How to do it...
- How it works...
- Setting up the continuous Mountain Car environment
- How to do it...
- How it works...
- Solving the continuous Mountain Car environment with the advantage actor-critic network
- How to do it...
- How it works...
- There's more...
- See also
- Playing CartPole through the cross-entropy method
- How to do it...
- How it works...
- Capstone Project – Playing Flappy Bird with DQN
- Setting up the game environment
- Getting ready
- How to do it...
- How it works...
- Building a Deep Q-Network to play Flappy Bird
- How to do it...
- How it works...
- Training and tuning the network
- How to do it...
- How it works...
- Deploying the model and playing the game
- How to do it...
- How it works...
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 12:35:24
推薦閱讀
- 工業(yè)機(jī)器人虛擬仿真實(shí)例教程:KUKA.Sim Pro(全彩版)
- Mastering Proxmox(Third Edition)
- 三菱FX3U/5U PLC從入門到精通
- Java實(shí)用組件集
- 自動控制原理
- 嵌入式Linux上的C語言編程實(shí)踐
- OpenStack Cloud Computing Cookbook(Second Edition)
- DevOps:Continuous Delivery,Integration,and Deployment with DevOps
- 嵌入式操作系統(tǒng)
- 悟透JavaScript
- 人工智能技術(shù)入門
- MongoDB 4 Quick Start Guide
- 計算智能算法及其生產(chǎn)調(diào)度應(yīng)用
- Cloudera Hadoop大數(shù)據(jù)平臺實(shí)戰(zhàn)指南
- 精通ROS機(jī)器人編程(原書第2版)
- 網(wǎng)絡(luò)信息安全項(xiàng)目教程
- 軟件質(zhì)量管理實(shí)踐
- Architectural Patterns
- 嵌入式系統(tǒng)應(yīng)用開發(fā)基礎(chǔ)
- Linux應(yīng)用程序設(shè)計
- Web性能權(quán)威指南
- 車輛動力總成電控系統(tǒng)標(biāo)定技術(shù)
- Alexa Skills Projects
- 工業(yè)自動化儀器儀表與裝置修理工
- 數(shù)據(jù)恢復(fù)和PC-3000 for Windows基礎(chǔ)與應(yīng)用案例全解析
- 零起點(diǎn)學(xué)西門子變頻器應(yīng)用
- 可重入生產(chǎn)系統(tǒng)的多尺度建模與控制策略研究
- 看圖學(xué)電腦上網(wǎng)
- Flex3.0 RIA開發(fā)詳解
- C51單片機(jī)編程與應(yīng)用