舉報

會員
Reinforcement Learning with TensorFlow
IfyouwanttogetstartedwithreinforcementlearningusingTensorFlowinthemostpracticalway,thisbookwillbeausefulresource.Thebookassumespriorknowledgeofmachinelearningandneuralnetworkprogrammingconcepts,aswellassomeunderstandingoftheTensorFlowframework.NopreviousexperiencewithReinforcementLearningisrequired.
目錄(254章)
倒序
- 封面
- Title Page
- Copyright and Credits
- Reinforcement Learning with TensorFlow
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Deep Learning – Architectures and Frameworks
- Deep learning
- Activation functions for deep learning
- The sigmoid function
- The tanh function
- The softmax function
- The rectified linear unit function
- How to choose the right activation function
- Logistic regression as a neural network
- Notation
- Objective
- The cost function
- The gradient descent algorithm
- The computational graph
- Steps to solve logistic regression using gradient descent
- What is xavier initialization?
- Why do we use xavier initialization?
- The neural network model
- Recurrent neural networks
- Long Short Term Memory Networks
- Convolutional neural networks
- The LeNet-5 convolutional neural network
- The AlexNet model
- The VGG-Net model
- The Inception model
- Limitations of deep learning
- The vanishing gradient problem
- The exploding gradient problem
- Overcoming the limitations of deep learning
- Reinforcement learning
- Basic terminologies and conventions
- Optimality criteria
- The value function for optimality
- The policy model for optimality
- The Q-learning approach to reinforcement learning
- Asynchronous advantage actor-critic
- Introduction to TensorFlow and OpenAI Gym
- Basic computations in TensorFlow
- An introduction to OpenAI Gym
- The pioneers and breakthroughs in reinforcement learning
- David Silver
- Pieter Abbeel
- Google DeepMind
- The AlphaGo program
- Libratus
- Summary
- Training Reinforcement Learning Agents Using OpenAI Gym
- The OpenAI Gym
- Understanding an OpenAI Gym environment
- Programming an agent using an OpenAI Gym environment
- Q-Learning
- The Epsilon-Greedy approach
- Using the Q-Network for real-world applications
- Summary
- Markov Decision Process
- Markov decision processes
- The Markov property
- The S state set
- Actions
- Transition model
- Rewards
- Policy
- The sequence of rewards - assumptions
- The infinite horizons
- Utility of sequences
- The Bellman equations
- Solving the Bellman equation to find policies
- An example of value iteration using the Bellman equation
- Policy iteration
- Partially observable Markov decision processes
- State estimation
- Value iteration in POMDPs
- Training the FrozenLake-v0 environment using MDP
- Summary
- Policy Gradients
- The policy optimization method
- Why policy optimization methods?
- Why stochastic policy?
- Example 1 - rock paper scissors
- Example 2 - state aliased grid-world
- Policy objective functions
- Policy Gradient Theorem
- Temporal difference rule
- TD(1) rule
- TD(0) rule
- TD() rule
- Policy gradients
- The Monte Carlo policy gradient
- Actor-critic algorithms
- Using a baseline to reduce variance
- Vanilla policy gradient
- Agent learning pong using policy gradients
- Summary
- Q-Learning and Deep Q-Networks
- Why reinforcement learning?
- Model based learning and model free learning
- Monte Carlo learning
- Temporal difference learning
- On-policy and off-policy learning
- Q-learning
- The exploration exploitation dilemma
- Q-learning for the mountain car problem in OpenAI gym
- Deep Q-networks
- Using a convolution neural network instead of a single layer neural network
- Use of experience replay
- Separate target network to compute the target Q-values
- Advancements in deep Q-networks and beyond
- Double DQN
- Dueling DQN
- Deep Q-network for mountain car problem in OpenAI gym
- Deep Q-network for Cartpole problem in OpenAI gym
- Deep Q-network for Atari Breakout in OpenAI gym
- The Monte Carlo tree search algorithm
- Minimax and game trees
- The Monte Carlo Tree Search
- The SARSA algorithm
- SARSA algorithm for mountain car problem in OpenAI gym
- Summary
- Asynchronous Methods
- Why asynchronous methods?
- Asynchronous one-step Q-learning
- Asynchronous one-step SARSA
- Asynchronous n-step Q-learning
- Asynchronous advantage actor critic
- A3C for Pong-v0 in OpenAI gym
- Summary
- Robo Everything – Real Strategy Gaming
- Real-time strategy games
- Reinforcement learning and other approaches
- Online case-based planning
- Drawbacks to real-time strategy games
- Why reinforcement learning?
- Reinforcement learning in RTS gaming
- Deep autoencoder
- How is reinforcement learning better?
- Summary
- AlphaGo – Reinforcement Learning at Its Best
- What is Go?
- Go versus chess
- How did DeepBlue defeat Gary Kasparov?
- Why is the game tree approach no good for Go?
- AlphaGo – mastering Go
- Monte Carlo Tree Search
- Architecture and properties of AlphaGo
- Energy consumption analysis – Lee Sedol versus AlphaGo
- AlphaGo Zero
- Architecture and properties of AlphaGo Zero
- Training process in AlphaGo Zero
- Summary
- Reinforcement Learning in Autonomous Driving
- Machine learning for autonomous driving
- Reinforcement learning for autonomous driving
- Creating autonomous driving agents
- Why reinforcement learning ?
- Proposed frameworks for autonomous driving
- Spatial aggregation
- Sensor fusion
- Spatial features
- Recurrent temporal aggregation
- Planning
- DeepTraffic – MIT simulator for autonomous driving
- Summary
- Financial Portfolio Management
- Introduction
- Problem definition
- Data preparation
- Reinforcement learning
- Further improvements
- Summary
- Reinforcement Learning in Robotics
- Reinforcement learning in robotics
- Evolution of reinforcement learning
- Challenges in robot reinforcement learning
- High dimensionality problem
- Real-world challenges
- Issues due to model uncertainty
- What's the final objective a robot wants to achieve?
- Open questions and practical challenges
- Open questions
- Practical challenges for robotic reinforcement learning
- Key takeaways
- Summary
- Deep Reinforcement Learning in Ad Tech
- Computational advertising challenges and bidding strategies
- Business models used in advertising
- Sponsored-search advertisements
- Search-advertisement management
- Adwords
- Bidding strategies of advertisers
- Real-time bidding by reinforcement learning in display advertising
- Summary
- Reinforcement Learning in Image Processing
- Hierarchical object detection with deep reinforcement learning
- Related works
- Region-based convolution neural networks
- Spatial pyramid pooling networks
- Fast R-CNN
- Faster R-CNN
- You Look Only Once
- Single Shot Detector
- Hierarchical object detection model
- State
- Actions
- Reward
- Model and training
- Training specifics
- Summary
- Deep Reinforcement Learning in NLP
- Text summarization
- Deep reinforced model for Abstractive Summarization
- Neural intra-attention model
- Intra-temporal attention on input sequence while decoding
- Intra-decoder attention
- Token generation and pointer
- Hybrid learning objective
- Supervised learning with teacher forcing
- Policy learning
- Mixed training objective function
- Text question answering
- Mixed objective and deep residual coattention for Question Answering
- Deep residual coattention encoder
- Mixed objective using self-critical policy learning
- Summary
- Further topics in Reinforcement Learning
- Continuous action space algorithms
- Trust region policy optimization
- Deterministic policy gradients
- Scoring mechanism in sequential models in NLP
- BLEU
- What is BLEU score and what does it do?
- ROUGE
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-08-27 18:52:42
推薦閱讀
- Mastering Matplotlib 2.x
- R Data Mining
- 走入IBM小型機世界
- Visual C++編程全能詞典
- 大數據驅動的設備健康預測及維護決策優化
- 影視后期編輯與合成
- Blender 3D Printing by Example
- LAMP網站開發黃金組合Linux+Apache+MySQL+PHP
- Deep Reinforcement Learning Hands-On
- 嵌入式GUI開發設計
- 大數據案例精析
- 筆記本電腦電路分析與故障診斷
- 典型Hadoop云計算
- Natural Language Processing and Computational Linguistics
- Machine Learning in Java
- 互聯網單元測試及實踐
- NetSuite ERP for Administrators
- 白話機器學習算法
- Azure Serverless Computing Cookbook
- 單片機原理、應用與PROTEUS仿真
- 隨機分布控制系統的故障診斷與容錯控制
- 關節故障空間機械臂容錯運動控制技術
- 動畫制作
- Implementing Azure Cloud Design Patterns
- Photoshop修圖實用速查通典
- 數據處理與深度學習
- 對抗機器學習:機器學習系統中的攻擊和防御
- Presto實戰
- Flash動畫設計
- 多變量過程智能優化辨識理論及應用