- Hands-On Q-Learning with Python
- Nazia Habib
- 133字
- 2021-06-24 15:13:17
MABP – a classic exploration versus exploitation problem
Several MABP environments have been created for OpenAI Gym, and they are well worth exploring for a clearer picture of how the problem works. We will not be solving a bandit problem from scratch with the code in this book, but we will go into some solutions in detail and discuss their relevance to epsilon decay strategies.
The main thing to bear in mind when solving any bandit problem is that we are always trying to discover the optimal outcome in a system by balancing our need to both explore and exploit our knowledge of our environment. Effectively, we are learning as we go and we are taking advantage of the knowledge that we already have in the process of gaining new knowledge.
推薦閱讀
- 課課通計(jì)算機(jī)原理
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- 模型制作
- 樂高創(chuàng)意機(jī)器人教程(中級(jí) 下冊(cè) 10~16歲) (青少年iCAN+創(chuàng)新創(chuàng)意實(shí)踐指導(dǎo)叢書)
- Hybrid Cloud for Architects
- 計(jì)算機(jī)網(wǎng)絡(luò)安全
- 會(huì)聲會(huì)影X4中文版從入門到精通
- Mastering Exploratory Analysis with pandas
- 經(jīng)典Java EE企業(yè)應(yīng)用實(shí)戰(zhàn)
- Mastering Text Mining with R
- 計(jì)算機(jī)應(yīng)用基礎(chǔ)實(shí)訓(xùn)(職業(yè)模塊)
- Learning Cassandra for Administrators
- PVCBOT零基礎(chǔ)機(jī)器人制作(第2版)
- Orange'S:一個(gè)操作系統(tǒng)的實(shí)現(xiàn)
- 博弈論與無線傳感器網(wǎng)絡(luò)安全