官术网_书友最值得收藏!

MABP – a classic exploration versus exploitation problem

Several MABP environments have been created for OpenAI Gym, and they are well worth exploring for a clearer picture of how the problem works. We will not be solving a bandit problem from scratch with the code in this book, but we will go into some solutions in detail and discuss their relevance to epsilon decay strategies.

The main thing to bear in mind when solving any bandit problem is that we are always trying to discover the optimal outcome in a system by balancing our need to both explore and exploit our knowledge of our environment. Effectively, we are learning as we go and we are taking advantage of the knowledge that we already have in the process of gaining new knowledge. 

主站蜘蛛池模板: 景洪市| 五台县| 宁波市| 澄城县| 寿阳县| 富源县| 班玛县| 拜泉县| 获嘉县| 图木舒克市| 莱州市| 舞钢市| 嵊泗县| 中西区| 乐亭县| 东乌| 万年县| 隆化县| 长顺县| 宜春市| 河南省| 仁化县| 疏附县| 涟源市| 芒康县| 通河县| 外汇| 江山市| 封丘县| 金山区| 湟源县| 东宁县| 揭西县| 牙克石市| 额敏县| 霍林郭勒市| 介休市| 西青区| 河池市| 连云港市| 盐山县|