真人麻将一元一分

書名：統計策略搜索強化學習方法及應用
作者名：趙婷婷
本章字數： 1266字
更新時間： 2021-10-29 12:05:24

參考文獻

[1] Schacter,D.,Gilbert,D.,Wegner,D.,et al.Psychology:European Edition[J].Worth Publishers,2011.

[2] Mitchell,T.M..The Discipline of Machine Learning[R].Technical Report CMU ML-06108,2006.

[3] Murphy,K.P..Machine Learning:A Probabilistic Perspective[M].MIT Press,Cambridge,MA,2012.

[4] Bishop,C.M..Pattern Recognition and Machine Learning (Information Science and Statistics)[M].Secaucus,NJ,USA:Springer-Verlag New York,Inc.,2006.

[5] Sutton,R.S.,Ba Rto,A.G..Reinforcement Learning:An Introduction[J].IEEE Transactions on Neural Networks,1998,9(5):1054.

[6] Kaelbling,L.P.,Littman,M.L.and Moore,A.W..Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.

[7] Poole,D.,Mackworth,A.K..Artificial Intelligence:Foundations of Computational Agents[M].Cambridge University Press,2010.

[8] Kirk,D.E..Optimal Control Theory:An Introduction[J].Positively Aware the Monthly Journal of the Test Positive Aware Network,2004,23(2):13-5.

[9] Bertsekas,D.P..Dynamic Programming and Optimal Control:2nd Edition[J].Athena Scientific,1995.

[10] Sutton,R.S.,Barto,A.G.,and Williams,R.J..Reinforcement Learning is Direct Adaptive Optimal Control[J].IEEE Control Systems Magazine,1992,12(2):19-22.

[11] Busoniu,L.R.,Babu?ka,R.,Schutter,B.D.,et al.Reinforcement Learning and Dynamic Programming Using Function Approximators[M].CRC Press,Inc,2010.

[12]陳春林.基于強化學習的移動機器人自主學習及導航控制[D].合肥：中國科學技術大學，2006.

[13] Peters,J.,Schaal,S..Policy gradient methods for robotics[C].In Proceedings of the IEEE/RSJ International Conferece on Intelligent Robots and Systems,2006:2219-2225.

[14] Tesauro,G..TD-Gammon,a Self-Teaching Backgammon Program,Achieves Master-Level Play[J].Neural Computation,1944,6(2):215-219.

[15] Abe,N.,Kowalczyk,M.,Domick,M.,et al.Optimizing Debt Collections Using Constrained Reinforcement Learning[C].16th ACM SGKDD Conference on Knowledge Discovery and Data Mining,2010:75.

[16] Williams,J.D.,Young,S..Partially Observable Markov Decision Processes for Spoken Dialog Systems[J].Computer Speech and Language,2007,21(2):393-422.

[17]李瓊，郭御風，蔣艷凰.基于強化學習的智能 I/O 調度算法[J].計算機工程與科學，2010（7）：58-61.

[18]張水平.在策略強化學習算法在互聯電網 AGC 最優控制中的應用[D].廣州：華南理工大學，2013.

[19]劉智勇，馬鳳偉.城市交通信號的在線強化學習控制[C].第26屆中國控制會議，2007.

[20]祖麗楠.多機器人系統自主協作控制與強化學習研究[D].長春：吉林大學，2007.

[21]陳鑫，魏海軍，吳敏，等.基于高斯回歸的連續空間多智能體跟蹤學習[J].自動化學報，2013，39（012）：2021-2031.

[22] Lee,D.,Choi,M.,and Bang,H..Model-Free Linear Quadratic Tracking Control for Unmanned Helicopters Using Reinforcement Learning[C].5th International Conference on Automation,Robotics and Applications (ICARA),2011.

[23] Valasek,J.,Doebbler,J.,Tandale,M.D.,et al.Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles[J].IEEE Transactions on Systems Man and Cybernetics Part B,2008,38(4):1014-1020.

[24] Crespo,A.,Li,W.,and Timoszczuk,A.P..ATFM Computational Agent Based on Reinforcement Learning Aggregating Human Expert Experience[C].Integrated and Sustainable Transportation System IEEE,2011.

[25] Xie,N.,Hachiya,H.,and Sugiyama,M..Artist Agent:A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting[J].IEICE Transactions on Information and Systems,2012,E96D(5).

[26] Silver,D.,Huang,A.,Maddison,C.J.,et al.Mastering the Game of Go with Deep Neural Networks and Tree Search[J].Nature,2016,529 (7587):484-489.

[27] Thrun,S.,Burgard,W.,and Fox,D..Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)[M].The MIT Press,2005.

[28] Kober,J.,Bagnell,J.A.,and Peters,J..Reinforcement Learning in Robotics:A Survey[J].International Journal of Robotics Research,2013.

[29] Deisenroth,M.P.,Neumann,G.,and Peters,J.R..A Survey on Policy Search for Robotics[J].Foundations and Trends in Robotics,2013,2(1-2):1-142,.

[30] Cheng,G.,Hyon,S.H.,Morimoto,J.,et al.CB:A Humanoid Research Platform for Exploring NeuroScience[J].Advanced Robotics,2007,21(10):1097-1114..

[31] Watkins,C.,Dayan,P..Q-learning[J].Machine Learning,1992,8(3-4):279-292.

[32] Sutton,R.S..Learning to Predict by the Methods of Temporal Differences[J].Machine Learning,1988,3(1):9-44.

[33] Rummery,G.A.,Niranjan,M..On-Line Q-Learning Using Connectionist Systems[J].Technical Report,1994.

[34]高陽，陳世福，陸鑫.強化學習研究綜述[J].自動化學報，2004，30（001）：86-100.

[35]蔣國飛，高慧琪，吳滄浦.Q 學習算法中網格離散化方法的收斂性分析[J].控制理論與應用，1999，16（002）：194-198.

[36]蔣國飛，吳滄浦.基于 Q 學習算法和 BP 神經網絡的倒立擺控制[J].自動化學報，1998，24（005）：662-666.

[37] Lagoudakis,M.G.,Parr,R..Least-Squares Policy Iteration[J].Journal of Machine Learning Research,2003,4(6):1107-1149.

[38]陳興國.基于值函數估計的強化學習算法研究[D].南京：南京大學，2013.

[39] Sugiyama,M.,Hachiya,H.,Towell,C.,et al.Geodesic Gaussian Kernels for Value Function Approximation[J].Autonomous Robots,2008,25(3):287-304.

[40] Hachiya,H.,Akiyama,T.,Sugiayma,M.,et al.Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning[J].Neural Networks,2009,22(10):1399-1410.

[41] Akiyama,T.,Hachiya,H.,Sugiyama,M..Efficient exploration through active learning for value function approximation in reinforcement learning[J].Neural Networks,23(5):639-648,2010.

[42] Sugiyama,M.,Hachiya,H.,Kashima,H.,et al.Least Absolute Policy Iteration--A Robust Approach to Value Function Approximation[J].IEICE Transactions on Information and Systems,2010,93(9):2555-2565.

[43] S Schaal,S.,Peters,J.,Nakanishi,J.,et al.Learning Movement Primitives[J].Springer Tracts in Advanced Robotics.Ciena,Italy:Springer,2004.

[44] Bagnell,J.A.,Schneider,J.G..Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods[C].IEEE International Conference on Robotics and Automation,2001.

[45] Kober,J.,Peters,J..Policy Search for Motor Primitives in Robotics[J].Machine Learning,2011,84(1):171-203.

[46] Ng,A.Y.,Kim,H.J.,Jordan,M.I.,et al.Autonomous Helicopter Flight Via Reinforcement Learning[J].Advances in Neural Information Processing Systems,2004,16.

[47] Ng,Y.,Jordan,M..PEGASUS:A policy search method for large MDPs and POMDPs[C].In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence,2000,406-415.

[48] Sehnke,F.,Osendorfer,C.,Thomas Rückstie,et al.Parameter-exploring policy gradients[J].Neural Networks,2010,23(4):551-559.

[49] Williams,R.J..Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning[J].Machine Learning,1992,8(3-4):229-256.

[50] Kakade,S..A Natural Policy Gradient[J].Advances in Neural Information Processing Systems(NIPS),2002.

[51] Dayan,Peter,Hinton,et al.Using Expectation-maximization for Reinforcement Learning[J].Neural Computation,1997,9(2):271-278

[52] Peters,J.,Schaal,S..Natural Actor-Critic[J].Neurocomputing,2008,71(7-9):1180-1190.

[53] Barto,A.G.,Mahadevan,S..Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems,2003,13(1-2):341-379.

[54]周文吉，俞揚.分層強化學習綜述[J].智能系統學報，2017，12（5）：590-594.

[55]杜威，丁世飛.多智能體強化學習綜述[J].計算機科學，2019，46（8）：1-8.

[56]劉全，翟建偉，章宗長，等.深度強化學習綜述[J].計算機學報，2018，041（1）：1-27.

[57]趙冬斌，邵坤，朱圓恒，等.深度強化學習綜述：兼論計算機圍棋的發展[J].控制理論與應用，2016，33（6）：701-717.

[58] Osa,T.,Pajarinen,J.,Neumann,G.,et al.An Algorithmic Perspective on Imitation Learning[J].Foundations and Trends in Robotics,2018,7(1-2):1-179.

[59] Sermanet,P.,Xu,K.,and Levine,S..Unsupervised Perceptual Rewards for Imitation Learning[J].arXiv preprint arXiv:1612.06699,2016.

[60] Maeda,G.J.,Neumann,G.,Ewerton,M.,et al.Probabilistic Movement Primitives for Coordination of Multiple Human-robot Collaborative Tasks[J].Autonomous Robots,2017,41(3):593-612.

[61]張凱峰，俞揚.基于逆強化學習的示教學習方法綜述[J].計算機研究與發展，2019，56（2）：254-261.

[62]李帥龍，張會文，周維佳.模仿學習方法綜述及其在機器人領域的應用[J].計算機工程與應用，2019，55（04）：22-35.

[63] Pan,S.J.,Yang,Q..A Survey on Transfer Learning[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(10):1345-1359.

[64]王皓，高陽，陳興國.強化學習中的遷移：方法和進展[J].電子學報，2008，36（S1）：39-43.

[65] Finn,C.,Abbeel,and P.,Levine,S..Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C].In Proceedings of the 34th International Conference on Machine Learning,2017:1126-1135.

[66] Todorov,E.,Erez,T.and Tassa,Y..MuJoCo:A Physics Engine for Model-based Control[C],2012 IEEE/RSJ International Conference on Intelligent Robots and Systems,2012,5026-5033.

官术网_书友最值得收藏!

統計策略搜索強化學習方法及應用

參考文獻