官术网_书友最值得收藏!

參考文獻

[1] Schacter,D.,Gilbert,D.,Wegner,D.,et al.Psychology:European Edition[J].Worth Publishers,2011.

[2] Mitchell,T.M..The Discipline of Machine Learning[R].Technical Report CMU ML-06108,2006.

[3] Murphy,K.P..Machine Learning:A Probabilistic Perspective[M].MIT Press,Cambridge,MA,2012.

[4] Bishop,C.M..Pattern Recognition and Machine Learning (Information Science and Statistics)[M].Secaucus,NJ,USA:Springer-Verlag New York,Inc.,2006.

[5] Sutton,R.S.,Ba Rto,A.G..Reinforcement Learning:An Introduction[J].IEEE Transactions on Neural Networks,1998,9(5):1054.

[6] Kaelbling,L.P.,Littman,M.L.and Moore,A.W..Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.

[7] Poole,D.,Mackworth,A.K..Artificial Intelligence:Foundations of Computational Agents[M].Cambridge University Press,2010.

[8] Kirk,D.E..Optimal Control Theory:An Introduction[J].Positively Aware the Monthly Journal of the Test Positive Aware Network,2004,23(2):13-5.

[9] Bertsekas,D.P..Dynamic Programming and Optimal Control:2nd Edition[J].Athena Scientific,1995.

[10] Sutton,R.S.,Barto,A.G.,and Williams,R.J..Reinforcement Learning is Direct Adaptive Optimal Control[J].IEEE Control Systems Magazine,1992,12(2):19-22.

[11] Busoniu,L.R.,Babu?ka,R.,Schutter,B.D.,et al.Reinforcement Learning and Dynamic Programming Using Function Approximators[M].CRC Press,Inc,2010.

[12]陳春林.基于強化學習的移動機器人自主學習及導航控制[D].合肥:中國科學技術大學,2006.

[13] Peters,J.,Schaal,S..Policy gradient methods for robotics[C].In Proceedings of the IEEE/RSJ International Conferece on Intelligent Robots and Systems,2006:2219-2225.

[14] Tesauro,G..TD-Gammon,a Self-Teaching Backgammon Program,Achieves Master-Level Play[J].Neural Computation,1944,6(2):215-219.

[15] Abe,N.,Kowalczyk,M.,Domick,M.,et al.Optimizing Debt Collections Using Constrained Reinforcement Learning[C].16th ACM SGKDD Conference on Knowledge Discovery and Data Mining,2010:75.

[16] Williams,J.D.,Young,S..Partially Observable Markov Decision Processes for Spoken Dialog Systems[J].Computer Speech and Language,2007,21(2):393-422.

[17]李瓊,郭御風,蔣艷凰.基于強化學習的智能 I/O 調度算法[J].計算機工程與科學,2010(7):58-61.

[18]張水平.在策略強化學習算法在互聯電網 AGC 最優控制中的應用[D].廣州:華南理工大學,2013.

[19]劉智勇,馬鳳偉.城市交通信號的在線強化學習控制[C].第26屆中國控制會議,2007.

[20]祖麗楠.多機器人系統自主協作控制與強化學習研究[D].長春:吉林大學,2007.

[21]陳鑫,魏海軍,吳敏,等.基于高斯回歸的連續空間多智能體跟蹤學習[J].自動化學報,2013,39(012):2021-2031.

[22] Lee,D.,Choi,M.,and Bang,H..Model-Free Linear Quadratic Tracking Control for Unmanned Helicopters Using Reinforcement Learning[C].5th International Conference on Automation,Robotics and Applications (ICARA),2011.

[23] Valasek,J.,Doebbler,J.,Tandale,M.D.,et al.Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles[J].IEEE Transactions on Systems Man and Cybernetics Part B,2008,38(4):1014-1020.

[24] Crespo,A.,Li,W.,and Timoszczuk,A.P..ATFM Computational Agent Based on Reinforcement Learning Aggregating Human Expert Experience[C].Integrated and Sustainable Transportation System IEEE,2011.

[25] Xie,N.,Hachiya,H.,and Sugiyama,M..Artist Agent:A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting[J].IEICE Transactions on Information and Systems,2012,E96D(5).

[26] Silver,D.,Huang,A.,Maddison,C.J.,et al.Mastering the Game of Go with Deep Neural Networks and Tree Search[J].Nature,2016,529 (7587):484-489.

[27] Thrun,S.,Burgard,W.,and Fox,D..Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)[M].The MIT Press,2005.

[28] Kober,J.,Bagnell,J.A.,and Peters,J..Reinforcement Learning in Robotics:A Survey[J].International Journal of Robotics Research,2013.

[29] Deisenroth,M.P.,Neumann,G.,and Peters,J.R..A Survey on Policy Search for Robotics[J].Foundations and Trends in Robotics,2013,2(1-2):1-142,.

[30] Cheng,G.,Hyon,S.H.,Morimoto,J.,et al.CB:A Humanoid Research Platform for Exploring NeuroScience[J].Advanced Robotics,2007,21(10):1097-1114..

[31] Watkins,C.,Dayan,P..Q-learning[J].Machine Learning,1992,8(3-4):279-292.

[32] Sutton,R.S..Learning to Predict by the Methods of Temporal Differences[J].Machine Learning,1988,3(1):9-44.

[33] Rummery,G.A.,Niranjan,M..On-Line Q-Learning Using Connectionist Systems[J].Technical Report,1994.

[34]高陽,陳世福,陸鑫.強化學習研究綜述[J].自動化學報,2004,30(001):86-100.

[35]蔣國飛,高慧琪,吳滄浦.Q 學習算法中網格離散化方法的收斂性分析[J].控制理論與應用,1999,16(002):194-198.

[36]蔣國飛,吳滄浦.基于 Q 學習算法和 BP 神經網絡的倒立擺控制[J].自動化學報,1998,24(005):662-666.

[37] Lagoudakis,M.G.,Parr,R..Least-Squares Policy Iteration[J].Journal of Machine Learning Research,2003,4(6):1107-1149.

[38]陳興國.基于值函數估計的強化學習算法研究[D].南京:南京大學,2013.

[39] Sugiyama,M.,Hachiya,H.,Towell,C.,et al.Geodesic Gaussian Kernels for Value Function Approximation[J].Autonomous Robots,2008,25(3):287-304.

[40] Hachiya,H.,Akiyama,T.,Sugiayma,M.,et al.Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning[J].Neural Networks,2009,22(10):1399-1410.

[41] Akiyama,T.,Hachiya,H.,Sugiyama,M..Efficient exploration through active learning for value function approximation in reinforcement learning[J].Neural Networks,23(5):639-648,2010.

[42] Sugiyama,M.,Hachiya,H.,Kashima,H.,et al.Least Absolute Policy Iteration--A Robust Approach to Value Function Approximation[J].IEICE Transactions on Information and Systems,2010,93(9):2555-2565.

[43] S Schaal,S.,Peters,J.,Nakanishi,J.,et al.Learning Movement Primitives[J].Springer Tracts in Advanced Robotics.Ciena,Italy:Springer,2004.

[44] Bagnell,J.A.,Schneider,J.G..Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods[C].IEEE International Conference on Robotics and Automation,2001.

[45] Kober,J.,Peters,J..Policy Search for Motor Primitives in Robotics[J].Machine Learning,2011,84(1):171-203.

[46] Ng,A.Y.,Kim,H.J.,Jordan,M.I.,et al.Autonomous Helicopter Flight Via Reinforcement Learning[J].Advances in Neural Information Processing Systems,2004,16.

[47] Ng,Y.,Jordan,M..PEGASUS:A policy search method for large MDPs and POMDPs[C].In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence,2000,406-415.

[48] Sehnke,F.,Osendorfer,C.,Thomas Rückstie,et al.Parameter-exploring policy gradients[J].Neural Networks,2010,23(4):551-559.

[49] Williams,R.J..Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning[J].Machine Learning,1992,8(3-4):229-256.

[50] Kakade,S..A Natural Policy Gradient[J].Advances in Neural Information Processing Systems(NIPS),2002.

[51] Dayan,Peter,Hinton,et al.Using Expectation-maximization for Reinforcement Learning[J].Neural Computation,1997,9(2):271-278

[52] Peters,J.,Schaal,S..Natural Actor-Critic[J].Neurocomputing,2008,71(7-9):1180-1190.

[53] Barto,A.G.,Mahadevan,S..Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems,2003,13(1-2):341-379.

[54]周文吉,俞揚.分層強化學習綜述[J].智能系統學報,2017,12(5):590-594.

[55]杜威,丁世飛.多智能體強化學習綜述[J].計算機科學,2019,46(8):1-8.

[56]劉全,翟建偉,章宗長,等.深度強化學習綜述[J].計算機學報,2018,041(1):1-27.

[57]趙冬斌,邵坤,朱圓恒,等.深度強化學習綜述:兼論計算機圍棋的發展[J].控制理論與應用,2016,33(6):701-717.

[58] Osa,T.,Pajarinen,J.,Neumann,G.,et al.An Algorithmic Perspective on Imitation Learning[J].Foundations and Trends in Robotics,2018,7(1-2):1-179.

[59] Sermanet,P.,Xu,K.,and Levine,S..Unsupervised Perceptual Rewards for Imitation Learning[J].arXiv preprint arXiv:1612.06699,2016.

[60] Maeda,G.J.,Neumann,G.,Ewerton,M.,et al.Probabilistic Movement Primitives for Coordination of Multiple Human-robot Collaborative Tasks[J].Autonomous Robots,2017,41(3):593-612.

[61]張凱峰,俞揚.基于逆強化學習的示教學習方法綜述[J].計算機研究與發展,2019,56(2):254-261.

[62]李帥龍,張會文,周維佳.模仿學習方法綜述及其在機器人領域的應用[J].計算機工程與應用,2019,55(04):22-35.

[63] Pan,S.J.,Yang,Q..A Survey on Transfer Learning[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(10):1345-1359.

[64]王皓,高陽,陳興國.強化學習中的遷移:方法和進展[J].電子學報,2008,36(S1):39-43.

[65] Finn,C.,Abbeel,and P.,Levine,S..Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C].In Proceedings of the 34th International Conference on Machine Learning,2017:1126-1135.

[66] Todorov,E.,Erez,T.and Tassa,Y..MuJoCo:A Physics Engine for Model-based Control[C],2012 IEEE/RSJ International Conference on Intelligent Robots and Systems,2012,5026-5033.

主站蜘蛛池模板: 正镶白旗| 略阳县| 枣强县| 博罗县| 宝应县| 济源市| 兴宁市| 民县| 遂川县| 金沙县| 土默特左旗| 五指山市| 什邡市| 青冈县| 遂昌县| 绥阳县| 湖州市| 磐安县| 阜阳市| 和林格尔县| 怀柔区| 花莲县| 曲阳县| 灵山县| 沾化县| 呼和浩特市| 聂拉木县| 鞍山市| 镇平县| 大同县| 腾冲县| 鄂尔多斯市| 云霄县| 凤冈县| 城固县| 黄浦区| 岳普湖县| 酉阳| 双柏县| 宝应县| 应城市|