[1] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018.
[2] LI Y. Deep reinforcement learning: an overview[J]. arXiv:1701.07274, 2017.
[3] NOAEEN M, NAIK A, GOODMAN L, et al. Reinforcement learning in urban network traffic signal control: a systematic literature review[J]. Expert Systems with Applications, 2022, 199: 116830.
[4] ABDOOS M, BAZZAN A L C. Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory[J]. Expert Systems with Applications, 2021, 171: 114580.
[5] 肖硕, 黄珍珍, 张国鹏, 等. 基于SAC的多智能体深度强化学习算法[J]. 电子学报, 2021, 49(9): 1675-1681.
XIAO S, HUANG Z Z, ZHANG G P, et al. Deep reinforcement learning algorithm of multi-agent based on SAC[J]. Acta Electronica Sinica, 2021, 49(9): 1675-1681.
[6] BRUNKE L, GREEFF M, HALL A W, et al. Safe learning in robotics: from learning-based control to safe reinforcement learning[J]. arXiv:2108.06266, 2021.
[7] SINGH B, KUMAR R, SINGH V P. Reinforcement learning in robotic applications: a comprehensive survey[J]. Artificial Intelligence Review, 2022, 55(2): 945-990.
[8] MIAO Y, BLUNSOM P, SPECIA L. A generative framework for simultaneous machine translation[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 6697-6706.
[9] CHEN M, LIU W, WANG T, et al. A game-based deep reinforcement learning approach for energy-efficient computation in MEC systems[J]. Knowledge-Based Systems, 2022, 235: 107660.
[10] KIRAN B R, SOBH I, TALPAERT V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(6): 4909-4926.
[11] CHEN J, LI S E, TOMIZUKA M. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 5068-5078.
[12] KUMARI A, TANWAR S. A reinforcement-learning-based secure demand response scheme for smart grid system[J]. IEEE Internet of Things Journal, 2021, 9(3): 2180-2191.
[13] 郭宪. 深入浅出强化学习: 原理入门[M]. 北京: 电子工业出版社, 2018.
GUO X. Head first reinforcement learning: an introduction to the principles[M]. Beijing: Electronic Industry Press, 2018.
[14] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[15] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the 2016 AAAI Conference on Artificial Intelligence, Phoenix, Feb 12-17, 2016. Menlo Park: AAAI, 2016: 2094-2100.
[16] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[C]//Proceedings of the 4th International Conference on Learning Representations, San Juan, May 2-4, 2016.
[17] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Lear-ning, New York, Jun 19-24, 2016: 1995-2003.
[18] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3): 229-256.
[19] PETERS J, SCHAAL S. Policy gradient methods for robotics[C]//Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, Oct 9-15, 2006. Piscataway: IEEE, 2006: 2219-2225.
[20] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2000: 1008-1014.
[21] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv:1509. 02971, 2015.
[22] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 1928-1937.
[23] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017.
[24] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholm, Jul 10-15, 2018: 1856-1865.
[25] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 3215-3222.
[26] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
[27] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with large scale deep reinforcement learning[J]. arXiv:1912.06680, 2019.
[28] HEESS N, TB D, SRIRAM S, et al. Emergence of locomotion behaviours in rich environments[J]. arXiv:1707.02286, 2017.
[29] HA D, SCHMIDHUBER J. World models[J]. arXiv:1803. 10122, 2018.
[30] WATTER M, SPRINGENBERG J T, BOEDECKER J, et al. Embed to control: a locally linear latent dynamics model for control from raw images[C]//Advances in Neural Information Processing Systems 28, Montreal, Dec 7-12, 2015: 2746-2754.
[31] LIU X, ZHANG F, HOU Z, et al. Self-supervised learning: generative or contrastive[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 35(1): 857-876.
[32] QI C, ZHU Y, SONG C, et al. Self-supervised reinforcement learning-based energy management for a hybrid electric vehicle[J]. Journal of Power Sources, 2021, 514: 230584.
[33] BANIJAMALI E, SHU R, BUI H, et al. Robust locally-linear controllable embedding[C]//Proceedings of the 2018 International Conference on Artificial Intelligence and Statistics, Playa Blanca, Apr 9-11, 2018: 1751-1759.
[34] KINGMA D P, WELLING M. Auto-encoding variational Bayes[J]. arXiv:1312.6114, 2013.
[35] DOERSCH C. Tutorial on variational autoencoders[J]. arXiv: 1606.05908, 2016.
[36] HAFNER D, LILLICRAP T, FISCHER I, et al. Learning latent dynamics for planning from pixels[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2555-2565.
[37] HAFNER D, LILLICRAP T, BA J, et al. Dream to control: learning behaviors by latent imagination[J]. arXiv:1912.01603, 2019.
[38] HAFNER D, LILLICRAP T, NOROUZI M, et al. Mastering Atari with discrete world models[J]. arXiv:2010.02193, 2020.
[39] 林景栋, 吴欣怡, 柴毅, 等. 卷积神经网络结构优化综述[J]. 自动化学报, 2020, 46(1): 24-37.
LIN J D, WU X Y, CHAI Y, et al. A review on structural optimization of convolutional neural networks[J]. Acta Automatica Sinica, 2020, 46(1): 24-37.
[40] GELADA C, KUMAR S, BUCKMAN J, et al. DeepMDP: learning continuous latent space models for representation learning[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2170-2179.
[41] HINDERER K. Lipschitz continuity of value functions in Markovian decision processes[J]. Mathematical Methods of Operations Research, 2005, 62: 3-22.
[42] ASADI K, MISRA D, LITTMAN M. Lipschitz continuity in model-based reinforcement learning[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholm, Jul 10-15, 2018: 264-273.
[43] ZHANG A, MCALLISTER R T, CALANDRA R, et al. Learning invariant representations for reinforcement learning without reconstruction[C]//Proceedings of the 9th International Conference on Learning Representations, May 3-7, 2021.
[44] DULAC-ARNOLD G, DENOYER L, PREUX P, et al. Fast reinforcement learning with large action sets using error-correcting output codes for MDP factorization[C]//Proceedings of the 2012 Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bristol, Sep 24-28, 2012. Berlin, Heidelberg: Springer, 2012: 180-194.
[45] DIETTERICH T G, BAKIRI G. Solving multiclass learning problems via error-correcting output codes[J]. Journal of Artificial Intelligence Research, 1994, 2: 263-286.
[46] LAGOUDAKIS M G, PARR R. Reinforcement learning as classification: leveraging modern classifiers[C]//Proceedings of the 20th International Conference on Machine Lear-ning. Menlo Park: AAAI, 2003: 424-431.
[47] DULAC-ARNOLD G, EVANS R, VAN HASSELT H, et al. Deep reinforcement learning in large discrete action spaces[J]. arXiv:1512.07679, 2015.
[48] CHANDAK Y, THEOCHAROUS G, KOSTAS J, et al. Learning action representations for reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 941-950.
[49] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015: 1889-1897.
[50] BENGIO Y, COURVILLE A, VINCENT P. Representation learning: a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828.
[51] AYACHI R, SAID Y, ATRI M. A convolutional neural network to perform object detection and identification in visual large-scale data[J]. Big Data, 2021, 9(1): 41-52.
[52] GIBADULLIN R F, PERUKHIN M Y, ILIN A V. Speech recognition and machine translation using neural networks[C]//Proceedings of the 2021 International Conference on Industrial Engineering, Applications and Manufacturing, Sochi, May 17-21, 2021. Piscataway: IEEE, 2021: 398-403.
[53] LAURIOLA I, LAVELLI A, AIOLLI F. An introduction to deep learning in natural language processing: models, techniques, and tools[J]. Neurocomputing, 2022, 470: 443-456.
[54] PATEL H, UPLA K P. A shallow network for hyperspectral image classification using an autoencoder with convolutional neural network[J]. Multimedia Tools and Applications, 2022, 81(1): 695-714.
[55] SABOKROU M, FATHY M, HOSEINI M. Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder[J]. Electronics Letters, 2016, 52(13): 1122-1124.
[56] CHANG Y, TU Z, XIE W, et al. Clustering driven deep auto- encoder for video anomaly detection[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 329-345.
[57] HAMMOUCHE R, ATTIA A, AKHROUF S, et al. Gabor filter bank with deep autoencoder based face recognition system[J]. Expert Systems with Applications, 2022, 197: 116743.
[58] 李耿增. 基于变分自编码器的图像压缩[D]. 北京: 北京邮电大学, 2021.
LI G Z. Image compression based on variational self-encoder[D]. Beijing: Beijing University of Posts and Telecommunications, 2021. |