强化学习中异构反馈信号的分析与集成

doi:10.3778/j.issn.1673-9418.2012.04.009

计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (4): 366-376.DOI: 10.3778/j.issn.1673-9418.2012.04.009

强化学习中异构反馈信号的分析与集成

余雪丽，李志，周昌能，崔倩，胡坤

1. 太原理工大学计算机科学与技术学院，太原 030024
2. 青岛滨海大学信息工程学院，山东青岛 266555

出版日期:2012-04-01

Analysis and Integration of Heterogeneous Feedback Signals for Reinforcement Learning

YU Xueli, LI Zhi, ZHOU Changneng, CUI Qian, HU Kun

1. College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China
2. School of Information Engineering, Qingdao Binhai University, Qingdao, Shandong 266555, China

Online:2012-04-01

摘要/Abstract

摘要： 探讨了在高度危险行业的游戏式专业救援培训系统中，视觉与听觉信号能否协同作用以提高人们的记忆和推理能力问题；运用半马尔科夫博弈模型（semi-Markov game，SMG）提出了合作型多agent分层强化学习框架和算法，构建了由视觉处理agent、听觉处理agent以及人类agent组成的异构异质多agent系统；指出分析和归纳视觉听觉相干反馈信号的性质和特点是非常具有挑战性的任务，其决定了强化学习中异构信号的集成方法和途径。在此基础上，提出了将异构反馈信号进行集成的偏信息学习算法，大大缩小了状态搜索空间，缓解了强化学习固有的“维数灾难”问题；根据心理治疗的“系统脱敏”原理，设计了“情绪-个性-刺激-调节”（mood-personality-stimulus-regulation，MPSR）模型和恐怖场景个性化呈现算法（personalized rendering algorithm for terrorist scene，PRATS），用于提升救援队员的心理承受能力，并通过实验验证了算法的有效性。

关键词: 强化学习, 异构, 反馈信号, 视觉听觉相干性

Abstract: This paper explores whether the synergy of visual and audio signals can improve people’s memory and reasoning ability in the model of reinforcement learning of game-based rescue professional training system for highly dangerous professions. Then, it proposes a hierarchy reinforcement learning frame and algorithm of collaborative multi-agent system using semi-Markov game (SMG) model, which is a heterogeneous multi-agent system including visual process agent, audio process agent and human agent. Investigating the constitution of those application models, analyzing and generalizing the properties and characteristics of feedback signals for audio-visual coherency are very challenging tasks, which will decide the integrating way and means of the reinforcement heterogeneous signals. On the basis of previous analyses, the paper studies the bias algorithm for integrating heterogeneous feedback signals, and the result greatly reduces the retrieval state space, alleviates the curse of dimensional problem. According to principles of “systematic desensitization psychotherapy”, the paper designs a mood-personality-stimulus-regulation (MPSR) model and a personalized rendering algorithm for terrorist scene (PRATS) in order to improve the psychological quality of trainees. The relevant experimental results prove the validity of these model and algorithms.

Key words: reinforcement learning, heterogeneous, feedback signal, audio-visual coherency

余雪丽，李志，周昌能，崔倩，胡坤. 强化学习中异构反馈信号的分析与集成[J]. 计算机科学与探索, 2012, 6(4): 366-376.

YU Xueli, LI Zhi, ZHOU Changneng, CUI Qian, HU Kun. Analysis and Integration of Heterogeneous Feedback Signals for Reinforcement Learning[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(4): 366-376.

[1]	陈斌, 刘卫国. 基于SAC模型的改进遗传算法求解TSP问题[J]. 计算机科学与探索, 2021, 15(9): 1680-1693.
[2]	康上, 钱雪忠, 甘霖. 面向申威众核处理器的并行SaNSDE算法[J]. 计算机科学与探索, 2021, 15(10): 2015-2024.
[3]	李秉政，黄高阳，许瑾晨. 面向申威众核处理器的LZMA并行算法设计与优化[J]. 计算机科学与探索, 2020, 14(9): 1501-1509.
[4]	严丹，何军，刘红岩，杜小勇. 考虑评级信息的音乐评论文本自动生成[J]. 计算机科学与探索, 2020, 14(8): 1389-1396.
[5]	许鹏，邓赵红，王骏，王士同. 基于联合信息保持的异构领域自适应[J]. 计算机科学与探索, 2020, 14(7): 1183-1193.
[6]	赵婷婷，孔乐，韩雅杰，任德华，陈亚瑞. 模型化强化学习研究综述[J]. 计算机科学与探索, 2020, 14(6): 918-927.
[7]	刘中强，游晓明，刘升. 启发式强化学习机制的异构双种群蚁群算法[J]. 计算机科学与探索, 2020, 14(3): 460-469.
[8]	杨珉，汪洁. 解决深度探索问题的贝叶斯深度强化学习算法[J]. 计算机科学与探索, 2020, 14(2): 307-316.
[9]	刘徐，肖志勇，甘霖，徐敬蘅，陈宏博. 神威国产处理器应用程序的并行参数自动寻优[J]. 计算机科学与探索, 2020, 14(11): 1838-1848.
[10]	浦建宇，陈蕾，邵楷. 基于Katz增强归纳型矩阵补全的基因-疾病关联关系预测[J]. 计算机科学与探索, 2019, 13(7): 1154-1164.
[11]	朱芮，马永涛，南亚飞，张云蕾. 融合改进强化学习的认知无线电抗干扰决策算法[J]. 计算机科学与探索, 2019, 13(4): 693-701.
[12]	吕小敬，刘钊，蒋令闻，陈德训，杨广文. 船舶三维声弹性模拟软件的并行优化策略[J]. 计算机科学与探索, 2019, 13(11): 1852-1863.
[13]	段林侠，孙晓艳，王稚慧. 面向安全应用消息传输的异构网络选择算法[J]. 计算机科学与探索, 2018, 12(4): 595-607.
[14]	宋国治，张大坤，马杰超，涂遥，刘畅. 异构三维片上网络布局优化的超图划分算法[J]. 计算机科学与探索, 2016, 10(6): 811-821.
[15]	李如平，王勇，徐珍玉. 基于特性筛选的网络视频纳什优化机制研究[J]. 计算机科学与探索, 2016, 10(5): 657-666.

强化学习中异构反馈信号的分析与集成

Analysis and Integration of Heterogeneous Feedback Signals for Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics