计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (4): 366-376.DOI: 10.3778/j.issn.1673-9418.2012.04.009

• 学术研究 • 上一篇    下一篇

强化学习中异构反馈信号的分析与集成

余雪丽,李 志,周昌能,崔 倩,胡 坤   

  1. 1. 太原理工大学 计算机科学与技术学院,太原 030024
    2. 青岛滨海大学 信息工程学院,山东 青岛 266555
  • 出版日期:2012-04-01

Analysis and Integration of Heterogeneous Feedback Signals for Reinforcement Learning

YU Xueli, LI Zhi, ZHOU Changneng, CUI Qian, HU Kun   

  1. 1. College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China
    2. School of Information Engineering, Qingdao Binhai University, Qingdao, Shandong 266555, China
  • Online:2012-04-01

摘要: 探讨了在高度危险行业的游戏式专业救援培训系统中,视觉与听觉信号能否协同作用以提高人们的记忆和推理能力问题;运用半马尔科夫博弈模型(semi-Markov game,SMG)提出了合作型多agent分层强化学习框架和算法,构建了由视觉处理agent、听觉处理agent以及人类agent组成的异构异质多agent系统;指出分析和归纳视觉听觉相干反馈信号的性质和特点是非常具有挑战性的任务,其决定了强化学习中异构信号的集成方法和途径。在此基础上,提出了将异构反馈信号进行集成的偏信息学习算法,大大缩小了状态搜索空间,缓解了强化学习固有的“维数灾难”问题;根据心理治疗的“系统脱敏”原理,设计了“情绪-个性-刺激-调节”(mood-personality-stimulus-regulation,MPSR)模型和恐怖场景个性化呈现算法(personalized rendering algorithm for terrorist scene,PRATS),用于提升救援队员的心理承受能力,并通过实验验证了算法的有效性。

关键词: 强化学习, 异构, 反馈信号, 视觉听觉相干性

Abstract: This paper explores whether the synergy of visual and audio signals can improve people’s memory and reasoning ability in the model of reinforcement learning of game-based rescue professional training system for highly dangerous professions. Then, it proposes a hierarchy reinforcement learning frame and algorithm of collaborative multi-agent system using semi-Markov game (SMG) model, which is a heterogeneous multi-agent system including visual process agent, audio process agent and human agent. Investigating the constitution of those application models, analyzing and generalizing the properties and characteristics of feedback signals for audio-visual coherency are very challenging tasks, which will decide the integrating way and means of the reinforcement heterogeneous signals. On the basis of previous analyses, the paper studies the bias algorithm for integrating heterogeneous feedback signals, and the result greatly reduces the retrieval state space, alleviates the curse of dimensional problem. According to principles of “systematic desensitization psychotherapy”, the paper designs a mood-personality-stimulus-regulation (MPSR) model and a personalized rendering algorithm for terrorist scene (PRATS) in order to improve the psychological quality of trainees. The relevant experimental results prove the validity of these model and algorithms.

Key words: reinforcement learning, heterogeneous, feedback signal, audio-visual coherency