带有惩罚措施的自竞争事后经验重播算法
王子豪, 钱雪忠, 宋威
Self-competitive Hindsight Experience Replay with Penalty Measures
WANG Zihao, QIAN Xuezhong, SONG Wei
计算机科学与探索 . 2024, (5): 1223 -1231 .  DOI: 10.3778/j.issn.1673-9418.2303031