计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (2): 261-279.DOI: 10.3778/j.issn.1673-9418.2107040

• 综述·探索 • 上一篇    下一篇

强化学习求解组合最优化问题的研究综述

王扬, 陈智斌+(), 吴兆蕊, 高远   

  1. 昆明理工大学 理学院,昆明 650000
  • 收稿日期:2021-07-12 修回日期:2021-09-16 出版日期:2022-02-01 发布日期:2021-09-24
  • 通讯作者: + E-mail: chenzhibin311@126.com
  • 作者简介:王扬(1997—),男,山东烟台人,硕士研究生,主要研究方向为组合最优化、强化学习、深度强化学习等。
    陈智斌(1979—),男,云南文山人,博士,副教授,硕士生导师,主要研究方向为组合最优化、图理论、近似算法、运筹学。
    吴兆蕊(1998—),女,云南昭通人,硕士研究生,主要研究方向为组合最优化。
    高远(1997—),女,辽宁丹东人,硕士研究生,主要研究方向为组合最优化。
  • 基金资助:
    国家自然科学基金(11761042)

Review of Reinforcement Learning for Combinatorial Optimization Problem

WANG Yang, CHEN Zhibin+(), WU Zhaorui, GAO Yuan   

  1. Faculty of Science, Kunming University of Science and Technology, Kunming 650000, China
  • Received:2021-07-12 Revised:2021-09-16 Online:2022-02-01 Published:2021-09-24
  • About author:WANG Yang, born in 1997, M.S. candidate. His research interests include combinatorial optimization, reinforcement learning, deep reinforcement learning, etc.
    CHEN Zhibin, born in 1979, Ph.D., associate professor, M.S. supervisor. His research interests include combinatorial optimization, graph theory, approximate algorithms and operations research.
    WU Zhaorui, born in 1998, M.S. candidate. Her research interest is combinatorial optimization.
    GAO Yuan, born in 1997, M.S. candidate. Her research interest is combinatorial optimization.
  • Supported by:
    National Natural Science Foundation of China(11761042)

摘要:

组合最优化问题(COP)的求解方法已经渗透到人工智能、运筹学等众多领域。随着数据规模的不断增大、问题更新速度的变快,运用传统方法求解COP问题在速度、精度、泛化能力等方面受到很大冲击。近年来,强化学习(RL)在无人驾驶、工业自动化等领域的广泛应用,显示出强大的决策力和学习能力,故而诸多研究者尝试使用RL求解COP问题,为求解此类问题提供了一种全新的方法。首先简要梳理常见的COP问题及其RL的基本原理;其次阐述RL求解COP问题的难点,分析RL应用于组合最优化(CO)领域的优势,对RL与COP问题结合的原理进行研究;然后总结近年来采用RL求解COP问题的理论方法和应用研究,对各类代表性研究所解决COP问题的关键要点、算法逻辑、优化效果进行对比分析,以突出RL模型的优越性,并对不同方法的局限性及其使用场景进行归纳总结;最后提出了四个RL求解COP问题的潜在研究方向。

关键词: 强化学习(RL), 深度强化学习(DRL), 组合最优化问题(COP)

Abstract:

The solution methods for combinatorial optimization problem (COP) have permeated to the fields of artificial intelligence, operations research, etc. With the scale of data increasing and the speed of problem updating being faster, the traditional method of solving the COP is challenged in computational speed, precision and generali-zation ability. In recent years, reinforcement learning (RL) has been widely used in driverless, industrial automation and other fields, showing strong decision-making and learning ability. Thus, many researchers have strived to use RL to solve COP, which provides a novel method for solving these problems. This paper firstly introduces the common COP problems and the basic principles of RL. Then, this paper elaborates the difficulties of RL in solving COP, analyzes the advantages of RL in combinatorial optimization (CO) field, and studies the principle of the combina-tion of RL and COP. Subsequently, this paper summarizes the theoretical methods and applied researches of solving COP problems utilizing RL in recent years. In order to highlight the superiority of RL model, this paper also com-pares and analyzes the key points, algorithmic logic and optimization effect of various representative researches in solving COP problem, and sums up the limitations of different methods and their application fields. Finally, this paper proposes four potential research directions.

Key words: reinforcement learning (RL), deep reinforcement learning (DRL), combinatorial optimization problem (COP)

中图分类号: