计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (3): 460-469.DOI: 10.3778/j.issn.1673-9418.1903068

• 人工智能 • 上一篇    下一篇

启发式强化学习机制的异构双种群蚁群算法

刘中强,游晓明,刘升   

  1. 1.上海工程技术大学 电子电气工程学院,上海 201620
    2.上海工程技术大学 管理学院,上海 201620
  • 出版日期:2020-03-01 发布日期:2020-03-13

Two-Population Ant Colony Algorithm Based on Heuristic Reinforcement Learning

LIU Zhongqiang, YOU Xiaoming, LIU Sheng   

  1. 1.College of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
    2.School of Management, Shanghai University of Engineering Science, Shanghai 201620, China
  • Online:2020-03-01 Published:2020-03-13

摘要:

针对传统蚁群算法在解决TSP问题时易陷入局部最优、收敛速度较慢的问题,提出了一种基于启发式强化学习的异构双种群蚁群算法。蚁群分为主种群和子种群,主种群负责解的构建和信息素的更新,子种群则是在构建解的同时对主种群的解集进行替换。算法初期利用启发式算子自适应地控制两个种群的交流频率,通过偏离度系数控制解的交换方式。前期让子种群的最优解去替换主种群的随机解,增加解的多样性,同时引入强化学习机制对交流后主种群最优路径上的信息素进行自适应的奖赏,以增大最优公共路径以后被选择的概率。后期则控制子种群的最优解去替换主种群的最差解,强化最优路径上信息素的量,并对主种群最优路径上的信息素进行奖赏,进一步提高算法的收敛速度。实验仿真表明,算法能够有效地跳出局部最优,并且解的质量在大规模测试集上有明显的改善。

关键词: 商旅问题(TSP), 异构双种群, 偏离度系数, 启发式强化学习

Abstract:

A heterogeneous two-population ant colony algorithm based on heuristic reinforcement learning is pro-posed to solve the problem that traditional ant colony algorithm is prone to fall into local optimum and convergence speed is slow when solving TSP. Ant colony can be divided into main population and sub-population.The main population is responsible for solution construction and pheromone update. The sub-population replaces the solution set of the main population while constructing the solution. At the beginning of the algorithm, heuristic operator is used to control the communication frequency of two populations adaptively, and the solution exchange mode is controlled by the deviation degree coefficient. At the early stage, the optimal solution of the subpopulation is allowed to replace the random solution of the main population, increasing the diversity of solutions. Meanwhile, the reinforcement learning mechanism is introduced to reward the pheromones in the optimal path of the main population adaptively, so as to increase the probability of the optimal public path selected later. In the later stage, the optimal solution of the sub-population is controlled to replace the worst solution of the main population, the quantity of pheromones in the optimal path is strengthened, and the pheromones in the optimal path of the main population are rewarded, so as to further improve the convergence speed of the algorithm. The simulation results show that the algorithm can effectively jump out of the local optimum, and the quality of the solution is obviously improved in the large-scale test set.

Key words: travelling salesman problem (TSP), heterogeneous dual population, deviation coefficient, heuristic reinforcement learning