计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (9): 1680-1693.DOI: 10.3778/j.issn.1673-9418.2010065

• 人工智能 • 上一篇    下一篇

基于SAC模型的改进遗传算法求解TSP问题

陈斌,刘卫国   

  1. 1. 中南大学 自动化学院,长沙 410083
    2. 中南大学 计算机学院,长沙 410083
  • 出版日期:2021-09-01 发布日期:2021-09-06

SAC Model Based Improved Genetic Algorithm for Solving TSP

CHEN Bin, LIU Weiguo   

  1. 1. School of Automation, Central South University, Changsha 410083, China
    2. School of Computer Science and Engineering, Central South University, Changsha 410083, China
  • Online:2021-09-01 Published:2021-09-06

摘要:

遗传算法(GA)的全局搜索能力强,易于操作,但其收敛速度慢,易陷入局部最优值。针对以上问题,利用深度强化学习模型SAC对遗传算法进行改进,并将其应用至旅行商问题(TSP)的求解。改进算法将种群作为与智能体(agent)交互的环境,引入贪心算法对环境进行初始化,使用改进后的交叉与变异运算作为agent的动作空间,将种群的进化过程视为一个整体,以最大化种群进化过程的累计奖励为目标,结合当前种群个体适应度情况,采用基于SAC的策略梯度算法,生成控制种群进化的动作策略,合理运用遗传算法的全局和局部搜索能力,优化种群的进化过程,平衡种群收敛速度与遗传操作次数之间的关系。对TSPLIB实例的实验结果表明,改进的遗传算法可有效地避免陷入局部最优解,在提高种群收敛速度的同时,减少寻优过程的迭代次数。

关键词: 强化学习, 遗传算法(GA), 旅行商问题(TSP), 深度策略梯度, soft actor-critic(SAC)模型

Abstract:

Genetic algorithm (GA) has strong global searching ability and is easy to operate, but its disadvantages such as poor convergence speed, unstable and easy to fall into local optimal value restrict its application. In order to overcome these disadvantages, an improved genetic algorithm based on the deep reinforcement learning model SAC (soft actor-critic) is proposed in this paper, which is applied to the resolution of traveling salesman problem (TSP). The improved algorithm regards the population as agent??s interaction environment, meanwhile greedy algorithm is used to initialize this environment for improving the quality of initial populations. For controlling the evolution of the population, the improved crossover and mutation operations are used as agent??s action space. With the goal of maximizing the cumulative rewards of population evolution, the improved algorithm treats the evolution of the population as a whole and uses a policy gradient algorithm based on SAC to generate evolution controlling action strategy combined with the current individual fitness of the population. The action strategy reasonably uses the global and local search ability of genetic algorithm by agent??s actions, optimizing the evolutionary process of the population while balancing relationship between the population convergence rate and the times of genetic operation. The experimental results of TSPLIB indicate that the improved genetic algorithm can effectively avoid falling into the local optimal solution and reduce the number of iteration in the optimization process while improving the convergence rate of the population.

Key words: reinforcement learning, genetic algorithm (GA), traveling salesman problem (TSP), deep policy gradient, soft actor-critic (SAC) model