SAC Model Based Improved Genetic Algorithm for Solving TSP

doi:10.3778/j.issn.1673-9418.2010065

Abstract

Abstract:

Genetic algorithm (GA) has strong global searching ability and is easy to operate, but its disadvantages such as poor convergence speed, unstable and easy to fall into local optimal value restrict its application. In order to overcome these disadvantages, an improved genetic algorithm based on the deep reinforcement learning model SAC (soft actor-critic) is proposed in this paper, which is applied to the resolution of traveling salesman problem (TSP). The improved algorithm regards the population as agent??s interaction environment, meanwhile greedy algorithm is used to initialize this environment for improving the quality of initial populations. For controlling the evolution of the population, the improved crossover and mutation operations are used as agent??s action space. With the goal of maximizing the cumulative rewards of population evolution, the improved algorithm treats the evolution of the population as a whole and uses a policy gradient algorithm based on SAC to generate evolution controlling action strategy combined with the current individual fitness of the population. The action strategy reasonably uses the global and local search ability of genetic algorithm by agent??s actions, optimizing the evolutionary process of the population while balancing relationship between the population convergence rate and the times of genetic operation. The experimental results of TSPLIB indicate that the improved genetic algorithm can effectively avoid falling into the local optimal solution and reduce the number of iteration in the optimization process while improving the convergence rate of the population.

Key words: reinforcement learning, genetic algorithm (GA), traveling salesman problem (TSP), deep policy gradient, soft actor-critic (SAC) model

摘要：

遗传算法（GA）的全局搜索能力强，易于操作，但其收敛速度慢，易陷入局部最优值。针对以上问题，利用深度强化学习模型SAC对遗传算法进行改进，并将其应用至旅行商问题（TSP）的求解。改进算法将种群作为与智能体（agent）交互的环境，引入贪心算法对环境进行初始化，使用改进后的交叉与变异运算作为agent的动作空间，将种群的进化过程视为一个整体，以最大化种群进化过程的累计奖励为目标，结合当前种群个体适应度情况，采用基于SAC的策略梯度算法，生成控制种群进化的动作策略，合理运用遗传算法的全局和局部搜索能力，优化种群的进化过程，平衡种群收敛速度与遗传操作次数之间的关系。对TSPLIB实例的实验结果表明，改进的遗传算法可有效地避免陷入局部最优解，在提高种群收敛速度的同时，减少寻优过程的迭代次数。

关键词: 强化学习, 遗传算法（GA）, 旅行商问题（TSP）, 深度策略梯度, soft actor-critic（SAC）模型

CHEN Bin, LIU Weiguo. SAC Model Based Improved Genetic Algorithm for Solving TSP[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(9): 1680-1693.

陈斌, 刘卫国. 基于SAC模型的改进遗传算法求解TSP问题[J]. 计算机科学与探索, 2021, 15(9): 1680-1693.

References

[1] HOLLAND J H. Adaptation in natural and artificial system[M]. Michigan: University of Michigan Press, 1975.
[2] MA Y J, YUN W X. Research progress of genetic algorithm[J]. Application Research of Computers, 2012, 29(4): 1201-1206.
马永杰, 云文霞. 遗传算法研究进展[J]. 计算机应用研究, 2012, 29(4): 1201-1206.
[3] CHEN Q, HUANG M X, XU Q N, et al. Reinforcement learning-based genetic algorithm in optimizing multidimensional data discretization scheme[J]. Mathematical Problems in Engineering, 2020(3): 1-13.
[4] XU X Y, LIU W W, FU D, et al. An improved genetic algorithm to solve the course scheduling problem in the context of new college entrance examinations[J]. Journal of East China Normal University (Natural Science), 2020(4): 108-123.
徐向阳, 刘文伟, 傅蝶, 等. 改进遗传算法求解新高考背景下的排课问题[J]. 华东师范大学学报(自然科学版), 2020(4): 108-123.
[5] LI Z B, HOU S W, CHENG H H. Method for initial population of TSP[J]. Computer Engineering and Applications, 2016, 52(17): 172-176.
李志宾, 侯世旺, 程厚虎. 一种求解TSP初始化种群问题的方法[J]. 计算机工程与应用, 2016, 52(17): 172-176.
[6] LIU Y, ZHANG C. Application of dueling DQN and DECGA for parameter estimation in variogram models[J]. IEEE Access, 2020, 8: 38112-38122.
[7] WANG B N, GAO Y, CHEN Z Q, et al. RLGA: a reinforcement learning based genetic algorithm[J]. Acta Electronica Sinica, 2006, 34(5): 856-860.
王本年, 高阳, 陈兆乾, 等. RLGA: 一种基于强化学习机制的遗传算法[J]. 电子学报, 2006, 34(5): 856-860.
[8] WANG X Y, LIU Q, FU Q M, et al. Multiple policy selection genetic algorithm based on reinforcement learning[J]. Computer Engineering, 2011, 37(8): 149-152.
王晓燕, 刘全, 傅启明, 等. 基于强化学习的多策略选择遗传算法[J]. 计算机工程, 2011, 37(8): 149-152.
[9] LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1): 1-27.
刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27.
[10] ZHAO T T, KONG L, HAN Y J, et al. Review of model-based reinforcement learning[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(6): 918-927.
赵婷婷, 孔乐, 韩雅杰, 等. 模型化强化学习研究综述[J]. 计算机科学与探索, 2020, 14(6): 918-927.
[11] QU Z S, LIU S L. Convergence analysis means of simple genetic algorithm[J]. Journal of Harbin University of Science and Technology, 2003, 27(1): 42-45.
曲中水, 刘淑兰. 基本遗传算法的收敛性分析方法[J]. 哈尔滨理工大学学报, 2003, 27(1): 42-45.
[12] CHEN J T, XIANG Y. Survey of unstable gradients in deep neural network training[J]. Journal of Software, 2018, 29(7): 2071-2091.
陈建廷, 向阳. 深度神经网络训练中梯度不稳定现象研究综述[J]. 软件学报, 2018, 29(7): 2071-2091.
[13] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[J]. arXiv:1812.05905, 2018.
[14] ZHANG N P, WU X, ZHU Q. Entropy-based oversampling framework[J]. Computer Engineering and Applications, 2021, 57(13): 96-101.
张念蓬, 吴旭, 朱强. 基于熵的过采样框架[J]. 计算机工程与应用, 2021, 57(13): 96-101.
[15] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[J]. arXiv:1502.05477, 2015.
[16] SCHUMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017.
[17] HAARNOJA T, TANG H, ABBEEL P, et al. Reinforcement learning with deep energy-based policies[J]. arXiv:1702. 08165, 2017.
[18] PAN J W, QIAN Q, FU Y F, et al. Multi-population genetic algorithm based on optimal weight dynamic control learning mechanism[J/OL]. Journal of Frontiers of Computer Science and Technology (2020-09-27)[2020-12-17]. http://kns.cnki.net/kcms/detail/11.5602.TP.20200927.1428.004.html.
潘家文, 钱谦, 伏云发, 等. 最优权动态控制学习机制的多种群遗传算法[J/OL]. 计算机科学与探索(2020-09-27)[2020-12-17]. http://kns.cnki.net/kcms/detail/11.5602.TP.20200927. 1428.004.html.
[19] WANG Z, LIU R M, ZHU Y G, et al. Improved genetic algorithm for solving TSP problem[J]. Electronic Measurement Technology, 2019, 42(23): 91-96.
王震, 刘瑞敏, 朱阳光, 等. 一种求解TSP问题的改进遗传算法[J]. 电子测量技术, 2019, 42(23): 91-96.
[20] CHEN W. The application of the evolutionary computation for optimal problem[D]. Wuhan: Wuhan University of Technology, 2010.
陈伟. 进化计算在优化问题中的应用[D]. 武汉: 武汉理工大学, 2010.
[21] REN Z W, SAN Y. Improved adaptive genetic algorithm and its application research in parameter identification[J]. Journal of System Simulation, 2006, 18(1): 41-43.
任子武, 伞冶. 自适应遗传算法的改进及在系统辨识中应用研究[J]. 系统仿真学报, 2006, 18(1): 41-43.
[22] FENG A L, WANG C X, KONG J L. Improved genetic algorithm for solving order batching optimization model[J]. Computer Engineering and Applications, 2020, 56(8): 261-269.
冯爱兰, 王晨西, 孔继利. 改进遗传算法求解订单分批优化模型[J]. 计算机工程与应用, 2020, 56(8): 261-269.
[23] CHRISTODOULOU P. Soft actor-critic for discrete action settings[J]. arXiv:1910.07207, 2019.
[24] VYAS A, CHAWLA D K, THAKAR U. Dynamic simulated annealing for solving the traveling salesman problem with cooling enhancer and modified acceptance probability[J]. International Journal of Scientific and Research Publications, 2018, 8(3): 213-220.
[25] DORIGO M, LUCA M G. Ant colonies for the traveling salesman problem[J]. BioSystems, 1997, 43(2): 73-81.
[26] WU H S, ZHANG F M, LI H, et al. Discrete wolf pack algorithm for traveling salesman problem[J]. Control and Decision, 2015, 30(10): 1861-1867.
吴虎胜, 张凤鸣, 李浩, 等. 求解TSP问题的离散狼群算法[J]. 控制与决策, 2015, 30(10): 1861-1867.
[27] CHEN S M, CHIEN C Y. Solving the traveling salesman problem based on the genetic simulated annealing ant colony system with particle swarm optimization techniques[J]. Expert Systems with Applications, 2011, 38(12): 14439-14450.
[28] DONG G, GUO W W, TICKLE K. Solving the traveling salesman problem using cooperative genetic ant systems[J]. Expert Systems with Applications, 2012, 39(5): 5006-5011.
[29] GüNDüZ M, KIRAN M S, ?ZCEYLAN E. A hierarchic approach based swarm intelligence to traveling salesman problem[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2015, 23(1): 103-117.
[30] OSABA E, YANG X S, DIAZ F, et al. An improved discrete bat algorithm for symmetric and asymmetric traveling salesman problems[J]. Engineering Applications of Artificial Intelligence, 2016, 48(C): 59-71.
[31] HE Q, WU Y L, XU T W. Application of improved genetic simulated annealing algorithm in TSP optimization[J]. Control and Decision, 2018, 33(2): 219-225.
何庆, 吴意乐, 徐同伟. 改进遗传模拟退火算法在TSP优化中的应用[J]. 控制与决策, 2018, 33(2): 219-225.
[32] ZHANG Q Y, PAN Z X, LEI D M, et al. New imperialist competitive algorithm for solving traveling salesman problem[J]. Journal of Wuhan University of Technology, 2018, 40(6): 89-97.
张清勇, 潘子肖, 雷德明, 等. 求解旅行商问题的新型帝国竞争算法[J]. 武汉理工大学学报, 2018, 40(6): 89-97.
[33] LI J, YOU X M, LIU S, et al. Adaptive fuzzy ant colony system[J]. Computer Engineering and Applications, 2019, 55(15): 75-81.
李娟, 游晓明, 刘升, 等. 自适应模糊蚁群系统[J]. 计算机工程与应用, 2019, 55(15): 75-81.
[34] DAI H J, KHALIL E B, ZHANG Y Y, et al. Learning combinatorial optimization algorithms over graphs[J]. arXiv: 1704.01665, 2017.
[35] WU Y X, SONG W, CAO Z G, et al. Learning improvement heuristics for solving routing problems[J]. arXiv:1912. 05784, 2019.
[36] DA COSTA P R D O, RHUGGENAATH J, ZHANG Y Q, et al. Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning[J]. arXiv:2004. 01608, 2020.