Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (9): 1703-1716.DOI: 10.3778/j.issn.1673-9418.2006084

• Artificial Intelligence • Previous Articles     Next Articles

Dynamic Hierarchical Ant Colony Optimization Based on Reward and Punishment Learning Strategy

MO Yadong, YOU Xiaoming, LIU Sheng   

  1. 1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
    2. School of Management, Shanghai University of Engineering Science, Shanghai 201620, China
  • Online:2021-09-01 Published:2021-09-06

融合奖惩学习策略的动态分级蚁群算法

莫亚东游晓明刘升   

  1. 1. 上海工程技术大学 电子电气学院,上海 201620
    2. 上海工程技术大学 管理学院,上海 201620

Abstract:

In order to solve the problems of low precision and premature convergence of traditional ant colony system in solving traveling salesman problem (TSP), this paper proposes a dynamic hierarchical ant colony system which integrates reward and punishment learning strategy (DHL-ACS). Firstly, the ant colony is dynamically divided into empire ants, colony ants and king ant, in which empire ants and colony ants perform local pheromone update, and king ant performs global pheromone update. In local pheromone update, the empire ants perform larger weight coefficient, which is responsible for the development of better solution to enhance the guidance of algorithm, and the colony ants perform small weight coefficient to explore the suboptimal solution and to ensure the diversity of algorithm. In addition, the method of exchanging high-quality solutions between empire ants and colony ants is used to improve the accuracy of the solutions. Secondly, an improved learning strategy is proposed, which realizes the assimilation of the better solution by rewarding the common path of empire ants and colony ants, and improves the convergence speed of the algorithm. Furthermore, when the algorithm stops, the feedback operator is introduced to reduce the pheromone concentration on the path of king ant, which can punish the path of higher pheromone, so as to improve the diversity of the population and enhance the ability of the algorithm to jump out of the local optimum. Through the experiments of several sets of TSP instances, it is shown that the improved algorithm balances the relationship between convergence rate and diversity, especially for large-scale TSP problems, which can effectively improve the accuracy of solutions.

Key words: ant colony optimization (ACO), learning strategy, feedback operator, dynamic hierarchical, traveling salesman problem (TSP)

摘要:

为应对传统蚁群算法在解决旅行商问题(TSP)中求解精度不高、算法易早熟等问题,提出融合奖惩学习策略的动态分级蚁群算法(DHL-ACS)。首先将蚁群动态划分为帝国蚁、殖民蚁及国王蚁,其中帝国蚁与殖民蚁执行局部信息素更新,国王蚁执行全局信息素更新,在局部信息素更新中帝国蚁执行较大权重系数,负责对较优解的开发增强算法导向性,殖民蚁执行较小权重系数,负责对次优解的探索保证算法多样性,并利用帝国蚁与殖民蚁交换优质解的方式提高解的精度。其次提出一种改进的学习策略,通过奖励帝国蚁与殖民蚁的公共路径以实现较优解的同化作用,进而提高算法收敛速度;进一步当算法停滞时,引入反馈算子来减少国王蚁路径上的信息素,以达到对较高信息素路径的惩罚作用,从而提高种群多样性,增强算法跳出局部最优能力。通过对多组TSP数据集实验对比分析,实验结果表明改进后的算法很好地平衡了收敛速度与多样性之间的关系,尤其应对大规模TSP问题,能有效改善解的精度。

关键词: 蚁群算法(ACO), 学习策略, 反馈算子, 动态分级, 旅行商问题(TSP)