计算机科学与探索

• 学术研究 •    下一篇

基于多臂赌博机遗传算法的无人机与卡车协同配送

朱烨娜, 刘敏, 赵肄江, 陈萱霖   

  1. 1. 湖南科技大学 计算机科学与工程学院,湖南 湘潭 411201
    2. 服务计算与软件服务新技术湖南省重点实验室,湖南 湘潭 411201

A Genetic Algorithm Based on Multi-armed Bandit for Collaborative Drone-Truck Delivery

ZHU Yena,  LIU Min,  ZHAO Yijiang,  CHEN Xuanlin   

  1. 1. School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China
    2. Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, Hunan 411201, China

摘要: 无人机与卡车协同配送新模式凭借其高效、环保、不受地形限制等优势,正在改变传统的物流配送方式。带无人机的旅行商问题(Traveling Salesman Problem with Drone, TSP-D)是上述配送新模式中的一种经典问题,比纯卡车物流配送更为复杂,需要从无人机和卡车间的协同交互中寻找最优的配送组合,带来了新的挑战。提出了一种基于多臂赌博机的混合遗传算法来求解TSP-D。采用了自然数排列的染色体编码,并应用基于动态规划的精确划分方法对其解码,以生成无人机与卡车协同配送解方案。新设计了一种多臂赌博机局部搜索策略,将局部搜索算子池中的五种不同搜索算子视作赌博机的多个“臂”。先通过赌博机摇臂搜索后解方案适应值的提升程度来计算奖励,再根据ε-greedy强化学习方法计算各个“臂”被选中的概率,以便选择合适的搜索算子来增强算法的局部搜索能力。实验表明,提出的算法与其它主流的算法相比,在不同分布与不同规模的多数测试实例上均有更低的解方案成本。进一步的实验分析验证了多臂赌博机局部搜索策略比其它局部搜索策略具有更好的自适应能力,能显著提升算法的性能。最后,将提出的算法应用于长沙市一个实际的配送案例,展示了其现实应用效果。

关键词: 无人机卡车协同配送, 带无人机的旅行商问题, 混合遗传算法, 多臂赌博机

Abstract: The new mode of drone-truck collaborative delivery is changing the traditional logistics distribution mode with its advantages of high efficiency, environmental protection, and no terrain restrictions. The traveling salesman problem with drone (TSP-D) is a classic problem in the above-mentioned distribution mode. It is more complicated than truck-only logistics distribution and requires finding the optimal distribution combination from the collaborative interaction between drone and truck, which brings new challenges. A hybrid genetic algorithm based on multi-armed bandit (HGA-MAB) is proposed to solve TSP-D. Natural number permutation is used in the chromosome encoding, and an exact partitioning method based on dynamic programming is applied to decode the chromosome to generate a collaborative delivery solution of drone and truck. A new local search strategy based on multi-armed bandit is designed, which regards the five different search operators in the local search operator pool as multiple “arms” of the bandit. The reward is first calculated by the improvement of the solution fitness after the bandit arm search, and then the probability of each “arm” being selected is calculated according to the ε-greedy reinforcement learning method, so as to select the appropriate operator to improve the local search ability of the algorithm. Experimental results show that the proposed algorithm is able to find low costs compared to the state-of-the-art algorithms on most test instances with different distributions and scales. Further experimental analysis indicates that the multi-armed bandit local search strategy has better adaptability than other strategies and can significantly improve the performance of the algorithm. Finally, the proposed algorithm is applied on a real-world delivery case of Changsha, which shows its practical application effect.

Key words: Drone-truck collaborative delivery, Traveling Salesman Problem with Drone, Hybrid genetic algorithm, Multi-armed Bandit