Memetic Algorithm Based on Deep Reinforcement Learning for Vehicle Routing Problem with Pickup-Delivery

doi:10.3778/j.issn.1673-9418.2302072

Abstract

Abstract: The vehicle routing problem with simultaneous pickup-delivery and time windows (VRPSPDTW) is a NP hard problem, which has a wide application in modern logistics. Memetic algorithm based on deep reinforcement learning is proposed to solve the problem. The large neighborhood search process of Memetic algorithm for VRPSPDTW is modeled into a Markov decision process. An encoder-decoder neural network architecture is designed for the removal operation in large neighborhood search. The extracted individual characteristics and location characteristics of all nodes in the current solution are input into the encoder for information interaction. The decoder outputs the nodes to be removed. Two kinds of decoders are designed including non-autoregressive and autoregressive structures. The neural network architecture uses reinforcement learning for training. A hybrid strategy is also designed, combining manually designed heuristic strategies with strategies learned through deep reinforcement learning to improve the optimization ability. Experimental results show that the proposed algorithm has a stronger ability to jump out of the local optimum, and can provide better solutions than the comparison algorithms in an effective time, especially in solving large-scale problems. In addition, ablation experiments are conducted on the new components of the proposed algorithm to show the effectiveness.

Key words: simultaneous pickup-delivery vehicle routing problem, time window, deep reinforcement learning, large neighborhood search

摘要： 带时间窗约束的同时取送货车辆路径问题（VRPSPDTW）是NP难问题，属于约束较复杂的车辆路径问题，在现代物流中有广泛应用。提出深度强化学习Memetic算法求解该问题，将Memetic算法求解VRPSPDTW问题中的大邻域搜索过程建模成马尔可夫决策过程，构建编码器-解码器架构的深度神经网络模型完成大邻域搜索中的移除操作。编码器对当前解中各结点的个体特征和位置特征进行信息交互，解码器输出需要移除的结点，设计了非自回归和自回归两种网络结构，采用强化学习算法训练神经网络模型。设计了混合策略，将人工设计的启发式策略与深度强化学习到的策略相结合，以提高寻优能力。实验结果显示提出的算法具有更强的跳出局部最优的能力，能在有效的时间内获得比对比算法更优的解，特别是在大规模问题上。最后，对提出算法的新组件进行了消融实验，证明了算法的有效性。

关键词: 同时取送货车辆路径问题, 时间窗, 深度强化学习, 大邻域搜索

ZHOU Yalan, LIAO Yitian, SU Xiao, WANG Jiahai. Memetic Algorithm Based on Deep Reinforcement Learning for Vehicle Routing Problem with Pickup-Delivery[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 818-830.

周雅兰, 廖易天, 粟筱, 王甲海. 深度强化学习Memetic算法求解取送货车辆路径问题[J]. 计算机科学与探索, 2024, 18(3): 818-830.

References

[1] DEVIKA K, JAFARIAN A, NOURBAKHSH V. Designing a sustainable closed-loop supply chain network based on triple bottom line approach: a comparison of metaheuristics hybridization techniques[J]. European Journal of Operational Research, 2014, 235(3): 594-615.
[2] 李珺, 段钰蓉, 郝丽艳, 等. 混合优化算法求解同时送取货车辆路径问题[J]. 计算机科学与探索, 2022, 16(7): 1623-1632.
LI J, DUAN Y R, HAO L Y, et al. Hybrid optimization algorithm for vehicle routing problem with simultaneous deliverypickup[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1623-1632.
[3] LIU S, TANG K, YAO X. Memetic search for vehicle routing with simultaneous pickup-delivery and time windows[J]. Swarm and Evolutionary Computation, 2021, 66: 100927.
[4] JIN Y, HAO J K. Solving the Latin square completion problem by memetic graph coloring[J]. IEEE Transactions on Evolutionary Computation, 2019, 23(6): 1015-1028.
[5] ROPKE S, PISINGER D. An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows[J]. Transportation Science, 2006, 40(4): 455-472.
[6] SHAW P. Using constraint programming and local search methods to solve vehicle routing problems[C]//LNCS 1520: Proceedings of the 1998 International Conference on Principles and Practice of Constraint Programming, Pisa, Oct 26-30, 1998. Berlin, Heidelberg: Springer, 1998: 417-431.
[7] SHAW P. A new local search algorithm providing high quality solutions to vehicle routing problems[D]. Glasgow: University of Strathclyde. Department of Computer Science, 1997: 46.
[8] BENGIO Y, LODI A, PROUVOST A. Machine learning for combinatorial optimization: a methodological tour d’horizon[J]. European Journal of Operational Research, 2021, 290(2): 405-421.
[9] 郑渤龙, 明岭峰, 胡琦, 等. 基于深度强化学习的网约车动态路径规划[J]. 计算机研究与发展, 2022, 59(2): 329-341.
ZHENG B L, MING L F, HU Q, et al. Dynamic ride-hailing route planning based on deep reinforcement learning[J]. Journal of Computer Research and Development, 2022, 59(2): 329-341.
[10] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. arXiv:1506.03134, 2015.
[11] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[12] 徐冰冰, 岑科廷, 黄俊杰, 等. 图卷积神经网络综述[J]. 计算机学报, 2020, 43(5): 755-780.
XU B B, CEN K Y, HUANG J J, et al. Survey on image convolutional neural networks[J]. Chinese Journal of Computers, 2020, 43(5): 755-780.
[13] DUAN L, ZHAN Y, HU H, et al. Efficiently solving the practical vehicle routing problem: a novel joint learning approach[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 6-10, 2020. New York: ACM, 2020: 3054-3063.
[14] MA Y, LI J, CAO Z, et al. Learning to iteratively solve routing problems with dual-aspect collaborative transformer[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 11096-11107.
[15] WU Y, SONG W, CAO Z, et al. Learning improvement heuristics for solving routing problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 5057-5069.
[16] COSTA P R, RHUGGENAATH J, ZHANG Y, et al. Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning[C]//Proceedings of the 12th Asian Conference on Machine Learning, Bangkok, Nov 18-20, 2020: 1-17.
[17] CHEN M, GAO L, CHEN Q, et al. Dynamic partial removal: a neural network heuristic for large neighborhood search[J]. arXiv:2005.09330, 2020.
[18] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256.
[19] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[20] WANG H F, CHEN Y Y. A genetic algorithm for the simultaneous delivery and pickup problems with time window[J]. Computers & Industrial Engineering, 2012, 62(1): 84-95.
[21] SOLOMON M M. Algorithms for the vehicle routing and scheduling problems with time window constraints[J]. Operations Research, 1987, 35(2): 254-265.
[22] SILVA M A L, DE SOUZA S R, SOUZA M J F, et al. A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems[J]. Expert Systems with Applications, 2019, 131: 148-171.
[23] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![J]. arXiv:1803.08475, 2018.
[24] FALKNER J K, SCHMIDT-THIEME L. Learning to solve vehicle routing problems with time windows through joint attention[J]. arXiv:2006.09100, 2020.