[1] ATEZCAN E, TORUN T, KOSAR F, et al. Mixed and multi-precision SpMV for GPUs with row-wise precision selection[C]//Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing, Bordeaux, Nov 2-5, 2022. Piscataway: IEEE, 2022: 31-40.
[2] SUN H Y, GAINARU A, SHANTHARAM M, et al. Selective protection for sparse iterative solvers to reduce the resilience overhead[C]//Proceedings of the 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing, Porto, Sep 9-11, 2020. Piscataway: IEEE, 2020: 141-148.
[3] 李秉政,黄高阳,许瑾晨. 面向申威众核处理器的LZMA并行算法设计与优化[J]. 计算机科学与探索, 2020, 14(9): 1501-1509.
LI B Z, HUANG G Y, XU J C. Design and optimization of parallel LZMA for many-core sunway processor[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1501-1509.
[4] YANG M L, DU Y L, SHENG X Q. Solving electromagnetic scattering problems with over 10 billion unknowns with the parallel MLFMA[C]//Proceedings of the 2019 Photonics & Electromagnetics Research Symposium-Fall, Xiamen, Dec 17-20, 2019. Piscataway: IEEE, 2019: 355-360.
[5] LIU J. Accuracy controllable SpMV optimization on GPU[C]//Proceedings of the 2022 4th International Conference on Artificial Intelligence and Computer Science, Beijing, Jul 30-31,2022. Bristol: IOP Publishing, 2022.
[6] AHMED M, USMAN S, SHAH N A, et al. AAQAL: a machine learning-based tool for performance optimization of parallel SPMV computations using block CSR[J]. Applied Sciences, 2022, 12(14): 7073.
[7] ISOTTON G, FRIGO M, SPIEZIA N, et al. Chronos: a general purpose classical AMG solver for high performance computing[J]. SIAM Journal on Scientific Computing, 2021, 43(5): 335-357.
[8] 肖汉,孙陆鹏,李彩林,等. 面向GPU的直方图统计图像增强并行算法[J]. 计算机科学与探索,2022, 16(10): 2273-2285.
XIAO H, SUN L P, LI C L, et al. GPU-oriented parallel algorithm for histogram statistical image enhancement[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(10): 2273-2285.
[9] CHEN Y D, XIAO G Q, WU F, et al. tpSpMV: a two-phase large-scale sparse matrix-vector multiplication kernel for many-core architectures[J]. Information Sciences, 2020, 523: 279-295.
[10] NAMASHIVAVAM N, MEHTA S, YEW P C. Variable-sized blocks for locality-aware SpMV[C]//Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization, Seoul, Feb 27-Mar 3, 2021. Piscataway: IEEE, 2021: 211-221.
[11] BIAN H D, HUANG J Q, LIU L B, et al. ALBUS: a method for efficiently processing SpMV using SIMD and load balancing[J]. Future Generation Computer Systems, 2021, 116: 371-392.
[12] LIU W F, VINTER B. CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication[C]//Proceedings of the 29th ACM on International Conference on Supercomputing, California, Jun 30-31, 2015. New York: ACM, 2015: 339-350.
[13] ZHANG Y F, YANG W D, LI K L, et al. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor[J]. Journal of Parallel and Distributed Computing, 2021, 158: 126-137.
[14] YESIL S, HEIDARSHENS A, MORRISON A, et al. Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Georgia, Nov 9-19, 2020. Piscataway: IEEE, 2020: 1-15.
[15] CUI H Y, WANG N B, WANG Y H, et al. An effective SPMV based on block strategy and hybrid compression on GPU[J]. The Journal of Supercomputing, 2022, 78(5): 6318-6339.
[16] LI Y S, XIE P Z, CHEN X H, et al. VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors[J]. The Journal of Supercomputing, 2020, 76(3): 2063-2081.
[17] BIAN H D, HUANG J Q, DONG R T, et al. A simple and efficient storage format for SIMD-accelerated SpMV[J]. Cluster Computing, 2021, 24(4): 3431-3448.
[18] GAO J H, JI W X, LIU J, et al. AMF-CSR: adaptive multi-row folding of CSR for SpMV on GPU[C]//Proceedings of the 2021 IEEE 27th International Conference on Parallel and Distributed Systems, Beijing, Dec 14-16, 2021. Piscataway: IEEE, 2021: 418-425.
[19] YANG W D, LI K L, LI K Q. A parallel computing method using blocked format with optimal partitioning for SpMV on GPU[J]. Journal of Computer and System Sciences, 2018, 92: 152-170.
[20] BARRIENTOS E C, INDALECIO G, LOUREIRO A G. Improving performance of iterative solvers with the AXC format using the Intel Xeon Phi[J]. The Journal of Supercomputing, 2018, 74(6): 2823-2840.
[21] BELL N, GARLAND M. Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]//Proceedings of the International Conference for High Performance Computing, Networking, Portland, Nov 14-20, 2009. New York: ACM, 2009: 1-11.
[22] TALATI N, MAY K, BEHROOZI A, et al. Prodigy: improving the memory latency of data-indirect irregular workloads using hardware-software co-design[C]//Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture, Seoul, Feb 27-Mar 3, 2021. Piscataway: IEEE, 2021: 654-667. |