[1] BARRETT R, BERRY M, CHAN T, et al. Templates for the solution of linear systems: building block for iterative methods[M]. Philadelphia: SIAM, 1994.
[2] BJ?RCK ?. Numerical methods in matrix computations[M]. Cham: Springer, 2015.
[3] BAI Z J, DAMMEL J, DONGARRA J, et al. Templates for the solution of algebraic eigenvalue problems: a practical guide[M]. Philadelphia: SIAM, 2000.
[4] SAAD Y. Numerical methods for large eigenvalue problems: revised edition[M]. Philadelphia: SIAM, 2011.
[5] SAAD Y. Iterative methods for sparse linear systems[M]. Phi-ladelphia: SIAM, 2003.
[6] ANDERSON E, BAI Z, BISCHOF C, et al. LAPACK users’ guide[M]. Philadelphia: SIAM, 1992.
[7] BLACKFORD L S, CHOI J, CLEARY A, et al. ScaLAPACK users’ guide[M]. Philadelphia: SIAM, 1997.
[8] BUTTARI A, LANGOU J, KURZAK J, et al. A class of pa-rallel tiled linear algebra algorithms for multicore archi-tectures[J]. Parallel Computing, 2009, 35: 38-53.
[9] DONGARRA J, GATES M, HAIDAR A, et al. PLASMA: parallel linear algebra software for multicore using OpenMP[J]. ACM Transactions on Mathematical Software, 2019, 45(2): 1-35.
[10] BOSILCA G, BOUTEILLER A, DANALIS A, et al. Sca-lable dense linear algebra on heterogeneous hardware[J]. Advances in Parallel Computing, 2013, 28: 65-103.
[11] GATES M, KURZAK J, CHARARA A, et al. SLATE: design of a modern distributed and accelerated linear algebra library[C]//Proceedings of the 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, Nov 17-19, 2019. New York: ACM, 2019: 1-18.
[12] LIU F F, MA W J, ZHAO Y W, et al. xMath2.0: a high-per-formance extended math library for SW26010-Pro many-core processor[J]. CCF Transactions on High Performance Computing, 2023, 5: 56-71.
[13] BALAY S, ABHYANKAR S, ADAMS M, et al. PETSc users manual(revision 3.15)[R]. 2021.
[14] HEROUX M A, BARTLETT R A, HOWLE V E, et al. An overview of the Trilinos project[J]. ACM Transactions on Mathematical Software, 2005, 31(3): 397-423.
[15] FALGOUT R D, JONES J E, YANG U M. The design and implementation of hypre, a library of parallel high per-formance preconditioners[M]//BRUASET A M, TVEITO A. Numerical Solution of Partial Differential Equations on Parallel Computers. Berlin, Heidelberg: Springer, 2006: 267-294.
[16] ANZT H, CHEN Y C, COJEAN T, et al. Towards con-tinuous benchmarking: an automated performance evalua-tion framework for high performance software[C]//Procee-dings of the Platform for Advanced Scientific Computing Conference, Zurich, Jun 12-14, 2019. New York: ACM, 2019: 1-11.
[17] LI X S. An overview of SuperLU: algorithms, implemen-tation, and user interface[J]. ACM Transactions on Mathe-matical Software, 2005, 31(3): 302-325.
[18] GHYSELS P, SYNK R. High performance sparse multifron-tal solvers on modern GPUs[J]. Parallel Computing, 2022, 110: 102897.
[19] YAMAMOTO Y. High-performance algorithms for nume-rical linear algebra[M]//GESHI M. The Art of High Perfor-mance Computing for Computational Science. Berlin, Hei-delberg: Springer, 2019: 113-136.
[20] BOSILCA G, BOUTEILLER A, DANALIS A, et al. DAGuE: a generic distributed DAG engine for high performance computing[J]. Parallel Computing, 2012, 38(1/2): 37-51.
[21] DEMMEL J, GRIGORI L, HOEMMEN M, et al. Commu-nication-optimal parallel and sequential QR and LU factori-zations[J]. SIAM Journal on Scientific Computing, 2012, 34: A206-A239.
[22] TAN L, KOTHAPALLI S, CHEN L, et al. A survey of power and energy efficient techniques for high performance nume-rical linear algebra operations[J]. Parallel Computing, 2014, 40(10): 559-573.
[23] ABDELFATTAH A, ANZT H, BOMAN E G, et al. A survey of numerical linear algebra methods utilizing mixed precision arithmetic[J]. International Journal of High Performance Computing Applications, 2021, 35(2): 109434202110033.
[24] HIGHAM N J, MARY T. Mixed precision algorithms in numerical linear algebra[J]. Acta Numerica, 2022, 31: 347-414.
[25] ELLIOTT J, HOEMMEN M, MUELLER F. Evaluating the impact of SDC on the GMRES iterative solver[C]//Pro-ceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, May 19-23, 2014. Washington: IEEE Computer Society, 2014: 1193-1202.
[26] YARKHAN A, KURZAK J, LUSZCZEK P, et al. Porting the PLASMA numerical library to the OpenMP standard[J]. International Journal of Parallel Programming,?2017, 45: 612-633.
[27] BOSILCA G, BOUTEILLER A, DANALIS A, et al. Flexible development of dense linear algebra algorithms on massi-vely parallel architectures with DPLASMA[C]//Procee-dings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, Anchorage, May 16-20, 2011. Piscataway: IEEE, 2011: 1432-1441.
[28] AGULLO E, AUMAGE O, FAVERGE M, et al. Achieving high performance on supercomputers with a sequential task-based programming model[J]. IEEE Transactions on Parallel and Distributed Systems, 2017. DOI: 10.1109/TPDS.2017.2766064.
[29] TOMOV S. MAGMA tutorial[R/OL]. (2020-02-03) [2023-03-21]. https://ecpannualmeeting.com/assets/overview/sessions/2020-magma-heffte-tutorial.pdf.
[30] GATES M, YARKHAN A, SUKKARI D, et al. Portable and efficient dense linear algebra in the beginning of the exas-cale era[C]//Proceedings of the 2022 IEEE/ACM Interna-tional Workshop on Performance, Portability and Produc-tivity in HPC, Dallas, Nov 13-18, 2022. Piscataway: IEEE, 2022: 36-46.
[31] ANZT H, BOMAN E, FALGOUT R, et al. Preparing sparse solvers for exascale computing[J]. Philosophical Transac-tions of the Royal Society of London Series A, 2020, 378: 20190053.
[32] BAVIER E, HOEMMEN M, RAJAMANICKAM S, et al. 2012 Amesos2 and Belos: direct and iterative solvers for large sparse linear systems[J]. Scientific Programming, 2012, 20: 241-255.
[33] EDWARDS H C, TROTT C R, SUNDERLAND D. Kokkos: enabling manycore performance portability through poly-morphic memory access patterns[J]. Journal of Parallel and Distributed Computing, 2014, 74(12): 3202-3216.
[34] BOOTH J D, ELLINGWOOD N D, THORNQUIST H K, et al. Basker: parallel sparse LU factorization utilizing hierar-chical parallelism and data layouts[J]. Parallel Computing,2017, 68: 17-31.
[35] KIM K, EDWARDS H C, RAJAMANICKAM S. Tacho: memory-scalable task parallel sparse Cholesky factorization[C]//Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, Van-couver, May 21-25, 2018. Washington: IEEE Computer So-ciety, 2018: 550-559.
[36] HEROUX M A, MCINNES L, LI S, et al.?ECP software technology capability assessment report: ORNL/TM-2022/2651[R]. Office of Science US Department of Energy. Office of Advanced Scientific Computing Research, 2022.
[37] STERCK H D, FALGOUT R D, NOLTING J W, et al. Dis-tance-two interpolation for parallel algebraic multigrid[J]. Numerical Linear Algebra with Applications, 2008, 15(2/3): 115-139.
[38] VASSILEVSKI P S, YANG U M. Reducing communication in algebraic multigrid using additive variants[J]. Numerical Linear Algebra with Applications, 2014, 21(2): 275-296.
[39] FALGOUT R D, SCHRODER J B. Non-Galerkin coarse grids for algebraic multigrid[J]. SIAM Journal on Scienti-fic Computing, 2014, 36(3): C309-C334.
[40] ALIAGA J I, ANZT H, GRüTZMACHER T, et al. Com-pressed basis GMRES on high-performance graphics proces-sing units[J]. The International Journal of High Performance Computing Applications, 2022: 1-18.
[41] FLEGAR G, ANZT H, COJEAN T, et al. Adaptive preci-sion Block-Jacobi for high performance preconditioning in the Ginkgo linear algebra software[J]. ACM Transactions on Mathematical Software, 2021, 47(2): 1-28.
[42] ANZT H, DONGARRA J, FLEGAR G, et al. Adaptive pre-cision in Block-Jacobi preconditioning for iterative sparse linear system solvers[J]. Concurrency and Computation: Practice and Experience, 2019, 31(6): e4460.
[43] ANZT H, RIBIZEL T, FLEGAR G, et al. ParILUT—a parallel threshold ILU for GPUs[C]//Proceedings of the 2019 IEEE International Parallel and Distributed Processing Sym-posium, Rio de Janeiro, May 20-24, 2019. Piscataway: IEEE, 2019: 231-241.
[44] DONGARRA J, GRIGORI L, HIGHAM N J. Numerical algorithms for high-performance computational science[J]. Philosophical Transactions of the Royal Society A, 2020, 378: 20190066.
[45] ABDELFATTAH A, ANZT H, AYALA A, et al. Advances in mixed precision algorithms: 2021 edition: SAND2021-10227R 698286[R]. 2021.
[46] CHARARA A, DONGARRA J, GATES M, et al. SLATE mixed precision performance report: ICL-UT-19-03[R]. University of Tennessee, 2019.
[47] BARRON D W, SWINNERTON-DYER H P F. Solution of simultaneous linear equations using a magnetic-tape store[J]. The Computer Journal, 1960, 3(1): 28-33.
[48] ALOMAIRY R, GATES M, CAYROLS S, et al. Commu-nication avoiding LU with tournament pivoting in SLATE:18, ICL-UT-22-01[R]. 2022.
[49] GRIGORI L, DEMMEL J W, XIANG H. Communication avoiding Gaussian elimination[C]//Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Austin, Nov 15-21, 2008. Piscataway: IEEE, 2008: 29.
[50] GRIGORI L, DEMMEL J W, XIANG H. CALU: a com-munication optimal LU factorization algorithm[J]. SIAM Journal on Matrix Analysis and Applications, 2011, 32(4):1317-1350.
[51] SAO P, VUDUC R, LI X. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems[J]. Journal of Parallel Distributed Computing, 2019, 131: 218-234.
[52] DING N, WILLIAMS S,?LIU Y, et al. Leveraging one-sided communication for sparse triangular solvers[C]//Pro-ceedings of the 2020 SIAM Conference on Parallel Pro-cessing for Scientific Computing, Seattle, Feb 12-15, 2020. Philadelphia: SIAM, 2020: 93-105. |