[1] Tang C, Liu D, Xing Z C, et al. Memory access analysis of many-core system with abundant bandwidth[C]//Proceedings of the 9th IEEE International Symposium on Embedded Multi-core/Many-core Systems-on-Chip, Turin, Sep 23-25, 2015. Washington: IEEE Computer Society, 2015: 187-194.
[2] Wei S J, Liu L B, Yin S Y. Key techniques of reconfigurable computing processor[J]. Science in China: Information Scie-nces, 2012, 42(12): 1559-1576.魏少军, 刘雷波, 尹首一. 可重构计算处理器技术[J]. 中国科学: 信息科学, 2012, 42(12): 1559-1576.
[3] Song C, Ju L, Jia Z P. Hybrid scratchpad and cache memory management for energy-efficient parallel HEVC encoding[C]//Proceedings of the 33rd IEEE International Conference on Computer Design, New York, Oct 18-21, 2015. Washing-ton: IEEE Computer Society, 2015: 712-719.
[4] Ullah Z, Minallah N, Marwat S N K, et al. Performance analy-sis of cache size and set-associativity using simplescalar bench-mark[C]//Proceedings of the 5th International Conference on Advances in Electrical Engineering, Dhaka, Sep 26-28, 2019. Piscataway: IEEE, 2019: 440-447.
[5] Huang A W, Gao J, Zhang M X. Latency optimization techni-ques in non-uniform cache architecture for chip multi-processors: a survey[J]. Journal of Computer Research and Development, 2012, 49(S1): 118-124.黄安文, 高军, 张民选. 多核处理器非一致Cache体系结构延迟优化技术研究综述[J]. 计算机研究与发展, 2012, 49(S1): 118-124.
[6] Zhao X, Adileh A, Yu Z B, et al. Adaptive memory-side last-level GPU caching[C]//Proceedings of the 46th International Symposium on Computer Architecture, Phoenix, Jun 22-26, 2019. New York: ACM, 2019: 411-423.
[7] Wang G M, Ge J C, Yan Y H, et al. A data-sharing aware and scalable cache miss rates model for multi-core processors with multi-level cache hierarchies[C]//Proceedings of the IEEE 25th International Conference on Parallel and Distributed Sys-tems, Tianjin, Dec 4-6, 2019. Piscataway: IEEE, 2019: 267-274.
[8] Zhang D, Zhou Y Z, Zhang Y X. A multi-level cache frame-work for remote resource access in transparent computing[J]. IEEE Network, 2018, 32(1): 140-145.
[9] Shan R, Shen X B, Jiang L, et al. Design of distributed shared memory structure for array processor[J]. Journal of Beijing University of Posts and Telecommunications, 2017, 40(4): 9-15.山蕊, 沈绪榜, 蒋林, 等. 面向阵列处理器的分布式共享存储结构设计[J]. 北京邮电大学学报, 2017, 40(4): 9-15.
[10] Kim C, Burger D, Keckler S W. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches[C]//Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Opera-ting Systems, San Jose, Oct 5-9, 2002. New York: ACM, 2002: 211-222.
[11] Huang A W, Gao J, Guo W, et al. PSA-NUCA: a pressure self-adapting dynamic non-uniform cache architecture[C]//Proceed-ings of the 7th International Conference on Networking, Architecture, and Storage, Xiamen, Jun 28-30, 2012. Wash-ington: IEEE Computer Society, 2012: 181-188.
[12] Li J H, Li M M, Xue C J, et al. Thread criticality assisted replication and migration for chip multiprocessor caches[J]. IEEE Transactions on Computers, 2017, 66(10): 1747-1762.
[13] Chen H, Fu C H, Chan Y L, et al. Early intra block partition decision for depth maps in 3D-HEVC[C]//Proceedings of the 25th IEEE International Conference on Image Processing, Athens, Oct 7-10, 2018. Piscataway: IEEE, 2018: 1777-1781.
[14] Beckmann B M, Marty M R, Wood D A. ASR: adaptive selective replication for CMP caches[C]//Proceedings of the 39th Annual IEEE/ACM International Symposium on Micro-architecture, Orlando, Dec 9-13, 2006. Washington: IEEE Com-puter Society, 2006: 443-454.
[15] Jiang L, Cui P F, Shan R, et al. Design of distributed memory architecture for video array processor[J]. Computer Engi-neering and Applications, 2018, 54(12): 57-62.蒋林, 崔朋飞, 山蕊, 等. 视频阵列处理器多层次分布式存储结构设计[J]. 计算机工程与应用, 2018, 54(12): 57-62.
[16] Matthews E, Doyle N C, Shannon L. Design space exploration of L1 data caches for FPGA-based multiprocessor systems[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, Feb 22-24, 2015. New York: ACM, 2015: 156-159.
[17] Jiang L, Liu Y, Shan R, et al. RDMM: runtime dynamic migration mechanism of distributed cache for reconfigurable array processor[J]. Integration, 2020, 72: 82-91. |