计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (4): 385-396.DOI: 10.3778/j.issn.1673-9418.1309012

• 学术研究 • 上一篇    下一篇

核分组的多核处理器优化方法

李国红1,汪东升2+,刘振宇2,李崇民1,刘根贤1,郭三川1   

  1. 1. 清华大学 计算机科学与技术系,北京 100084
    2. 清华大学 信息科学与技术国家实验室,北京 100084
  • 出版日期:2014-04-01 发布日期:2014-04-03

Grouping Cores for Chip Multiprocessors Optimization

LI Guohong1, WANG Dongsheng2+, LIU Zhenyu2, LI Chongmin1, LIU Genxian1, GUO Sanchuan1   

  1. 1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    2. Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
  • Online:2014-04-01 Published:2014-04-03

摘要: 随着多核处理器规模的扩大,请求数据的处理器核到数据的宿主节点之间的平均距离相应增大,并且数据访问在分布式共享高速缓存块中的分布并不均衡引起了网络热点。这些情况导致一级高速缓存缺失延迟的增大。为了解决该问题,将每四个处理器核分为一组,在组内设计邻近数据探测器。邻近数据探测器通过确定一次缺失能否在邻近核的一级高速缓存中得到数据,从而利用了并行程序在多核处理器上执行时数据访问的核间局部性。另外,根据新的结构相应优化了高速缓存一致性协议。实验表明,该片上存储优化方法提高了系统性能,减少了片上网络流量,节省了能耗。

关键词: 多核处理器, 高速缓存, 片上网络

Abstract: In chip multiprocessors (CMP), as the number of cores increases, the average distance between the requestors and the home nodes becomes longer, and certain hot nodes are incurred by the unbalanced accesses to the different banks of the distributed share cache. These cases lead to the higher average latency of L1 cache misses. To conquer this problem, this paper divides the cores into groups of 2×2 nodes, and introduces the neighboring data prober (NDP). By deciding if a miss can be served by the L1 cache of a neighbor node, NDP can leverage the node-level spatial locality of the data accesses of parallel programs. Also, this paper optimizes the coherence protocol for the new architecture. The evaluation results illustrate that the proposed cache optimization improves the performance, lowers the network traffic and saves energy.

Key words: chip multiprocessors, cache, network on chip