计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (12): 2028-2038.DOI: 10.3778/j.issn.1673-9418.2002044

• 学术研究 • 上一篇    下一篇

改进的阵列处理器数据Cache实时动态迁移机制

冯雅妮,蒋林,山蕊,刘阳,张园   

  1. 1. 西安邮电大学 电子工程学院,西安 710121
    2. 西安科技大学 集成电路实验室,西安 710054
    3. 西安邮电大学 计算机学院,西安 710121
  • 出版日期:2020-12-01 发布日期:2020-12-11

Improved Real-Time Dynamic Migration Mechanism of Array Processor Data Cache

FENG Yani, JIANG Lin, SHAN Rui, LIU Yang, ZHANG Yuan   

  1. 1. School of Electronic Engineering, Xi'an University of Posts & Telecommunications, Xi'an 710121, China
    2. Laboratory of Integrated Circuit, Xi'an University of Science and Technology, Xi'an 710054, China
    3. School of Computer, Xi'an University of Posts & Telecommunications, Xi'an 710121, China
  • Online:2020-12-01 Published:2020-12-11

摘要:

片上分布式存储结构满足了阵列处理器对访存提出的高并行性要求,一定程度上缓解了“存储墙”问题。但是,在远程访问情况下,分布式存储结构存在的长延迟问题仍然十分突出。针对该问题,设计了一种改进的基于分布式数据Cache的实时动态迁移机制,采用四级全互连和迁移互连,以数据访问频率为依据对远程数据进行动态调度,有效降低了远程访存的延迟。并基于阵列处理器分布式Cache结构,通过运动补偿等典型算法的并行实现,对所提出的实时动态迁移机制进行全面验证测试。实验结果表明,采用实时动态迁移机制的分布式Cache在166.9 MHz的工作频率下,最高可提供10.68 GB/s的访存带宽。与同类结构相比,远程访问延迟降低了46.5%。

关键词: 阵列处理器, 分布式Cache, 动态迁移, Cache一致性

Abstract:

The on-chip distributed storage structure satisfies the high parallelism requirements of the array processor for memory access, and alleviates the problem of memory wall to some extent. However, in the case of remote access, the long latency problem of distributed storage structure is still very severe. Aiming at this problem, an improved real-time dynamic migration mechanism based on distributed data Cache is designed. It uses four-level fully interconnection and migration interconnection to dynamically schedule remote data based on data access frequency, effectively reducing the delay of remote access. Based on the distributed Cache structure of the array processor, the proposed real-time dynamic migration mechanism is verified by parallel implementation of typical algorithms such as motion compensation. The experimental results show that the distributed Cache with the real-time dynamic migration mechanism can provide data access bandwidth up to 10.68 GB/s at the operating frequency of 166.9 MHz. Compared to similar architectures, remote access latency is reduced by 46.5%.

Key words: array processor, distributed Cache, dynamic migration, Cache consistency