计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (8): 1842-1851.DOI: 10.3778/j.issn.1673-9418.2207069

• 理论·算法 • 上一篇    下一篇

高效多分支预测器设计与实现

杨凌,周锦文,王京,兰孟桥,丁梓坚,杨实,王永文,黄立波   

  1. 国防科技大学 计算机学院,长沙 410073
  • 出版日期:2023-08-01 发布日期:2023-08-01

Design and Implementation of Efficient Multi-branch Predictor

YANG Ling, ZHOU Jinwen, WANG Jing, LAN Mengqiao, DING Zijian, YANG Shi, WANG Yongwen, HUANG Libo   

  1. School of Computer, National University of Defense Technology, Changsha 410073, China
  • Online:2023-08-01 Published:2023-08-01

摘要: 分支预测是保证处理器性能的重要技术,尤其在当今广泛应用的超标量处理器中,分支预测器的各项属性极大地影响着处理器的整体性能、功耗和面积。为了在超标量处理器中获得具有较高性价比的分支预测器,尝试使用了TAGE预测器对取指宽度内的所有分支进行预测,并利用分支预测竞赛平台对预测器的理想性能进行了评估,发现其预测能力是足以满足预测条件的。但在实践过程中发现多分支取指时分支预测器和分支目标缓存内均会存在冲突的情况,这严重影响了预测器的性能。为了解决以上问题,在单个TAGE分支预测器的基础上增加了额外的预测通路,独立地保存和预测额外的分支指令信息。并利用硬件描述语言在超标量处理器中实现了这一预测器,同时将其与单个TAGE分支预测器进行了嵌入式处理器常用基准程序dhrystone、coremark和embench的性能对比实验。实验结果表明,优化后的分支预测器性能提高了14.1个百分点,而存储开销只增加了9.06%。最后通过实验数据分析,发现这一方案不仅有利于额外的分支指令预测,而且可以通过更加准确的分支历史信息获取实现更加准确的单分支取指预测。

关键词: 分支预测, TAGE, 嵌入式, 超标量, 处理器

Abstract: Branch prediction is a momentous technology guarantee for processor performance, especially for the widely used superscalar processor. The properties of the branch predictor significantly affect the overall performance, power consumption, and area of the processor. To obtain a more cost-effective branch predictor in the superscalar processor, an attempt is made to use a single TAGE (tagged geometric history length) predictor to predict the branches within the fetch width. The championship branch prediction platform is used to evaluate the performance of the predictor, and its prediction ability is sufficient to meet the prediction conditions. However, in practice, conflicts in both the predictor and branch target buffer affect its performance. To solve the above problem, this paper adds additional prediction paths based on a single TAGE branch predictor and independently saves and predicts additional branch instruction information. This predictor is implemented in the processor using hardware description language and compared with a single TAGE branch predictor to perform standard benchmark programs for embedded processors, dhrystone, coremark and embench. Experimental results show that the performance of the optimized branch predictor is improved by 14.1 percentage points, while the storage overhead is only increased by 9.06%. Finally, through the analysis of the experimental data, it is found that this scheme is not only conducive to the prediction of additional branch instructions, but also can achieve more accurate prediction of single branch instruction through more accurate acquisition of branch history information.

Key words: branch prediction, tagged geometric history length (TAGE), embedded, superscalar, processor