高能效CNN加速器设计

doi:10.3778/j.issn.1673-9418.2411020

摘要/Abstract

摘要： 当前，卷积神经网络（Convolutional Neural Networks，CNN）被广泛应用于图片分类、目标检测与识别以及自然语言理解等领域。随着卷积神经网络的复杂度和规模不断增加，对硬件部署带来了极大的挑战，尤其是面对嵌入式应用领域的低功耗、低时延需求，大多数现有平台存在高功耗，控制复杂的问题。为此，该文以优化加速器能效为目标，对决定系统能效的关键因素进行分析，以缩放计算精度和降低系统频率为主要出发点，研究极低比特下全网络统一量化方法，设计一种高能效CNN加速器，该加速器以1比特权重和4比特激活值的轻量化计算单元为基础，构建了128×128空间并行加速阵列结构，由于空间并行度高，因此整个系统采用低运行频率；同时，采用权重固定、特征图广播的数据传播方式，有效减少权重、特征图的数据搬移次数，达到降低功耗，提高系统能效比的目的。通过22nm工艺流片验证，结果表明，在20MHz频率下，峰值算力达到10.54 TOPS（Tera Operations Per Second，TOPS），能效比达到64.317 TOPS/W，相较同类型加速器，该文加速器能效比有5倍的提升。同时，部署的目标检测网络能够达到60 FPS（Frames Per Second，FPS）的检测速率，完全满足嵌入式应用需求。

关键词: 加速器, 卷积神经网络（CNN）, 轻量化神经元计算单元（NCU）, 空间并行加速阵列（MSNAP）, 分支卷积量化

Abstract: Recently, the Convolutional Neural Network (CNN) has been widely used in image classification, object detection and recognition, and natural language processing. Due to the increasing complexity and scale of CNN, hardware realization is facing greater challenges, particularly for embedded systems with low power and latency requirements. Among them, the high throughput requirement coming from processing hundreds of filters in high-dimensional convolutions is foremost. Although high-parallel compute arrays address the high throughput requirement, energy consumption from the huge amount of data movement and resource consumption from high-parallel arrays remain unacceptable for embedded application scenarios，Not to mention the increased complexity in control that comes with it. This paper aims to optimize the energy efficiency of accelerators by analyzing the key factors that determine system energy efficiency. Therefore, focusing on reducing energy consumption and resource consumption of the system, this paper proposes a high-performance CNN accelerator, called MSNAP. This is realized by 128×128 high-parallel compute arrays and bases on a unified quantization method for the entire network with extremely low bit-width. To ensure that the data of input images and last layer are the same bit-width as the middle layers, we adopt thermometer codes and Branch Convolution Quantization methods, that allows integrating all memory on-chip and make the implementation of a large-scale array easier. MSNAP features an efficiencies lightweight computation neuron which composes of 1152 multiplicative cells. In addition to compressed memory storage, the unified quantization method simplifies multiplications to Multiplexer, which drastically reduces resource consumption. A weight-stationary, data-parallel dataflow and the optimization of the pooling layer improve data utilizations, which minimize data movement energy consumption of MSNAP. To evaluate the efficiencies of this design, and demonstrate the high-parallel compute arrays, a CNN chip in 22-nm CMOS was realized. Experiments show that at the frequency of 20MHz, the chip offers a peak throughput of 10.54 Tera Operations Per second(TOPS) and the efficiencies up to 64.317TOPS/W, the corresponding efficiencies amounts to a 5x improvement over the previous benchmark on CIFAR-10. Meanwhile, the design operates at 60frames/s (FPS) on YOLO, fully meets the needs of embedded applications.

Key words: Accelerator, Convolutional Neural Networks (CNN), lightweight computation neuron (NCU), high-parallel array (MSNAP), Branch convolution quantization

喇超, 李淼, 张峰, 张翠婷. 高能效CNN加速器设计[J]. 计算机科学与探索, DOI: 10.3778/j.issn.1673-9418.2411020.

LA Chao, LI Miao, ZHANG Feng, ZHANG Cuiting. MSNAP: A High Efficiencies CNN Accelerator[J]. Journal of Frontiers of Computer Science and Technology, DOI: 10.3778/j.issn.1673-9418.2411020.

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	7	0	0

	来源	本网站

	次数	7
	比例	100%

摘要

最新录用	在线预览	正式出版

10	0	0

	来源	本网站

	次数	10
	比例	100%

[1]	郭佳霖, 智敏, 殷雁君, 葛湘巍. 图像处理中CNN与视觉Transformer混合模型研究综述[J]. 计算机科学与探索, 2025, 19(1): 30-44.
[2]	利建铖, 曹路, 何锡权, 廖军红. CT影像下的肺结节分类方法研究综述[J]. 计算机科学与探索, 2024, 18(7): 1705-1724.
[3]	考文涛, 李明, 马金刚. 卷积神经网络在结直肠息肉辅助诊断中的应用综述[J]. 计算机科学与探索, 2024, 18(3): 627-645.
[4]	陈加兴, 胡志伟, 李茹, 韩孝奇, 卢江, 闫智超. 融合描述信息和结构特征的知识图谱链接预测[J]. 计算机科学与探索, 2024, 18(2): 486-495.
[5]	王兵, 黄刚, 张兴鹏. 融合卷积特征的清晰边缘检测研究[J]. 计算机科学与探索, 2023, 17(9): 2148-2160.
[6]	徐光宪, 冯春, 马飞. 基于UNet的医学图像分割综述[J]. 计算机科学与探索, 2023, 17(8): 1776-1792.
[7]	贾天豪, 彭力, 戴菲菲. 引入残差学习与多尺度特征增强的目标检测器[J]. 计算机科学与探索, 2023, 17(5): 1102-1111.
[8]	王燕, 吕艳萍. 混合深度CNN联合注意力的高光谱图像分类[J]. 计算机科学与探索, 2023, 17(2): 385-395.
[9]	李杰, 瞿中. 深度学习在手指静脉识别中的应用研究综述[J]. 计算机科学与探索, 2023, 17(11): 2557-2579.
[10]	安凤平, 李晓薇, 曹翔. 权重初始化-滑动窗口CNN的医学图像分类[J]. 计算机科学与探索, 2022, 16(8): 1885-1897.
[11]	程卫月, 张雪琴, 林克正, 李骜. 融合全局与局部特征的深度卷积神经网络算法[J]. 计算机科学与探索, 2022, 16(5): 1146-1154.
[12]	童敢, 黄立波. Winograd快速卷积相关研究综述[J]. 计算机科学与探索, 2022, 16(5): 959-971.
[13]	裴利沈, 赵雪专. 群体行为识别深度学习方法研究综述[J]. 计算机科学与探索, 2022, 16(4): 775-790.
[14]	陆仲达, 张春达, 张佳奇, 王子菲, 许军华. 双分支网络的苹果叶部病害识别[J]. 计算机科学与探索, 2022, 16(4): 917-926.
[15]	李志欣, 陈圣嘉, 周韬, 马慧芳. 协同级联网络和对抗网络的目标检测[J]. 计算机科学与探索, 2022, 16(1): 217-230.

高能效CNN加速器设计

MSNAP: A High Efficiencies CNN Accelerator

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐 0

Metrics