面向训练阶段的神经网络性能分析

doi:10.3778/j.issn.1673-9418.1711010

计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (10): 1645-1657.DOI: 10.3778/j.issn.1673-9418.1711010

面向训练阶段的神经网络性能分析

李景军+，张宸，曹强

华中科技大学武汉光电国家研究中心，武汉 430074

出版日期:2018-10-01 发布日期:2018-10-08

Analyzing Performance of Neural Networks in Training Phase

LI Jingjun+, ZHANG Chen, CAO Qiang

Key Laboratory of Information Storage System, Ministry of Education of China. Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China

Online:2018-10-01 Published:2018-10-08

摘要/Abstract

摘要： 最近，神经网络被广泛应用到许多领域。然而，随着神经网络模型越来越复杂，图形处理单元（graphics processing unit，GPU）被应用到深度学习中。GPU在加速矩阵计算方面展现出了卓越的性能，但是多样和复杂的神经网络模型导致网络训练阶段GPU的计算资源和显存并没有充分利用。对神经网络训练阶段进行细粒度的性能分析。首先从数据流的角度把训练过程分解为6个阶段，并测试每个阶段的延时；然后从GPU加速库、神经网络模型和批次三方面量化分析每一层的GPU计算效率和资源利用率；最后分析每层的参数和特征图的显存占用情况。实验发现：（1）cuDNN库卷积的计算效率是cuBLAS库的2倍。（2）卷积层的资源利用率比全连接层高50%。（3）不同层的显存利用率差异很大，整体利用率不高，最大不超过显存的20%。

关键词: 网络模型, 图形处理单元（GPU）, 资源利用率, 计算效率, 数据流, GPU加速库

Abstract: Recently, the neural networks have increasingly delopyed in many fields. However, as complexity of neural networks grows, graphics processing units (GPUs) begin to be applied in deep learning. Though GPUs have exhibited excellent performance on accelerating matrix multiplication, the real computing resources and memory resources of GPUs have not been fully utilized in the compute-intensive neural network training phase due to the complexity and diversity of network models. This paper focuses on doing an experimental and fine-grained performance analysis for deep neural network models. First, it divides the training phase into six stages in the sight of data flow and measures the latency of each stage. And then, it presents a quantitative analysis for GPU compute efficiency and resource utilization in each layer from point of views of GPU-accelerated libraries, neural network models, and batch size. Finally, weights and feature maps of each layer are given quantitatively to reveal the GPU memory utilization. These experiments and analysis show that (1) The compute efficiency of cuDNN in convolution layers is 2 times than cuBLAS. (2) The resource utilization of convolution layers is 50% higher than full-connected layers. (3) The GPU memory utilization in different layers are varied, and the overall utilization is not high, no more than 20% of the total memory space.

Key words: network models, graphics processing unit (GPU), resource utilization, compute efficiency, data flow, GPU-accelerated library

李景军，张宸，曹强. 面向训练阶段的神经网络性能分析[J]. 计算机科学与探索, 2018, 12(10): 1645-1657.

LI Jingjun, ZHANG Chen, CAO Qiang. Analyzing Performance of Neural Networks in Training Phase[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(10): 1645-1657.

120

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	120

来源	本网站	其他网站

次数	103	17
比例	86%	14%

摘要

285

最新录用	在线预览	正式出版

0	0	285

	来源	本网站

	次数	285
	比例	100%

[1]	杨茸，牛保宁. 空间文本数据流上连续查询评估技术综述[J]. 计算机科学与探索, 2021, 15(4): 631-640.
[2]	武炜杰，张景祥. 有新类的动态数据流分类算法研究[J]. 计算机科学与探索, 2021, 15(1): 132-140.
[3]	韩明明，孙广路，朱素霞. 自适应概念漂移问题的增量集成分类算法[J]. 计算机科学与探索, 2020, 14(7): 1200-1210.
[4]	江淼淼，孙更新，宾晟. 多关系社交网络中社团结构发现算法[J]. 计算机科学与探索, 2019, 13(7): 1134-1144.
[5]	许萌，鲍安平，吕湛山. 改进的ABE在公有云存储访问控制中的研究[J]. 计算机科学与探索, 2019, 13(3): 437-445.
[6]	王之琼，霸建民，黄达，信俊昌. 数据流中ρ-支配轮廓查询算法[J]. 计算机科学与探索, 2017, 11(7): 1080-1091.
[7]	王曦杨，程春玲，陈兴国 . 面向可视化的全局自适应等距映射算法[J]. 计算机科学与探索, 2017, 11(7): 1092-1101.
[8]	程彬，李大力，徐传福，刘巍，王光学，邓小刚. 面向高阶精度CFD的JFNK算法及其并行计算[J]. 计算机科学与探索, 2017, 11(1): 61-69.
[9]	刘三民，王忠群，刘涛，修宇. 融合互近邻降噪的动态数据流分类研究[J]. 计算机科学与探索, 2016, 10(1): 36-42.
[10]	谢晴晴，王良民. 基于PMD的外包数据流范围查询验证方案[J]. 计算机科学与探索, 2015, 9(10): 1209-1218.
[11]	姜元凯，郑洪源. 基于粗糙模糊集的不确定数据流聚类算法[J]. 计算机科学与探索, 2014, 8(12): 1494-1501.
[12]	何炎祥，陈勇，吴伟，李清安，江南，徐超. 绿色编译优化策略：研究综述[J]. 计算机科学与探索, 2013, 7(8): 673-690.
[13]	张玉红，胡学钢，张娟. 倾斜数据流中正例样本的漂移检测方法[J]. 计算机科学与探索, 2013, 7(6): 545-550.
[14]	张艳梅, 姜淑娟, 王庆坛, 赵雪峰. 不可达基路径的静态检测方法[J]. 计算机科学与探索, 2012, 6(2): 144-155.
[15]	王广东，王意洁，李小勇，王媛. 不确定数据流上的并行Skyline查询算法[J]. 计算机科学与探索, 2012, 6(12): 1116-1125.

面向训练阶段的神经网络性能分析

Analyzing Performance of Neural Networks in Training Phase

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐 0

Metrics