计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (9): 1221-1228.DOI: 10.3778/j.issn.1673-9418.1508026

• 学术研究 • 上一篇    下一篇

融合有向图集与并行架构的HEVC去块滤波

揭月馨,刘  浩+   

  1. 东华大学 信息科学与技术学院,上海 201620
  • 出版日期:2016-09-01 发布日期:2016-09-05

HEVC Deblocking Filter with Directed Graphs and Parallel Architecture

JIE Yuexin, LIU Hao+   

  1. College of Information Science and Technology, Donghua University, Shanghai 201620, China
  • Online:2016-09-01 Published:2016-09-05

摘要: 针对高效视频编码(high efficiency video coding,HEVC)的去块滤波,现有文献并没有深入研究其算法层和平台层之间的跨层并行实现机制。基于算法层的有向无环图集(directed acyclic graph set,DAGS)和平台层的通用并行计算架构(compute unified device architecture,CUDA),针对HEVC去块滤波提出了一种跨层并行解码方案。所提方案通过分离图像帧的独立像素区域来减少对缓存的访问,并且降低了HEVC滤波过程中的时序依赖性,便于多核平台的并行处理。通过实验比较“串行”、“DAGS+多核CPU”、“DAGS+GPU”3种不同的HEVC去块滤波方案,结果表明,所提“DAGS+GPU”跨层并行滤波方案平均取得了11~24倍的解码加速比,在保证率失真性能相当的情况下显著减少了解码时间。

关键词: 去块滤波, 有向无环图集, 并行处理, 多核平台, 通用并行计算架构

Abstract: For the deblocking filter of high efficiency video coding (HEVC), current literatures lack the in-depth research on the cross-layer parallel implementation between algorithm layer and platform layer. Based on the directed acyclic graph set (DAGS) at algorithm layer and the compute unified device architecture (CUDA) at platform layer, this paper proposes a cross-layer parallel decoding scheme for HEVC deblocking filter. The proposed scheme exploits the independent pixel regions to reduce cache accesses, and weakens the sequential dependence of filtering process to facilitate the parallel optimization. By evaluating three implementation schemes of HEVC deblocking filter: “serial”, “DAGS+multi-core CPU” and “DAGS+GPU”, the experimental results demonstrate that the proposed “DAGS+GPU” scheme can achieve the speedup as high as 11~24 times, and thus significantly save the decoding time while maintaining similar rate-distortion performance.

Key words: deblocking filter, directed acyclic graph set, parallel processing, multi-core platform, compute unified device architecture