计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (8): 736-746.DOI: 10.3778/j.issn.1673-9418.1212008

• 学术研究 • 上一篇    下一篇

基于多核处理器的VTD-XML解析性能优化

郭宪勇1,2,陈性元1,邓亚丹2+   

  1. 1. 解放军信息工程大学 电子技术学院,郑州 450001
    2. 北方信息技术研究所,北京 100072
  • 出版日期:2013-08-01 发布日期:2013-08-06

VTD-XML Parsing Performance Optimization Based on Chip Multiprocessor

GUO Xianyong1,2, CHEN Xingyuan1, DENG Yadan2+   

  1. 1. Institute of Electronic Technology, the PLA University of Information Engineering, Zhengzhou 450001, China
    2. Northern Institute of Information Technology, Beijing 100072, China
  • Online:2013-08-01 Published:2013-08-06

摘要: 针对目前主流的多核处理器,研究了XML(extensible markup language)处理过程中XML文档解析性能优化,从多线程并发执行和提高线程内存访问性能两个方面优化XML文档解析的性能,主要贡献如下:给出了多线程XML文档解析框架,该框架采用多线程执行XML文档的扫描,采用预读线程改善解析线程的内存访问性能;给出了XML文档数据划分算法和数据融合算法,保证了该框架XML文档扫描结果的正确性,且算法自身代价很小;给出了该框架的代价分析,然后基于该代价分析优化了框架的性能;在实验中,基于开源XML处理引擎VTD-XML(virtual token descriptor XML)实现了上述多线程执行框架,测试了XML文档解析的性能。实验结果表明,多线程XML文档解析框架充分利用了多核处理器的计算资源,有效提高了线程的内存访问性能和XML文档解析的性能。

关键词: VTD-XML, 多核处理器, 解析性能优化, 多线程

Abstract: Aiming to multi-core processor, this paper studies the performance optimization of XML (extensible markup language) parser in XML documents processing, from multithreading and improving memory access performance of threads to optimize XML parser. The main contributions are as follows: Firstly, the framework of multithread XML documents parsing is presented, the framework uses the multithread implementation for the XML document scanning, and then uses the preload-ahead thread to improve the memory access performance of the parsing thread; Secondly, the XML document data partitioning algorithm and data fusion algorithm are presented, these two algorithms can ensure the correctness of the results of an XML document scanning, and the algorithm itself has low cost; Thirdly, the cost analysis of the framework is also presented, based on the cost analysis to optimize the performance of the framework; Finally, in the experiment, the multithread execution framework is realized based on the open source XML processing engine VTD-XML (virtual token descriptor XML), and the performance of the XML document parsing is tested. The experimental results show that the proposed multithread XML document parsing framework can take advantage of multi-core processors computing resources, and effectively improve the memory access performance of threads and the parsing performance of XML documents.

Key words: virtual token descriptor XML(VTD-XML), chip multiprocessor, parsing performance optimization, multithread