Journal of Frontiers of Computer Science and Technology ›› 2010, Vol. 4 ›› Issue (2): 145-152.DOI: 10.3778/j.issn.1673-9418.2010.02.006

• 学术研究 • Previous Articles     Next Articles

XSLC: Layered Coding and Query-Oriented XML Data Compression Algorithm

FU Qiang1,2+, WANG Tengjiao1,2, LI Hongyan1,3, YANG Dongqing1,2, TANG Shiwei1,3   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    2. Key Laboratory of High Confidence Software Technologies, MOE, Peking University, Beijing 100871, China
    3. Key Laboratory of Machine Perception, MOE, Peking University, Beijing 100871, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-02-15 Published:2010-02-15
  • Contact: FU Qiang

XSLC:分层编码并面向查询的XML数据压缩算法

付 强1,2+,王腾蛟1,2,李红燕1,3,杨冬青1,2,唐世渭1,3   

  1. 1. 北京大学 信息科学技术学院,北京 100871
    2. 北京大学 高可信软件技术教育部重点实验室,北京 100871
    3. 北京大学 机器感知与智能教育部重点实验室,北京 100871
  • 通讯作者: 付 强

Abstract: XML documents have been widely used as a data exchange format. XML (extensible markup language) data compression technology has become a new field of research. A compression method called XSLC (XML stream layered-coding compression) is proposed to compress and decompress XML stream in real time. When DTD (document type definition) is available, XSLC can analyze the data model and encode elements according to the relationship of father node and son node, compress data part according to its type, and support query operations app-lied on compressed files, as for only one time of scanning data is needed, all the processes can be implemented in XML data stream environment. Experimental results show that XSLC outperforms other methods in compression ratio and compression efficiency.

Key words: extensible markup language(XML), compression, document type definition(DTD), data stream

摘要: XML(extensible markup language)文档已经被广泛用作应用程序的一个数据交换格式,针对XML数据的压缩技术也逐渐成为新的研究领域。提出XSLC(XML stream layered-coding compression)算法,通过预先扫描DTD对数据模式进行分析,继而根据元素的父子关系进行子元素层面的编码;同时根据数据类型进行数据压缩,能够在压缩之后的文档上进行查询,因为仅需一遍压缩扫描所以可以应用于数据流环境。实验表明:XSLC算法的压缩比率和压缩时间均优于传统算法。

关键词: 可扩展标记语言, 压缩, 文档类型定义, 数据流

CLC Number: