利用关键因子过滤的正则表达式匹配算法

doi:10.3778/j.issn.1673-9418.1507070

计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (3): 326-337.DOI: 10.3778/j.issn.1673-9418.1507070

利用关键因子过滤的正则表达式匹配算法

邱涛+，王斌，杨晓春

东北大学信息科学与工程学院，沈阳 110819

出版日期:2016-03-01 发布日期:2016-03-11

Regular Expression Matching Algorithm Using Pivotal Factors

QIU Tao+, WANG Bin, YANG Xiaochun

College of Information Science and Engineering, Northeastern University, Shenyang 110819, China

Online:2016-03-01 Published:2016-03-11

摘要/Abstract

摘要： 正则表达式（regular expression，RE）是一种能够提供复杂查询能力的技术，其通过特定的语法结构来描述一类文本的共同特征。正则表达式强大的表达能力和简洁的语法，使得其在各个领域都被广泛地应用。为了提高正则表达式的匹配效率，提出了一种利用关键因子进行过滤的匹配技术，关键因子指的是在文本中具有最小出现频率的有效过滤因子。由于实际文本中字符并不是均匀分布的，子串在文本中出现频率的差异将影响过滤因子的过滤能力。通过考虑有效过滤因子在文本中出现的频率，关键因子能获得更好的过滤能力。提出了利用正则表达式的划分来求取关键因子的算法，进而通过关键因子来过滤候选位置。通过在真实的蛋白序列和英文文本上进行实验，说明了基于关键因子过滤的匹配方法可以有效地提升正则表达式的匹配性能。

关键词: 正则表达式, 过滤策略, 算法性能, 关键因子

Abstract: Regular expression (RE) can describe complicated query, it depicts the common features of a set of text by the specific syntax. Since regular expression has the strong ability of expression and succinct syntax, it has been applied in many applications. In order to improve the matching performance of RE, this paper proposes a novel technique that pruning false negatives by utilizing pivotal factors, which are the valid filters with the least occurrences in text. Since the characters in text are not well-distributed, the difference of occurrences count of substrings will affect the pruning power of filters. Pivotal factors can achieve better pruning power by considering the occurrences count in text, which can also improve the matching performance of RE. This paper proposes the algorithm that computing pivotal factors by the partitions of RE, then further prunes the candidate positions by pivotal factors. Comprehensive experiments are conducted on real protein sequences and English text, the results show the significant performance improvement when utilizing the technique of pivotal factors.

Key words: regular expression, filtering strategy, performance, pivotal factors

邱涛，王斌，杨晓春. 利用关键因子过滤的正则表达式匹配算法[J]. 计算机科学与探索, 2016, 10(3): 326-337.

QIU Tao, WANG Bin, YANG Xiaochun. Regular Expression Matching Algorithm Using Pivotal Factors[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(3): 326-337.

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	95

来源	本网站	其他网站

次数	88	7
比例	93%	7%

摘要

205

最新录用	在线预览	正式出版

0	0	205

	来源	本网站

	次数	205
	比例	100%

利用关键因子过滤的正则表达式匹配算法

Regular Expression Matching Algorithm Using Pivotal Factors

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐 0

Metrics