计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (11): 984-995.DOI: 10.3778/j.issn.1673-9418.2010.11.003

• 学术研究 • 上一篇    下一篇

支持带有通配符的字符串匹配算法*

运正佳, 李轶男, 杨晓春+   

  1. 东北大学 信息科学与工程学院, 沈阳 110819
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-11-01 发布日期:2010-11-01
  • 通讯作者: 杨晓春

An Algorithm for Matching Strings with Wildcards*

YUN Zhengjia, LI Yinan, YANG Xiaochun+   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-11-01 Published:2010-11-01
  • Contact: YANG Xiaochun

摘要: 研究了查询字符串中含有通配符“*”以及“?”两种情况下的字符串匹配问题, 其中,“*”代表任意长度的字符串,“?”代表字母表中任意一个字符。由于gram索引结构在空间大小以及查询效率上的优势, 将gram索引结构用于带通配符的字符串匹配问题。通过将带有通配符的查询字符串分解为若干不含通配符的查询片段, 成功地将带有通配符的复杂查询问题转化为不含通配符的简单精确子串匹配问题。同时在片段查询过程中运用长度过滤、位置过滤以及计数过滤等方法来提高查询速度。

关键词: 通配符, 字符串匹配, q-gram索引

Abstract: This paper focuses on the problem of strings matching with wildcards “*” and “?” in the query, where “*” matches any sequence and “?” matches any character. Since gram based index structure has advantages in both space and searching time, it proposes an algorithm to solve the problem of strings matching with wildcards based on gram index structure. The query string with wildcards is divided into several query segments without any wildcards. So the algorithm successfully changes this complex problem to a simple exact substring matching problem. The algo-rithm takes advantage of length filter, position filter, and count filter to speed up the query process

Key words: wildcard, matching strings, q-gram index

中图分类号: