Journal of Frontiers of Computer Science and Technology ›› 2013, Vol. 7 ›› Issue (1): 83-91.DOI: 10.3778/j.issn.1673-9418.1207018

Previous Articles     Next Articles

Heuristic Algorithm for Approximate Pattern Matching Problem

HUANG Guolin, GUO Dan, HU Xuegang   

  1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
  • Online:2013-01-01 Published:2012-12-29

求解近似模式匹配的启发式算法

黄国林,郭  丹+,胡学钢   

  1. 合肥工业大学 计算机与信息学院,合肥 230009

Abstract:

This paper studies APMWL (approximate pattern matching with wildcards and length constraint) problem. In order to avoid the exponential growth of matching patterns, the paper introduces the one_off condition, and proposes a heuristic algorithm BAPM (backward approximate pattern matching) which constructing the edit distance matrix backwardly. Based on one_off condition, flexible wildcards and length constraint, BAPM can simultaneously process three edit operations, namely insertion, replacement and deletion. The experimental results show that BAPM has a significant advantage on matching solutions compared with Sail-Approx, and the average improvement rate of matching is up to 18.99%.

Key words: approximate pattern matching, wildcards, length constraints, edit distance matrix, one_off condition

摘要:

研究了带有灵活通配符和长度约束的近似模式匹配问题(approximate pattern matching with wildcards and length constraint,APMWL);为避免文本字符重复使用造成解的指数级增长,引入了一次性使用原则one_off条件,提出了一种后向构造编辑距离矩阵的BAPM(backward approximate pattern matching)算法。该算法在one_off条件、灵活通配符和长度约束条件的基础上,可同时处理插入、替换和删除三种编辑操作。与同类算法Sail_Approx进行实验对比,结果表明BAPM算法获取解的平均增长率可达18.99%,具备良好的解优势。

关键词: 近似匹配, 通配符, 长度约束, 编辑距离矩阵, one_off条件