计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (1): 83-91.DOI: 10.3778/j.issn.1673-9418.1207018

• 学术研究 • 上一篇    下一篇

求解近似模式匹配的启发式算法

黄国林,郭  丹+,胡学钢   

  1. 合肥工业大学 计算机与信息学院,合肥 230009
  • 出版日期:2013-01-01 发布日期:2012-12-29

Heuristic Algorithm for Approximate Pattern Matching Problem

HUANG Guolin, GUO Dan, HU Xuegang   

  1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
  • Online:2013-01-01 Published:2012-12-29

摘要:

研究了带有灵活通配符和长度约束的近似模式匹配问题(approximate pattern matching with wildcards and length constraint,APMWL);为避免文本字符重复使用造成解的指数级增长,引入了一次性使用原则one_off条件,提出了一种后向构造编辑距离矩阵的BAPM(backward approximate pattern matching)算法。该算法在one_off条件、灵活通配符和长度约束条件的基础上,可同时处理插入、替换和删除三种编辑操作。与同类算法Sail_Approx进行实验对比,结果表明BAPM算法获取解的平均增长率可达18.99%,具备良好的解优势。

关键词: 近似匹配, 通配符, 长度约束, 编辑距离矩阵, one_off条件

Abstract:

This paper studies APMWL (approximate pattern matching with wildcards and length constraint) problem. In order to avoid the exponential growth of matching patterns, the paper introduces the one_off condition, and proposes a heuristic algorithm BAPM (backward approximate pattern matching) which constructing the edit distance matrix backwardly. Based on one_off condition, flexible wildcards and length constraint, BAPM can simultaneously process three edit operations, namely insertion, replacement and deletion. The experimental results show that BAPM has a significant advantage on matching solutions compared with Sail-Approx, and the average improvement rate of matching is up to 18.99%.

Key words: approximate pattern matching, wildcards, length constraints, edit distance matrix, one_off condition