计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (8): 1017-1024.DOI: 10.3778/j.issn.1673-9418.1310030

• 人工智能与模式识别 • 上一篇    

音形结合的方块苗文输入编码方案研究

莫礼平1,2+,曾水玲1,周恺卿3   

  1. 1. 吉首大学 信息科学与工程学院,湖南 吉首 416000
    2. 中南大学 信息科学与工程学院,长沙 410083
    3. 马来西亚理工大学 计算学院,马来西亚 柔佛州 士古来 81310
  • 出版日期:2014-08-01 发布日期:2014-08-07

Research on Encoding Scheme Combined with Tone and Shape for Inputting Square Hmong Language Characters

MO Liping1,2+, ZENG Shuiling1, ZHOU Kaiqing3   

  1. 1. College of Information Science and Engineering, Jishou University, Jishou, Hunan 416000, China
    2. Institute of Information Science and Engineering, Central South University, Changsha 410083, China
    3. Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor 81310, Malaysia
  • Online:2014-08-01 Published:2014-08-07

摘要: 根据方块苗文的造字原理和字形拓扑结构特征,提出了一种由构件汉语拼音的部分字母决定音码,由合体字结构类型决定形码,按照“先音后形”的次序生成编码序列的方块苗文字形输入编码方案,并使用上下文无关文法对方案进行了形式化描述,给出了方块苗文拆分取码的方法。测试实验表明,该方案具有码长短、重码率低的特点,基于该方案的输入法简捷快速、易学易用,能够解决从方块苗文字库中快速调出所需字形的问题。

关键词: 方块苗文, 字形, 拓扑结构, 输入法, 编码方案

Abstract: According to the principle of creating characters and the glyph topology features of the square Hmong language characters, this paper puts forward an encoding scheme of glyph input. In this scheme, both the tone of components and the shape of compound characters are considered, a few letters of component pinyin decide tone codes, and the structure type of compound characters determines shape code. The input code sequence is generated in accordance with the order of “first tone last shape”. Then, this paper describes the scheme by using context-free grammar, and gives the method of splitting compound characters and getting code. The test results illustrate that this scheme has the advantages of shorter code length, lower probability of occurring same code. The input method based on this encoding scheme, is simple and fast, is easy to learn and use, and can solve the problem of calling up the desired glyph quickly from fonts.

Key words: square Hmong language characters, glyph, topology structure, input method, encoding scheme