计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (7): 887-896.DOI: 10.3778/j.issn.1673-9418.1410051

• 学术研究 • 上一篇    

Entropy-Beta:用于模式匹配众包方法中的发包策略

黄冬梅,许  坤+,张明华   

  1. 上海海洋大学 信息学院,上海 201306
  • 出版日期:2015-07-01 发布日期:2015-07-07

Entropy-Beta: A Strategy for Publishing Questions in Schema Matching via Crowdsourcing

HUANG Dongmei, XU Kun+, ZHANG Minghua   

  1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
  • Online:2015-07-01 Published:2015-07-07

摘要: 模式匹配是数据管理中的一个基础性问题。随着数据集的不断增长,使用自动化模式匹配工具能大大节省匹配时间,但其给出的结果带有不确定性,且难以消除。提出了一种用于模式匹配众包方法中的问题发布策略“Entropy-Beta”。该方法在发包阶段,对发包流程进行了优化,用于提高解决模式匹配不确定性问题的效率。在此基础上提供了对众包工作者答案精度评估的方法,用于提高解决模式匹配不确定性问题的精确度。最后通过实验证明,“Entropy-Beta”问题发包策略更加高效,并且在有限的成本下,运用该发包策略能提高解决问题的精确度。

关键词: 众包, 模式匹配, 熵, Beta分布

Abstract: Schema matching is a fundamental problem in data management. With the growing of data sets, using automated schema matching tool can greatly save matching time. However, the results of automated matching tools often have uncertainty, and this uncertainty is difficult to eliminate. This paper puts forward a strategy for publishing questions in schema matching via crowdsourcing: Entropy-Beta. This strategy optimizes the process of publishing in the phase of publishing questions, and this operate can improve the efficiency of pattern matching to resolve uncertainty. On this basis, this strategy provides a method to evaluate the answer accuracy from crowdsourcing workers, and this method can improve the accuracy of solving the uncertainty of pattern matching efficiently. Finally, the experimental results show that “Entropy-Beta” strategy of publishing questions is more efficient. Moreover, under the fixed cost, the proposed strategy can improve the accuracy of problem solving.

Key words: crowdsourcing, schema matching, entropy, Beta distribution