计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (12): 2678-2694.DOI: 10.3778/j.issn.1673-9418.2207104

• 综述·探索 • 上一篇    下一篇

RFID数据清洗技术研究进展

王健1,+(), 乐嘉锦2   

  1. 1.河南财经政法大学 计算机与信息工程学院,郑州 450046
    2.东华大学 计算机科学与技术学院,上海 201620
  • 收稿日期:2022-06-06 修回日期:2022-08-10 出版日期:2022-12-01 发布日期:2022-12-16
  • 通讯作者: +E-mail: goodjian121@126.com
  • 作者简介:王健(1981—),男,河南安阳人,博士,硕士生导师,主要研究方向为大数据分析、隐私保护、智能信息处理等。
    乐嘉锦(1951—),男,上海人,教授,博士生导师,主要研究方向为数据科学管理、软件工程理论与实践等。
  • 基金资助:
    国家自然科学基金(61702161);河南省科技厅科技攻关项目(222102210289);河南省科技厅科技攻关项目(212102210386)

Research Progress of RFID Data Cleaning Technology

WANG Jian1,+(), LE Jiajin2   

  1. 1. School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China
    2. School of Computer Science and Technology, Donghua University, Shanghai 201620, China
  • Received:2022-06-06 Revised:2022-08-10 Online:2022-12-01 Published:2022-12-16
  • About author:WANG Jian, born in 1981, Ph.D., M.S. supervisor. His research interests include big data analysis, privacy preserving, intelligent information processing, etc.
    LE Jiajin, born in 1951, professor, Ph.D. supervisor. His research interests include data science management, software engineering theory and practice, etc.
  • Supported by:
    National Natural Science Foundation of China(61702161);Science and Technology Research Project of Henan Provincial Science and Technology Department(222102210289);Science and Technology Research Project of Henan Provincial Science and Technology Department(212102210386)

摘要:

无线射频识别(RFID)技术是一种自动识别方法,它依赖于称为RFID标签的无线电转发器快速存储和检索数据。由于RFID标签与读写器通信时无需直接接触,这样为短时间内采集大量的数据提供了可能。但是,采集到的数据也产生了诸如漏读、多读、冗余、乱序等问题,如何在短时间内高效地清洗产生的大规模RFID数据成为数据库领域的重要研究课题。对现有的RFID数据清洗技术进行了综述。首先,给出了RFID系统与RFID数据清洗问题的有关定义与描述,列出了典型的数据集与评价标准,从相关技术的分类、子类、基本思想、优势、局限、适用场景等方面详细比较和总结了现有的RFID数据清洗工作,同时对相关应用系统进行比较分析。然后,针对漏读数据清洗、多读数据清洗、冗余数据清洗、乱序数据处理等关键问题,对已有的研究进行了详细的比较和总结。最后,从RFID原始数据与基准数据集构建、加密与隐私保护数据的清洗策略、数据采集准确率、清洗结果的时效性、场景自学习等方面提出了RFID数据清洗领域未来五个值得关注的研究方向。

关键词: 无线射频识别(RFID), 数据清洗, 漏读数据, 多读数据, 冗余数据, 乱序数据, 系统应用

Abstract:

Radio frequency identification (RFID) technology is an automatic identification method, which relies on the use of radio repeaters called RFID tags to quickly store and retrieve data. Because RFID tags do not need linear contact when communicating with readers, it is possible to collect a large amount of data in a short time. However, the collected data also produce problems such as false negative readings, false positive readings, duplicated readings, out-of-order readings and so on. In this case, how to efficiently clean the large-scale RFID data in short time has become an important research topic in the field of database. This paper mainly summarizes the existing RFID data cleaning technology. Firstly, the relevant definitions and descriptions of RFID system and RFID data cleaning problem are given, and typical datasets and evaluation criteria are listed. Then the existing RFID data cleaning work is compared and summarized in detail from the classification, subcategory, basic idea, advantages, limitations, application scenarios and other aspects of related technologies. At the same time, the relevant application systems are compared and analyzed. Then, for the key problems of the false negative reading, false positive reading, duplicated reading and out-of-order reading, the existing studies are compared and summarized in detail. Finally, this paper proposes next five research directions worthy of paying attention to in the field of RFID data cleaning, such as the construction of RFID original data and benchmark dataset, the cleaning strategy of encryption and privacy protection data, the accuracy of data collection, the timeliness of cleaning results, and scene self-learning.

Key words: radio frequency identification (RFID), data cleaning, false negative reading, false positive reading, duplicated reading, out-of-order reading, system application

中图分类号: