计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (2): 171-184.DOI: 10.3778/j.issn.1673-9418.1606010

• 综述·探索 • 上一篇    下一篇

交互式数据探索综述

王蒙湘,李芳芳,谷  峪,于  戈+   

  1. 东北大学 计算机科学与工程学院 计算机科学系,沈阳 110819
  • 出版日期:2017-02-01 发布日期:2017-02-10

Survey on Interactive Data Exploration

WANG Mengxiang, LI fangfang, GU Yu, YU Ge+   

  1. Department of Computer Science, College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Online:2017-02-01 Published:2017-02-10

摘要:

大规模数据集已经超过TB和PB级,现有的技术可以收集和存储大量的信息。虽然数据库管理系统一直在不断提高提供复杂的多种数据管理的能力,但是管理查询工具并不能满足大数据的需求,如何精准理解和探索这些大规模数据集仍然是一个巨大的挑战。交互式数据探索(interactive data exploration,IDE)的关注点是强调交互、探索和发现,能让用户从海量的数据中用最小的代价更精确地找到他们需要的信息。首先对交互式数据探索及其应用背景进行了介绍,总结了通用的探索模型和IDE的特点,分析了交互式数据探索中的查询推荐技术和查询结果优化技术的现状;随后分别对IDE原型系统进行了分析和比较;最后给出了关于交互式数据探索技术的总结和展望。

关键词: 交互式数据探索, 查询推荐, 查询结果优化, 用户反馈, 机器学习

Abstract:

Large data sets have exceeded the scale of terabytes and petabytes, and existing techniques can collect and store massive information. While database management systems have been constantly improved to offer a variety of complex data management capabilities, but the query tools cannot satisfy the needs of large data, so how to precisely understand and explore the massive data set remains a huge challenge. The focus of interactive data exploration (IDE) is to emphasize interaction, exploration and discovery. Users will accurately find the information they need with the minimum cost in the vast amounts of data. Firstly, this paper introduces the IDE and its application background, summarizes the general model and features of IDE, and analyzes the present situation of the query technology and the optimization techniques for query results. Furthermore, this paper analyzes and compares IDE prototype systems respectively. Finally, this paper summarizes and forecasts the techniques of IDE.

Key words: interactive data exploration, query recommendation, optimization for query results, user?feedback, machine learning