计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (1): 39-48.DOI: 10.3778/j.issn.1673-9418.1609003

• 数据库技术 • 上一篇    下一篇

数据平台的设计和实现以及大赛中的应用

王永坤1,金耀辉1,2+   

  1. 1. 上海交通大学 网络信息中心,上海 200240
    2. 上海交通大学 光纤通信国家重点实验室,上海 200240
  • 出版日期:2018-01-01 发布日期:2018-01-09

Design of Data Platform and Application in Data Competition

WANG Yongkun1, JIN Yaohui1,2+   

  1. 1. Network and Information Center, Shanghai Jiao Tong University, Shanghai 200240, China
    2. State Key Lab of Advanced Optical Communication Systems and Networks, Shanghai Jiao Tong University, Shanghai 200240, China
  • Online:2018-01-01 Published:2018-01-09

摘要: 得益于大数据相关的开源软件蓬勃发展,中小企业和机构也可以进行大数据平台的搭建和应用。但是数据平台在实际应用中仍然有很多挑战,例如如何进行开放数据的共享和计算,并保证代码安全等。基于开源软件设计了一个数据平台架构来让用户一站式地共享数据和计算,同时跟踪和审核代码。根据此设计搭建了一个生产环境,并给出了基本的测试来验证平台的可用性。把平台开放给外界使用,成功地支持了上海开放数据创新应用大赛(Shanghai Open Data Apps,SODA)中的大量用户的数据共享和计算需求。

关键词: 大数据, 数据分析, 数据处理, 数据平台, Hadoop

Abstract: The quick development of open source software for big data makes it possible for small and medium businesses and organizations to build and operate their own data platform. However, there are still many challenges in practice such as how to share not only the data but the computing seamlessly, as well as keep users' source code under control without any harm to platform and data. This paper provides a design of data platform which can share the data and computing easily, at the same time, users' source code is also hosted with version control and under scrutiny to ensure the safety. The design is implemented with open source software on real hardware in production. The basic benchmark shows that it works as expected. The platform is open to public and successfully supports the data sharing and computing of the Shanghai Open Data Apps (SODA) competition.

Key words: big data, big data analysis, big data processing, data platform, Hadoop