计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (4): 438-445.DOI: 10.3778/j.issn.1673-9418.1311014

• 系统软件与软件工程 • 上一篇    下一篇

SFFS:低延迟的面向小文件的分布式文件系统

王鲁俊,龙  翔,吴兴博,王  雷+   

  1. 北京航空航天大学 计算机学院,北京 100191
  • 出版日期:2014-04-01 发布日期:2014-04-03

SFFS: Low-Latency Small-File-Oriented Distributed File System

WANG Lujun, LONG Xiang, WU Xingbo, WANG Lei+   

  1. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
  • Online:2014-04-01 Published:2014-04-03

摘要: 社交网站和电子商务等网络服务发展迅速,这类服务需要存储大量图片、音乐、微博文本等小文件。传统的分布式存储系统,如HDFS(Hadoop distributed file system),是面向大文件而设计的,在存储小文件时会产生元数据开销过大,访问延迟较高等问题,不能适应存储海量小文件的应用环境。分析了TFS(Taobao file system)的系统架构和读写流程,发现TFS在每次读/写过程中至少要建立3次网络连接,增大了读写延迟。针对海量小文件存储带来的挑战和TFS存在的问题,提出了一种新的低延迟、高可用的面向海量小文件的分布式存储方案,并实现了分布式文件系统SFFS(small-file file system)。性能测试表明,SFFS和TFS相比,写延迟降低了76.6%,读延迟降低了约10%。通过对系统结构的分析,相比于TFS,SFFS在中心节点的负载更轻,失效恢复更快,在可用性方面更有优势。

关键词: 小文件, 低延迟高可用, 分布式存储

Abstract: SNS (social networking services) and E-commerce services developed rapidly. Such services need store numerous small files like pictures, music files and macro blog texts. Traditional distributed storage systems, such as HDFS (Hadoop distributed file system), are designed for large files, which will have problems such as too much overhead with metadata and high latency when dealing with large number of small files. This paper analyzes the architecture and read-write flow of TFS (Taobao file system), and finds that TFS has to build several network connections when writing or reading a small file, which increases the read-write latency. Aiming at the challenge of storing numerous small files and the problems of TFS, this paper proposes SFFS (small-file file system), a low-latency high availability small-file-oriented distributed storage. The performance experiments show that the write latency of SFFS decreases 76.6%, and the read latency of SFFS decreases about 10% compared with TFS. SFFS also has a higher availability than TFS since the center node in SFFS has lighter load and can recover more quickly.

Key words: small file, low-latency high availability, distributed storage