计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (8): 1339-1349.DOI: 10.3778/j.issn.1673-9418.1708035

• 理论与算法 • 上一篇    下一篇

间隔执行的异步副本放置策略

谢纪东+,武继刚   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2018-08-01 发布日期:2018-08-09

Asynchronous Round-Based Strategy for Replica Placement

XIE Jidong+, WU Jigang   

  1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2018-08-01 Published:2018-08-09

摘要: 副本技术旨在通过预测用户获取数据行为并在适当的地点放置副本来降低网络延迟以及减少网络带宽消耗。副本技术已经广泛用在了数据网格、云计算中。副本技术主要有两大过程:第一个过程通过收集用户对文件的请求来选择最合适的文件作为候选副本,第二个过程通过计算资源节点位置、容量、带宽等因素来决定将候选副本放置到哪一个资源节点,以使整个系统所产生的延迟和带宽消耗最少。通过重新定义流行度,提高了对大文件造成延迟的敏感性。采用分而治之的思想设计全局算法和局部算法,局部算法通过异步机制将文件访问记录传递给全局算法进行全局流行度计算,然后局部算法综合全局流行度信息计算得到最合适的候选副本,最后将候选副本放置到最合适的资源节点。通过模拟实验,利用高斯分布、幂律分布来模拟用户文件请求行为偏好,验证了所提出的策略相比IPFRF(improved popular file replicate first)算法,在一定程度上降低了平均文件延迟和平均带宽消耗。

关键词: 数据网格, 副本放置, 文件流行度, 星型拓扑

Abstract: Data replication technique reduces the network latency and network bandwidth by predicting the preference of users and placing file replicas to the nearest resource node of users in advance, which has been widely used in data grid and cloud computing. Two important steps involve in data replication, file selection and replica placement. File selection is responsible for predicting the preference and selecting the most popular files as the candidates of replicas. Replica placement is responsible for placing file replicas to the most suitable node in data grid by taking the location, capacity and bandwidth of the resource node into consideration to minimize the latency and bandwidth consumption of the entire system. By redefining the file popularity, this paper improves the algorithm??s sensitivity to large files. Then this paper designs an asynchronous algorithm with distributed ideas which consists of local algorithm and global algorithm. The local algorithm sends the file access records to the global algorithm to calculate the global file popularity through the asynchronous mechanism. Then, the local algorithm selects the candidates of replica using redefined file popularity. Finally, the candidates of replica will be placed to the most suitable resource node. The simulation experimental results show that the proposed strategy can reduce the average file latency and the average bandwidth consumption to a certain extent compared with the IPFRF (improved popular file replicate first) algorithm under the access patterns of Gauss distribution and power law distribution.

Key words: data grid, replication placement, file popularity, star topology