Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (4): 822-834.DOI: 10.3778/j.issn.1673-9418.2009098
• Database Technology • Previous Articles Next Articles
CHEN Shenglei1, QIU Yitao1, JIANG Congfeng1,+(), ZHANG Jilin1, YU Jun1, LIN Jiangbin2, YAN Longchuan3, REN Zujie4, WAN Jian5
Received:
2020-09-07
Revised:
2020-11-02
Online:
2022-04-01
Published:
2020-11-06
About author:
CHEN Shenglei, born in 1996, M.S. candidate. Her research interest is cloud computing.Supported by:
陈圣蕾1, 裘翼滔1, 蒋从锋1,+(), 张纪林1, 俞俊1, 林江彬2, 闫龙川3, 任祖杰4, 万健5
通讯作者:
+ E-mail: cjiang@hdu.edu.cn作者简介:
陈圣蕾(1996—),女,浙江缙云人,硕士研究生,主要研究方向为云计算。基金资助:
CLC Number:
CHEN Shenglei, QIU Yitao, JIANG Congfeng, ZHANG Jilin, YU Jun, LIN Jiangbin, YAN Longchuan, REN Zujie, WAN Jian. Workload Characterization of Online and Offline Services in Co-located Data Centers[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 822-834.
陈圣蕾, 裘翼滔, 蒋从锋, 张纪林, 俞俊, 林江彬, 闫龙川, 任祖杰, 万健. 混部数据中心在线离线服务特征分析[J]. 计算机科学与探索, 2022, 16(4): 822-834.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2009098
表名 | 记录数 | 文件大小 |
---|---|---|
machine meta | 17 592 | 0.56 MB |
machine usage | 246 637 252 | 8.38 GB |
container meta | 370 540 | 22.26 MB |
container usage | 4 015 763 787 | 164.09 GB |
batch task | 14 295 731 | 0.98 GB |
batch instance | 1 351 255 775 | 103.69 GB |
Table 1 Alibaba dataset record line number
表名 | 记录数 | 文件大小 |
---|---|---|
machine meta | 17 592 | 0.56 MB |
machine usage | 246 637 252 | 8.38 GB |
container meta | 370 540 | 22.26 MB |
container usage | 4 015 763 787 | 164.09 GB |
batch task | 14 295 731 | 0.98 GB |
batch instance | 1 351 255 775 | 103.69 GB |
状态 | started | stopped | allocated | unknow | 总计 |
---|---|---|---|---|---|
容器数量 | 70 903 | 400 | 39 | 0 | 71 342 |
Table 2 Container amount with only one state
状态 | started | stopped | allocated | unknow | 总计 |
---|---|---|---|---|---|
容器数量 | 70 903 | 400 | 39 | 0 | 71 342 |
状态 | started,stopped | started,allocated | started,unknow | 总计 |
---|---|---|---|---|
容器数量 | 118 | 10 | 5 | 133 |
Table 3 Container amount with two states
状态 | started,stopped | started,allocated | started,unknow | 总计 |
---|---|---|---|---|
容器数量 | 118 | 10 | 5 | 133 |
状态 | started,allocated,stopped |
---|---|
容器数量 | 1 |
Table 4 Container amount with three states
状态 | started,allocated,stopped |
---|---|
容器数量 | 1 |
Fitting function | Resource category | | | | R-square |
---|---|---|---|---|---|
| CPU | 5 792 | -0.086 | — | 0.977 |
memory | 0.003 | 0.146 | — | 0.617 | |
disk_io | 14 430 | 8.203 | 2.452 | 0.992 |
Table 5 Fitting function and parameter value of container resource usage distribution
Fitting function | Resource category | | | | R-square |
---|---|---|---|---|---|
| CPU | 5 792 | -0.086 | — | 0.977 |
memory | 0.003 | 0.146 | — | 0.617 | |
disk_io | 14 430 | 8.203 | 2.452 | 0.992 |
Feature vectorgroup name | Average CPU | Average memory | Average disk |
---|---|---|---|
mGroup0 | 35.775~44.424 | 81.836~92.608 | 3.298~32.604 |
mGroup1 | 0.000 2~28.393 | 2.999~48.931 | 0~25.944 |
mGroup2 | 2.257~21.418 | 52.162~96.156 | 0.611~24.015 |
mGroup3 | 39.908~60.559 | 81.562~92.383 | 2.940~28.120 |
mGroup4 | 34.719~58.793 | 81.577~92.079 | 40.815~98.108 |
mGroup5 | 20.179~37.649 | 49.439~94.623 | 2.737~20.349 |
Table 6 Boundaries of feature vectors for servers
Feature vectorgroup name | Average CPU | Average memory | Average disk |
---|---|---|---|
mGroup0 | 35.775~44.424 | 81.836~92.608 | 3.298~32.604 |
mGroup1 | 0.000 2~28.393 | 2.999~48.931 | 0~25.944 |
mGroup2 | 2.257~21.418 | 52.162~96.156 | 0.611~24.015 |
mGroup3 | 39.908~60.559 | 81.562~92.383 | 2.940~28.120 |
mGroup4 | 34.719~58.793 | 81.577~92.079 | 40.815~98.108 |
mGroup5 | 20.179~37.649 | 49.439~94.623 | 2.737~20.349 |
Feature vectorgroup name | Average CPU | Average memory | Average disk |
---|---|---|---|
cGroup0 | 0.000 1~99.989 | 64.208 5~100.000 | 0.628~98.897 |
cGroup1 | 0~99.999 8 | 1.008~69.847 | 0~98.910 2 |
Table 7 Boundaries of feature vectors for containers
Feature vectorgroup name | Average CPU | Average memory | Average disk |
---|---|---|---|
cGroup0 | 0.000 1~99.989 | 64.208 5~100.000 | 0.628~98.897 |
cGroup1 | 0~99.999 8 | 1.008~69.847 | 0~98.910 2 |
Feature vector group name | Average CPU | Average memory | Duration/s |
---|---|---|---|
iGroup0 | 0~4 257 | 0~11.130 0 | 271~221 229 |
iGroup1 | 0~2 135 | 0~91.599 9 | 1~296 |
Table 8 Boundaries of feature vectors for instance
Feature vector group name | Average CPU | Average memory | Duration/s |
---|---|---|---|
iGroup0 | 0~4 257 | 0~11.130 0 | 271~221 229 |
iGroup1 | 0~2 135 | 0~91.599 9 | 1~296 |
Feature vector group name | cGroup0 | cGroup1 | iGroup0 | iGroup1 |
---|---|---|---|---|
mGroup0 | 0.74 | 0.26 | 0.03 | 0.97 |
mGroup1 | 0.75 | 0.25 | 0.03 | 0.97 |
mGroup2 | 0.79 | 0.21 | 0.03 | 0.97 |
mGroup3 | 0.81 | 0.19 | 0.04 | 0.96 |
mGroup4 | 0.78 | 0.22 | 0.04 | 0.96 |
mGroup5 | 0.67 | 0.33 | 0.03 | 0.97 |
Table 9 Proportion of two types of containers and instances in each type of server
Feature vector group name | cGroup0 | cGroup1 | iGroup0 | iGroup1 |
---|---|---|---|---|
mGroup0 | 0.74 | 0.26 | 0.03 | 0.97 |
mGroup1 | 0.75 | 0.25 | 0.03 | 0.97 |
mGroup2 | 0.79 | 0.21 | 0.03 | 0.97 |
mGroup3 | 0.81 | 0.19 | 0.04 | 0.96 |
mGroup4 | 0.78 | 0.22 | 0.04 | 0.96 |
mGroup5 | 0.67 | 0.33 | 0.03 | 0.97 |
[1] | JYOTHI S A, CURINO C, MENACHE I, et al. Morpheus: towards automated SLOs for enterprise clusters[C]// Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, Nov 2-4, 2016. Berkeley: USENIX Association, 2016: 117-134. |
[2] | RAJAN K, KAKADIA D, CURINO C, et al. PerfOrator: eloquent performance models for resource optimization[C]// Proceedings of the 7th ACM Symposium on Cloud Computing, Santa Clara, Oct 5-7, 2016. New York: ACM, 2016: 415-427. |
[3] | XU G Y, XU C Z. Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation[C]// Proceedings of the 2017 Symposium on Cloud Computing, Santa Clara, Sep 24-27, 2017. New York: ACM, 2017: 655-655. |
[4] | REISS C, TUMANOV A, GANGER G R, et al. Towards understanding heterogeneous clouds at scale: Google trace analysis: ISTC-CC-TR-12-101[R]. Pittsburgh: Carnegie Mellon University, 2012. |
[5] |
ZHOU M S, DONG X S, CHEN H, et al. Fine-grained scheduling in multi-resource clusters[J]. The Journal of Supercomputing, 2020, 76(3):1931-1958.
DOI URL |
[6] | ZOU D Q, QIAN S Y, XUE G T, et al. UpPreempt: a fine-grained preemptive scheduling strategy for container-based clusters[C]// Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, Singapore, Dec 11-13, 2018. Piscataway: IEEE, 2018: 373-380. |
[7] |
BI J, YUAN H T, TAN W, et al. Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center[J]. IEEE Transactions on Automation Science and Engineering, 2017, 14(2):1172-1184.
DOI URL |
[8] |
USMANI Z, SINGH S. A survey of virtual machine placement techniques in a cloud data center[J]. Procedia Computer Science, 2016, 78:491-498.
DOI URL |
[9] |
AHMAD R W, GANI A, HAMId S H A, et al. A survey on virtual machine migration and server consolidation frameworks for cloud data centers[J]. Journal of Network and Computer Applications, 2015, 52:11-25.
DOI URL |
[10] | TOSATTO A, RUIU P, ATTANASIO A. Container-based orchestration in cloud: state of the art and challenges[C]// Proceedings of the 9th International Conference on Complex, Intelligent, and Software Intensive Systems, Santa Catarina, Jul 8-10, 2015. Washington: IEEE Computer Society, 2015: 70-75. |
[11] |
GOUDARZI H, PEDRAM M. Hierarchical SLA-driven resource management for peak power-aware and energy-efficient operation of a cloud datacenter[J]. IEEE Transactions on Cloud Computing, 2016, 4(2):222-236.
DOI URL |
[12] | PETRUCCI V, LAURENZANO M A, DOHERTY J, et al. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers[C]// Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, Burlingame, Feb 7-11, 2015. Washi- ngton: IEEE Computer Society, 2015: 246-258. |
[13] | CHEN W, RAO J, ZHOU X B. Preemptive, low latency datacenter scheduling via lightweight virtualization[C]// Proceedings of the 2017 USENIX Annual Technical Conference, Santa Clara, Jul 12-14, 2017. Berkeley: USENIX Association, 2017: 251-263. |
[14] | YAN Y, GAO Y J, CHEN Y, et al. TR-spark: transient computing for big data analytics[C]// Proceedings of the 7th ACM Symposium on Cloud Computing, Santa Clara, Oct 5-7, 2016. New York: ACM, 2016: 484-496. |
[15] | CHEN S, DELIMITROU C, MARTÍNEZ J F. Parties: Qos-aware resource partitioning for multiple interactive services[C]// Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, Apr 13-17, 2019. New York: ACM, 2019: 107-120. |
[16] | ISLAM M A, GANDHI A, REN S L. Minimizing electricity cost for geo-distributed interactive services with tail latency constraint[C]// Proceedings of the 7th International Green and Sustainable Computing Conference, Hangzhou, Nov 7-9, 2016. Washington: IEEE Computer Society, 2016: 1-8. |
[17] | CHENG Y, ANWAR A, DUAN X J. Analyzing Alibaba’s co-located datacenter workloads[C]// Proceedings of the 2018 IEEE International Conference on Big Data, Seattle, Dec 10-13, 2018. Piscataway: IEEE, 2018: 292-297. |
[18] |
ZHANG Z, LI C, TAO Y Y, et al. Fuxi: a fault-tolerant resource management and job scheduling system at Internet scale[J]. Proceedings of the VLDB Endowment, 2014, 7(13):1393-1404.
DOI URL |
[19] | Pouch container engine[EB/OL]. [2020-06-23]. https://github.com/Alibaba/pouch. |
[20] | Alibaba trace[EB/OL]. [2020-06-23]. https://github.com/alibaba/clusterdata. |
[21] | DENG L, REN Y L, XU F, et al. Resource utilization analysis of Alibaba cloud[C]// LNCS 10954: Proceedings of the 14th International Conference on Intelligent Computing Theories and Application, Wuhan, Aug 15-18, 2018. Cham: Springer, 2018: 183-194. |
[22] |
ARORA P, VARSHNEY S. Analysis of k-means and k-medoids algorithm for big data[J]. Procedia Computer Science, 2016, 78:507-512.
DOI URL |
[23] | ŁUKASIK S, KOWALSKI P A, CHARYTANOWICZ M, et al. Clustering using flower pollination algorithm and Calinski-Harabasz index[C]// Proceedings of the 2016 IEEE Congress on Evolutionary Computation, Vancouver, Jul 24-29, 2016. Piscataway: IEEE, 2016: 2724-2728. |
[24] | SHISHIRA S R, KANDASAMY A, CHANDRASEKARAN K. Workload characterization: survey of current approaches and research challenges[C]// Proceedings of the 7th International Conference on Computer and Communication Technology, Allahabad, Nov 24-26, 2017. New York: ACM, 2017: 151-156. |
[25] | Google trace[EB/OL]. [2020-06-23]. https://github.com/google/cluster-data. |
[26] | REISS C, TUMANOV A, GANGER G R, et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis[C]// Proceedings of the 3rd ACM Symposium on Cloud Computing. New York: ACM, 2012: 7. |
[27] | FAN Z W, HUANG P J, HUANG P S, et al. A feature generation framework for Google trace analysis[C]// Proceedings of the 2015 International Conference on Machine Learning and Cybernetics. Piscataway: IEEE, 2015: 229-234. |
[28] | LU C Z, YE K J, XU G Y, et al. Imbalance in the cloud: an analysis on Alibaba cluster trace[C]// Proceedings of the 2017 IEEE International Conference on Big Data, Boston, Dec 11-14, 2017. Washington: IEEE Computer Society, 2017: 2884-2892. |
[29] | CHENG Y, CHAI Z, ANWAR A, Characterizing co-located datacenter workloads: an Alibaba case study[J]. arXiv: 1808. 02919, 2018. |
[30] | LIU Q X, YU Z B. The elasticity and plasticity in semi-containerized co-locating cloud workload: a view from Alibaba trace[C]// Proceedings of the 2018 ACM Symposium on Cloud Computing, Carlsbad, Oct 11-13, 2018. New York: ACM, 2018: 347-360. |
[31] | CHEN Y, GANAPATHI A S, GRIFFITH R, et al. Analysis and lessons from a publicly available Google cluster trace: UCB/EECS-2010-95[R]. Berkeley: University of California, 2010. |
[32] | ALAM M, SHAKIL K A, SETHI S. Analysis and clustering of workload in Google cluster trace based on resource usage[C]// Proceedings of the 2016 IEEE International Conference on Computational Science and Engineering, and IEEE International Conference on Embedded and Ubiquitous Computing, and 15th International Symposium on Distributed Computing and Applications for Business Engineering, Paris, Aug 24-26, 2016. Washington: IEEE Computer Society, 2016: 740-747. |
[33] | CHEN W Y, YE K J, WANG Y, et al. How does the workload look like in production cloud? Analysis and clustering of workloads on Alibaba cluster trace[C]// Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, Singapore, Dec 11-13, 2018. Piscataway: IEEE, 2018: 102-109. |
[1] | YU Daming, ZHANG Zhen. FSDC: Flexible and Highly Scalable Data Center Network Structure [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 855-864. |
[2] | GUAN Zheng, HU Yang, YANG Zhijun, HE Min. Weighted Scheduling Algorithm Based on In-Band Full-Duplex Link for Distributed WLAN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 372-383. |
[3] | LI Chengyan, SONG Yue, MA Jintao. RIOPSO Algorithm for Fuzzy Cloud Resource Scheduling Problem [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1534-1545. |
[4] | YE Jin, XIE Ziqi, XIAO Qingyu, SONG Ling, LI Xiaohuan. Inferring Coflow Size Mechanism Based on ELM in Data Center Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 261-269. |
[5] | GUO Yuhan, YI Peng. Distributed Hybrid Variable Neighborhood Search Algorithm for Carpooling Problem [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(2): 330-341. |
[6] | SUN Huaiying, YU Huiqun, FAN Guisheng, CHEN Liqiong. Time Minimized Task Scheduling in Hadoop with SDN [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(11): 1767-1776. |
[7] | PEI Shujun, SONG Dongmei, KONG Dekai. Application of Fast Pruning Algorithm in Map/Reduce for Complex Tasks Scheduling [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(1): 72-81. |
[8] | YU Yajun, LIU Zheng, XU Mingwei. Research on TCP Incast in Data Center Networks [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(9): 1361-1378. |
[9] | LI Fangfang, LIU Chong, YU Ge. Scheduling Algorithm of Events with Imprecise Timestamps for CPS [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(6): 887-896. |
[10] | ZHANG Yiwen, WANG Cheng. Reliability-Aware Energy Management Scheduling Algorithm for Periodic Task [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(5): 833-841. |
[11] | ZHANG Feipeng, CHEN Lin, ZHANG Jingjing. Measurement and Performance Bottleneck Analysis Method for Large-Scale Complex Networks [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(2): 262-270. |
[12] | WANG Desheng, ZHANG Weizhe, HAO Meng, LU Gangzhao, BAI Enci. Research of Adaptive Virtual Machine Memory Scheduling Algorithm in Cloud Computing Environment [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(1): 70-79. |
[13] | ZHU Peng, HE Kun, CAO Weigang, YANG Huan. Caving-Degree Based Greedy Scheduling Algorithm for Three-Dimensional Space-Time Optimization Problem [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(8): 1051-1062. |
[14] | GAO Renfei, WU Jigang, ZHOU Ying, ZHANG Yaoguo. Virtual Machine Assignment for Minimizing Data Latency [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(7): 924-935. |
[15] | ZHOU Dan, GE Hongwei, SU Shuzhi, YUAN Yunhao. Particle Compaction and Scheduling Based Particle Swarm Optimization [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(5): 742-750. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/