Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (7): 1543-1551.DOI: 10.3778/j.issn.1673-9418.2101028

• Service Computing • Previous Articles     Next Articles

Data Set Construction Method for Intelligent Health Care and Its Application

ZHANG Linyu, TU Zhiying+(), HANG Shaoshi, ZHANG Bolin, CHU Dianhui   

  1. School of Computer Science and Technology, Harbin Institute of Technology, Weihai, Shandong 264209, China
  • Received:2021-01-07 Revised:2021-05-17 Online:2022-07-01 Published:2021-06-29
  • Supported by:
    the National Key Research and Development Program of China(2018YFB1004800);the National Natural Science Foundation of China(61772159);the Natural Science Foundation of Shandong Province(ZR2017MF026)

面向智慧康养的数据集构建方法及其应用

张麟宇, 涂志莹+(), 杭少石, 张柏林, 初佃辉   

  1. 哈尔滨工业大学 计算机科学与技术学院,山东 威海 264209
  • 作者简介:张麟宇(1997—),男,硕士研究生,主要研究方向为服务计算、知识图谱。
    ZHANG Linyu, born in 1997, M.S. candidate. His research interests include service computing and knowledge graph.
    涂志莹(1983—),男,博士,副教授,CCF会员,主要研究方向为服务计算、知识工程。
    TU Zhiying, born in 1983, Ph.D., associate professor, member of CCF. His research interests include service computing and knowledge engineering.
    杭少石(1996—),男,硕士研究生,主要研究方向为服务计算、知识图谱。
    HANG Shaoshi, born in 1996, M.S. candidate. His research interests include service computing and knowledge graph.
    张柏林(1997—),男,硕士研究生,主要研究方向为机器学习、服务计算。
    ZHANG Bolin, born in 1997, M.S. candidate. His research interests include machine learning and service computing.
    初佃辉(1970—),男,博士,教授,CCF会员,主要研究方向为服务计算、知识工程。
    CHU Dianhui, born in 1970, Ph.D., professor, member of CCF. His research interests include service computing and knowledge engineering.
  • 基金资助:
    国家重点研发计划(2018YFB1004800);国家自然科学基金(61772159);山东省自然科学基金(ZR2017MF026)

Abstract:

The rapid development of Internet and computer technology makes it possible to improve smart health care services in today’s aging population. However, there are some data problems that seriously restrict the process of intelligence in the field of elderly care, such as the lack of real data, the interference of dirty data, and too few standard samples. To solve the problem of lacking data set, this paper proposes a three-stage data set construction method based on machine learning on the basis of small sample data which are collected from the community health care in a city. In the first stage, this paper designs a tree structure-based generation strategy to generate the basic attributes of the data set according to the distribution of the original data. In the second stage, this paper obtains the basic behavioral ability evaluation index of the samples with naive Bayesian algorithm. In the third stage, this paper constructs a variety of multiple linear regression equations to get high-order behavioral ability index and evaluation stage on the basis of the first two stages. In order to verify the effectiveness of the data set generated by the model for downstream tasks, this paper designs multiple rehabilitation training plan recommendation models based on the generated data with neural network, and achieves 5 multi-classification tasks and 2 multi-label classification tasks. This paper verifies the authenticity and validity of generated data through analysis of experimental results and expert knowledge.

Key words: smart health care service, small sample data, naive Bayes, multiple linear regression

摘要:

互联网和计算机技术的快速发展,使得在人口老龄化的今天发展智慧康养服务成为可能。然而,养老领域的数据问题严重地制约着该领域的智慧化进程。真实数据的缺失、脏数据的干扰以及标准样本过少等问题层出不穷。针对数据集缺失问题,在收集了某市社区康养的小样本数据基础上,提出了一种基于机器学习的三阶段数据生成模型:第一阶段,使用基于树形结构的生成策略,按照原始数据的分布生成了数据集的基础属性;第二阶段,使用朴素贝叶斯算法生成样本的基础行为能力评估指标;第三阶段,在前两个阶段的基础上采用多元线性回归的方法生成高阶行为能力指标以及评估阶段。此外,为验证该模型生成的数据集对下游任务的有效性,在生成数据基础上,利用神经网络设计多个康复训练计划推荐模型,实现5个多分类任务和2个多标签分类任务。通过对实验结果的分析以及专家知识的注入,验证了生成数据的真实性和有效性。

关键词: 智慧康养服务, 小样本数据, 朴素贝叶斯, 多元线性回归

CLC Number: