Journal of Frontiers of Computer Science and Technology ›› 2020, Vol. 14 ›› Issue (12): 2061-2071.DOI: 10.3778/j.issn.1673-9418.1912024

Previous Articles     Next Articles

Mining Fuzzy Relationship Between Malignant Tumors and Industrial Pollution

CHU Chuanxin, WANG Lizhen, ZHOU Lihua, LI Xuyang   

  1. School of Information Science and Engineering, Yunnan University, Kunming 650500, China
  • Online:2020-12-01 Published:2020-12-11

恶性肿瘤与工业污染之间的模糊关系挖掘

储传鑫王丽珍周丽华李旭阳   

  1. 云南大学 信息学院,昆明 650500

Abstract:

Malignant tumors are one of the serious diseases that endanger human health. The use of data mining techniques to mine the relationship between malignant tumors and various pathogenic factors has been attracted more and more attention. In practice, the relationship between tumor diseases and pathogenic factors is often fuzzy, and the occurrence of tumor diseases is not only affected by a single factor. However, there is no research to address the above problem. For this reason, the concept of fuzzy co-location patterns is proposed that combines the spatial co-location pattern mining with fuzzy theory, and pollution sources are fuzzified by the clustering method. Then, the method of extracting rules from the decision table is adopted to extract rules, and the corresponding confidence calculation algorithm is designed. As a result, a novel method which can discover the fuzzy relationships between tumor diseases and pathogenic factors is proposed. The effectiveness of the proposed method is verified on the actual case, and the effects of different parameters on the running time of the algorithm are analyzed by conducting experiments on synthetic datasets. At last, the theoretical analysis of the time complexity of the proposed algorithm is presented.

Key words: spatial data mining, spatial co-location pattern, fuzzy theory, clustering analysis, rule extraction

摘要:

恶性肿瘤是危害人类健康的重要疾病之一,运用数据挖掘技术挖掘恶性肿瘤与各种致病因素之间的关系受到越来越多的关注。在实际中,肿瘤疾病与致病因素之间的关系往往是模糊的,肿瘤疾病的发生也不只受单一因素的影响,但目前还没有针对上述问题的研究。为此,基于空间共存模式挖掘技术,结合模糊理论,提出了模糊共存模式的概念;运用聚类方法对污染源进行了模糊化处理;在进行规则提取时采用了决策表提取规则的方法,并设计了相应的置信度计算算法;最终提出了一种能够挖掘出多种肿瘤疾病与多种污染源之间模糊关系的新方法。通过在实际案例上的应用验证了提出算法的有效性,通过在合成数据集上的实验分析了不同参数对算法运行时间的影响,并对算法的时间效率进行了理论分析。

关键词: 空间数据挖掘, 空间共存模式, 模糊理论, 聚类分析, 规则提取