Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty

doi:10.3778/j.issn.1673-9418.2010.09.001

Journal of Frontiers of Computer Science and Technology ›› 2010, Vol. 4 ›› Issue (9): 769-779.DOI: 10.3778/j.issn.1673-9418.2010.09.001

• 学术研究 • Previous Articles Next Articles

Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty

PAN Shirui¹, ZHANG Yang^1,2+, LI Xue³, WANG Yong⁴

1. College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
2. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
3. School of Information Technology and Electrical Engineering, University of Queensland, Brisbane 4072, Australia
4. School of Computer, Northwestern Polytechnical University, Xi’an 710072, China

Received:1900-01-01 Revised:1900-01-01 Online:2010-09-09 Published:2010-09-09
Contact: ZHANG Yang

针对不确定正例和未标记学习的最近邻算法*

潘世瑞¹, 张阳^1,2+, 李雪 ³, 王勇 ⁴

1. 西北农林科技大学信息工程学院, 陕西杨凌 712100
2. 南京大学计算机软件新技术国家重点实验室, 南京 210093
3. 昆士兰大学计算机及电子工程系, 布里斯班 4072, 澳大利亚
4. 西北工业大学计算机学院, 西安 710072

通讯作者: 张阳

Abstract

Abstract: This paper studies the problem of uncertain data classification under positive and unlabeled (PU) learning
scenario. It proposes a novel algorithm, NNPU (nearest neighbor algorithm for positive and unlabeled learning), to
handle this problem with two varieties, NNPUa and NNPUu. Experimental results on benchmark UCI datasets show
that NNPUu, which considers the whole uncertain information on the datasets, has a better ability to classify unseen
examples than NNPUa that considers the average value of uncertainty only. Furthermore, NNPU outperforms some
existing algorithms such as NN-d, OCC (one-class classifier) and aPUNB in handling precise data.

Key words: uncertain data, positive and unlabeled learning, nearest neighbor algorithm

摘要： 研究了在正例和未标记样本场景下不确定样本的分类问题, 提出了一种新的算法NNPU(nearest neighbor algorithm for positive and unlabeled learning)。NNPU 具有两种实现方式：NNPUa 和NNPUu。在UCI 标准数据集上的实验结果表明, 充分考虑数据不确定信息的NNPUu 算法要比仅仅考虑样本中不确定信息均值的NNPUa 算法具有更好的分类能力; 同时, NNPU 算法在对精确数据进行分类时, 比NN-d、OCC以及aPUNB 算法性能更优。

关键词: 不确定数据, 正例和未标记样本学习, 最近邻算法

CLC Number:

TP181

PAN Shirui¹, ZHANG Yang^1,2+, LI Xue³, WANG Yong⁴.

Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty

[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(9): 769-779.

潘世瑞1 , 张阳1,2+ , 李雪 3 , 王勇 4 . 针对不确定正例和未标记学习的最近邻算法*[J]. 计算机科学与探索, 2010, 4(9): 769-779.

[1]	CUI Meiyu, WAN Jing, HE Yunbin, LI Song. Uncertain Data Clustering Algorithm Based on Grid in Obstacle Space [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(3): 408-417.
[2]	YU Jiaxi, LI Song, ZHANG Liping, LIU Lei. Probabilistic Obstacle k Aggregate Nearest Neighbor Query on Uncertain Data [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(2): 231-240.
[3]	ZHU Mingdong, XU Lixin, SHEN Derong, KOU Yue, NIE Tiezheng. Methods for Similarity Query on Uncertain Data with Cosine Similarity Constraints [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(1): 49-64.
[4]	CAO Keyan, WANG Guoren, HAN Donghong, LI Shuoru. Top-k Outlier Detection Algorithm on Uncertain Data Stream [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(2): 172-181.
[5]	JIANG Yuankai, ZHENG Hongyuan. Clustering Algorithm over Uncertain Data Streams Based on Rough Fuzzy Set [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(12): 1494-1501.
[6]	JIA Qinan, MA Lei, HE Jianfeng. Research on Twice Supervised Learning Algorithm Applied for Clinical Survival Time Prediction [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(11): 1391-1399.
[7]	ZHU Yunlei, YUE Kun, QIAN Wenhua, YANG Wenjing, LIU Weiyi. Time-Series Multi-Level Probabilistic Graphical Model for Representing Lineages over Uncertain Data [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(5): 460-471.
[8]	LI Jiajia, WANG Botao, WANG Guoren, HUANG Shan. A Survey of Query Processing Techniques over Uncertain Mobile Objects [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(12): 1057-1072.
[9]	CAO Jinfeng, DONG Yihong, WANG Yong, QIAN Jiangbo, ZHONG Caiming. Updating Queries for Probabilistic Skyline Set of Uncertain Moving Objects [J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(5): 443-455.
[10]	CAO Keyan, WANG Guoren, HAN Donghong, YUAN Ye, HU Yachao, QI Baolei. Clustering Algorithm of Uncertain Data in Obstacle Space [J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12): 1087-1097.
[11]	WANG Guangdong, WANG Yijie, LI Xiaoyong, WANG Yuan. Parallel Skyline Computation over Uncertain Data Streams [J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12): 1116-1125.
[12]	ZHANG Zhiqiang, WEI Xiaoyan, XIE Xiaoqin. Using Dominate Relationship Analysis to Optimize Top-k Queries on Uncertain Data [J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(11): 994-1006.
[13]	JIANG Guohua, JIANG Shouxu, WANG Hongzhi, LI Jianzhong, GAO Hong. Query Processing on XML with Dirty Tags [J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(8): 673-685.
[14]	XIN Tingting, LIU Guohua. Top-k Queries under K-Anonymity Privacy Protection Model [J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(8): 751-759.
[15]	ZHONG Zhi1, ZHU Manlong2+, ZHANG Chen2, HUANG Liangchang2. Research on Nearest Neighbors Classification Techniques [J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(5): 467-473.

Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty

针对不确定正例和未标记学习的最近邻算法*

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics