基于影响函数的k-近邻分类

职为梅; 张婷; 范明

doi:10.11999/JEIT141433

基于影响函数的k-近邻分类

doi: 10.11999/JEIT141433 cstr: 32379.14.JEIT141433

基金项目:

国家自然科学基金(61170223)和河南省教育厅科学技术研究重点项目(14A520016)资助课题

计量
- 文章访问数: 1525
- HTML全文浏览量: 187
- PDF下载量: 575
- 被引次数: 0
出版历程
- 收稿日期: 2014-11-13
- 修回日期: 2015-04-03
- 刊出日期: 2015-07-19

k-nearest Neighbor Classification Based on Influence Function

摘要

摘要: 分类是一种监督学习方法，通过在训练数据集学习模型判定未知样本的类标号。与传统的分类思想不同，该文从影响函数的角度理解分类，即从训练样本集对未知样本的影响来判定未知样本的类标号。首先介绍基于影响函数分类的思想；其次给出影响函数的定义，设计3种影响函数；最后基于这3种影响函数，提出基于影响函数的k-近邻(kNN)分类方法。并将该方法应用到非平衡数据集分类中。在18个UCI数据集上的实验结果表明，基于影响函数的k-近邻分类方法的分类性能好于传统的k-近邻分类方法，且对非平衡数据集分类有效。
- 数据挖掘 /
- 监督学习 /
- 非平衡数据集分类 /
- 影响函数 /
- k-近邻
Abstract: Classification is a supervised learning. It determines the class label of an unlabeled instance by learning model based on the training dataset. Unlike traditional classification, this paper views classification problem from another perspective, that is influential function. That is, the class label of an unlabeled instance is determined by the influence of the training data set. Firstly, the idea of classification is introduced based on influence function. Secondly, the definition of influence function is given and three influence functions are designed. Finally, this paper proposes k-nearest neighbor classification method based on these three influence functions and applies it to the classification of imbalanced data sets. The experimental results on 18 UCI data sets show that the proposed method improves effectively the k-nearest neighbor generalization ability. Besides, the proposed method is effective for imbalanced classification.
- Data mining /
- Supervised learning /
- Classification of imbalanced data sets /
- Influence function /
- k-Nearest Neighbor (kNN)

HTML全文

参考文献(16)

Tan P N and Steinbach M著, 范明, 范宏建, 译. 数据挖掘入门[M]. 第2版, 北京: 人民邮电出版社, 2011: 127-187.

Quinlan J S. Induction of decision trees[J]. Machine Learning, 1986, 1(1): 81-106.

Domingos P and Pazzani M J. Beyond independence: conditions for the optimality of the simple bayesian classifier[C].?Proceedings of the International Conference on Machine Learning, Bari, Italy, 1996: 105-112.

Rumelhart D E, Hinton G E, and Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(9): 533-536.

Boser B E,?Guyon I M, and Vapnik V N.?A training algorithm for optimal margin classifiers[C].?Proceedings of the Conference on Learning Theory, Pittsburgh, USA, 1992: 144-152.

Dasarathy B V. Nearest Neighbor (NN) norms: NN Pattern Classification Techniques[M]. Michigan: IEEE Computer Society Press, 1991: 64-85.

Leake D B.?Experience, introspection and expertise: learning to refine the case-based reasoning process[J].?Journal of Experimental Theoretical Artificial Intelligent, 1996, 8(3/4): 319-339.

Hinneburg A and Keim D A. An efficient approach to clustering in large multimedia databases with noise[C]. Proceedings of the Knowledge Discovery and Data Mining, New York, USA, 1998: 58-65.

html. 2014.5.

Liu X Y, Li Q Q, and Zhou Z H. Learning imbalanced multi-class data with optimal dichotomy weights[C]. Proceedings of the 13th IEEE International Conference on Data Mining, Dallas, USA, 2013: 478-487.

He H B and Edwardo A G. Learning from imbalanced Data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.

Maratea A, Petrosino A, and Manzo M. Adjusted F-measure and kernel scaling for imbalanced data learning[J]. Information Sciences, 2014(257): 331-341.

Wang S and Yao X. Multiclass imbalance problems: analysis and potential solutions[J]. IEEE Transactions on Systems, Man and Cybernetics, Part B, 2012, 42(4): 1119-1130.

Lin M, Tang K, and Yao X. Dynamic sampling approach to training neural networks for multiclass imbalance classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(4): 647-660.

Peng L Z, Zhang H L, Yang B, et al.. A new approach for imbalanced data classification based on data gravitation[J]. Information Sciences, 2014(288): 347-373.

Menardi G and Torelli N. Training and assessing classification rules with imbalanced data[J]. Data Mining and Knowledge Discovery, 2014, 28(1): 92-122.

施引文献

资源附件(0)

访问统计