基于反向标定合成数据的改进集成算法
doi: 10.3724/SP.J.1146.2010.00954
A New Ensemble Algorithm Based on Oppositional Relabeling of Artificial Data
-
摘要: 面对获得的数据量越来越多,需要处理的数据类型也不尽相同,因此就需要寻找一种具有较好泛化性能和较高分类精度的算法。该文提出一种通过借用反向扩充训练数据样本对输入数据类型的不敏感性和径向基函数网络模型快速学习的能力来进行集成的混合算法。采用渐进P值作为受试者特征曲线下面积与0.5判断冗余特征的标准,将反向标定合成的新数据对分类器进行训练,通过比较训练误差的变化来决定新分类器的添加,最终以绝大多数投票方法对所有的分类器进行决策融合。最后以UCI数据为实验,结果表明该算法可以较好地适应于不同数据类型,得到比其它集成算法更高的分类精度。
-
关键词:
- 集成算法 /
- 径向基函数神经网络 /
- 反向扩充训练数据样本 /
- 投票法 /
- ROC曲线
Abstract: The amount of data increases rapidly, and the types of data need to be handled become more and more various, a new algorithm with better generalization performance and higher classification accuracy is indispensable. In this paper, a new hybrid algorithm is proposed, which takes the advantage of the insensitivity to the input data of the Diversity Ensemble Creation by Oppositional Relabeling of Artificial Training Examples (DECORATE) algorithm and the efficiency of the radial basis functions neural network model. Asymptotic P value to decide the relationship between the area under receiver operator characteristic with 0.5 which belong to redundant features, and the oppositional relabeling artificial data is used to train the classifier. Then the new classifier is added which will lower training error get down to the original model, and the most vote is used to get the decision fusion result. Finally, this method is applied to UCI dataset, the results show that it can adapt to the different kinds of data and give the higher accuracy of classification.
计量
- 文章访问数: 2894
- HTML全文浏览量: 93
- PDF下载量: 590
- 被引次数: 0