基于中心对齐多核学习的稀疏多元逻辑回归算法

雷大江; 唐建烊; 李智星; 吴渝

doi:10.11999/JEIT190426

基于中心对齐多核学习的稀疏多元逻辑回归算法

doi: 10.11999/JEIT190426 cstr: 32379.14.JEIT190426

1.
重庆邮电大学计算机科学与技术学院重庆 400065
2.
重庆邮电大学网络智能研究所重庆 400065

基金项目: 重庆市留学归国人员创新创业项目支持人选(cx2018120)，国家社会科学基金(17XFX013)，重庆市基础研究与前沿探索项目(cstc2015jcyjA40018)

详细信息

作者简介:
雷大江：男，1979年生，副教授，研究方向为机器学习

唐建烊：男，1993年生，硕士生，研究方向为核机器学习

李智星：男，1985年生，副教授，研究方向为自然语言处理

吴渝：女，1970年生，教授，研究方向为网络智能

通讯作者:
雷大江　leidj@cqupt.edu.cn

中图分类号: TN911.7; TP181
计量
- 文章访问数: 1938
- HTML全文浏览量: 551
- PDF下载量: 68
- 被引次数: 0
出版历程
- 收稿日期: 2019-06-11
- 修回日期: 2020-03-28
- 网络出版日期: 2020-08-27
- 刊出日期: 2020-11-16

Sparse Multinomial Logistic Regression Algorithm Based on Centered Alignment Multiple Kernels Learning

1.
College of Computer, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Institute of Web Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The Chongqing Innovative Project of Overseas Study(cx2018120), The National Social Science Foundation of China(17XFX013), The Natural Science Foundation of Chongqing(cstc2015jcyjA40018)

摘要

摘要: 稀疏多元逻辑回归(SMLR)作为一种广义的线性模型被广泛地应用于各种多分类任务场景中。SMLR通过将拉普拉斯先验引入多元逻辑回归(MLR)中使其解具有稀疏性，这使得该分类器可以在进行分类的过程中嵌入特征选择。为了使分类器能够解决非线性数据分类的问题，该文通过核技巧对SMLR进行核化扩充后得到了核稀疏多元逻辑回归(KSMLR)。KSMLR能够将非线性特征数据通过核函数映射到高维甚至无穷维的特征空间中，使其特征能够充分地表达并最终能进行有效的分类。此外，该文还利用了基于中心对齐的多核学习算法，通过不同的核函数对数据进行不同维度的映射，并用中心对齐相似度来灵活地选取多核学习权重系数，使得分类器具有更好的泛化能力。实验结果表明，该文提出的基于中心对齐多核学习的稀疏多元逻辑回归算法在分类的准确率指标上都优于目前常规的分类算法。
- 稀疏优化 /
- 核技巧 /
- 多核学习 /
- 稀疏多元逻辑回归
Abstract: As a generalized linear model, Sparse Multinomial Logistic Regression (SMLR) is widely used in various multi-class task scenarios. SMLR introduces Laplace priori into Multinomial Logistic Regression (MLR) to make its solution sparse, which allows the classifier to embed feature selection in the process of classification. In order to solve the problem of non-linear data classification, Kernel Sparse Multinomial Logistic Regression (KSMLR) is obtained by kernel trick. KSMLR can map nonlinear feature data into high-dimensional and even infinite-dimensional feature spaces through kernel functions, so that its features can be fully expressed and eventually classified effectively. In addition, the multi-kernel learning algorithm based on centered alignment is used to map the data in different dimensions through different kernel functions. Then center-aligned similarity can be used to select flexibly multi-kernel learning weight coefficients, so that the classifier has better generalization ability. The experimental results show that the sparse multinomial logistic regression algorithm based on center-aligned multi-kernel learning is superior to the conventional classification algorithm in classification accuracy.
- Sparse optimization /
- Kernel trick /
- Multiple kernels learning /
- Sparse Multinomial Logistic Regression(SMLR)

HTML全文

算法1：KSMLR问题的回溯ISTA算法
输入：
初始化步长：$ \tau =1/L $, $ L>0 $，
初始化参数：$ {\alpha }\in {R}^{n\times k} $，初始化核函数参数：$ \mathrm{\sigma }=2 $，
最大迭代次数：$ \mathrm{I}\mathrm{t}\mathrm{e}\mathrm{r} $ = 500，
回溯参数：$ \beta \in (0,\mathrm{ }1) $
输出：
算法最终的参数：$ {{\alpha }}^{t+1} $
迭代步骤：
步骤1　由样本$ {{X}}^{\left(i\right)} $计算得到核矩阵$ {k} $；
步骤2　初始化计数器 $ t\leftarrow 0 $；
步骤3　初始化参数$ {{\alpha }}^{{t}}\leftarrow {\alpha } $；
步骤4　 $ {{\alpha }}^{t+1}={p}_{\tau }\left({{\alpha }}^{t}\right) $；
步骤5　 $ \tau =\beta \tau $；
步骤6　当满足$l\left( {{{{\alpha}} ^{t + 1}}} \right) \le \hat l\left( {{{{\alpha}} ^{t + 1}},{{{\alpha}} ^t}} \right)$或迭代到指定次数时算　　　　　法终止，执行步骤7。否则，令t←t+1，并返回到步骤4；
步骤7　返回更新完成的算法参数${{{\alpha}} ^{t + 1}}$。

下载: 导出CSV

算法2：MKSMLR问题的回溯FISTA算法
输入：
初始化步长：$\tau =1/L$, $ L>0 $，
初始化参数：$ {\alpha }\in {R}^{n\times k} $，
初始化核函数参数：$ \mathrm{\sigma }=2 $，
最大迭代次数：$ \mathrm{I}\mathrm{t}\mathrm{e}\mathrm{r} $ = 500，
回溯参数：$ \beta \in (0,\mathrm{ }1) $
输出：
算法最终的参数：$ {{\alpha }}^{t+1} $
迭代步骤：
步骤1　由样本$ {{X}}^{\left(i\right)} $计算得到$ p $个不同的核矩阵；
步骤2　用Align方法计算得到多核学习参数$ {\mu } $并生成新的核矩阵　　　　　$ {{K}}_{c\mu } $；
步骤3　初始化计数器 $ t\leftarrow 0 $；
步骤4　初始化参数$ {{\alpha }}^{{t}}\leftarrow {\alpha } $, $ {\mu }^{t}\leftarrow 1 $,$ {v}^{t}\leftarrow {{\alpha }}^{{t}} $；
步骤5　 $ {{\alpha }}^{t+1}={p}_{\tau }\left({v}^{t}\right) $；
步骤6　 ${\mu }^{t+1}=\dfrac{1+\sqrt{1+4({\mu }^{t}{)}^{2} } }{2}$；
步骤7　${v}^{t+1}={{\alpha } }^{t+1}+\dfrac{ {\mu }^{t}-1}{ {\mu }^{t+1} }({{\alpha } }^{t+1}-{{\alpha } }^{t})$；
步骤8　$\tau= \beta \tau$；
步骤9　当满足$l\left( {{\alpha ^{t + 1}}} \right) \le \hat l\left( {{\alpha ^{t + 1}},\;{\alpha ^t}} \right)$或迭代到指定次数时算　　　　法终止，执行步骤10。否则，令$t \leftarrow t + 1$，并返回到步　　　　骤5；
步骤10　返回更新完成的算法参数${{{\alpha}} ^{t + 1}}$。

下载: 导出CSV

表 1 分类准确率

数据集	SVM	SLR	WDMLR	SML-ISTA	SML-FISTA	KSMLR	MKSMLR
Banana	0.9069	–	–	–	–	0.9069	0.9107
COIL20	0.8032	0.9676	0.9832	0.9895	0.9958	0.9977	1
ORL	0.9507	0.9420	0.9545	0.9242	0.9545	0.9000	0.9167
GT-32	–	–	0.7823	0.7580	0.7621	0.8044	0.8044
MNIST-S	0.9113	0.9001	0.9109	0.9036	0.9048	0.9360	0.9400
Lung	0.7705	0.9344	0.9104	0.9104	0.9254	0.9180	0.9344
Indian-pines	0.7980	0.8182	0.7599	0.8120	0.8120	0.8218	0.8237
Segment	0.5989	0.9235	0.8268	0.8925	0.9253	0.9538	0.9567
注：表中的“– ”符号表示未能正确分类或分类效果接近于随机选择。

下载: 导出CSV

表 2 算法运行时间(s)

数据集	SML-ISTA	SML-FISTA	KSMLR	MKSMLR
Banana	–	–	0.78	1.19
COIL20	1.71	0.39	7.61	13.46
ORL	142.05	7.5	10.43	2.73
GT-32	88.19	2.03	37.94	10.77
MNIST-S	0.12	0.14	0.14	22.98
Lung	42.71	1.4	2.12	3.08
Indian-pines	427.62	18.58	68.31	909.1
Segment	21.33	20.71	13.68	33.35
注：表中的“– ”符号表示未能正确分类或分类效果接近于随机选择。

下载: 导出CSV

参考文献(28)

ZHOU Changjun, WANG Lan, ZHANG Qiang, et al. Face recognition based on PCA and logistic regression analysis[J]. Optik, 2014, 125(20): 5916–5919. doi: 10.1016/j.ijleo.2014.07.080

WARNER P. Ordinal logistic regression[J]. Journal of Family Planning and Reproductive Health Care, 2008, 34(3): 169–170. doi: 10.1783/147118908784734945

LIU Wu, FOWLER J E, and ZHAO Chunhui. Spatial logistic regression for support-vector classification of hyperspectral imagery[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(3): 439–443. doi: 10.1109/LGRS.2017.2648515

ABRAMOVICH F and GRINSHTEIN V. High-dimensional classification by sparse logistic regression[J]. IEEE Transactions on Information Theory, 2019, 65(5): 3068–3079. doi: 10.1109/TIT.2018.2884963

CARVALHO C M, CHANG J, LUCAS J E, et al. High-dimensional sparse factor modeling: Applications in gene expression genomics[J]. Journal of the American Statistical Association, 2008, 103(484): 1438–1456. doi: 10.1198/016214508000000869

GALAR M, FERNÁNDEZ A, BARRENECHEA E, et al. An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes[J]. Pattern Recognition, 2011, 44(8): 1761–1776. doi: 10.1016/j.patcog.2011.01.017

曾志强, 吴群, 廖备水, 等. 一种基于核SMOTE的非平衡数据集分类方法[J]. 电子学报, 2009, 37(11): 2489–2495. doi: 10.3321/j.issn:0372-2112.2009.11.024

ZENG Zhiqiang, WU Qun, LIAO Beishui, et al. A classfication method for imbalance data set based on kernel SMOTE[J]. Acta Electronica Sinica, 2009, 37(11): 2489–2495. doi: 10.3321/j.issn:0372-2112.2009.11.024

CAO Faxian, YANG Zhijing, REN Jinchang, et al. Extreme sparse multinomial logistic regression: A fast and robust framework for hyperspectral image classification[J]. Remote Sensing, 2017, 9(12): 1255. doi: 10.3390/rs9121255

LIU Tianzhu, GU Yanfeng, JIA Xiuping, et al. Class-specific sparse multiple kernel learning for spectral–spatial hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7351–7365. doi: 10.1109/TGRS.2016.2600522

FANG Leyuan, WANG Cheng, LI Shutao, et al. Hyperspectral image classification via multiple-feature-based adaptive sparse representation[J]. IEEE Transactions on Instrumentation and Measurement, 2017, 66(7): 1646–1657. doi: 10.1109/TIM.2017.2664480

OUYED O and ALLILI M S. Feature weighting for multinomial kernel logistic regression and application to action recognition[J]. Neurocomputing, 2018, 275: 1752–1768. doi: 10.1016/j.neucom.2017.10.024

徐金环, 沈煜, 刘鹏飞, 等. 联合核稀疏多元逻辑回归和TV-L1错误剔除的高光谱图像分类算法[J]. 电子学报, 2018, 46(1): 175–184. doi: 10.3969/j.issn.0372-2112.2018.01.024

XU Jinhuan, SHEN Yu, LIU Pengfei, et al. Hyperspectral image classification combining kernel sparse multinomial logistic regression and TV-L1 error rejection[J]. Acta Electronica Sinica, 2018, 46(1): 175–184. doi: 10.3969/j.issn.0372-2112.2018.01.024

SCHÖLKOPF B and SMOLA A J. Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond[M]. Cambridge: MIT Press, 2002.

汪洪桥, 孙富春, 蔡艳宁, 等. 多核学习方法[J]. 自动化学报, 2010, 36(8): 1037–1050. doi: 10.3724/SP.J.1004.2010.01037

WANG Hongqiao, SUN Fuchun, CAI Yanning, et al. On multiple kernel learning methods[J]. Acta Automatica Sinica, 2010, 36(8): 1037–1050. doi: 10.3724/SP.J.1004.2010.01037

GÖNEN M and ALPAYDIN E. Multiple kernel learning algorithms[J]. Journal of Machine Learning Research, 2011, 12: 2211–2268.

GU Yanfeng, LIU Tianzhu, JIA Xiuping, et al. Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(6): 3235–3247. doi: 10.1109/TGRS.2015.2514161

RAKOTOMAMONJY A, BACH F R, CANU S, et al. SimpleMKL[J]. Journal of Machine Learning Research, 2008, 9: 2491–2521.

LOOSLI G and ABOUBACAR H. Using SVDD in SimpleMKL for 3D-Shapes filtering[C]. CAp - Conférence D'apprentissage, Saint-Etienne, 2017. doi: 10.13140/2.1.3091.3605.

JAIN A, VISHWANATHAN S V N, and VARMA M. SPF-GMKL: Generalized multiple kernel learning with a million kernels[C]. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012: 750–758. doi: 10.1145/2339530.2339648.

BAHMANI S, BOUFOUNOS P T, and RAJ B. Learning model-based sparsity via projected gradient descent[J]. IEEE Transactions on Information Theory, 2016, 62(4): 2092–2099. doi: 10.1109/TIT.2016.2515078

CORTES C, MOHRI M, and ROSTAMIZADEH A. Algorithms for learning kernels based on centered alignment[J]. Journal of Machine Learning Research, 2012, 13(28): 795–828.

CHENG Chunyuan, HSU C C, and CHENG Muchen. Adaptive kernel principal component analysis (KPCA) for monitoring small disturbances of nonlinear processes[J]. Industrial & Engineering Chemistry Research, 2010, 49(5): 2254–2262. doi: 10.1021/ie900521b

YANG Hongjun and LIU Jinkun. An adaptive RBF neural network control method for a class of nonlinear systems[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5(2): 457–462. doi: 10.1109/JAS.2017.7510820

BECK A and TEBOULLE M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2(1): 183–202. doi: 10.1137/080716542

KRISHNAPURAM B, CARIN L, FIGUEIREDO M A T, et al. Sparse multinomial logistic regression: Fast algorithms and generalization bounds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 957–968. doi: 10.1109/tpami.2005.127

CHEN Xi, LIN Qihang, KIM S, et al. Smoothing proximal gradient method for general structured sparse regression[J]. The Annals of Applied Statistics, 2012, 6(2): 719–752. doi: 10.1214/11-aoas514

LECUN Y, BENGIO Y and HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444. doi: 10.1038/nature14539

PÉREZ-ORTIZ M, GUTIÉRREZ P A, SÁNCHEZ-MONEDERO J, et al. A study on multi-scale kernel optimisation via centered kernel-target alignment[J]. Neural Processing Letters, 2016, 44(2): 491–517. doi: 10.1007/s11063-015-9471-0

施引文献

资源附件(0)

访问统计