Sparse Multinomial Logistic Regression Algorithm Based on Centered Alignment Multiple Kernels Learning
摘要: 稀疏多元逻辑回归(SMLR)作为一种广义的线性模型被广泛地应用于各种多分类任务场景中。SMLR通过将拉普拉斯先验引入多元逻辑回归(MLR)中使其解具有稀疏性,这使得该分类器可以在进行分类的过程中嵌入特征选择。为了使分类器能够解决非线性数据分类的问题,该文通过核技巧对SMLR进行核化扩充后得到了核稀疏多元逻辑回归(KSMLR)。KSMLR能够将非线性特征数据通过核函数映射到高维甚至无穷维的特征空间中,使其特征能够充分地表达并最终能进行有效的分类。此外,该文还利用了基于中心对齐的多核学习算法,通过不同的核函数对数据进行不同维度的映射,并用中心对齐相似度来灵活地选取多核学习权重系数,使得分类器具有更好的泛化能力。实验结果表明,该文提出的基于中心对齐多核学习的稀疏多元逻辑回归算法在分类的准确率指标上都优于目前常规的分类算法。Abstract: As a generalized linear model, Sparse Multinomial Logistic Regression (SMLR) is widely used in various multi-class task scenarios. SMLR introduces Laplace priori into Multinomial Logistic Regression (MLR) to make its solution sparse, which allows the classifier to embed feature selection in the process of classification. In order to solve the problem of non-linear data classification, Kernel Sparse Multinomial Logistic Regression (KSMLR) is obtained by kernel trick. KSMLR can map nonlinear feature data into high-dimensional and even infinite-dimensional feature spaces through kernel functions, so that its features can be fully expressed and eventually classified effectively. In addition, the multi-kernel learning algorithm based on centered alignment is used to map the data in different dimensions through different kernel functions. Then center-aligned similarity can be used to select flexibly multi-kernel learning weight coefficients, so that the classifier has better generalization ability. The experimental results show that the sparse multinomial logistic regression algorithm based on center-aligned multi-kernel learning is superior to the conventional classification algorithm in classification accuracy.
算法1:KSMLR问题的回溯ISTA算法 输入: 初始化步长:$ \tau =1/L $, $ L>0 $, 初始化参数:$ {\alpha }\in {R}^{n\times k} $,初始化核函数参数:$ \mathrm{\sigma }=2 $, 最大迭代次数:$ \mathrm{I}\mathrm{t}\mathrm{e}\mathrm{r} $ = 500, 回溯参数:$ \beta \in (0,\mathrm{ }1) $ 输出: 算法最终的参数:$ {{\alpha }}^{t+1} $ 迭代步骤: 步骤1 由样本$ {{X}}^{\left(i\right)} $计算得到核矩阵$ {k} $; 步骤2 初始化计数器 $ t\leftarrow 0 $; 步骤3 初始化参数$ {{\alpha }}^{{t}}\leftarrow {\alpha } $; 步骤4 $ {{\alpha }}^{t+1}={p}_{\tau }\left({{\alpha }}^{t}\right) $; 步骤5 $ \tau =\beta \tau $; 步骤6 当满足$l\left( {{{{\alpha}} ^{t + 1}}} \right) \le \hat l\left( {{{{\alpha}} ^{t + 1}},{{{\alpha}} ^t}} \right)$或迭代到指定次数时算
法终止,执行步骤7。否则,令t←t+1,并返回到步骤4;步骤7 返回更新完成的算法参数${{{\alpha}} ^{t + 1}}$。 算法2:MKSMLR问题的回溯FISTA算法 输入: 初始化步长:$\tau =1/L$, $ L>0 $, 初始化参数:$ {\alpha }\in {R}^{n\times k} $, 初始化核函数参数:$ \mathrm{\sigma }=2 $, 最大迭代次数:$ \mathrm{I}\mathrm{t}\mathrm{e}\mathrm{r} $ = 500, 回溯参数:$ \beta \in (0,\mathrm{ }1) $ 输出: 算法最终的参数:$ {{\alpha }}^{t+1} $ 迭代步骤: 步骤1 由样本$ {{X}}^{\left(i\right)} $计算得到$ p $个不同的核矩阵; 步骤2 用Align方法计算得到多核学习参数$ {\mu } $并生成新的核矩阵
$ {{K}}_{c\mu } $;步骤3 初始化计数器 $ t\leftarrow 0 $; 步骤4 初始化参数$ {{\alpha }}^{{t}}\leftarrow {\alpha } $, $ {\mu }^{t}\leftarrow 1 $,$ {v}^{t}\leftarrow {{\alpha }}^{{t}} $; 步骤5 $ {{\alpha }}^{t+1}={p}_{\tau }\left({v}^{t}\right) $; 步骤6 ${\mu }^{t+1}=\dfrac{1+\sqrt{1+4({\mu }^{t}{)}^{2} } }{2}$; 步骤7 ${v}^{t+1}={{\alpha } }^{t+1}+\dfrac{ {\mu }^{t}-1}{ {\mu }^{t+1} }({{\alpha } }^{t+1}-{{\alpha } }^{t})$; 步骤8 $\tau= \beta \tau$; 步骤9 当满足$l\left( {{\alpha ^{t + 1}}} \right) \le \hat l\left( {{\alpha ^{t + 1}},\;{\alpha ^t}} \right)$或迭代到指定次数时算
法终止,执行步骤10。否则,令$t \leftarrow t + 1$,并返回到步
骤5;步骤10 返回更新完成的算法参数${{{\alpha}} ^{t + 1}}$。 表 1 分类准确率
数据集 SVM SLR WDMLR SML-ISTA SML-FISTA KSMLR MKSMLR Banana 0.9069 – – – – 0.9069 0.9107 COIL20 0.8032 0.9676 0.9832 0.9895 0.9958 0.9977 1 ORL 0.9507 0.9420 0.9545 0.9242 0.9545 0.9000 0.9167 GT-32 – – 0.7823 0.7580 0.7621 0.8044 0.8044 MNIST-S 0.9113 0.9001 0.9109 0.9036 0.9048 0.9360 0.9400 Lung 0.7705 0.9344 0.9104 0.9104 0.9254 0.9180 0.9344 Indian-pines 0.7980 0.8182 0.7599 0.8120 0.8120 0.8218 0.8237 Segment 0.5989 0.9235 0.8268 0.8925 0.9253 0.9538 0.9567 注:表中的“– ”符号表示未能正确分类或分类效果接近于随机选择。 表 2 算法运行时间(s)
数据集 SML-ISTA SML-FISTA KSMLR MKSMLR Banana – – 0.78 1.19 COIL20 1.71 0.39 7.61 13.46 ORL 142.05 7.5 10.43 2.73 GT-32 88.19 2.03 37.94 10.77 MNIST-S 0.12 0.14 0.14 22.98 Lung 42.71 1.4 2.12 3.08 Indian-pines 427.62 18.58 68.31 909.1 Segment 21.33 20.71 13.68 33.35 注:表中的“– ”符号表示未能正确分类或分类效果接近于随机选择。 -
ZHOU Changjun, WANG Lan, ZHANG Qiang, et al. Face recognition based on PCA and logistic regression analysis[J]. Optik, 2014, 125(20): 5916–5919. doi: 10.1016/j.ijleo.2014.07.080 WARNER P. Ordinal logistic regression[J]. Journal of Family Planning and Reproductive Health Care, 2008, 34(3): 169–170. doi: 10.1783/147118908784734945 LIU Wu, FOWLER J E, and ZHAO Chunhui. Spatial logistic regression for support-vector classification of hyperspectral imagery[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(3): 439–443. doi: 10.1109/LGRS.2017.2648515 ABRAMOVICH F and GRINSHTEIN V. High-dimensional classification by sparse logistic regression[J]. IEEE Transactions on Information Theory, 2019, 65(5): 3068–3079. doi: 10.1109/TIT.2018.2884963 CARVALHO C M, CHANG J, LUCAS J E, et al. High-dimensional sparse factor modeling: Applications in gene expression genomics[J]. Journal of the American Statistical Association, 2008, 103(484): 1438–1456. doi: 10.1198/016214508000000869 GALAR M, FERNÁNDEZ A, BARRENECHEA E, et al. An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes[J]. Pattern Recognition, 2011, 44(8): 1761–1776. doi: 10.1016/j.patcog.2011.01.017 曾志强, 吴群, 廖备水, 等. 一种基于核SMOTE的非平衡数据集分类方法[J]. 电子学报, 2009, 37(11): 2489–2495. doi: 10.3321/j.issn:0372-2112.2009.11.024ZENG Zhiqiang, WU Qun, LIAO Beishui, et al. A classfication method for imbalance data set based on kernel SMOTE[J]. Acta Electronica Sinica, 2009, 37(11): 2489–2495. doi: 10.3321/j.issn:0372-2112.2009.11.024 CAO Faxian, YANG Zhijing, REN Jinchang, et al. Extreme sparse multinomial logistic regression: A fast and robust framework for hyperspectral image classification[J]. Remote Sensing, 2017, 9(12): 1255. doi: 10.3390/rs9121255 LIU Tianzhu, GU Yanfeng, JIA Xiuping, et al. Class-specific sparse multiple kernel learning for spectral–spatial hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7351–7365. doi: 10.1109/TGRS.2016.2600522 FANG Leyuan, WANG Cheng, LI Shutao, et al. Hyperspectral image classification via multiple-feature-based adaptive sparse representation[J]. IEEE Transactions on Instrumentation and Measurement, 2017, 66(7): 1646–1657. doi: 10.1109/TIM.2017.2664480 OUYED O and ALLILI M S. Feature weighting for multinomial kernel logistic regression and application to action recognition[J]. Neurocomputing, 2018, 275: 1752–1768. doi: 10.1016/j.neucom.2017.10.024 徐金环, 沈煜, 刘鹏飞, 等. 联合核稀疏多元逻辑回归和TV-L1错误剔除的高光谱图像分类算法[J]. 电子学报, 2018, 46(1): 175–184. doi: 10.3969/j.issn.0372-2112.2018.01.024XU Jinhuan, SHEN Yu, LIU Pengfei, et al. Hyperspectral image classification combining kernel sparse multinomial logistic regression and TV-L1 error rejection[J]. Acta Electronica Sinica, 2018, 46(1): 175–184. doi: 10.3969/j.issn.0372-2112.2018.01.024 SCHÖLKOPF B and SMOLA A J. Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond[M]. Cambridge: MIT Press, 2002. 汪洪桥, 孙富春, 蔡艳宁, 等. 多核学习方法[J]. 自动化学报, 2010, 36(8): 1037–1050. doi: 10.3724/SP.J.1004.2010.01037WANG Hongqiao, SUN Fuchun, CAI Yanning, et al. On multiple kernel learning methods[J]. Acta Automatica Sinica, 2010, 36(8): 1037–1050. doi: 10.3724/SP.J.1004.2010.01037 GÖNEN M and ALPAYDIN E. Multiple kernel learning algorithms[J]. Journal of Machine Learning Research, 2011, 12: 2211–2268. GU Yanfeng, LIU Tianzhu, JIA Xiuping, et al. Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(6): 3235–3247. doi: 10.1109/TGRS.2015.2514161 RAKOTOMAMONJY A, BACH F R, CANU S, et al. SimpleMKL[J]. Journal of Machine Learning Research, 2008, 9: 2491–2521. LOOSLI G and ABOUBACAR H. Using SVDD in SimpleMKL for 3D-Shapes filtering[C]. CAp - Conférence D'apprentissage, Saint-Etienne, 2017. doi: 10.13140/2.1.3091.3605. JAIN A, VISHWANATHAN S V N, and VARMA M. SPF-GMKL: Generalized multiple kernel learning with a million kernels[C]. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012: 750–758. doi: 10.1145/2339530.2339648. BAHMANI S, BOUFOUNOS P T, and RAJ B. Learning model-based sparsity via projected gradient descent[J]. IEEE Transactions on Information Theory, 2016, 62(4): 2092–2099. doi: 10.1109/TIT.2016.2515078 CORTES C, MOHRI M, and ROSTAMIZADEH A. Algorithms for learning kernels based on centered alignment[J]. Journal of Machine Learning Research, 2012, 13(28): 795–828. CHENG Chunyuan, HSU C C, and CHENG Muchen. Adaptive kernel principal component analysis (KPCA) for monitoring small disturbances of nonlinear processes[J]. Industrial & Engineering Chemistry Research, 2010, 49(5): 2254–2262. doi: 10.1021/ie900521b YANG Hongjun and LIU Jinkun. An adaptive RBF neural network control method for a class of nonlinear systems[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5(2): 457–462. doi: 10.1109/JAS.2017.7510820 BECK A and TEBOULLE M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2(1): 183–202. doi: 10.1137/080716542 KRISHNAPURAM B, CARIN L, FIGUEIREDO M A T, et al. Sparse multinomial logistic regression: Fast algorithms and generalization bounds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 957–968. doi: 10.1109/tpami.2005.127 CHEN Xi, LIN Qihang, KIM S, et al. Smoothing proximal gradient method for general structured sparse regression[J]. The Annals of Applied Statistics, 2012, 6(2): 719–752. doi: 10.1214/11-aoas514 LECUN Y, BENGIO Y and HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444. doi: 10.1038/nature14539 PÉREZ-ORTIZ M, GUTIÉRREZ P A, SÁNCHEZ-MONEDERO J, et al. A study on multi-scale kernel optimisation via centered kernel-target alignment[J]. Neural Processing Letters, 2016, 44(2): 491–517. doi: 10.1007/s11063-015-9471-0 -
- 文章访问数: 1828
- HTML全文浏览量: 402
- PDF下载量: 68
- 被引次数: 0