高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于跨视角相似度顺序保持的基因特征提取方法

苏树智 张开宇 王子莹 张茂岩

苏树智, 张开宇, 王子莹, 张茂岩. 基于跨视角相似度顺序保持的基因特征提取方法[J]. 电子与信息学报, 2023, 45(1): 317-324. doi: 10.11999/JEIT211126
引用本文: 苏树智, 张开宇, 王子莹, 张茂岩. 基于跨视角相似度顺序保持的基因特征提取方法[J]. 电子与信息学报, 2023, 45(1): 317-324. doi: 10.11999/JEIT211126
SU Shuzhi, ZHANG Kaiyu, WANG Ziying, ZHANG Maoyan. A Gene Feature Extraction Method Based on Across-view Similarity Order Preserving[J]. Journal of Electronics & Information Technology, 2023, 45(1): 317-324. doi: 10.11999/JEIT211126
Citation: SU Shuzhi, ZHANG Kaiyu, WANG Ziying, ZHANG Maoyan. A Gene Feature Extraction Method Based on Across-view Similarity Order Preserving[J]. Journal of Electronics & Information Technology, 2023, 45(1): 317-324. doi: 10.11999/JEIT211126

基于跨视角相似度顺序保持的基因特征提取方法

doi: 10.11999/JEIT211126
基金项目: 国家自然科学基金(61806006),中国博士后科学基金(2019M660149),合肥综合性国家科学中心能源研究院项目(19KZS203),安徽省重点研发计划国际科技合作专项(202004b11020029)
详细信息
    作者简介:

    苏树智:男,副教授,研究方向为多模态模式识别、特征学习、基因分析

    张开宇:男,硕士生,研究方向为多模态模式识别、基因分析

    王子莹:女,硕士生,研究方向为模式识别、图像处理

    张茂岩:男,硕士生,研究方向为模式识别

    通讯作者:

    苏树智 sushuzhi@foxmail.com

  • 中图分类号: TN911.73; TP391.4

A Gene Feature Extraction Method Based on Across-view Similarity Order Preserving

Funds: The National Natural Science Foundation of China (61806006), China Postdoctoral Science Foundation (2019M660149), The Project of Institute of Energy, Hefei Comprehensive National Science Center (19KZS203), The International Science and Technology Cooperation Project of Key Research and Development Plan in Anhui Province (202004b11020029)
  • 摘要: 基因表达数据通常具有维数高、样本少、类别分布不均等特点,如何提取基因表达数据的有效特征是基因分类研究的关键问题。该文借助相关分析理论,构建鉴别敏感的视角内相似度顺序保持散布并且约束鉴别敏感的视角间相似度相关,从而形成了一种新的基因特征提取方法,即相似度顺序保持跨视角相关分析(SOPACA)。该文方法在保持不同视角间特征类内聚集性和相似度顺序的同时具有较大的类间离散性。在癌症基因表达数据集上的良好实验结果显示了该文方法的有效性。
  • 算法1 SOPACA方法步骤
     输入:视角数据集$\{ { {\boldsymbol{X} }^{(i)} } = ({\boldsymbol{x} }_1^{(i)},{\boldsymbol{x} }_2^{(i)}, \cdots ,{\boldsymbol{x} }_n^{(i)}) \in {{\boldsymbol{R}}^{ {d_i} \times n} }\} _{i = 1}^m$
     输出:基因样本类标签
     (1)利用式(7)和式(13)分别构建视角内相似度顺序保持散布矩阵
       ${\boldsymbol{S}}_w^{(i)}$和视角间相似度相关矩阵${\boldsymbol{S}}_b^{(ij)}$;
     (2)利用式(16)Lagrange函数求得特征值$\lambda $和对应特征向量;
     (3)利用式(20)获得相关投影矩阵
       $\{ {{\boldsymbol{W}}_i} = ({\boldsymbol{\alpha}}_1^{(i)},{\boldsymbol{\alpha}}_2^{(i)}, \cdots ,{\boldsymbol{\alpha}}_d^{(i)})\} _{i = 1}^m$;
     (4)利用式(21)获得特征融合后的鉴别矢量${\boldsymbol{Z}}$;
     (5)利用基于欧氏距离的最近邻分类器对鉴别矢量${\boldsymbol{Z}}$进行分类,
       得到基因样本类标签。
    下载: 导出CSV

    表  1  在肺癌基因表达数据集上的识别率变化结果

    方法 5训练样本10训练样本15训练样本20训练样本25训练样本
    SOPACA98.66$ \pm $0.8599.08$ \pm $0.9198.70$ \pm $1.2298.81$ \pm $0.9499.65$ \pm $0.74
    MCCA96.08$ \pm $2.3798.16$ \pm $1.1197.92$ \pm $1.4097.61$ \pm $1.0099.30$ \pm $1.11
    LDA96.70$ \pm $2.0598.05$ \pm $1.2298.31$ \pm $1.2398.51$ \pm $1.0099.30$ \pm $0.91
    GrMCCs94.64$ \pm $3.1096.55$ \pm $2.8298.05$ \pm $2.3197.61$ \pm $2.0198.60$ \pm $1.38
    LMCCA97.01$ \pm $1.4198.28$ \pm $1.1298.18$ \pm $1.4098.36$ \pm $1.1099.30$ \pm $0.91
    下载: 导出CSV

    表  2  在结直肠癌基因表达数据集上的平均识别率

    方法 2训练样本3训练样本4训练样本5训练样本6训练样本
    SOPACA98.67$ \pm $1.7299.29$ \pm $1.5199.23$ \pm $1.6299.58$ \pm $1.3299.09$ \pm $1.92
    MCCA95.67$ \pm $3.5297.50$ \pm $2.4197.31$ \pm $2.6099.17$ \pm $1.7698.18$ \pm $2.35
    LDA95.00$ \pm $2.8396.07$ \pm $2.0395.77$ \pm $4.2396.67$ \pm $2.6497.73$ \pm $2.40
    GrMCCs93.33$ \pm $8.7594.29$ \pm $2.5096.92$ \pm $3.5397.50$ \pm $2.1598.64$ \pm $2.20
    LMCCA96.67$ \pm $2.2296.07$ \pm $2.4197.31$ \pm $3.1797.50$ \pm $2.1597.73$ \pm $2.40
    下载: 导出CSV
  • [1] SHUMATE A and SALZBERG S L. Liftoff: Accurate mapping of gene annotations[J]. Bioinformatics, 2021, 37(12): 1639–1643. doi: 10.1093/BIOINFORMATICS/BTAA1016
    [2] LU Rongxiu, CAI Yingjie, ZHU Jianyong, et al. Dimension reduction of multimodal data by auto-weighted local discriminant analysis[J]. Neurocomputing, 2021, 461: 27–40. doi: 10.1016/J.NEUCOM.2021.06.035
    [3] 王肖锋, 孙明月, 葛为民. 基于图像协方差无关的增量特征提取方法研究[J]. 电子与信息学报, 2019, 41(11): 2768–2776. doi: 10.11999/JEIT181138

    WANG Xiaofeng, SUN Mingyue, and GE Weimin. An incremental feature extraction method without estimating image covariance matrix[J]. Journal of Electronics &Information Technology, 2019, 41(11): 2768–2776. doi: 10.11999/JEIT181138
    [4] ARTONI F, DELORME A, and MAKEIG S. Applying dimension reduction to EEG data by principal component analysis reduces the quality of its subsequent independent component decomposition[J]. NeuroImage, 2018, 175: 176–187. doi: 10.1016/j.neuroimage.2018.03.016
    [5] LI Chunna, SHAO Yuanhai, CHEN Weijie, et al. Generalized two-dimensional linear discriminant analysis with regularization[J]. Neural Networks, 2021, 142: 73–91. doi: 10.1016/J.NEUNET.2021.04.030
    [6] NAKAYAMA Y, YATA K, and AOSHIMA M. Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings[J]. Journal of Multivariate Analysis, 2021, 185: 104779. doi: 10.1016/J.JMVA.2021.104779
    [7] CLAYMAN C L, SRINIVASAN S M, and SANGWAN R S. K-means clustering and principal components analysis of microarray data of L1000 landmark genes[J]. Procedia Computer Science, 2020, 168: 97–104. doi: 10.1016/j.procs.2020.02.265
    [8] WANG Cheng, CAO Longbing, and MIAO Baiqi. Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data[J]. Computational Statistics & Data Analysis, 2013, 66: 140–149. doi: 10.1016/j.csda.2013.04.003
    [9] LIN Weiming, GAO Qinquan, DU Min, et al. Multiclass diagnosis of stages of Alzheimer's disease using linear discriminant analysis scoring for multimodal data[J]. Computers in Biology and Medicine, 2021, 134: 104478. doi: 10.1016/J.COMPBIOMED.2021.104478
    [10] 苏树智, 谢军, 平昕瑞, 等. 图强化典型相关分析及在图像识别中的应用[J]. 电子与信息学报, 2021, 43(11): 3342–3349. doi: 10.11999/JEIT210154

    SU Shuzhi, XIE Jun, PING Xinrui, et al. Graph enhanced canonical correlation analysis and its application to image recognition[J]. Journal of Electronics &Information Technology, 2021, 43(11): 3342–3349. doi: 10.11999/JEIT210154
    [11] LIN Dongdong, CALHOUN V D, and WANG Yuping. Correspondence between fMRI and SNP data by group sparse canonical correlation analysis[J]. Medical Image Analysis, 2014, 18(6): 891–902. doi: 10.1016/j.media.2013.10.010
    [12] TENENHAUS A, PHILIPPE C, and FROUIN V. Kernel generalized canonical correlation analysis[J]. Computational Statistics & Data Analysis, 2015, 90: 114–131. doi: 10.1016/j.csda.2015.04.004
    [13] WANG Wenjia and ZHOU Yihui. Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors[J]. Journal of Multivariate Analysis, 2021, 185: 104781. doi: 10.1016/J.JMVA.2021.104781
    [14] YUAN Yunhao, SUN Quansen, ZHOU Qiang, et al. A novel multiset integrated canonical correlation analysis framework and its application in feature fusion[J]. Pattern Recognition, 2011, 44(5): 1031–1040. doi: 10.1016/j.patcog.2010.11.004
    [15] DELEUS F and VAN HULLE M M. Functional connectivity analysis of fMRI data based on regularized multiset canonical correlation analysis[J]. Journal of Neuroscience Methods, 2011, 197(1): 143–157. doi: 10.1016/j.jneumeth.2010.11.029
    [16] YUAN Yunhao and SUN Quansen. Graph regularized multiset canonical correlations with applications to joint feature extraction[J]. Pattern Recognition, 2014, 47(12): 3907–3919. doi: 10.1016/j.patcog.2014.06.016
    [17] SU Shuzhi, GE Hongwei, and YUAN Yunhao. Kernel-aligned multi-view canonical correlation analysis for image recognition[J]. Infrared Physics & Technology, 2016, 78: 233–240. doi: 10.1016/j.infrared.2016.08.010
    [18] GAO Lei, QI Lin, CHEN Enqing, et al. Discriminative multiple canonical correlation analysis for information fusion[J]. IEEE Transactions on Image Processing, 2018, 27(4): 1951–1965. doi: 10.1109/TIP.2017.2765820
    [19] GAO Lei, ZHANG Rui, QI Lin, et al. The labeled multiple canonical correlation analysis for information fusion[J]. IEEE Transactions on Multimedia, 2019, 21(2): 375–387. doi: 10.1109/TMM.2018.2859590
    [20] HU Haoshuang, FENG Dazheng, and CHEN Qingyan. A novel dimensionality reduction method: Similarity order preserving discriminant analysis[J]. Signal Processing, 2021, 182: 107933. doi: 10.1016/J.SIGPRO.2020.107933
    [21] SU Shuzhi, ZHU Gang, and ZHU Yanmin. An orthogonal locality and globality dimensionality reduction method based on Twin Eigen decomposition[J]. IEEE Access, 2021, 9: 55714–55725. doi: 10.1109/ACCESS.2021.3071192
    [22] SHEN Xiaobo, SUN Quansen, and YUAN Yunhao. A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction[J]. Neurocomputing, 2015, 148: 397–408. doi: 10.1016/j.neucom.2014.06.015
    [23] SHOKRZADE A, RAMEZANI M, TAB F A, et al. A novel extreme learning machine based kNN classification method for dealing with big data[J]. Expert Systems with Applications, 2021, 183: 115293. doi: 10.1016/J.ESWA.2021.115293
    [24] LIU Dongwei, JIA Runping, WANG Caifeng, et al. Automated detection of cancerous genomic sequences using genomic signal processing and machine learning[J]. Future Generation Computer Systems, 2019, 98: 233–237. doi: 10.1016/J.FUTURE.2018.12.041
  • 加载中
表(3)
计量
  • 文章访问数:  1021
  • HTML全文浏览量:  233
  • PDF下载量:  111
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-10-14
  • 修回日期:  2022-01-10
  • 录用日期:  2022-01-12
  • 网络出版日期:  2022-02-02
  • 刊出日期:  2023-01-17

目录

    /

    返回文章
    返回