高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

两行更新策略的成对约束投影-聚类联合优化方法

朱建勇 陈坤 杨辉 聂飞平

朱建勇, 陈坤, 杨辉, 聂飞平. 两行更新策略的成对约束投影-聚类联合优化方法[J]. 电子与信息学报. doi: 10.11999/JEIT260111
引用本文: 朱建勇, 陈坤, 杨辉, 聂飞平. 两行更新策略的成对约束投影-聚类联合优化方法[J]. 电子与信息学报. doi: 10.11999/JEIT260111
ZHU Jianyong, CHEN Kun, YANG Hui, NIE Feiping. Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating a Two-row Simultaneous Update Strategy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260111
Citation: ZHU Jianyong, CHEN Kun, YANG Hui, NIE Feiping. Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating a Two-row Simultaneous Update Strategy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260111

两行更新策略的成对约束投影-聚类联合优化方法

doi: 10.11999/JEIT260111 cstr: 32379.14.JEIT260111
基金项目: 国家自然科学基金(62363010),江西省自然科学基金重点项目(20252BAC250019),江西省“双千计划”(SSQ2023018) ,国家科技重大专项项目(2026ZD16010302)
详细信息
    作者简介:

    朱建勇:男,教授,研究方向为机器学习,工业人工智能等

    陈坤:男,硕士生,研究方向为机器学习,数据挖掘等

    杨辉:男,教授,研究方向为复杂系统建模,控制与优化等

    聂飞平:男,教授,研究方向为机器学习及应用,模式识别,数据挖掘等

    通讯作者:

    聂飞平 feipingnie@gmail.com

  • 中图分类号: TN911.23; TP181

Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating a Two-row Simultaneous Update Strategy

Funds: The National Natural Science Foundation of China (62363010), The Key Project of Jiangxi Provincial Natural Science Foundation (20252BAC250019), Jiangxi Double Thousand Plan (SSQ2023018), The National Science and Technology Major Project (2026ZD16010302)
  • 摘要: 现有的约束投影方法通常采用投影-聚类两步独立策略,导致投影误差直接传递至聚类过程;此外,成对约束仅局限于投影阶段,偏离由先验知识引导聚类的核心思想。为此,该文提出两行更新策略的成对约束投影-聚类联合优化方法。首先,将约束投影目标函数作为正则化项统一到聚类框架中,二者在交替优化的过程中共同逼近全局最优解;其次,为克服传统K-means易陷入局部次优解的缺陷,并旨在提升计算效率,采用了一种融合增量计算、提前存储、变量更新3步加速策略的改进坐标下降法求解指示矩阵;最后,将成对约束嵌入到聚类指示矩阵中,使有限的监督信息贯穿于整个学习流程,并设计了一种新颖的两行同时优化策略,在每次循环中精准且直接地分配勿连约束。仿真实验验证了该文所提方法的优越性。
  • 图  1  不同方法在4个基准数据集上的成对约束违反率对比

    图  2  不同$ \gamma $值对PCITUS算法性能的影响

    图  3  PCITUS噪声敏感度及收敛曲线

    表  1  本文主要符号说明

    符号 描述 符号 描述
    $ \boldsymbol{X}\in {\mathbb{R}}^{d\times n} $ 数据集矩阵 $ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $ 合并后的指示矩阵
    $ \boldsymbol{Y}\in {\mathbb{R}}^{n\times c} $ 指示矩阵 $ {\tilde{y}}_{i}\in {\mathbb{R}}^{\tilde{n}\times 1} $ $ \tilde{\boldsymbol{Y}} $的第i
    $ \boldsymbol{W}\in {\mathbb{R}}^{d\times m} $ 投影矩阵 $ {\tilde{y}}^{i}\in {\mathbb{R}}^{1\times c} $ $ \tilde{\boldsymbol{Y}} $的第i
    M(C) 必连(勿连)约束集合 d 样本原始特征维度
    n 原始样本数 m 投影维度
    $ \tilde{n} $ 合并后的样本数 c 类别数
    下载: 导出CSV

    1  式(6)下提出的PCITUS 算法

     1:输入:数据$ \tilde{\boldsymbol{X}}\in {\mathbb{R}}^{d\times \tilde{n}} $,成对约束集合MC,类别数c,调
      节参数$ \gamma $;
     2: 初始化:通过式(5)利用必连传递闭包形成超链接点,得到合
      并后的数据集$ \tilde{\boldsymbol{X}}\in {\mathbb{R}}^{d\times \tilde{n}} $,包含超链接点的CL约束集合$ \tilde{\boldsymbol{C}} $,
      随机初始化指示矩阵$ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $;
     3: repeat
     4:  计算$ \boldsymbol{S}=\boldsymbol{P}+\gamma \boldsymbol{K} $;
     5:  通过式(7)对S执行特征分解更新W
     6:  通过算法2中坐标下降法更新$ \tilde{\boldsymbol{Y}} $;
     7: until convergence
     8: 输出:总样本的预测标签Label.
    下载: 导出CSV

    2  坐标下降法求解问题(8)

     1: 输入:合并后数据矩阵$ \tilde{\boldsymbol{X}}\in {\mathbb{R}}^{d\times \tilde{n}} $,投影矩阵$ \boldsymbol{W}\in {\mathbb{R}}^{d\times m} $,
      包含超链接点的CL约束集合$ \tilde{C} $;
     2: 初始化:随机初始化指示矩阵$ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $,计算$ \boldsymbol{A}={\tilde{\boldsymbol{X}}}^{\text{T}}\boldsymbol{W}{\boldsymbol{W}}^{\text{T}}\tilde{\boldsymbol{X}} $;
     3: 计算并提前存储$ \boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $, $ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}\boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $和$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}{\tilde{\boldsymbol{y}}}_{k}(k=1,2,\cdots ,c) $;
     4: for i = 1 to$ \tilde{n} $ do
     5: if $ {\tilde{\boldsymbol{x}}}_{i} $不包含在$ \tilde{C} $中 then
     6:  通过式(14)计算$ \varphi (k)(k=1,2,\cdots, c) $;
     7:  通过式(15)更新$ \tilde{\boldsymbol{Y}} $的第i行$ {\tilde{\boldsymbol{y}}}^{i} $;
     8:  if $ q\neq p $ then
     9:   通过式(16)更新$ \boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $, $ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}\boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $和$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}{\tilde{\boldsymbol{y}}}_{k}(k=p,q) $;
     10: end if
     11: else if $ {\tilde{\boldsymbol{x}}}_{i} $和$ {\tilde{\boldsymbol{x}}}_{j} $包含在$ \tilde{C} $中 then
     12: 通过式(14)同时计算$ {\varphi }^{i}(k) $和$ {\varphi }^{j}(k) $;
     13: 通过式(17)同时更新$ {\tilde{\boldsymbol{y}}}^{i} $和$ {\tilde{\boldsymbol{y}}}^{j} $;
     14: if $ q\neq p $ then
     15: 通过式(16)分别更新$ {\tilde{\boldsymbol{y}}}^{i} $和$ {\tilde{\boldsymbol{y}}}^{j} $所对应的$ \boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $, $ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}\boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $和
        $ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}{\tilde{\boldsymbol{y}}}_{k}(k=p,q) $;
     16: end if
     17: end if
     18: end for
     19: 输出: 指示矩阵$ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $
    下载: 导出CSV

    表  2  数据集信息

    数据集样本数量维度类别数据集样本数量维度类别
    Wine178133Binalpha140432036
    Breast69992Steel1941277
    Australian690142Satimage6435366
    Landset2000366Mushroom81241122
    下载: 导出CSV

    表  3  不同算法参数设置

    算法名称参数描述算法名称参数描述
    Cop-Kmeans无参数调节Onestep-PCP权衡系数α=1
    PC-Kmeans权重系数ω=1PCOG正则化系数$y \in\left\{10^{-3}, 10^{-2}, 10^{-1}, 1,10,10^{-1}, 10^{-1}\right\} $
    SCC高斯核带宽$\sigma \in[0,2] $PCASG正则化系数$\lambda=1 ; \beta, \theta \in\left\{10^{-3}, 10^{-2}, 10^{-1}, 1,10,10^{2}, 10^{2}\right\} $
    CNP $ \rho \in [0.1,0.2] $ ESGCPC锚点个$m=\left\{2^{\prime}, 2^{\prime}, 2^{\prime}, 2^{\prime \prime}, 2^{n}\right\} $,近邻数$k=\{6,12,18,24\} $
    SSDC临时聚类数$ D=\sqrt{n} $ or $ \sqrt{n}/2 $PCITUS权衡系数$y \in\left\{10^{-3}, 10^{-2}, 10^{-1}, 1,10,10^{-1}, 10^{-1}\right\} $
    下载: 导出CSV

    表  5  不同方法在八个基准数据集上的NMI比较(平均值±标准差)(%)

    方法WineBreastAustralianLandsetBinalphaSteelSatimageMushroom
    Cop-Kmeans86.51±1.9069.86±2.1236.57±1.2159.79±2.8456.51±1.0229.62±2.1859.77±3.2742.76±5.49
    PC-Kmeans87.10±1.8670.52±1.1537.85±0.9858.83±2.3957.83±1.0130.58±2.0358.13±4.7340.39±4.36
    SCC87.28±1.3972.68±2.8035.79±1.6351.41±2.8161.89±0.4931.41±2.3660.89±0.1550.90±2.25
    CNP82.75±4.5170.42±1.2838.81±2.0257.53±4.3158.53±1.1330.44±1.9560.95±0.2145.84±3.77
    SSDC75.83±5.3430.96±3.8211.86±1.1444.64±2.6646.51±3.4726.31±1.2443.14±1.9124.71±2.54
    Onestep-PCP84.44±0.0037.12±0.008.71±0.0062.52±0.0061.33±0.3726.92±1.6661.65±0.0213.61±0.00
    PCOG83.24±3.3266.37±1.18N/A52.55±4.8560.54±5.0126.91±0.2946.90±3.65N/A
    PCASG62.59±3.9862.41±0.6325.20±3.3455.91±3.0660.51±1.5225.21±1.7955.95±3.45N/A
    ESGCPC88.07±2.0873.69±2.3836.98±1.6261.77±2.2258.94±0.8332.27±0.9161.37±0.5145.87±5.42
    PCITUS89.24±1.9675.53±1.4441.50±1.9562.01±0.3861.46±1.2132.69±1.8361.81±0.2258.19±0.61
    下载: 导出CSV

    表  4  不同方法在8个基准数据集上的ACC比较(平均值±标准差)(%)

    方法WineBreastAustralianLandsetBinalphaSteelSatimageMushroom
    Cop-Kmeans96.15±0.8895.09±0.4684.03±0.5167.32±1.8640.25±1.7844.75±2.7767.31±2.5881.52±5.76
    PC-Kmeans96.48±0.7895.27±0.2484.58±0.6065.10±3.2441.95±1.7746.39±3.4365.70±3.7979.27±6.24
    SCC96.43±0.4295.04±0.6583.52±2.8261.44±2.0346.11±1.9441.27±3.8563.99±0.3585.44±4.28
    CNP95.06±1.6095.14±0.3184.04±0.7765.32±3.1242.88±2.1944.81±3.5167.86±0.3588.94±2.69
    SSDC90.34±4.1866.82±4.6131.48±3.5952.95±4.6129.20±2.1832.65±3.3253.27±3.6449.48±6.15
    Onestep-PCP95.51±0.0080.11±0.0066.81±0.0064.30±0.0046.65±1.4241.20±3.0663.90±0.0270.21±0.00
    PCOG94.94±2.2894.07±0.42N/A61.75±5.3542.53±2.6642.28±1.4961.49±4.24N/A
    PCASG82.70±2.8893.62±0.1478.84±4.2167.97±1.5844.07±2.7941.04±0.8354.80±5.16N/A
    ESGCPC96.91±0.6995.91±0.4884.19±0.6569.28±2.7944.04±2.0746.29±2.6068.39±0.3086.19±4.07
    PCITUS97.11±0.5696.28±0.2986.07±0.7668.62±0.6146.89±2.0847.59±2.5170.91±1.1190.13±0.15
    下载: 导出CSV

    表  6  不同约束比例对本文算法的性能(NMI)影响(平均值±标准差)(%)

    指标约束比例WineBreastAustralianLandsetBinalphaSteelSatimageMushroom
    NMI0.189.24±1.9675.53±1.4441.50±1.9562.01±0.3861.20±1.2131.07±1.8361.81±0.2258.19±0.61
    0.289.91±2.8178.51±2.7545.72±2.8462.89±0.6163.04±0.9931.97±1.3662.63±0.2361.47±0.88
    0.391.29±3.1781.88±2.2250.13±2.5963.72±0.5164.83±1.0833.27±1.3963.40±0.3865.48±1.08
    0.491.81±2.5584.42±1.9454.29±3.4164.70±0.6366.55±1.3034.35±2.0164.13±0.4469.67±1.10
    0.592.75±2.4187.05±2.3256.95±2.7265.59±0.8768.38±1.2535.34±1.5364.38±2.1572.99±1.07
    下载: 导出CSV

    表  7  不同方法在4个较大规模数据集上的运行时间对比(s)

    方法Cop-KmeansPC-KmeansSCCCNPSSDCOnestep-PCPESGCPCPCITUS
    Landset0.834.223.830.0329.8416.911.480.34
    Steel1.835.2710.490.0232.6430.151.650.44
    Satimage5.6482.35514.760.53415.92669.775.114.82
    Mushroom2.688.62320.491.56723.551477.622.792.93
    下载: 导出CSV
  • [1] 张朋飞, 程俊, 张治坤, 等. 满足本地差分隐私的混合噪音感知的模糊C均值聚类算法[J]. 电子与信息学报, 2025, 47(3): 739–757. doi: 10.11999/JEIT241067.

    ZHANG Pengfei, CHENG Jun, ZHANG Zhikun, et al. Fuzzy C-means clustering algorithm based on mixed noise-aware under local differential privacy[J]. Journal of Electronics & Information Technology, 2025, 47(3): 739–757. doi: 10.11999/JEIT241067.
    [2] 金极栋, 卢宛萱, 孙显, 等. 分布采样对齐的遥感半监督要素提取框架及轻量化方法[J]. 电子与信息学报, 2024, 46(5): 2187–2197. doi: 10.11999/JEIT240220.

    JIN Jidong, LU Wanxuan, SUN Xian, et al. Remote sensing semi-supervised feature extraction framework and lightweight method integrated with distribution-aligned sampling[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2187–2197. doi: 10.11999/JEIT240220.
    [3] 霍纬纲, 朱旭, 张盼. 基于约束传递的深度主动时序聚类方法[J]. 电子与信息学报, 2025, 47(4): 1172–1181. doi: 10.11999/JEIT240855.

    HUO Weigang, ZHU Xu, and ZHANG Pan. Deep active time-series clustering based on constraint transitivity[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1172–1181. doi: 10.11999/JEIT240855.
    [4] CHEN Jingwei, XIE Shiyu, YANG Hui, et al. Effective semi-supervised graph clustering with pairwise constraints[J]. Information Sciences, 2024, 681: 121249. doi: 10.1016/j.ins.2024.121249.
    [5] WAGSTAFF K, CARDIE C, ROGERS S, et al. Constrained K-means clustering with background knowledge[C]. The 18th International Conference on Machine Learning, San Francisco, USA, 2001: 577–584.
    [6] KAMVAR S D, KAMVAR D, and MANNING C DKAMVAR S D, KAMVAR D, and MANNING C DSpectral learning[C]. The 18th International Joint Conference on Artificial Intelligence, San Francisco, USA, 2003: 561–566.
    [7] JIA Yuheng, WU Wenhui, WANG Ran, et al. Joint optimization for pairwise constraint propagation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(7): 3168–3180. doi: 10.1109/TNNLS.2020.3009953.
    [8] NIE Feiping, ZHANG Han, WANG Rong, et al. Semi-supervised clustering via pairwise constrained optimal graph[C]. The 29th International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 2021: 3160–3166. doi: 10.24963/IJCAI.2020/437.
    [9] ZHANG Jing, FAN Ruidong, TAO Hong, et al. Constrained clustering with weak label prior[J]. Frontiers of Computer Science, 2024, 18(3): 183338. doi: 10.1007/s11704-023-3355-7.
    [10] ZHOU Jie, HUANG Chucheng, GAO Can, et al. Weighted subspace fuzzy clustering with adaptive projection[J]. International Journal of Intelligent Systems, 2024, 2024: 6696775. doi: 10.1155/2024/6696775.
    [11] TANG Wei, XIONG Hui, ZHONG Shi, et al. Enhancing semi-supervised clustering: A feature projection perspective[C]. The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, USA, 2007: 707–716. doi: 10.1145/1281192.1281268.
    [12] WANG Hongjun, LI Tao, LI Tianrui, et al. Constraint neighborhood projections for semi-supervised clustering[J]. IEEE Transactions on Cybernetics, 2014, 44(5): 636–643. doi: 10.1109/TCYB.2013.2263383.
    [13] YU Zhiwen, LUO Peinan, LIU Jiming, et al. Semi-supervised ensemble clustering based on selected constraint projection[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(12): 2394–2407. doi: 10.1109/TKDE.2018.2818729.
    [14] NIE Feiping, XUE Jingjing, WU Danyang, et al. Coordinate descent method for k-means[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(5): 2371–2385. doi: 10.1109/TPAMI.2021.3085739.
    [15] ZHANG Chao, XU Deng, CHEN Chunlin, et al. Semi-supervised multi-view clustering with active constraints[C]. The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, Toronto, Canada, 2025: 1903–1912. doi: 10.1145/3690624.3709204.
    [16] BASU S, BANERJEE A, and MOONEY R J. Active semi-supervision for pairwise constrained clustering[C]. The 2004 SIAM International Conference on Data Mining, Lake Buena Vista, USA, 2004: 333–344.
    [17] REN Yazhou, HU Xiaohui, SHI Ke, et al. Semi-supervised DenPeak clustering with pairwise constraints[C]. The 15th Pacific RIM International Conference on Artificial Intelligence, Nanjing, China, 2018: 837–850. doi: 10.1007/978-3-319-97304-3_64.
    [18] CHEN Long and ZHONG Zhi. Adaptive and structured graph learning for semi-supervised clustering[J]. Information Processing & Management, 2022, 59(4): 102949. doi: 10.1016/j.ipm.2022.102949.
  • 加载中
图(3) / 表(9)
计量
  • 文章访问数:  93
  • HTML全文浏览量:  33
  • PDF下载量:  14
  • 被引次数: 0
出版历程
  • 收稿日期:  2026-01-28
  • 修回日期:  2026-05-09
  • 录用日期:  2026-05-12
  • 网络出版日期:  2026-05-29

目录

    /

    返回文章
    返回