高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于约束传递的深度主动时序聚类方法

霍纬纲 朱旭 张盼

霍纬纲, 朱旭, 张盼. 基于约束传递的深度主动时序聚类方法[J]. 电子与信息学报, 2025, 47(4): 1172-1181. doi: 10.11999/JEIT240855
引用本文: 霍纬纲, 朱旭, 张盼. 基于约束传递的深度主动时序聚类方法[J]. 电子与信息学报, 2025, 47(4): 1172-1181. doi: 10.11999/JEIT240855
HUO Weigang, ZHU Xu, ZHANG Pan. Deep Active Time-series Clustering Based on Constraint Transitivity[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1172-1181. doi: 10.11999/JEIT240855
Citation: HUO Weigang, ZHU Xu, ZHANG Pan. Deep Active Time-series Clustering Based on Constraint Transitivity[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1172-1181. doi: 10.11999/JEIT240855

基于约束传递的深度主动时序聚类方法

doi: 10.11999/JEIT240855
基金项目: 国家自然基金民航联合基金重点项目(U2033205),国家自然科学基金(62173331),天津市自然科学基金面上项目(24JCYBJC00990)
详细信息
    作者简介:

    霍纬纲:男,教授,研究方向为时序大数据分析、数据挖掘

    朱旭:男,硕士生,研究方向为时间序列分析

    张盼:女,硕士生,研究方向为时间序列分析

    通讯作者:

    朱旭 2022051001@cauc.edu.cn

  • 中图分类号: TN911.7; TP18

Deep Active Time-series Clustering Based on Constraint Transitivity

Funds: The Civil Aviation Joint Fund of The National Natural Science Foundation of China (U2033205), The National Natural Science Foundation of China (62173331), Tianjin Natural Science Foundation General Project (24JCYBJC00990)
  • 摘要: 已有的深度主动聚类方法未能通过标注样本推理生成必须链接(ML)约束或不能链接(CL)约束,标注成本较高。为此该文提出一种基于约束传递的深度主动时序聚类方法。该方法设置了标注类簇集合(ACS)及相应的辅助标注集合(AAS)。通过预训练时序自编码器得到时序样本的表示向量。在深度聚类的每个训练轮次过程中,采样并标注表示空间中离类簇中心最近的样本存入ACS,使每个ACS内的样本属同一类别而ACS集合间的样本属于不同类别,然后从包含样本数最小的ACS集合中随机选取时序样本,采样并标注与该样本不属于同一类簇且距其所在类簇中心最近的时序样本存入AAS,使ACS与相应的AAS中的样本为不同类别,由ACS及对应的AAS中的样本推理生成ML和CL约束。由基于t-分布的类簇分布与其生成的辅助分布间的KL散度以及使满足ML及CL约束的时序样本在表示空间距离分别变小和变大的约束损失更新时序自编码器中编码网络参数和聚类中心。在18个公开数据集上的实验结果表明,该方法聚类效果在较低标注预算下平均RI值比已有的典型基线模型均提升5%以上。
  • 图  1  DATC-CT模型图

    图  2  时序编码器$ E $结构

    图  3  不同标注成本下主动聚类方法的RI值

    图  4  DATC-CT方法推理生成的ML与CL约束对的数量

    1  探索阶段采样标注过程

     输入:已标注次数$b$,类簇$\left\{ {{\pi _g}} \right\}_{g = 1}^G$,类簇中心$\left\{ {{{\boldsymbol{\mu }}_g}} \right\}_{g = 1}^G$,
     ACS集合$\left\{ {{S_g}} \right\}_{g = 1}^G$
     输出:更新后的ACS集合$\left\{ {{S_g}} \right\}_{g = 1}^G$
     (1) for $ i =1 $ to $G$ do:
     (2)  从${\pi _i}$中选择尚未被选择且表示向量与${{\boldsymbol{\mu }}_i}$距离最小的样本${{\boldsymbol{x}}_p}$
     (3)  for $ j =1 $ to $G$ do:
     (4)   if ${S_j} = \varnothing $ then$ : $
     (5)    将样本${{\boldsymbol{x}}_p}$放入集合${S_j}$
     (6)    break
     (7)   else:
     (8)    从${S_j}$中随机选择一个样本${{\boldsymbol{x}}_q}$,标注样本${{\boldsymbol{x}}_p}$与${{\boldsymbol{x}}_q}$的关
          系,$ b= b+1 $
     (9)    if 样本${{\boldsymbol{x}}_p}$与${{\boldsymbol{x}}_q}$为ML约束 then:
     (10)     将样本${{\boldsymbol{x}}_p}$放入集合${S_j}$
     (11)     break
     (12)    end if
     (13)   end if
     (14) end for
     (15) end for
    下载: 导出CSV

    2  巩固阶段采样标注过程

     输入:已标注次数$b$,类簇$\left\{ {{\pi _g}} \right\}_{g = 1}^G$,类簇中心$\left\{ {{{\boldsymbol{\mu }}_g}} \right\}_{g = 1}^G$,
     ACS集合$\left\{ {{S_g}} \right\}_{g = 1}^G$,AAS集合$\left\{ {{S_g}} \right\}_{g = 1}^G$
     输出:更新后的ACS集合$\left\{ {{S_g}} \right\}_{g = 1}^G$与AAS集合$\left\{ {{{\bar S}_g}} \right\}_{g = 1}^G$
     (1) 从样本数量最少的集合${S_g}$中随机选择样本${{\boldsymbol{x}}_p}$,从样本${{\boldsymbol{x}}_p}$所
     属类簇${\pi _g}$之外选择表示向量与${{\boldsymbol{\mu }}_g}$距离最小的样本${{\boldsymbol{x}}_q}$
     (2) 标注样本${{\boldsymbol{x}}_p}$与${{\boldsymbol{x}}_q}$的关系,$b = b + 1$
     (3) if 样本${{\boldsymbol{x}}_p}$与${{\boldsymbol{x}}_q}$为ML约束关系 then
     (4)  将样本${{\boldsymbol{x}}_q}$放入集合${S_g}$
     (5) end if
     (6) if 样本${{\boldsymbol{x}}_p}$与${{\boldsymbol{x}}_q}$为CL约束关系 then
     (7)  将样本${{\boldsymbol{x}}_q}$放入集合${\bar S_g}$
     (8) end if
    下载: 导出CSV

    3  生成ML约束集合$M$和CL约束集合$C$

     输入:ACS集合$\left\{ {{S_g}} \right\}_{g = 1}^G$与AAS集合$\left\{ {{{\bar S}_g}} \right\}_{g = 1}^G$
     输出:ML约束集合$M$,CL约束集合$C$
     (1) 初始化集合$M$=$\varnothing $, $C$=$\varnothing $
     (2) for $g{\text{ = }}1$ to $G$ do
     (3)  for ${{\boldsymbol{x}}_i},{{\boldsymbol{x}}_j} \in {S_g}$ do
     (4)   将样本对(${{\boldsymbol{x}}_i},{{\boldsymbol{x}}_j}$)添加至集合$M$
     (5)  end for
     (6)  for ${{\boldsymbol{x}}_i} \in {S_g}$, ${{\boldsymbol{x}}_j} \in {\bar S_g}$ do
     (7)   将样本对(${{\boldsymbol{x}}_i},{{\boldsymbol{x}}_j}$)添加至集合$C$
     (8)  end for
     (9)  for $k = g + 1$ to $ G $ do
     (10)   for ${{\boldsymbol{x}}_i} \in {S_g}$, $ {{\boldsymbol{x}}}_{j}\in {S}_{k} $ do
     (11)    将样本对(${{\boldsymbol{x}}_i},{{\boldsymbol{x}}_j}$)添加至集合$C$
     (12)   end for
     (13) end for
     (14) end for
    下载: 导出CSV

    4  DATC-CT算法过程

     输入:数据集$X$,总预算$B$,当前预算$b$,训练轮次$T$,ACS集
     合$\left\{ {{S_g}} \right\}_{g = 1}^G$与AAS集合$\left\{ {{{\bar S}_g}} \right\}_{g = 1}^G$,主动采样阶段标志SFlag
     输出:类簇中心$\left\{ {{{\boldsymbol{\mu }}_g}} \right\}_{g = 1}^G$
     (1) 使用K-means方法对样本${{\boldsymbol{x}}_1},{{\boldsymbol{x}}_2} \cdots ,{{\boldsymbol{x}}_N}$的表示向量
       ${{\boldsymbol{z}}_1},{{\boldsymbol{z}}_2} \cdots ,{{\boldsymbol{z}}_N}$聚类,得到初始类簇中心${{\boldsymbol{\mu }}_1},{{\boldsymbol{\mu }}_2}, \cdots ,{{\boldsymbol{\mu }}_G}$
     (2)  初始化${S_1},{S_2}, \cdots ,{S_G} = \varnothing $, ${\bar S_1},{\bar S_2}, \cdots ,{\bar S_G} = \varnothing $,
       SFlag=True, $b = 0$
     (3)  for $ t =1\dots T $ do
     (4)   使用TAE编码器获取样本${{\boldsymbol{x}}_1},{{\boldsymbol{x}}_2}, \cdots ,{{\boldsymbol{x}}_N}$嵌入
         ${{\boldsymbol{z}}_1},{{\boldsymbol{z}}_2}, \cdots ,{{\boldsymbol{z}}_N}$
     (5)   使用式(2)计算样本类簇分布形成类簇${\pi _1},{\pi _2}, \cdots ,{\pi _G}$
     (6)   if $b < B$ then
     (7)    if SFlag then
     (8)     使用算法1更新${S_1},{S_2}, \cdots ,{S_G}$
     (9)    else:
     (10)     使用算法2更新${S_1},{S_2}, \cdots ,{S_G}$与${\bar S_1},{\bar S_2}, \cdots ,{\bar S_G}$
     (11)   end if
     (12)   if ACS集合${S_1},{S_2}, \cdots ,{S_G}$中均至少包含1个样本
         then
     (13)    SFlag=False
     (14)   end if
     (15)   使用算法3生成约束集合$M$与约束集合$C$
     (16)   使用式(6)计算模型损失${L_3}$
     (17)   使用${L_3}$更新TAE编码器参数与类簇中心${{\boldsymbol{\mu }}_1},{{\boldsymbol{\mu }}_2}, \cdots ,{{\boldsymbol{\mu }}_G}$
     (18) end for
    下载: 导出CSV

    表  1  实验数据集描述

    数据集样本总数类别数序列长度
    ChlorineConcentration43073166
    Dist.Phal.Outl.AgeGroup539380
    Dist.Phal.Outl.Correct876280
    ECG200200296
    ECGFiveDays8842136
    GunPoint2002150
    Ham2142431
    Herring1282512
    MoteStrain1272284
    OSULeaf4426427
    Plane2107144
    Prox.phal.outl.AgeGroup605380
    Prox.Phal.TW605680
    SonyAIBORobotSurface1621270
    SonyAIBORobotSurface2980265
    SwedishLeaf112515128
    ToeSegmentation12682277
    TwoLeadECG1662343
    下载: 导出CSV

    表  2  与相关深度时序聚类方法对比结果

    数据集DECDTCDTCRDATC-CT
    ChlorineConcentration0.49940.53530.53570.5370
    Dist.Phal.Outl.AgeGroup0.73210.78120.78250.7918
    Dist.Phal.Outl.Correct0.49980.50100.60750.5824
    ECG2000.63080.60180.66480.7345
    ECGFiveDays0.50020.50160.96381.0000
    GunPoint0.49750.54000.63980.8219
    Ham0.49870.56480.53620.5841
    Herring0.49710.50450.57590.5153
    MoteStrain0.76210.50620.76860.9228
    OSULeaf0.72830.73290.77390.7742
    Plane0.92830.90400.95490.9921
    Prox.phal.outl.AgeGroup0.38930.74300.80910.8140
    Prox.Phal.TW0.51510.83800.90230.8568
    SonyAIBORobotSurface10.89880.55630.87690.9967
    SonyAIBORobotSurface20.65090.70120.83540.8562
    SwedishLeaf0.88370.88710.92230.9241
    ToeSegmentation10.49950.50770.56590.5158
    TwoLeadECG0.50010.51160.71141.0000
    最优个数00414
    平均RI0.61730.63430.74590.7900
    下载: 导出CSV

    表  3  与已有主动选择策略对比结果

    数据集RandADCDATC-CT
    ChlorineConcentration0.52940.53420.5370
    Dist.Phal.Outl.AgeGroup0.73130.72180.7918
    Dist.Phal.Outl.Correct0.50190.50590.5824
    ECG2000.68820.66990.7345
    ECGFiveDays0.67470.83111.0000
    GunPoint0.57540.53480.8219
    Ham0.53020.51730.5841
    Herring0.50620.50220.5153
    MoteStrain0.83000.89150.9228
    OSULeaf0.74430.72570.7742
    Plane0.95060.98190.9921
    Prox.phal.outl.AgeGroup0.76560.80470.8140
    Prox.Phal.TW0.83150.85660.8568
    SonyAIBORobotSurface10.95260.97440.9967
    SonyAIBORobotSurface20.80330.75880.8562
    SwedishLeaf0.90510.91280.9241
    ToeSegmentation10.49820.50290.5158
    TwoLeadECG0.50110.99821.0000
    最优个数0018
    平均RI0.69550.73470.7900
    下载: 导出CSV
  • [1] AIGNER W, MIKSCH S, SCHUMANN H, et al. Visualization of Time-Oriented Data[M]. London: Springer, 2011: 153–196.
    [2] LEE S, CHOI C, and SON Y. Deep time-series clustering via latent representation alignment[J]. Knowledge-Based Systems, 2024, 303: 112434. doi: 10.1016/j.knosys.2024.112434.
    [3] CAI Jinyu, FAN Jicong, GUO Wenzhong, et al. Efficient deep embedded subspace clustering[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 21–30. doi: 10.1109/cvpr52688.2022.00012.
    [4] ZHONG Ying, HUANG Dong, and WANG Changdong. Deep temporal contrastive clustering[J]. Neural Processing Letters, 2023, 55(6): 7869–7885. doi: 10.1007/s11063-023-11287-0.
    [5] ROS F, RIAD R, and GUILLAUME S. Deep clustering framework review using multicriteria evaluation[J]. Knowledge-Based Systems, 2024, 285: 111315. doi: 10.1016/j.knosys.2023.111315.
    [6] XIE Junyuan, GIRSHICK R, and FARHADI A. Unsupervised deep embedding for clustering analysis[C]. The 33rd International Conference on International Conference on Machine Learning, New York, USA, 2016: 478–487.
    [7] MADIRAJU N S. Deep temporal clustering: Fully unsupervised learning of time-domain features[D]. [Ph. D. dissertation], Arizona State University, 2018.
    [8] MA Qianli, ZHENG Jiawei, LI Sen, et al. Learning representations for time series clustering[C]. The 33rd International Conference on Neural Information Processing Systems, Red Hook, USA, 2019: 3781–3791.
    [9] PENG Furong, LUO Jiachen, LU Xuan, et al. Cross-domain contrastive learning for time series clustering[C]. The AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 8921–8929. doi: 10.1609/aaai.v38i8.28740.
    [10] LI Xiaosheng, XI Wenjie, and LIN J. Randomnet: Clustering time series using untrained deep neural networks[J]. Data Mining and Knowledge Discovery, 2024, 38(6): 3473–3502. doi: 10.1007/s10618-024-01048-5.
    [11] SUN Bicheng, ZHOU Peng, DU Liang, et al. Active deep image clustering[J]. Knowledge-Based Systems, 2022, 252: 109346. doi: 10.1016/j.knosys.2022.109346.
    [12] REN Yazhou, HU Kangrong, DAI Xinyi, et al. Semi-supervised deep embedded clustering[J]. Neurocomputing, 2019, 325: 121–130. doi: 10.1016/j.neucom.2018.10.016.
    [13] REN Pengzhen, XIAO Yun, CHANG Xiaojun, et al. A survey of deep active learning[J]. ACM Computing Surveys (CSUR), 2022, 54(9): 180. doi: 10.1145/3472291.
    [14] DENG Xun, LIU Junlong, ZHONG Han, et al. A3S: A general active clustering method with pairwise constraints[C]. The 41st International Conference on Machine Learning, Vienna, Austria, 2024: 10488–10505.
    [15] 李海林, 张丽萍. 时间序列数据挖掘中的聚类研究综述[J]. 电子科技大学学报, 2022, 51(3): 416–424. doi: 10.12178/1001-0548.2022055.

    LI Hailin and ZHANG Liping. Summary of clustering research in time series data mining[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(3): 416–424. doi: 10.12178/1001-0548.2022055.
    [16] PETITJEAN F, KETTERLIN A, and GANÇARSKI P. A global averaging method for dynamic time warping, with applications to clustering[J]. Pattern Recognition, 2011, 44(3): 678–693. doi: 10.1016/j.patcog.2010.09.013.
    [17] PAPARRIZOS J and GRAVANO L. Fast and accurate time-series clustering[J]. ACM Transactions on Database Systems (TODS), 2017, 42(2): 8. doi: 10.1145/3044711.
    [18] ZAKARIA J, MUEEN A, and KEOGH E. Clustering time series using unsupervised-shapelets[C]. 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 2012: 785–794. doi: 10.1109/icdm.2012.26.
    [19] 余思琴, 闫秋艳, 闫欣鸣. 基于最佳u-shapelets的时间序列聚类算法[J]. 计算机应用, 2017, 37(8): 2349–2356. doi: 10.11772/j.issn.1001-9081.2017.08.2349.

    YU Siqin, YAN Qiuyan, and YAN Xinming. Clustering algorithm of time series with optimal u-shapelets[J]. Journal of Computer Applications, 2017, 37(8): 2349–2356. doi: 10.11772/j.issn.1001-9081.2017.08.2349.
    [20] MACQUEEN J B. Some methods for classification and analysis of multivariate observations[C]. The 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, 1967: 281–297.
    [21] JOHNSON S C. Hierarchical clustering schemes[J]. Psychometrika, 1967, 32(3): 241–254. doi: 10.1007/bf02289588.
    [22] NG A Y, JORDAN M I, and WEISS Y. On spectral clustering: Analysis and an algorithm[C]. The 15th International Conference on Neural Information Processing Systems: Natural and Synthetic, Cambridge, USA, 2001: 849–856.
    [23] CHANG Shiyu, ZHANG Yang, HAN Wei, et al. Dilated recurrent neural networks[C]. The 31st International Conference on Neural Information Processing Systems, Red Hook, USA, 2017: 76–86.
    [24] BASU S, BANERJEE A, and MOONEY R J. Active semi-supervision for pairwise constrained clustering[C]. The 2004 SIAM International Conference on Data Mining, Lake Buena Vista, USA, 2004: 333–344. doi: 10.1137/1.9781611972740.31.
    [25] DAU H A, BAGNALL A, KAMGAR K, et al. The UCR time series archive[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(6): 1293–1305. doi: 10.1109/jas.2019.1911747.
  • 加载中
图(4) / 表(7)
计量
  • 文章访问数:  124
  • HTML全文浏览量:  67
  • PDF下载量:  30
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-10-14
  • 修回日期:  2025-03-10
  • 网络出版日期:  2025-03-20
  • 刊出日期:  2025-04-01

目录

    /

    返回文章
    返回