高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

满足本地差分隐私的混合噪音感知的模糊C均值聚类算法

张朋飞 程俊 张治坤 方贤进 孙笠 王杰 姜茸

陈培龙. 用相速跳变提高行波管的效率[J]. 电子与信息学报, 1979, 1(2): 64-73.
引用本文: 张朋飞, 程俊, 张治坤, 方贤进, 孙笠, 王杰, 姜茸. 满足本地差分隐私的混合噪音感知的模糊C均值聚类算法[J]. 电子与信息学报. doi: 10.11999/JEIT241067
ZHANG Pengfei, CHENG Jun, ZHANG Zhikun, FANG Xianjin, SUN Li, WANG Jie, JIANG Rong. Fuzzy C-Means Clustering Algorithm Based on Mixed Noise-aware under Local Differential Privacy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT241067
Citation: ZHANG Pengfei, CHENG Jun, ZHANG Zhikun, FANG Xianjin, SUN Li, WANG Jie, JIANG Rong. Fuzzy C-Means Clustering Algorithm Based on Mixed Noise-aware under Local Differential Privacy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT241067

满足本地差分隐私的混合噪音感知的模糊C均值聚类算法

doi: 10.11999/JEIT241067
基金项目: 安徽理工大学高层次引进人才科研启动基金(2023yjrc92),云南省服务计算重点实验室开放课题 (YNSC24116),国家自然科学基金(62202164)
详细信息
    作者简介:

    张朋飞:男,讲师,研究方向为数据安全与隐私保护、数据挖掘

    程俊:男,硕士生,研究方向为数据安全与隐私保护

    张治坤:男,副教授,研究方向为隐私计算、数据隐私保护、机器学习隐私与安全

    方贤进:男,教授,研究方向为数据安全与隐私保护,智能计算

    孙笠:男,讲师,研究方向为数据挖掘和机器学习

    王杰:男,教授,研究方向为智能检测与智能仪表、粉尘防治技术研究

    姜茸:男,教授,研究方向为数据安全与隐私保护,智能计算

    通讯作者:

    张治坤 zhikun@zju.edu.cn

  • 中图分类号: TN911; TP391

Fuzzy C-Means Clustering Algorithm Based on Mixed Noise-aware under Local Differential Privacy

Funds: The Scientific Research Foundation for High-level Talents of Anhui University of Science and Technology (2023yjrc92), The Foundation of Yunnan Key Laboratory of Service Computing (YNSC24116), The National Natural Science Foundation of China (62202164)
  • 摘要: 在大数据和物联网应用中,本地差分隐私(LDP)技术用于保护聚类分析中的用户隐私,但现有方法要么在LDP下交互式地进行聚类,需要消耗大量隐私预算,要么没有同时考虑到聚类数据中蕴含的表示数据质量的高斯噪音以及为满足LDP保护的拉普拉斯噪音,致使聚类精度低下。同时,对于衡量用户提交数据和簇心之间的距离选择较为武断,没有充分利用到用户提交的噪音数据中蕴含的噪音模式。为此,该文创新性地提出一种满足LDP的混合噪音感知的模糊C均值聚类算法(mnFCM),该算法的主要思想是同时建模用户上传数据中蕴含的表示用户质量的高斯噪音以及为保护用户数据注入的拉普拉斯噪音,进而设计出混合噪音感知的距离替代传统的欧式距离,来衡量样本数据与簇心间的相似性。特别地,在mnFCM中,该文首先设计了混合噪音感知的距离计算方法,在此基础上给出算法新的目标函数,并基于拉格朗日乘子法设计了求解方法,最后理论上分析了求解算法的收敛性。该文进一步理论分析了mnFCM的隐私、效用和复杂度,分析结果表明所提算法严格满足LDP、相对于对比算法更接近非隐私下的簇心以及和非隐私算法具有接近的复杂度。在两个真实数据集上的实验结果表明,mnFCM在满足LDP下,聚类精度提高了10%~15%。
  • 图  1  NG和UW数据集上不同隐私预算ε对各算法性能的影响

    图  2  NG和UW数据集上不同模糊度参数m对各算法性能的影响

    图  3  NG和UW数据集上目标函数值随迭代次数的变化情况

    图  4  NG和UW数据集上不同隐私预算下聚类效果对比

    图  5  Syn数据集上不同隐私预算下密度聚类算法效果对比

    图  6  Syn数据集上各算法在不同参数下的聚类效果对比

    表  1  常用符号列表

    符号 符号含义
    ε 隐私预算
    C, N, K 簇、样本数据以及样本属性的个数
    uci 样本数据的划分隶属度
    Dci 样本数据与簇心间的混合噪音感知的距离
    xi, pc 样本数据i、簇c的质心
    ˆxi 加噪后的样本数据i
    m, γ 模糊度参数、收敛终止参数
    τ 迭代阈值
    下载: 导出CSV

    1  mnFCM算法

     输入:样本数据集X={x1,x2, ···, xN},模糊度参数m,收敛终止
     参数γ,最大迭代次数τmax
     输出:样本划分隶属度uci,簇心pc
     /*本地端*/
     for i=1, 2, ··· , N do
      调用Laplace机制对样本数据进行加噪;
      用户将加噪后的样本数据ˆxi上传给服务器;
     end for
     /*服务器端*/
     设置初始迭代次数:τ =0;
     随机初始化uci使其满足0≤uci≤1;
     迭代开始:
     根据式(22)更新pc
     根据式(12)更新Dci
     根据式(18)更新uci
     更新迭代次数τ= τ+1;
     迭代终止条件 max|uci(τ)–uci(τ–1)|≤γ或者τ =τmax
     return uci, pc
    下载: 导出CSV

    表  2  算法运行时间对比

    对比算法NG (min)UW(s)
    NoPriv58.33
    PrivDis1923.67
    PrivPro2644.36
    PrivKM2434.63
    mnFCM78.42
    下载: 导出CSV
  • [1] MISTRY H K, MAVANI C, GOSWAMI A, et al. The impact of cloud computing and AI on industry dynamics and competition[J]. Educational Administration: Theory and Practice, 2024, 30(7): 797–804. doi: 10.53555/kuey.v30i7.6851.
    [2] 常璐瑶, 牛新征, 罗涛, 等. 基于子博弈完美均衡的启发式聚类算法[J]. 电子学报, 2024, 52(3): 740–750. doi: 10.12263/DZXB.20221206.

    CHANG Luyao, NIU Xinzheng, LUO Tao, et al. Heuristic clustering algorithm based on sub-game perfect equilibrium[J]. Acta Electronica Sinica, 2024, 52(3): 740–750. doi: 10.12263/DZXB.20221206.
    [3] 黄鹤, 李文龙, 杨澜, 等. 跳跃跟踪SSA交叉迭代AP聚类算法[J]. 电子学报, 2024, 52(3): 977–990. doi: 10.12263/DZXB.20220209.

    HUANG He, LI Wenlong, YANG Lan, et al. Jump tracking SSA hybrid iterative AP clustering algorithm[J]. Acta Electronica Sinica, 2024, 52(3): 977–990. doi: 10.12263/DZXB.20220209.
    [4] 张强, 叶阿勇, 叶帼华, 等. 最优聚类的k-匿名数据隐私保护机制[J]. 计算机研究与发展, 2022, 59(7): 1625–1635. doi: 10.7544/issn1000-1239.20210117.

    ZHANG Qiang, YE Ayong, YE Guohua, et al. k-Anonymous data privacy protection mechanism based on optimal clustering[J]. Journal of Computer Research and Development, 2022, 59(7): 1625–1635. doi: 10.7544/issn1000-1239.20210117.
    [5] LIN Wanyu, LI Baochun, and WANG Cong. Towards private learning on decentralized graphs with local differential privacy[J]. IEEE Transactions on Information Forensics and Security, 2022, 17: 2936–2946. doi: 10.1109/TIFS.2022.3198283.
    [6] 傅培旺, 丁红发, 刘海, 等. 基于本地差分隐私的分布式图统计采集算法[J]. 计算机研究与发展, 2024, 61(7): 1643–1669. doi: 10.7544/issn1000-1239.202330628.

    FU Peiwang, DING Hongfa, LIU Hai, et al. Statistics collecting algorithms of distributed graph via local differential privacy[J]. Journal of Computer Research and Development, 2024, 61(7): 1643–1669. doi: 10.7544/issn1000-1239.202330628.
    [7] 李宗维, 孔德潮, 牛媛争, 等. 基于人工智能和区块链融合的隐私保护技术研究综述[J]. 信息安全研究, 2023, 9(6): 557–565. doi: 10.12379/j.issn.2096-1057.2023.06.08.

    LI Zongwei, KONG Dechao, NIU Yuanzheng, et al. Towards a privacy-preserving research for AI and blockchain integration[J]. Journal of Information Security Research, 2023, 9(6): 557–565. doi: 10.12379/j.issn.2096-1057.2023.06.08.
    [8] LUO Yuling, WANG Zhangrui, ZHANG Shunsheng, et al. Efficient-secure k-means clustering guaranteeing personalized local differential privacy[C]. 22nd International Conference on Algorithms and Architectures for Parallel Processing, Copenhagen, Denmark, 2023: 660–675. doi: 10.1007/978-3-031-22677-9_35.
    [9] 张少波, 原刘杰, 毛新军, 等. 基于本地差分隐私的K-modes聚类数据隐私保护方法[J]. 电子学报, 2022, 50(9): 2181–2188. doi: 10.12263/DZXB.20201374.

    ZHANG Shaobo, YUAN Liujie, MAO Xinjun, et al. Privacy protection method for K-modes clustering data with local differential privacy[J]. Acta Electronica Sinica, 2022, 50(9): 2181–2188. doi: 10.12263/DZXB.20201374.
    [10] XIA Chang, HUA Jingyu, TONG Wei, et al. Distributed K-Means clustering guaranteeing local differential privacy[J]. Computers & Security, 2020, 90: 101699. doi: 10.1016/j.cose.2019.101699.
    [11] LIN Aixin and MA Xuebin. PU_Bpub: High-dimensional data release mechanism based on spectral clustering with local differential privacy[C]. 17th International Conference on Wireless Algorithms, Systems, and Applications, Dalian, China, 2022: 572–581. doi: 10.1007/978-3-031-19214-2_48.
    [12] 张国鹏, 陈学斌, 王豪石, 等. 面向本地差分隐私的K-Prototypes聚类方法[J]. 计算机应用, 2022, 42(12): 3813–3821. doi: 10.11772/j.issn.1001-9081.2021101724.

    ZHANG Guopeng, CHEN Xuebin, WANG Haoshi, et al. K-Prototypes clustering method for local differential privacy[J]. Journal of Computer Applications, 2022, 42(12): 3813–3821. doi: 10.11772/j.issn.1001-9081.2021101724.
    [13] LI Weiqing, CHEN Hongyu, ZHAO Ruifeng, et al. A federated recommendation system based on local differential privacy clustering[C]. 2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI), Atlanta, USA, 2021: 364–369. doi: 10.1109/SWC50871.2021.00056.
    [14] LI Yong, SONG Xiao, TU Yuchun, et al. GAPBAS: Genetic algorithm-based privacy budget allocation strategy in differential privacy K-means clustering algorithm[J]. Computers & Security, 2024, 139: 103697. doi: 10.1016/j.cose.2023.103697.
    [15] LIU Chao, ZHI Zhaolong, ZHAO Weinan, et al. Research on local fingerprint image differential privacy protection method based on clustering algorithm and regression algorithm segmentation image[J]. IEEE Access, 2024, 12: 27127–27146. doi: 10.1109/ACCESS.2024.3363494.
    [16] 石江南, 彭长根, 谭伟杰. Spark框架下支持差分隐私保护的K-means++聚类方法[J]. 信息安全研究, 2024, 10(8): 712–718. doi: 10.12379/j.issn.2096-1057.2024.08.04.

    SHI Jiangnan, PENG Changgen, and TAN Weijie. K-means++ clustering method supporting differential privacy protection in spark framework[J]. Journal of Information Security Research, 2024, 10(8): 712–718. doi: 10.12379/j.issn.2096-1057.2024.08.04.
    [17] WU Fuyu, DU Mingjing, and ZHI Qiang. Density-based clustering with differential privacy[J]. Information Sciences, 2024, 681: 121211. doi: 10.1016/j.ins.2024.121211.
    [18] FANG Shuhui, WAN Xuejun, WANG Jun, et al. HiDS Data clustering algorithm based on differential privacy[C]. 2024 International Conference on Networking and Network Applications (NaNA), Yinchuan, China, 2024: 131–136. doi: 10.1109/NaNA63151.2024.00029.
    [19] DIAA A, HUMPHRIES T, and KERSCHBAUM F. FastLloyd: Federated, accurate, secure, and tunable k-means clustering with differential privacy[EB/OL]. https://arxiv.org/abs/2405.02437, 2024.
    [20] SONG Haina, HAN Xinyu, LV Jie, et al. MPLDS: An integration of CP-ABE and local differential privacy for achieving multiple privacy levels data sharing[J]. Peer-to-Peer Networking and Applications, 2022, 15(1): 369–385. doi: 10.1007/s12083-021-01238-8.
    [21] YANG Wenjun and AL-MASRI E. ULDP: A user-centric local differential privacy optimization method[C]. 2024 IEEE World AI IoT Congress (AIIoT), Seattle, USA, 2024: 0316–0322. doi: 10.1109/AIIoT61789.2024.10579023.
    [22] 曾卓, 汪成亮, 马飞. 基于差分隐私的活动模式保护与时空轨迹发布方法[J]. 电子学报, 2023, 51(3): 552–563. doi: 10.12263/DZXB.20210631.

    ZENG Zhuo, WANG Chengliang, MA Fei. Differentially private activity pattern and spatial-temporal trajectory publication[J]. Acta Electronica Sinica, 2023, 51(3): 552–563. doi: 10.12263/DZXB.20210631.
    [23] HERNANDEZ-MATAMOROS A and KIKUCHI H. Comparative analysis of local differential privacy schemes in healthcare datasets[J]. Applied Sciences, 2024, 14(7): 2864. doi: 10.3390/app14072864.
    [24] DU Minxin, YUE Xiang, CHOW S S M, et al. Sanitizing sentence embeddings (and labels) for local differential privacy[C]. Proceedings of the ACM Web Conference 2023, New York, USA, 2023: 2349–2359. doi: 10.1145/3543507.3583512.
    [25] YANG Mengmeng, GUO Taolin, ZHU Tianqing, et al. Local differential privacy and its applications: A comprehensive survey[J]. Computer Standards & Interfaces, 2024, 89: 103827. doi: 10.1016/j.csi.2023.103827.
    [26] KRASNOV D, DAVIS D, MALOTT K, et al. Fuzzy C-means clustering: A review of applications in breast cancer detection[J]. Entropy, 2023, 25(7): 1021. doi: 10.3390/e25071021.
    [27] ALI N A, EL ABBASSI A, and BOUATTANE O. Performance evaluation of spatial fuzzy C-means clustering algorithm on GPU for image segmentation[J]. Multimedia Tools and Applications, 2023, 82(5): 6787–6805. doi: 10.1007/s11042-022-13635-z.
    [28] 徐久成, 侯钦臣, 瞿康林, 等. 面向时间序列的鲁棒性半监督模糊C均值聚类[J]. 计算机工程与应用, 2023, 59(8): 73–80. doi: 10.3778/j.issn.1002-8331.2207-0445.

    XU Jiucheng, HOU Qinchen, QU Kanglin, et al. Robust semi-supervised fuzzy C-means clustering for time series[J]. Computer Engineering and Applications, 2023, 59(8): 73–80. doi: 10.3778/j.issn.1002-8331.2207-0445.
    [29] ARORA J, TUSHIR M, and DADHWAL S K. A new suppression-based possibilistic fuzzy c-means clustering algorithm[J]. EAI Endorsed Transactions on Scalable Information Systems, 2023, 10(3): e3. doi: 10.4108/eetsis.v10i3.2057.
    [30] FANG Siguo, HUANG Dong, CAI Xiaosha, et al. Efficient multi-view clustering via unified and discrete bipartite graph learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(8): 11436–11447. doi: 10.1109/TNNLS.2023.3261460.
    [31] JAEGER A and BANKS D. Cluster analysis: A modern statistical review[J]. WIREs Computational Statistics, 2023, 15(3): e1597. doi: 10.1002/wics.1597.
    [32] BESHARATNIA F, TALEBPOUR A, and ALIAKBARY S. An improved grey wolves optimization algorithm for dynamic community detection and data clustering[J]. Applied Artificial Intelligence, 2022, 36(1): 2012000. doi: 10.1080/08839514.2021.2012000.
    [33] ALEMAZKOOR N, TOOTKABONI M, NATEGHI R, et al. Smart-meter big data for load forecasting: An alternative approach to clustering[J]. IEEE Access, 2022, 10: 8377–8387. doi: 10.1109/ACCESS.2022.3142680.
    [34] ZHAO Wenhao, MA Jin, LIU Qiyuan, et al. Comparison and application of SOFM, fuzzy c-means and k-means clustering algorithms for natural soil environment regionalization in China[J]. Environmental Research, 2023, 216: 114519. doi: 10.1016/j.envres.2022.114519.
    [35] 葛丽娜, 陈圆圆, 王捷, 等. 改进的密度峰值聚类算法的差分隐私保护方案[J]. 郑州大学学报: 工学版, 2023, 44(6): 19–24. doi: 10.13705/j.issn.1671-6833.2023.03.010.

    GE Lina, CHEN Yuanyuan, WANG Jie, et al. Differential privacy protection scheme of adaptive clustering by fast search and find of density peaks[J]. Journal of Zhengzhou University: Engineering Science, 2023, 44(6): 19–24. doi: 10.13705/j.issn.1671-6833.2023.03.010.
    [36] CHEN Hua, ZHOU Yuan, MEI Kehui, et al. A new density peak clustering algorithm with adaptive clustering center based on differential privacy[J]. IEEE Access, 2023, 11: 1418–1431. doi: 10.1109/ACCESS.2022.3233196.
  • 加载中
图(6) / 表(3)
计量
  • 文章访问数:  86
  • HTML全文浏览量:  34
  • PDF下载量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-12-04
  • 修回日期:  2025-02-25
  • 网络出版日期:  2025-03-07

目录

    /

    返回文章
    返回