高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于动态平衡自适应迁移学习的流量分类方法

尚凤军 李赛赛 王颖 催云帆

尚凤军, 李赛赛, 王颖, 催云帆. 基于动态平衡自适应迁移学习的流量分类方法[J]. 电子与信息学报, 2022, 44(9): 3308-3319. doi: 10.11999/JEIT210623
引用本文: 尚凤军, 李赛赛, 王颖, 催云帆. 基于动态平衡自适应迁移学习的流量分类方法[J]. 电子与信息学报, 2022, 44(9): 3308-3319. doi: 10.11999/JEIT210623
SHANG Fengjun, LI Saisai, WANG Ying, CUI Yunfan. Traffic Classification Method Based on Dynamic Balance Adaptive Transfer Learning[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3308-3319. doi: 10.11999/JEIT210623
Citation: SHANG Fengjun, LI Saisai, WANG Ying, CUI Yunfan. Traffic Classification Method Based on Dynamic Balance Adaptive Transfer Learning[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3308-3319. doi: 10.11999/JEIT210623

基于动态平衡自适应迁移学习的流量分类方法

doi: 10.11999/JEIT210623
基金项目: 国家自然科学基金(61672004)
详细信息
    作者简介:

    尚凤军:男,教授,博士生导师,主要研究方向为智能网络与通信、无线传感器网络、物联网

    李赛赛:男,硕士,研究方向为机器学习、迁移学习

    王颖:女,博士生,研究方向为深度强化学习

    通讯作者:

    尚凤军 shangfj@cqupt.edu.cn

  • 中图分类号: TP18

Traffic Classification Method Based on Dynamic Balance Adaptive Transfer Learning

Funds: The National Natural Science Foundation of China (61672004)
  • 摘要: 针对应用流量识别性能和准确率降低等问题,该文提出一种动态平衡自适应迁移学习的流量分类算法。首先将迁移学习引入到应用流量识别中,通过将源领域和目标领域的样本特征映射到高维特征空间中,使得源领域和目标领域的边缘分布与条件分布距离尽量小,提出使用概率模型来判断和计算域之间的边缘分布与条件分布的区别,利用概率模型对分类类别确认度的大小,定量来计算平衡因子$ \mu $,解决DDA中只考虑到分类错误率,没有考虑到确认度的问题。然后引入断崖式下跌策略动态确定特征主元的数量,将进行转换后的特征使用基础分类器进行训练,通过不断的迭代训练,将最终得到的分类器应用到最新的移动终端应用识别上,比传统机器学习方法的准确率平均提高了7%左右。最后针对特征维度较高的问题,引入逆向特征自删除策略,结合推土机距离(EMD),使用信息增益权重推土机相关系数,提出了针对应用流量识别的特征选择算法,解决了部分特征对模型的分类无法起到任何的帮助,仅仅导致模型的训练时间增加,甚至由于无关特征的存在导致模型的性能和准确率降低等问题,将经过选择处理的特征集作为迁移学习的训练输入数据,使得迁移算法的时间缩短大约80%。
  • 图  1  迁移学习模型设计

    图  2  联合分布适应方法

    图  3  根据特征值删除映射特征向量

    图  4  半监督学习分布适配模型训练

    图  5  公开视觉数据集

    图  6  不同参数μ对模型的影响

    图  7  特征选取前后所用时间对比

    表  1  有标签的半监督平衡分配适应算法

     输入:源数据 $ {\boldsymbol{X}}_{s} $,目标数据$ {\boldsymbol{X}}_{t} $,源数据标签$ {Y}_{s} $和目标数据标
        签$ {Y}_{t} $,迭代次数 T
     输出:分类器 $ f $
     Begin:
      (1) 选择概率模型$ {\mu }_{0}=\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}\mathrm{M}\mathrm{u}({\boldsymbol{X}}_{s},{\boldsymbol{X}}_{t}) $
      (2) $ \mathrm{v}\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{c}\mathrm{k}({\boldsymbol{X}}_{s},{\boldsymbol{X}}_{t}) $, create H
      (3) for i=0 to T do:
      (4)  updata $ {\boldsymbol{M}}_{0} $
      (5)  for j=0 to C do:
      (6)   if c not in Y_tar_predic
      (7)    continue
      (8)   if c in Y_tar_predic
      (9)    compute $ {\boldsymbol{M}}_{c} $, updata $ \boldsymbol{M} $
      (10) if Y tar_predic is not None
      (11)  updata $ \mu $
      (12) else
      (13)  updata $ \mu $
      (14)  values, vector := eig(dot(K,M,K.T), dot(K,H,K.T))
      (15) use (23) select $ {\mathrm{v}\mathrm{e}\mathrm{c}\mathrm{t}\mathrm{o}\mathrm{r}}_{\mathrm{p}\mathrm{o}\mathrm{r}\mathrm{t}} $
      (16) classifier.fit($ {X}_{\mathrm{t}\mathrm{r}\mathrm{a}\mathrm{i}\mathrm{n}},{Y}_{\mathrm{t}\mathrm{r}\mathrm{a}\mathrm{i}\mathrm{n}} $)
      (17) $\widehat{{y} }=\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}\mathrm{i}\mathrm{c}\mathrm{t}\left({X}_{ {\mathrm{t} }_{\mathrm{n}\mathrm{e}\mathrm{w} } }\right)$
      (18) updata W
      (19)return classifier: $ f $
     End
    下载: 导出CSV

    表  2  添加20%目标领域样本半监督学习结果(%)

    类别传统算法迁移学习算法
    k-NNPCASVMGFKTCAJDABDASMTADA
    C→A23.739.553.146.045.643.144.964.9
    C→W25.834.641.737.039.339.938.666.1
    C→D25.544.647.840.845.949.047.859.2
    A→C26.039.041.740.742.040.940.960.7
    A→W29.835.931.937.040.038.039.352.9
    A→D25.533.844.640.135.742.043.360.5
    W→C19.928.228.224.831.533.028.958.0
    W→A23.029.127.627.630.529.833.066.7
    W→D59.289.278.385.486.089.291.790.4
    D→C26.329.726.429.333.031.232.559.9
    D→A28.533.226.228.732.833.433.171.3
    D→W63.486.152.580.387.589.291.985.1
    平均31.443.641.643.145.846.647.267.1
    下载: 导出CSV

    表  3  应用流量实验对比结果(%)

    流量类别传统学习算法迁移学习算法
    RandomForestk-NNGaussianNBBDASMTADA
    1→1290.690.484.387.197.2
    2→1291.891.983.887.697.9
    3→1290.891.485.888.297.5
    4→1291.890.084.687.497.3
    5→1292.090.484.191.598.0
    6→1291.489.785.487.898.2
    7→1289.792.086.591.397.1
    8→1292.891.586.888.696.6
    9→1291.989.684.888.196.8
    10→1290.292.686.381.998.1
    平均91.390.9585.2487.9597.47
    下载: 导出CSV

    表  4  所选特征的部分含义

    序号简称全称
    109idletime_max_b a服务端到客户端时连续数据包之间的最大空闲时间
    119RTT_avg_b a服务端到客户端的RTT平均值
    123RTT_from_3WHS_b a服务端到客户端TCP 3次握手所计算RTT时间
    126RTT_full_sz_min_a b客户端到服务端最小RTT样例
    223FFT_all所有包的IAT傅里叶变换(频率6)
    110Throughput_a b客户端到服务端的平均吞吐量
    1Client Port客户端服务端口号
    243FFT_b a服务端到客户端分组IAT反正切傅里叶变换(频率6)
    106data_xmit_a b客户端到服务端总数据传输时间
    221FFT_all所有包的IAT傅里叶变换(频率4)
    114RTT_min_a b客户端到服务端的最小RTT样例
    241FFT_b a服务端到客户端分组IAT反正切傅里叶变换(频率4)
    下载: 导出CSV

    表  5  逆向选择策略所选特征实验结果(%)

    流量类别传统学习算法迁移学习算法
    RandomForestk-NNGaussianNBBDASMTADASMTADA(P)
    1→1290.690.484.395.397.097.2
    2→1291.891.983.894.796.197.9
    3→1290.891.485.893.995.997.5
    4→1291.890.084.695.896.697.3
    5→1292.090.484.194.495.498.0
    6→1291.489.785.494.197.298.2
    7→1289.792.086.592.796.497.1
    8→1292.891.586.893.696.296.6
    9→1291.989.684.891.895.596.8
    10→1290.292.686.395.197.298.1
    平均91.390.9585.2494.1496.3597.47
    下载: 导出CSV
  • [1] 邹腾宽, 汪钰颖, 吴承荣. 网络背景流量的分类与识别研究综述[J]. 计算机应用, 2019, 39(3): 802–811. doi: 10.11772/j.issn.1001-9081.2018071552

    ZOU Tengkuan, WANG Yuying, and WU Chengrong. Review of network background traffic classification and identification[J]. Journal of Computer Applications, 2019, 39(3): 802–811. doi: 10.11772/j.issn.1001-9081.2018071552
    [2] SUN Guanglu, LIANG Lili, CHEN Teng, et al. Network traffic classification based on transfer learning[J]. Computers & Electrical Engineering, 2018, 69: 920–927. doi: 10.1016/j.compeleceng.2018.03.005
    [3] 李猛, 李艳玲, 林民. 命名实体识别的迁移学习研究综述[J]. 计算机科学与探索, 2020, 15(2): 206–218. doi: 10.3778/j.issn.1673-9418.2003049

    LI Meng, LI Yanling, and LIN Min. Review of transfer learning for named entity recognition[J]. Journal of Frontiers of Computer Science and Technology, 2020, 15(2): 206–218. doi: 10.3778/j.issn.1673-9418.2003049
    [4] WANG Jindong. Concise handbook of transfer learning[EB/OL].https://www.sohu.com/a/420774574_114877.
    [5] 李号号. 基于实例的迁移学习技术研究及应用[D]. [硕士论文], 武汉大学, 2018.

    LI Haohao. Research and application of instance-based transfer learning[D]. [Master dissertation], Wuhan University, 2018.
    [6] PAN S J, TSANG I W, KWOK J T, et al. Domain adaptation via transfer component analysis[J]. IEEE Transactions on Neural Networks, 2011, 22(2): 199–210. doi: 10.1109/TNN.2010.2091281
    [7] 季鼎承, 蒋亦樟, 王士同. 基于域与样例平衡的多源迁移学习方法[J]. 电子学报, 2019, 47(3): 692–699.

    JI Dingcheng, JIANG Yizhang, and WANG Shitong. Multi-source transfer learning method by balancing both the domains and instances[J]. Acta Electronica Sinica, 2019, 47(3): 692–699.
    [8] YAO Yi and DORETTO G. Boosting for transfer learning with multiple sources[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 1855–1862.
    [9] 唐诗淇, 文益民, 秦一休. 一种基于局部分类精度的多源在线迁移学习算法[J]. 软件学报, 2017, 28(11): 2940–2960. doi: 10.13328/j.cnki.jos.005352

    TANG Shiqi, WEN Yimin, and QIN Yixiu. Online transfer learning from multiple sources based on local classification accuracy[J]. Journal of Software, 2017, 28(11): 2940–2960. doi: 10.13328/j.cnki.jos.005352
    [10] 张博, 史忠植, 赵晓非, 等. 一种基于跨领域典型相关性分析的迁移学习方法[J]. 计算机学报, 2015, 38(7): 1326–1336. doi: 10.11897/SP.J.1016.2015.01326

    ZHANG Bo, SHI Zhongzhi, ZHAO Xiaofei, et al. A transfer learning based on canonical correlation analysis across different domains[J]. Chinese Journal of Computers, 2015, 38(7): 1326–1336. doi: 10.11897/SP.J.1016.2015.01326
    [11] 张宁. 基于决策树分类器的迁移学习研究[D]. [硕士论文], 西安电子科技大学, 2014.

    ZHANG Ning. Research on transfer learning based on decision tree classifier[D]. [Master dissertation], Xidian University, 2014.
    [12] 洪佳明, 印鉴, 黄云, 等. TrSVM: 一种基于领域相似性的迁移学习算法[J]. 计算机研究与发展, 2011, 48(10): 1823–1830.

    HONG Jiaming, YIN Jian, HUANG Yun, et al. TrSVM: A transfer learning algorithm using domain similarity[J]. Journal of Computer Research and Development, 2011, 48(10): 1823–1830.
    [13] FADDOUL J B and CHIDLOVSKII B. Learning multiple tasks with boosted decision trees[P]. US, 8694444, 2014.
    [14] WAN Zitong, YANG Rui, HUANG Mengjie, et al. A review on transfer learning in EEG signal analysis[J]. Neurocomputing, 2021, 421: 1–14. doi: 10.1016/j.neucom.2020.09.017
    [15] TAN Ben, ZHANG Yu, PAN S, et al. Distant domain transfer learning[C]. The Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, USA, 2017: 2604–2610.
    [16] CHIANG K J, WEI Chunshu, NAKANISHI M, et al. Boosting template-based SSVEP decoding by cross-domain transfer learning[J]. Journal of Neural Engineering, 2021, 18(1): 016002. doi: 10.1088/1741-2552/abcb6e
    [17] NIU Shuteng, HU Yihao, WANG Jian, et al. Feature-based distant domain transfer learning[C]. 2020 IEEE International Conference on Big Data (Big Data), Atlanta, USA, 2020: 5164–5171.
    [18] DUAN Lixin, XU Dong, and TSANG I W H. Domain adaptation from multiple sources: A domain-dependent regularization approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(3): 504–518. doi: 10.1109/TNNLS.2011.2178556
    [19] WANG Jindong, CHEN Yiqiang, FENG Wenjie, et al. Transfer learning with dynamic distribution adaptation[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 11(1): 6. doi: 10.1145/3360309
    [20] JIANG R, PACCHIANO A, STEPLETON T, et al. Wasserstein fair classification[C]. The Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, Tel Aviv, Israel, 2019: 862–872.
    [21] PRONZATO L. Performance analysis of greedy algorithms for minimising a maximum mean discrepancy[J]. arXiv preprint arXiv: 2101.07564, 2021.
    [22] 张东光. 统计学[M]. 2版. 北京: 科学出版社, 2020: 161–169.

    ZHANG Dongguang. Statistics[M]. 2nd ed. Beijing: China Science Press, 2020: 161–169.
    [23] MOORE A W and ZUEV D. Internet traffic classification using bayesian analysis techniques[J]. ACM SIGMETRICS Performance Evaluation Review, 2005, 33(1): 50–60. doi: 10.1145/1071690.1064220
  • 加载中
图(7) / 表(5)
计量
  • 文章访问数:  700
  • HTML全文浏览量:  554
  • PDF下载量:  100
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-22
  • 修回日期:  2022-01-28
  • 录用日期:  2022-03-10
  • 网络出版日期:  2022-03-20
  • 刊出日期:  2022-09-19

目录

    /

    返回文章
    返回