Advanced Search
Volume 39 Issue 1
Jan.  2017
Turn off MathJax
Article Contents
WANG Ju, LIU Fuxian. Motif Discovery Algorithm for Multiple Attributes Uncertain Data Stream[J]. Journal of Electronics & Information Technology, 2017, 39(1): 159-166. doi: 10.11999/JEIT160247
Citation: WANG Ju, LIU Fuxian. Motif Discovery Algorithm for Multiple Attributes Uncertain Data Stream[J]. Journal of Electronics & Information Technology, 2017, 39(1): 159-166. doi: 10.11999/JEIT160247

Motif Discovery Algorithm for Multiple Attributes Uncertain Data Stream

doi: 10.11999/JEIT160247
Funds:

The National Natural Science Foundation of China (61272011)

  • Received Date: 2016-03-17
  • Rev Recd Date: 2016-08-16
  • Publish Date: 2017-01-19
  • Algorithm of motif discovery for multiple attributes uncertain data stream is proposed on the basis of MEME (Multiple Expectation-maximization for Motif Elicitation), which consults the thought of sequential pattern discovery in bioinformatics to solve the problem of frequent pattern discovery for multiple attributes uncertain data stream. A new method for update calculation of uncertain sliding window is designed based on mixed type model, SAX (Symbolic Aggregate approXimation) symbolic strategy is improved, and similarity analysis method for multiple attributes motifs under different sliding windows is put forward. The proposed algorithm is verified to be correct functionally by a set of uncertain data stream in the wireless sensor network of air and missile defense. Its accuracy is measured through planting different number of motifs. Furthermore, comparison with previous algorithm with tuples valid probability set to 1 shows that the proposed algorithm can discover frequent pattern for multiple attributes uncertain data stream precisely.
  • loading
  • LEUNG C K S, JIANG F, and HAYDUK Y. A landmark- model based system for mining frequent patterns from uncertain data streams[C]. 2011 International Database Engineering and Applications Symposium, Lisbon, Portugal, 2011: 249-250. doi: 10.1145/2076623.2076659.
    CHUI C K and KAO B. A decremental approach for mining frequent itemsets from uncertain data[C]. 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Osaka, Japan, 2008: 64-75. doi: 10.1007/978-3-540-68125.
    LEUNG C K S, HAO B, and BRAJCZUK D A. Mining uncertain data for frequent itemsets that satisfy aggregate constraints[C]. 25th Annual ACM Symposium on Applied Computing, Sierre, Switzerland, 2010: 1034-1038. doi: 10.1145/1774088.1774305.
    LEUNG C K S and HAO B. Mining of frequent items from streams of uncertain data[C]. 25th IEEE International Conference on Data Engineering, Piscataway, NJ, USA, 2009: 1663-1670. doi: 10.1109/ICDE.2009.157.
    汤克明. 不确定数据流中频繁数据挖掘[D]. [博士论文], 南京航空航天大学, 2012.
    TANG Keming. Study on frequent data mining from uncertain data streams[D]. [Ph.D. dissertation], Nanjing University of Aeronautics and Astronautics, 2012.
    HEWANADUNGODAGE C, YUNI X, and LEE J J. Hyper-structure mining of frequent patterns in uncertain data streams[J]. Knowledge and Information Systems, 2013, 37: 219-244. doi: 10.1007/s10115-012-0581-y.
    LEUNG C K S, CUZZOCREA A, FAN J, et al. Discovering frequent patterns from uncertain data streams with time-fading and landmark models[J]. Transactions on Large-Scale Data and Knowledge-Centered Systems VIII, 2013: 174-196. doi: 10.1007/978-3-642-37574-3_8.
    朱跃龙, 彭力, 李士进, 等. 水文时间序列模体挖掘[J]. 水利学报, 2012, 43(12): 1422-1430.
    ZHU Yuelong, PENG Li, LI Shijin, et al. Research on hydrological time series mining [J]. Journal of Hydraulic Engineering, 2012, 43(12): 1422-1430.
    张懿璞. 转录因子结合位点识别问题的算法研究[D]. [博士论文], 西安电子科技大学, 2014.
    ZHANG Yipu. Algorithm research on the problem of transcription factor binging sites identification[D]. [Ph.D. dissertation], Xidian University, 2014.
    杨矫云. 大规模生物序列分析的高性能算法和模型[D]. [博士论文], 中国科学技术大学, 2014.
    YANG Jiaoyun. High performance algorithms and models for large-scale biological sequence analysis[D]. [Ph.D. dissertation], University of Science and Technology of China, 2014.
    LIN J, KEOGH E, PATEL P, et al. Finding motifs in time series[C]. Proceedings of the 2nd Workshop on Temporal Data Mining at KDD, District of Colombia, USA, 2002: 53-68.
    CHIU B, KEOGH E, and LONARDI S. Probabilistic discovery of time series motifs[C]. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, District of Colombia, USA, 2003: 493-498. doi: 10.1145/956750.956808.
    FERREIRA P G, AZEVEDO P J, SILVA C G, et al. Mining approximate motifs in time series[C]. 9th international conference on Discovery Science, Berlin, Germany, 2006: 89-101 .
    MUEEN A, KEOGH E, ZHU Q, et al. Exact discovery of time series motif[C]. 9th SIAM International Conference on Data Mining 2009, Nevada, USA, 2009: 469-480.
    ABDULLAH M and NIKAN C. Enumeration of time series motifs of all lengths[J]. Knowledge and Information Systems, 2015, 45: 105-132. doi: 10.1007/s10115-014-0793-4.
    张懿璞, 霍红卫, 于强, 等. 用于转录因子结合位点识别的定位投影求精算法[J]. 计算机学报, 2013, 36(12): 2545-2559. doi: 10.3724/SP.J.1016.2013.02545.
    ZHANG Yipu, HUO Hongwei, YU Q, et al. A novel fixed- position projection refinement algorithm for TFBS Identification[J]. Chinese Journal of Computers, 2013, 36(12): 2545-2559. doi: 10.3724/SP.J.1016.2013.02545.
    TIMOTHY L B. DREME: motif discovery in transcription factor ChIP-seq data[J]. Original Paper, 2011, 17(12): 1653-1659. doi: 10.1093/bioinformatics/btr261.
    DANIEL Q and XIE Xiaohui. EXTREME: an online EM algorithm for motif discovery[J]. Original Paper, 2014, 30(12): 1667-1673. doi: 10.1093/bioinformatics/btu093.
    THANH T L T, PENG Liping, DIAO Yanlei, et al. CLARO: modeling and processing uncertain data streams[J]. The VLDB Journal, 2012, 21: 651676. doi: 10.1007/s00778- 011-0261-7.
    ARCHAMBEAU C and VERLEYSEN M. Manifold constrained finite Gaussian mixtures [C]. 8th International Work Conference on Artificial Neural Networks, Berlin, Germany, 2005: 820828.
    MICHELE D. Modeling and querying data series and data streams with uncertainty[D]. [Ph.D. dissertation], University of Trento, 2014.
    HONG Y. On computing the distribution function for the sum of independent and non-identical random indicators [R]. Technical Report, Department of Statistics, Virginia Tech, 2011.
    曲文龙, 张克君, 杨炳儒, 等. 基于奇异事件特征聚类的时间序列符号化方法[J]. 系统工程与电子技术, 2006, 28(8): 11311134.
    QU Wenlong, ZHANG Kejun, YANG Bingru, et al. Time series symbolization based on singular event feature clustering[J]. Systems Engineering and Electronics, 2006, 28(8): 11311134.
    JESSICA L, EAMONN K, LI W, et al. Experiencing SAX: a novel symbolic representation of time series[J]. Data Minning and Knowledge Discovery, 2007, 15: 107144. doi: 10.1007/ s10618-007- 0064-z.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1301) PDF downloads(497) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return