一种基于频繁模式的时间序列分类框架
doi: 10.3724/SP.J.1146.2009.00135
A Frequent Pattern Based Time Series Classification Framework
-
摘要: 如何提取和选择时间序列的特征是时间序列分类领域两个重要的问题。该文提出MNOE(Mining Non- Overlap Episode)算法计算时间序列中的非重叠频繁模式,并将其作为时间序列特征。基于这些非重叠频繁模式,该文提出EGMAMC(Episode Generated Mixed memory Aggregation Markov Chain)模型描述时间序列。根据似然比检验原理,从理论上推导出频繁模式在时间序列中出现的次数和EGMAMC模型是否能显著描述时间序列之间的关系;根据信息增益定义,选择能显著描述时间序列的频繁模式作为时间序列特征输入分类模型。在UCI (University of California Irvine)公共数据集和实际智能楼宇数据集上的实验表明,选择频繁模式作为特征进行分类的准确率、召回率和F-Measure均优于不选择频繁模式作为特征的分类结果。高效的计算和有效的选择非重叠频繁模式作为时间序列特征有助于提高时间序列分类模型的各项评价指标。
-
关键词:
- 时间序列分类; 频繁模式挖掘; 智能楼宇
Abstract: How to extract and select features from time series are two important topics in time series classification. In this paper, a MNOE (Mining Non-Overlap Episode) algorithm is presented to find non-overlap frequent patterns in time series and these non-overlap frequent patterns are considered as features of the time series. Based on these non-overlap episodes, an EGMAMC (Episode Generated Mixed memory Aggregation Markov Chain) model is presented to describe time series. According to the principle of likelihood ratio test, the connection between the support of episode and whether EGMAMC could describe the time series significantly is induced. Based on the definition of information gain, significant frequent patterns are selected as the features of time series for classification. The experiments on UCI (University of California Irvine) datasets and smart building datasets demonstrate that the classification model trained with selecting significant frequent patterns as features outperforms the one trained without selecting them on precision, recall and F-Measure. The time series classification models can be improved by efficiently extracting and effectively selecting non-overlap frequent patterns as features of time series. -
Boukerche. Handbook of Algorithms for Qireless Networking and Mobile Computing. Chapman Hall/CRC, 2005.[2]Aach J and Church G. Aligning gene expression time series with time warping algorithms. Bioinformatics, 2001, 17(6), 495-508.[3]Laxman S. Stream prediction using a generative model based on frequent episodes in event sequences. Proceeding of Knowledge Discovery and Data Mining Conference 2008, Las Vegas, Nevada, USA,30 Jul. 2008: 453-461.[4]Vladimir Vapnik. The Nature of Statistical Learning Theory. New York: Springer Verlag, 1999, Chapter 4.[5]Lin J, Keogh E, Lonardi S, and Chiu B. A symbolic representation of time series with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego, California, 9 Jun. 2003: 2-11.[6]Cheng H, Yan X, Han J, and Hsu C W. Discriminative frequent pattern analysis for effective classification. Proceeding of International Conference on Data Engineering 2007, Istanbul, 17 April, 2007: 716-725.[7]Liu B, Hsu W, and Ma Y. Integrating classification and association rule mining. Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, London, UK, Springer- Verlag?1999: 443-447.[8]Patel D, Hsu W, and Lee M L. Mining relationships among interval-based events for classification, Proceeding of International Conference on Management of Data / Principles of Database Systems, Vancouver, Canada, 10 Jun. 2008: 393-404.[9]Laxman S, Sastry P S, and Unnikrishnan K P. Discovering frequent episodes and learning Hidden Markov Models: A formal connection[J].IEEE Transactions on Knowledge and Data Engineering.2005, 17(11):1505-1517[10]Yang Y and Pedersen J O. A comparative study on feature selection in text categorization. Proceeding of International Conference on Machine Learning, San Francisco, USA, 8 Jul. 1997: 412-420.
计量
- 文章访问数: 3775
- HTML全文浏览量: 136
- PDF下载量: 1398
- 被引次数: 0