Research of Track Resident Point Identification Algorithm Based on Signaling Data
-
摘要: 针对密度聚类算法只能识别密度相近的簇类且计算复杂度高等问题,该文提出一种基于信令数据中时空轨迹信息的密度峰值快速聚类(ST-CFSFDP)算法。首先对低采样密度的信令数据进行预处理,消除轨迹震荡现象;然后基于密度峰值快速聚类(CFSFDP)算法显式地增加时间维度限制,将局部密度由2维扩展到3维,并提出高密度时间间隔以表征簇中心在时间维度上的数据特征;接着设计筛选策略以选取聚类中心;最后识别用户出行轨迹中的驻留点,完成出行链的划分。实验结果表明,所提算法适用于采样密度低且定位精度差的信令数据,相比CFSFDP算法更适用于时空数据,相比基于密度的时空聚类算法(ST-DBSCAN)召回率提升14%,准确率提升8%,同时降低计算复杂度。
-
关键词:
- 信令数据 /
- 时空聚类 /
- 密度峰值快速聚类算法 /
- 驻留点识别 /
- 出行链
Abstract: For the problem that the density-based clustering algorithm can only identify clusters with similar density and high computational complexity, a Clustering by Fast Search and Find of Density Peaks based on Spatio-Temporal trajectory information in mobile phone signaling data, namely ST-CFSFDP, is proposed. Firstly, the low sampling density signaling data are pre-processed to eliminate the trajectory oscillation phenomenon in the data. Then, based on the Clustering by Fast Search and Find of Density Peaks(CFSFDP) algorithm, the time dimension limitation is explicitly increased, and the local density is extended from two-dimension to three-dimension. Moreover, in order to characterize the cluster center point in the time dimension, the concept of high-density time interval is defined. Secondly, the suitable cluster center screening strategy is developed to select automatically the appropriate cluster center. Finally, the resident points are identified in the travel trajectory of individual users over a period of time and the division of the travel chains is completed. The experimental results show that the algorithm is suitable for signaling data with low sampling density and poor positioning accuracy. It is more suitable for spatio-temporal data than CFSFDP algorithm. Compared with Density-Based Spatial Clustering of Applications with Noise based on Spatio-Temporal data (ST-DBSCAN) algorithm, the recall rate is improved by 14%, the accuracy rate is increased by 8%, and the computational complexity is also reduced. -
表 1 信令数据主要字段
字段名称 字段解释 字段内容(示例) user ID 用户身份 0001A LAC_CID 基站位置区域码与小区识别码 13119-2056 TimeStamp 用户接入时间戳 2019-10-23 17:42:09 CoverScene 当前基站的覆盖场景 道路/学校/火车站等 Lon_Lat 当前基站经度、纬度 (106.59767, 29.40709) 表 2 震荡轨迹数据示例
轨迹 位置 时间 距离 (km) 切换速度 (km/h) ${D_0}$ ${L_0}$(106.607617, 29.530807) 08:19:35 / / ${D_1}$ ${L_1}$(106.602659, 29.545336) 08:20:14 1.6 147.6923 ${D_2}$ ${L_2}$(106.607617, 29.530807) 08:20:39 1.6 230.4000 表 3 基于时间窗的震荡轨迹检测方法
输入:原始轨迹数据${{L} } = \left\{ { {L_1}{\rm{ } }···{\rm{ } }{L_i}{\rm{ } }{L_{i + 1} }{\rm{ } }···{\rm{ } }{L_{i + {N_w} } }{\rm{ } }···} \right\}$,轨迹序列切片中基站位置个数${N_w}$,震荡数据最大时间阈值${T_{w\_\max }}$; 输出:检测到的震荡轨迹数据${L_{\rm{osc}}}$; (1) 按顺序截取原始数据${{L}}$中的前${N_w}$个位置组成序列${L_w}$; (2) 检测${L_w}$中是否出现循环模式,如果出现则执行步骤(3),否则序列点向前移1位,重新执行步骤(1),截取后续${N_w}$个位置的序列片段; (3) 对检测到的震荡部分记为(${L_{\rm{beg} } }{\rm{ } }···{\rm{ } }{L_{\rm{end} } }$),判断该部分序列的总时间是否小于${T_{w\_\max }}$,如果满足,那么将该震荡序列记为${L_{\rm{osc}}}$,同时序
列点向前移1位,返回步骤(1);如果不满足,直接返回步骤(1),直至遍历完${{L}}$内所有轨迹点。算法结束 表 4 ST-CFSFDP聚类算法
输入:原始空间数据$P\left\langle {x\;y\;t\;d} \right\rangle $;截断距离${d_{\rm c}}$;截断时间${t_{\rm c}}$;数据点覆盖场景的描述$d$ 输出:该原始数据的聚类集合${C_k}$, $k = 1,2, ··· ,n$; (1) 计算每一个数据点的局部时空密度${\rho _i}$; (2) 依照定义4与定义5计算每个数据点的高密度空间距离${\delta _i}$、高密度时间间隔${\tau _i}$; (3) 计算各个数据点的聚类中心权值,将聚类中心权值的平均值作为阈值,将大于该阈值的数据点放入聚类中心候选点集合${C_{\rm c}}$中; (4) 合并候选点中覆盖场景相同且空间距离小于${d_{\rm c}}$或时间间隔小于${t_{\rm c}}$的“近邻数据点”,保留聚类中心权重较高的点; (5) 将剩余的数据点,按照最近邻思想分配到各个聚类中心所代表的簇中。 算法结束 表 5 算法距离误差对比(m)
编号 驻留点坐标(Lon, Lat) CFSFDP算法的距离误差 ST-DBSCAN算法的距离误差 ST-CFSFDP算法的距离误差 1 106.601230, 29.5339600 44.8 42.2 34.6 2 106.602061, 29.5343564 \ 43.5 48.3 3 106.496737, 29.6166844 48.8 35.3 37.4 4 106.496729, 29.6166840 \ \ 50.6 5 106.546322, 29.6203120 52.6 \ 46.4 -
陈鸿昶, 徐乾, 黄瑞阳, 等. 一种基于用户轨迹的跨社交网络用户身份识别算法[J]. 电子与信息学报, 2018, 40(11): 2758–2764. doi: 10.11999/JEIT180130CHEN Hongchang, XU Qian, HUANG Ruiyang, et al. User identification across social networks based on user trajectory[J]. Journal of Electronics &Information Technology, 2018, 40(11): 2758–2764. doi: 10.11999/JEIT180130 彭大芹, 罗裕枫, 江德潮, 等. 基于移动信令数据的城市热点识别方法[J]. 重庆邮电大学学报: 自然科学版, 2019, 31(1): 95–102. doi: 10.3979/j.issn.1673-825X.2019.01.013PENG Daqin, LUO Yufeng, JIANG Dechao, et al. Urban hotspots identification method based on mobile signaling data[J]. Journal of Chongqing University of Posts and Telecommunications:Natural Science Edition, 2019, 31(1): 95–102. doi: 10.3979/j.issn.1673-825X.2019.01.013 罗孝羚, 蒋阳升. 基于出租车运营数据和POI数据的出行目的识别[J]. 交通运输系统工程与信息, 2018, 18(5): 60–66. doi: 10.16097/j.cnki.1009-6744.2018.05.010LUO Xiaoling and JIANG Yangsheng. Trip-purpose-identification based on taxi operating data and POI data[J]. Journal of Transportation Systems Engineering and Information Technology, 2018, 18(5): 60–66. doi: 10.16097/j.cnki.1009-6744.2018.05.010 鲍冠文, 刘小明, 蒋源, 等. 基于改进DBSCAN算法的出租车载客热点区域挖掘研究[J]. 交通工程, 2019, 19(4): 62–69. doi: 10.13986/j.cnki.jote.2019.04.010BAO Guanwen, LIU Xiaoming, JIANG Yuan, et al. Research on mining taxi pick-up hotspots area[J]. Journal of Transportation Engineering, 2019, 19(4): 62–69. doi: 10.13986/j.cnki.jote.2019.04.010 李岩, 陈红, 孙晓科, 等. 基于热点探测模型的城市居民出行特征分析[J]. 交通信息与安全, 2019, 37(1): 128–136. doi: 10.3963/j.issn.1674-4861.2019.01.017LI Yan, CHEN Hong, SUN Xiaoke, et al. An analysis of travel characteristics of urban residents based on hot spot detection model[J]. Journal of Transport Information and Safety, 2019, 37(1): 128–136. doi: 10.3963/j.issn.1674-4861.2019.01.017 张海霞, 李腆腆, 李东阳, 等. 基于车辆行为分析的智能车联网关键技术研究[J]. 电子与信息学报, 2020, 42(1): 36–49. doi: 10.11999/JEIT190820ZHANG Haixia, LI Tiantian, LI Dongyang, et al. Research on vehicle behavior analysis based technologies for intelligent vehicular networks[J]. Journal of Electronics &Information Technology, 2020, 42(1): 36–49. doi: 10.11999/JEIT190820 李浩, 王旭智, 万旺根. 基于位置数据的居民出行时空特征研究——以上海市为例[J]. 电子测量技术, 2019, 42(19): 25–30. doi: 10.19651/j.cnki.emt.1902923LI Hao, WANG Xuzhi, and WAN Wanggen. Research on temporal and spatial characteristics of residents’ travel based on location data—A case of Shanghai[J]. Electronic Measurement Technology, 2019, 42(19): 25–30. doi: 10.19651/j.cnki.emt.1902923 周洋, 杨超. 基于时空聚类算法的轨迹停驻点识别研究[J]. 交通运输系统工程与信息, 2018, 18(4): 88–95. doi: 10.16097/j.cnki.1009-6744.2018.04.014ZHOU Yang and YANG Chao. Anchors identification in trajectory based on temporospatial clustering algorithm[J]. Journal of Transportation Systems Engineering and Information Technology, 2018, 18(4): 88–95. doi: 10.16097/j.cnki.1009-6744.2018.04.014 方琪, 王山东, 于大超, 等. 基于出租车轨迹的居民出行特征分析[J]. 地理空间信息, 2019, 17(5): 128–130. doi: 10.3969/j.issn.1672-4623.2019.05.034FANG Qi, WANG Shandong, YU Dachao, et al. Analysis of resident trip characteristics based on taxi trajectory[J]. Geospatial Information, 2019, 17(5): 128–130. doi: 10.3969/j.issn.1672-4623.2019.05.034 BIRANT D and KUT A. ST-DBSCAN: An algorithm for clustering spatial–temporal data[J]. Data & Knowledge Engineering, 2007, 60(1): 208–221. doi: 10.1016/j.datak.2006.01.013 RODRIGUEZ A and LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492–1496. doi: 10.1126/science.1242072 WANG Feilong and CHEN C. On data processing required to derive mobility patterns from passively-generated mobile phone data[J]. Transportation Research Part C: Emerging Technologies, 2018, 87: 58–74. doi: 10.1016/j.trc.2017.12.003 CHEN C, BIAN Ling, and MA Jingtao. From traces to trajectories: How well can we guess activity locations from mobile phone traces?[J]. Transportation Research Part C: Emerging Technologies, 2014, 46: 326–337. doi: 10.1016/j.trc.2014.07.001 HARD E, CHIGOY B, SONGCHITRUKSA P, et al. Synopsis of new methods and technologies to collect Origin-Destination (O-D) data[R]. FHWA-HEP-16-083, 2016. LEE J K and HOU J C. Modeling steady-state and transient behaviors of user mobility: Formulation, analysis, and application[C]. The 7th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Florence, Italy, 2006: 85–96.