2019, 41(6): 1450-1457.
doi: 10.11999/JEIT180631
Abstract:
For problems of not meeting the demand of sampling both large flows and small flows at the same time, and not distinguishing flash crowd from traffic attacks in building network traffic anomaly detection datasets based on probabilistic sampling methods, a probabilistic flow sampling method for traffic anomaly detection is proposed. On the basis of the classification of network data flows according to their destination and source IP addresses, the sampling probability for each class of data flows is set as the maximum of its destination and source IP address’s sampling probability, and the number of sampled data flows is ceiled to ensure that each class of data flows is sampled at least once, so that the sampled dataset can reflect the distributions of large, small flows and source, destination IP addresses in original traffics. Then, the source IP address entropy is used to characterize the source IP dispersion of anomaly flows, and the attack flow sampling algorithm is designed based on the threshold of the source IP address entropy, which reduces the sampling probability of non-attack anomaly flows caused by flash crowd. The simulation results show that the proposed method can satisfy the sampling requirements of both large flows and small flows, it has a high anomaly flows sampling ability, can sample all the suspicious sources and destination IP addresses related to anomaly flows, and can effectively filter the non-attack anomaly flows.