基于策略学习的机票动态定价算法

卢敏; 张耀元; 卢春

doi:10.11999/JEIT200778

基于策略学习的机票动态定价算法

doi: 10.11999/JEIT200778 cstr: 32379.14.JEIT200778

卢敏^1, ,,
张耀元¹,
卢春²

1.
中国民航大学计算机科学与技术学院天津 300300
2.
中国南方航空股份有限公司信息中心广州 510000

基金项目: 国家自然科学基金(61502499)，民航航空公司人工智能重点实验室项目

详细信息

作者简介:
卢敏：男，1985年生，副研究员，博士，主要研究方向为机器学习、强化学习

张耀元：女，1996年生，硕士生，主要研究方向为民航收益管理、强化学习

卢春：男，1974年生，高级工程师，研究方向为航空公司收益管理

通讯作者:
卢敏　mlu@cauc.edu.cn

中图分类号: TP311
计量
- 文章访问数: 2462
- HTML全文浏览量: 1272
- PDF下载量: 184
- 被引次数: 0
出版历程
- 收稿日期: 2020-09-20
- 修回日期: 2021-02-04
- 网络出版日期: 2021-03-02
- 刊出日期: 2021-04-20

Approach for Dynamic Flight Pricing Based on Strategy Learning

Min LU^{1
, ,},
Yaoyuan ZHANG¹,
Chun LU²

1.
College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
2.
Information Center of China Southern Air Holding Company Limited, Guangzhou 510000, China

Funds: The National Natural Science Foundation of China (61502499), The Project from Key Laboratory of Artificial Intelligence for Airlines, CAAC

摘要

摘要: 机票动态定价旨在构建机票售价策略以最大化航班座位收益。现有机票定价算法都建立在提前预测各票价等级的需求量基础之上，会因票价等级需求量的预测偏差而降低模型性能。为此，提出基于策略学习的机票动态定价算法，其核心是不再预测各票价等级的需求量，而是将机票动态定价问题建模为离线强化学习问题。通过设计定价策略评估和策略更新的方式，从历史购票数据上学习具有最大期望收益的机票动态定价策略。同时设计了与现行定价策略和需求量预测方法的对比方法及评价指标。在两趟航班的多组定价结果表明：相比于现行机票销售策略，策略学习算法在座位收益上的提升率分别为30.94%和39.96%，且比基于需求量预测方法提升了6.04%和3.36%。
- 民航收益管理 /
- 机票动态定价 /
- 强化学习 /
- 策略学习
Abstract: The core of the dynamic flight pricing is to yield a pricing strategy with maximum seat revenue. The state-of-the-art flight pricing approaches are built on forecasting the fare demand. They suffer low profit due to the inaccurate prediction. To tackle the above issue, an approach for dynamic flight pricing based on strategy learning is proposed. That approach resorts to reinforcement learning to output pricing strategy with the highest expected return. That strategy is learned by iteratively policy evaluation and policy improvement. The rate of profit improvement on the two flights is empirically 30.94% and 39.96% over the existing pricing strategy, while that rate is 6.04% and 3.36% over the demand forecasting algorithm.
- Revenue management /
- Dynamic flight pricing /
- Reinforcement learning /
- Strategy learning

HTML全文

图 1 两个航班上定价策略的性能对比

下载: 全尺寸图片幻灯片

图 2 2011年6月22日航班CA1501在精度为0.0100和0.0001下的实验对比

下载: 全尺寸图片幻灯片

图 3 学习速率对算法性能的影响

下载: 全尺寸图片幻灯片

表 1 机票动态定价策略学习算法

输入学习速率$\eta $，折扣因子$\gamma $，最大迭代次数${\rm{episodes}}$，航班总座位数$N$ 航班第1天到$T - 1$天的历史销售序列${\rm{\{ }}s_{\rm{0}}^{{\rm{(}}n)},a_{\rm{0}}^{(n)},r_{\rm{0}}^{(n)}, ··· ,s_v^{(n)},a_v^{(n)},r_v^{(n)}{\rm{\} }}_{n = {\rm{1}}}^{T - {\rm{1}}}$
初始化对于任何状态$s$和$\alpha ,$$q(s,\alpha ) = 0,k = 0,n = 1$
Repeat：
Repeat (对于第1天到$T - 1$天的每趟离港航班)：
Repeat (对于此趟航班历史销售序列的每一步$(s_t^{(n)},a_t^{(n)},r_t^{(n)},s_{t + {\rm{1}}}^{(n)})$)：
策略评估：据式(3)更新动作值函数$q(s_t^{(n)},a_t^{(n)})$
策略更新：按式(4)调整策略$\pi (s_t^{(n)}) = \arg {\rm{ma}}{{\rm{x}}_\alpha }q(s_t^{(n)},a)$
Until 航班没有剩余座位或售票时间截止
$n \leftarrow n + 1$
Until $n > T - 1$
$k \leftarrow k + 1$
Until $k > {\rm{episodes}}$
输出第$T$天的机票动态定价策略$\pi (s) = \arg {\max _\alpha }q(s,\alpha )$

下载: 导出CSV

表 2 旅客订票记录示例

身份证号	航空公司	航班号	出发机场	到达机场	出发日期	订单编号	票价等级
52893787	CA	1501	PEK	SHA	20100308	2273651247	0.5213
55503718	CA	1501	PEK	SHA	20100308	2745812364	0.8212

下载: 导出CSV

表 3 实验数据集的统计信息

航班	售票记录总数	销售序列数	状态、动作等四元组数	原始票价等级 (精确到万分位)		预处理后的票价等级(精确到千分位)		预处理后的票价等级(精确到百分位)
航班	售票记录总数	销售序列数	状态、动作等四元组数	票价等级数	各等级平均票数	票价等级数	各等级平均票数	票价等级数	各等级平均票数
CA1501	130118	718	102809	5737	22.68	1087	119.70	150	867.45
JR1505	22691	611	17102	2359	9.62	745	30.46	90	254.96

下载: 导出CSV

表 4 票价等级精确度影响分析

票价等级精度	训练集中票价等级总数	定价策略中出现票价等级总数	收益平均提升率 ${\rm{ALR@T}}$(%)
0.0001	4590	128	13.21
0.0100	120	16	16.38

下载: 导出CSV

参考文献(16)

SMITH B C, LEIMKUHLER J F, and DARROW R M. Yield management at American airlines[J]. Interfaces, 1992, 22(1): 8–31. doi: 10.1287/inte.22.1.8

GALLEGO G and VAN RYZIN G. Optimal dynamic pricing of inventories with stochastic demand over finite horizons[J]. Management Science, 1994, 40(8): 999–1020. doi: 10.1287/mnsc.40.8.999

OTERO D F and AKHAVAN-TABATABAEI R. A stochastic dynamic pricing model for the multiclass problems in the airline industry[J]. European Journal of Operational Research, 2015, 242(1): 188–200. doi: 10.1016/j.ejor.2014.09.038

DELAHAYE T, ACUNA-AGOST R, BONDOUX N, et al. Data-driven models for itinerary preferences of air travelers and application for dynamic pricing optimization[J]. Journal of Revenue and Pricing Management, 2017, 16(6): 621–639. doi: 10.1057/s41272-017-0095-z

高金敏, 乐美龙, 曲林迟, 等. 基于时变需求的机票动态定价研究[J]. 南京航空航天大学学报, 2018, 50(4): 570–576. doi: 10.16356/j.1005-2615.2018.04.020

GAO Jinmin, LE Meilong, QU Linchi, et al. Dynamic pricing of air tickets based on time-varying demand[J]. Journal of Nanjing University of Aeronautics &Astronautics, 2018, 50(4): 570–576. doi: 10.16356/j.1005-2615.2018.04.020

SELC̣UK A M and AVṢAR Z M. Dynamic pricing in airline revenue management[J]. Journal of Mathematical Analysis and Applications, 2019, 478(2): 1191–1217. doi: 10.1016/j.jmaa.2019.06.012

LIN K Y and SIBDARI S Y. Dynamic price competition with discrete customer choices[J]. European Journal of Operational Research, 2009, 197(3): 969–980. doi: 10.1016/j.ejor.2007.12.040

施飞, 陈森发. 随时间变化的机票折扣定价研究[J]. 交通运输系统工程与信息, 2010, 10(1): 112–116. doi: 10.3969/j.issn.1009-6744.2010.01.017

SHI Fei and CHEN Senfa. Air ticket discount pricing based on time varying[J]. Journal of Transportation Systems Engineering and Information Technology, 2010, 10(1): 112–116. doi: 10.3969/j.issn.1009-6744.2010.01.017

LEE J, LEE E and KIM J. Electric vehicle charging and discharging algorithm based on reinforcement learning with data-driven approach in dynamic pricing scheme[J]. Energies, 2020, 13(8): 1950. doi: 10.3390/en13081950

CHENG Yin, ZOU Luobao, ZHUANG Zhiwei, et al. An extensible approach for real-time bidding with model-free reinforcement learning[J]. Neurocomputing, 2019, 360: 97–106. doi: 10.1016/j.neucom.2019.06.009

陈前斌, 谭颀, 魏延南, 等. 异构云无线接入网架构下面向混合能源供应的动态资源分配及能源管理算法[J]. 电子与信息学报, 2020, 42(6): 1428–1435. doi: 10.11999/JEIT190499

CHEN Qianbin, TAN Qi, WEI Yannan, et al. Dynamic resource allocation and energy management algorithm for hybrid energy supply in heterogeneous cloud radio access networks[J]. Journal of Electronics &Information Technology, 2020, 42(6): 1428–1435. doi: 10.11999/JEIT190499

GOSAVII A, BANDLA N, and DAS T K. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking[J]. IIE Transactions, 2002, 34(9): 729–742. doi: 10.1080/07408170208928908

SHIHAB S A M, LOGEMANN C, THOMAS D G, et al. Autonomous airline revenue management: A deep reinforcement learning approach to seat inventory control and overbooking[C]. The 36th International Conference on Machine Learning, Long Beach, USA, 2019: 132–139.

QIU Qinfu and CHEN Xiong. Behaviour-driven dynamic pricing modelling via hidden Markov model[J]. International Journal of Bio-Inspired Computation, 2018, 11(1): 27–33. doi: 10.1504/IJBIC.2018.090071

LAWHEAD R J and GOSAVI A. A bounded actor-critic reinforcement learning algorithm applied to airline revenue management[J]. Engineering Applications of Artificial Intelligence, 2019, 82: 252–262. doi: 10.1016/j.engappai.2019.04.008

RAMASWAMY A and BHATNAGAR S. Stability of stochastic approximations with “controlled markov” noise and temporal difference learning[J]. IEEE Transactions on Automatic Control, 2019, 64(6): 2614–2620. doi: 10.1109/TAC.2018.2874687

施引文献

资源附件(0)

访问统计