基于视觉自注意力模型与轨迹滤波器的篮球战术识别

许国良; 沈刚; 梁旭鹏; 雒江涛

doi:10.11999/JEIT230079

基于视觉自注意力模型与轨迹滤波器的篮球战术识别

doi: 10.11999/JEIT230079 cstr: 32379.14.JEIT230079

1.
重庆邮电大学通信与信息工程学院重庆 400065
2.
重庆邮电大学体育学院重庆 400065

基金项目: 重庆市体育局科研重点项目(A2019002, A202113)

详细信息

作者简介:
许国良：男，教授，硕士生导师，研究方向为计算机视觉、大数据分析与挖掘等

沈刚：男，硕士生，研究方向为计算机视觉、图像处理

梁旭鹏：男，副教授，研究方向为智慧体育

雒江涛：男，教授，研究方向为未来互联网体系架构、视频大数据分析等

通讯作者:
许国良　xugl@cqupt.edu.cn

1¹⁾ PlayersTrack 数据集：https://github.com/iceCream-sh/PlayersTrack.
中图分类号: TP391; TN929
计量
- 文章访问数: 1350
- HTML全文浏览量: 842
- PDF下载量: 139
- 被引次数: 0
出版历程
- 收稿日期: 2023-02-21
- 修回日期: 2023-05-09
- 网络出版日期: 2023-05-17
- 刊出日期: 2024-02-29

Recognition of Basketball Tactics Based on Vision Transformer and Track Filter

1.
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
School of Physical Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: Chongqing Municipal Sports Bureau Research Key Projects (A2019002, A202113)

摘要

摘要: 通过机器学习分析球员轨迹数据获得进攻或防守战术，是篮球视频内容理解的关键组成部分。传统机器学习方法需要人为设定特征变量，灵活性大大降低，因此如何自动获取可用于战术识别的特征信息成为关键问题。为此，该文基于美国职业篮球联赛(NBA)比赛中球员轨迹数据设计了一个篮球战术识别模型(TacViT)，该模型以视觉自注意力模型(ViT)作为主干网络，利用多头注意力模块提取丰富的全局轨迹特征信息，同时并入轨迹滤波器来加强球场线与球员轨迹之间的特征信息交互，增强球员位置特征表示，其中轨迹滤波器以对数线性复杂度学习频域中的长期空间相关性。该文将运动视觉系统(SportVU)的序列数据转化为轨迹图，自建篮球战术数据集(PlayersTrack)，在该数据集上的实验表明，TacViT的准确率达到了82.5%，相对未做更改的视觉自注意力S模型 (ViT-S)，精度上提升了16.7%。
- 篮球战术识别 /
- 球员轨迹 /
- 轨迹滤波器 /
- 对数线性复杂度 /
- 多头注意力
Abstract: The analysis of player trajectory data using machine learning to obtain offensive or defensive tactics is a crucial component of understanding basketball video content. Traditional machine learning methods require the setting of feature variables manually, significantly reducing flexibility. Therefore, the key issue is how to automatically obtain feature information that can be used for tactic recognition. To address this issue, a basketball Tactic Vision Transformer (TacViT) recognition model is proposed based on player trajectory data from the National Basketball Association (NBA) games. The proposed model adopts Vision Transformer (ViT) as the backbone network and multi-head attention modules to extract rich global trajectory feature information. Trajectory filters are also incorporated in order to not only enhance the feature interaction between the court lines and player trajectories, but also strengthen the representation of player position features in this study. The trajectory filters learn the long-term spatial correlations in the frequency domain with log-linear complexity. A self-built basketball tactic dataset (PlayersTrack) is created from the sequence data of the Sport Vision System (SportVU), which are converted into trajectory graphs in this work. The experiments on this dataset showed that the accuracy of TacViT reached 82.5%, which is a 16.7% improvement over the accuracy of the Vision Transformer S model (ViT-S) without modifications.
- Basketball tactics recognition /
- Players trajectory /
- Track filter /
- Log-linear complexity /
- Multi-head attention

HTML全文

图 1 TacViT网络架构图

下载: 全尺寸图片幻灯片

图 2 轨迹图像滤波过程

下载: 全尺寸图片幻灯片

图 3 多头注意力机制

下载: 全尺寸图片幻灯片

图 4 “牛角”战术示意图

下载: 全尺寸图片幻灯片

图 5 “边线球”战术示意图

下载: 全尺寸图片幻灯片

图 6 “边线球”战术类别的图像处理过程

下载: 全尺寸图片幻灯片

图 7 TFMHA模块不同组合

下载: 全尺寸图片幻灯片

图 8 3种heads下TFMHA模块的组合

下载: 全尺寸图片幻灯片

表 1 混淆矩阵正确率

	“牛角”	“挡拆”	“二三联防”	“边线球”
“牛角”	0.86	0.08	0	0.06
“挡拆”	0.12	0.76	0.06	0.04
“二三联防”	0	0.10	0.90	0
“边线球”	0.04	0.10	0.11	0.75

下载: 导出CSV

表 2 与当前的主流网络对比

模型	Params(M)	FLOPS(G)	Acc.(%)
ResNet50^[12]	25.6	4.1	67.9
ResNet101^[12]	44.5	7.9	70.6
ViT-S^[14]	21.7	4.2	75.8
ViT-B^[14]	85.8	16.8	77.4
GFNet-S^[22]	24.5	4.46	77.6
SwinT-T^[23]	29.1	4.5	79.3
SwinT-S^[23]	50.2	8.7	80.1
Deit-S^[24]	21.7	4.2	79.1
Deit-B^[24]	85.6	17.5	80.7
ResMLP-S/24^[25]	29.6	5.97	72.2
CrossViT-S^[20]	26.3	5.08	78.5
TacViT	35.7	6.6	82.5

下载: 导出CSV

参考文献(25)

[1]	PERŠE M, KRISTAN M, KOVAČIČ S, et al. A trajectory-based analysis of coordinated team activity in a basketball game[J]. Computer Vision and Image Understanding, 2009, 113(5): 612–621. doi: 10.1016/j.cviu.2008.03.001.
[2]	YUE Qiang and WEI Chao. Innovation of human body positioning system and basketball training system[J]. Computational Intelligence and Neuroscience, 2022, 2022: 2369925. doi: 10.1155/2022/2369925.
[3]	MILLER A C and BORNN L. Possession sketches: Mapping NBA strategies[C]. The 2017 MIT Sloan Sports Analytics Conference, Boston, USA, 2017.
[4]	TIAN Changjia, DE SILVA V, CAINE M, et al. Use of machine learning to automate the identification of basketball strategies using whole team player tracking data[J]. Applied Sciences, 2019, 10(1): 24. doi: 10.3390/app10010024.
[5]	MCINTYRE A, BROOKS J, GUTTAG J, et al. Recognizing and analyzing ball screen defense in the NBA[C]. The MIT Sloan Sports Analytics Conference, Boston, USA, 2016: 11–12.
[6]	WANG K C and ZEMEL R. Classifying NBA offensive plays using neural networks[C]. MIT Sloan Sports Analytics Conference, Boston, USA, 2016.
[7]	CHEN H T, CHOU C L, FU T S, et al. Recognizing tactic patterns in broadcast basketball video using player trajectory[J]. Journal of Visual Communication and Image Representation, 2012, 23(6): 932–947. doi: 10.1016/j.jvcir.2012.06.003.
[8]	TSAI T Y, LIN Y Y, LIAO H Y M, et al. Recognizing offensive tactics in broadcast basketball videos via key player detection[C]. 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017: 880–884.
[9]	TSAI T Y, LIN Y Y, JENG S K, et al. End-to-end key-player-based group activity recognition network applied to basketball offensive tactic identification in limited data scenarios[J]. IEEE Access, 2021, 9: 104395–104404. doi: 10.1109/ACCESS.2021.3098840.
[10]	CHEN C H, LIU T L, WANG Y S, et al. Spatio-temporal learning of basketball offensive strategies[C]. The 23rd ACM international conference on Multimedia, Brisbane, Australia, 2015: 1123–1126.
[11]	BORHANI Y, KHORAMDEL J, and NAJAFI E. A deep learning based approach for automated plant disease classification using vision transformer[J]. Scientific Reports, 2022, 12(1): 11554. doi: 10.1038/s41598-022-15163-0.
[12]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[13]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference onAdvances in Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
[14]	LI Shaohua, XUE Kaiping, ZHU Bin, et al. FALCON: A Fourier transform based approach for fast and secure convolutional neural network predictions[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8702–8711.
[15]	DING Caiwen, LIAO Siyu, WANG Yanzhi, et al. CirCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, USA, 2017: 395–408.
[16]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. The 9th International Conference on Learning Representations, Vienna, Austria, 2021.
[17]	WU Kan, PENG Houwen, CHEN Minghao, et al. Rethinking and improving relative position encoding for vision transformer[C]. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 10013–10021.
[18]	HSIEH H Y, CHEN C Y, WANG Y S, et al. BasketballGAN: Generating basketball play simulation through sketching[C]. The 27th ACM International Conference on Multimedia, Chengdu, China, 2019: 720–728.
[19]	HAN Kai, XIAO An, WU Enhua, et al. Transformer in transformer[C]. Advances in Neural Information Processing Systems, 2021: 15908–15919.
[20]	CHEN C F R, FAN Quanfu, and PANDA R. CrossViT: Cross-attention multi-scale vision transformer for image classification[C]. The 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada, 2021: 347–356.
[21]	STSTS. PERFORMANCE ANAL YSIS POWERED BYAI[OL]. https://www.statsperform.com/team-performance. 2022.7.
[22]	RAO Yongming, ZHAO Wenliang, ZHU Zheng, et al. Global filter networks for image classification[C]. Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021: 980–993.
[23]	LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. The 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 9992–10002.
[24]	TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]. The 38th International Conference on Machine Learning, Vienna, Austria, 2021: 10347–10357.
[25]	TOUVRON H, BOJANOWSKI P, CARON M, et al. ResMLP: Feedforward networks for image classification with data-efficient training[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5314–5321. doi: 10.1109/TPAMI.2022.3206148.