一种测试时间自适应的夜间图像辅助波束预测方法

孙昆阳; 姚睿; 祝汉城; 赵佳琦; 李希希; 胡殿麟; 黄伟

doi:10.11999/JEIT250530

一种测试时间自适应的夜间图像辅助波束预测方法

doi: 10.11999/JEIT250530 cstr: 32379.14.JEIT250530

1.
中国矿业大学计算机科学与技术学院/人工智能学院徐州 221116
2.
香港理工大学医疗及社会科学院医疗科技及资讯学系香港 999077
3.
合肥工业大学计算机与信息学院合肥 230009

基金项目: 中央高校基本科研业务费XJ2025005101

详细信息

作者简介:
孙昆阳：女，准聘副教授，研究方向为自动驾驶、目标检测、智能无线通信

姚睿：男，教授，研究方向为计算机视觉、目标追踪、目标检测

祝汉城：男，副教授，研究方向为图像美学评价、图像质量增强

赵佳琦：男，副教授，研究方向为计算机视觉、遥感目标检测

李希希：女，准聘副教授，研究方向为知识图谱、推荐算法

胡殿麟：男，博士后，研究方向为医学图像处理、医学影像分割

黄伟：男，副教授，研究方向为智能无线通信、大规模MIMO技术

通讯作者:
黄伟　huangwei@hfut.edu.cn

计量
- 文章访问数: 46
- HTML全文浏览量: 21
- PDF下载量: 3
- 被引次数: 0
出版历程
- 收稿日期: 2025-06-09
- 修回日期: 2025-08-28
- 录用日期: 2025-11-03
- 网络出版日期: 2025-11-08

A Test-Time Adaptive Method for Nighttime Image-Aided Beam Prediction

1.
School of Computer Science and Technology/School of Artificial Intelligence, China University of Mining and Technology, Xu Zhou, Jiangsu, 221116, China
2.
Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, 999077, China
3.
School of Computer Science and Information Engineering, He Fei University of Technology, He Fi, 230009, China

Funds: The Fundamental Research for the Central Universities (XJ2025005101)

摘要

摘要: 针对毫米波通信系统中传统波束管理方法在动态场景下面临的高时延问题以及视觉辅助波束预测技术在恶劣环境下性能显著退化的问题，该文提出一种基于测试时间自适应(TTA)的夜间图像辅助波束预测方法。毫米波通信依赖大规模多进多出(MIMO)技术实现高增益窄波束对准，但传统波束扫描机制存在指数级复杂度与时延瓶颈，难以满足车联网等高动态场景需求。现有视觉辅助方法通过深度学习模型提取图像特征并映射波束参数，但在低照度、雨雾等突发恶劣环境下，因训练数据与实时图像特征分布偏移导致预测精度急剧下降。该文创新性地引入测试时间自适应机制，突破传统静态推理模式，仅需在推理阶段对实时采集的低质量图像执行模型的单次梯度反向传播，即可实现跨域特征分布动态对齐，无需预先采集或标注恶劣场景数据。具体而言，设计基于熵最小化的一致性学习策略，通过对原始视图与数据增强视图的预测一致性约束，驱动模型参数向预测置信度最大化方向迭代更新，降低预测不确定性。实验表明，在真实夜间场景下，该文所提方法的top-3波束预测准确率达93.01%，较静态部署方案提升约25%，且显著优于传统低光照增强方法。该方法充分利用基站固定部署场景中背景语义的跨域一致性特性，通过轻量化在线自适应机制实现模型鲁棒性增强，为毫米波通信系统在复杂开放环境中的高效波束管理提供了新路径。
- 图像辅助束预测 /
- 测试时间自适应 /
- 一致性学习
Abstract: To address the high latency of traditional beam management methods in dynamic scenarios and the severe performance degradation of vision-aided beam prediction under adverse environmental conditions in millimeter-wave (mmWave) communication systems, this work proposes a nighttime image-assisted beam prediction method based on test-time adaptation (TTA). While mmWave communications rely on massive multiple input multiple output (MIMO) technology to achieve high-gain narrow beam alignment, conventional beam scanning mechanisms suffer from exponential complexity and latency bottlenecks, failing to meet the demands of high-mobility scenarios such as vehicular networks. Existing vision-assisted approaches employ deep learning models to extract image features and map them to beam parameters. However, in low-light, rainy, or foggy environments, the distribution shift between training data and real-time image features leads to a drastic decline in prediction accuracy. This work innovatively introduces a TTA mechanism, overcoming the limitations of conventional static inference paradigms. By performing a single gradient back propagation for entire model parameters during inference on real-time low-quality images, the proposed method dynamically aligns cross-domain feature distributions without requiring prior collection or annotation of adverse scenario data. Besides, an entropy minimization-based consistency learning strategy is designed to enforce prediction consistency between original and augmented views, driving model parameter updates toward maximizing prediction confidence and reducing uncertainty. Experimental results on real-world nighttime scenarios demonstrate that the proposed method achieves a top-3 beam prediction accuracy of 93.01%, outperforming static schemes by almost20% and significantly surpassing traditional low-light enhancement approaches. Leveraging the cross-domain consistency of background semantics in fixed-base-station deployment scenarios, this lightweight online adaptation mechanism enhances model robustness, offering a novel pathway for efficient beam management in mmWave systems operating in complex open environments. Objective Millimeter-wave communication, a cornerstone of 5G and beyond, relies on massive multiple-input multiple-output (MIMO) architectures to mitigate severe path loss through high-gain narrow beam alignment. However, traditional beam management schemes, dependent on exhaustive beam scanning and channel measurement, incur exponential complexity and latency (hundreds of milliseconds), rendering them impractical for high-mobility scenarios like vehicular networks. Vision-aided beam prediction has emerged as a promising solution, leveraging deep learning to map visual features (e.g., user location, motion) to optimal beam parameters. Despite its daytime success (>90% accuracy), this approach suffers catastrophic performance degradation under low-light, rain, or fog due to domain shifts between training data (e.g., daylight images) and real-time degraded inputs. Existing solutions rely on costly offline data augmentation with limited generalization to unseen harsh environment. This work addresses these limitations by proposing a lightweight, online adaptation framework that dynamically aligns cross-domain features during inference, eliminating the need for pre-collected harsh environment data. The necessity lies in enabling robust mmWave communications in unpredictable environments, a critical step toward practical deployment in autonomous driving and industrial IoT. Methods This TTA method operates in three stages. First, a pre-trained beam prediction model (ResNet-18 backbone) is initialized using daylight images and labeled beam indices. During inference, real-time low-quality nighttime images are fed into two parallel pipelines: (1) the original view and (2) a data-augmented view incorporating Gaussian noise. A consistency loss minimizes the prediction distance between these two views, enforcing robustness against local feature perturbations. Simultaneously, an entropy minimization loss sharpens the output probability distribution by penalizing high prediction uncertainty. These combined losses drive single-step gradient back propagation to update the model's entire parameters. This process aligns feature distributions between the training (daylight) and testing (nighttime) domains without altering the global semantic understanding, as illustrated in Fig. 2. The system architecture integrates a roadside base station equipped with an RGB camera and a 32-element antenna array, capturing environmental data and executing real-time beam prediction. Results and Discussions Experiments on a real-world dataset demonstrate the method’s superiority. Under nighttime conditions, the proposed TTA framework achieves 93.01% top-3 beam prediction accuracy, outperforming static inference (71.25%) and traditional low-light enhancement methods (85.27%) (Table 3). Ablation studies confirm the effectiveness of both the online feature alignment method designed for small-batch data (Table 4) and the entropy minimization with multi-view consistency learning (Table 5). Figure 4 illustrates the continuous online adaptation performance during testing, revealing rapid convergence that enables base stations to swiftly recover performance after new environmental disturbances occur. Conclusions To address the insufficient robustness of existing visual-aided beam prediction methods in dynamically changing environments, this study introduces a test-time adaptation framework using nighttime image-aided beam prediction. Firstly, a novel small-batch adaptive feature alignment strategy is developed to resolve feature mismatch in unseen domains while meeting real-time communication constraints. Besides, a joint optimization framework integrates classical low-light image enhancement with multi-view consistency learning, enhancing feature discrimination under complex lighting conditions. Experiments were conducted using real-scene data to validate the proposed algorithm. Results demonstrate that the method achieves over 20% higher Top-3 beam prediction accuracy compared to direct testing. This improvement highlights the method's effectiveness in dynamic environments. This approach provides new technical pathways for optimizing visual-aided communication systems in non-ideal conditions. Future work may extend to beam prediction under rain/fog and multi-modal perception-assisted communication systems.
- Vision Aided Beam Prediction /
- Test-Time Adaptation /
- Consistency learning

HTML全文

图 1 车联网场景示意图

下载: 全尺寸图片幻灯片

图 2 基于测试时间自适应的夜间图像辅助最优波束预测方法流程框图

下载: 全尺寸图片幻灯片

图 3 低光照增强展示图

下载: 全尺寸图片幻灯片

图 4 测试时间在线学习效果

下载: 全尺寸图片幻灯片

表 1 训练超参数

超参数	描述	参数值
lr	学习率	$ 1 \times {10^{ - 3}} $
Optimizer	训练优化器类型	Adam
Bt	批处理大小	32
E	训练的周期数	15
$ \gamma $	学习率衰减系数	0.1
lrd	学习率在第几个训练周期衰减	[4,8]
H × W	图像分辨率	$ 224 \times 224 $

下载: 导出CSV

表 2 测试时间自适应的超参数

超参数	描述	参数值
Lr_t	自适应的学习率	$ 1.25 \times {10^{ - 5}} $
Optimizer_t	自适应的优化器类型	SGD
Bs	遍历训练集时的批输入大小	128
B	夜间图像的批输入大小	4

下载: 导出CSV

表 3 预测准确率对比(%)

方法	Top-1	Top-2	Top-3
直接测试法	46.17	63.58	71.25
图像增强法	55.14	77.74	85.27
ActMAD	55.14	78.61	86.68
本文的方法	60.96	86.28	93.01

下载: 导出CSV

表 4 小批次在线异域特征对齐方法有效性(%)

方法	Top-1	Top-2	Top-3
图像增强法	55.14	77.74	85.27
ActMAD	55.14	78.61	86.68
小批次适用的方法	57.09	81.00	88.57

下载: 导出CSV

表 5 多视图在线一致性学习的有效性(%)

方法	Top-1	Top-2	Top-3
小批次适用的方法	57.09	81.00	88.57
增加$ {\mathcal{L}_{\text{e}}} $	58.94	83.93	91.02
增加一致性学习	60.96	86.28	93.01

下载: 导出CSV

表 6 不同批处理大小下的预测准确率对比(%)

B	Top-1	Top-2	Top-3
2	0.60.42	86.42	93.21
4	0.60.96	86.28	93.01
8	0.60.96	86.52	93.01

下载: 导出CSV

表 7 在线学习前后在验证集上的预测准确率(%)

模型参数	Top-1	Top-2	Top-3
$ {\theta ^{\text{*}}} $	73.96	95.12	98.65
$ {\theta ^{{\text{tta}}}} $	73.34	95.02	98.55

下载: 导出CSV

参考文献(19)

[1]	JIANG Shuaifeng and ALKHATEEB A. Computer vision aided beam tracking in a real-world millimeter wave deployment[C]. 2022 IEEE Globecom Workshops, Rio de Janeiro, Brazil, 2022: 142–147. doi: 10.1109/GCWkshps56602.2022.10008648.
[2]	HUANG Wei, HUANG Xueqing, ZHANG Haiyang, et al. Vision image aided near-field beam training for internet of vehicle systems[C]. 2024 IEEE International Conference on Communications Workshops, Denver, USA, 2024: 390–395. doi: 10.1109/ICCWorkshops59551.2024.10615560.
[3]	CHARAN G, OSMAN T, HREDZAK A, et al. Vision-position multi-modal beam prediction using real millimeter wave datasets[C]. 2022 IEEE Wireless Communications and Networking Conference, Austin, USA, 2022: 2727–2731. doi: 10.1109/WCNC51071.2022.9771835.
[4]	LI Kehui, ZHOU Binggui, GUO Jiajia, et al. Vision-aided multi-user beam tracking for mmWave massive MIMO system: Prototyping and experimental results[C]. IEEE 99th Vehicular Technology Conference, Singapore, Singapore, 2024: 1–6. doi: 10.1109/VTC2024-Spring62846.2024.10683659.
[5]	OUYANG Ming, GAO Feifei, WANG Yucong, et al. Computer vision-aided reconfigurable intelligent surface-based beam tracking: Prototyping and experimental results[J]. IEEE Transactions on Wireless Communications, 2023, 22(12): 8681–8693. doi: 10.1109/TWC.2023.3264752.
[6]	WEN Feiyang, XU Weihua, GAO Feifei, et al. Vision aided environment semantics extraction and its application in mmWave beam selection[J]. IEEE Communications Letters, 2023, 27(7): 1894–1898. doi: 10.1109/LCOMM.2023.3270039.
[7]	DEMIRHAN U and ALKHATEEB A. Radar aided 6G beam prediction: Deep learning algorithms and real-world demonstration[C]. 2022 IEEE Wireless Communications and Networking Conference (WCNC). Austin, USA, 2022, 2655–2660. doi: 10.1109/WCNC51071.2022.9771564.
[8]	ZHANG Tengyu, LIU Jun, and GAO Feifei. Vision aided beam tracking and frequency handoff for mmWave communications[C]. IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). New York, USA, 2022, 1–2. doi: 10.1109/INFOCOMWKSHPS54753.2022.9798197.
[9]	XU Weihua, GAO Fefei, TAO Xiaoming, et al. Computer vision aided mmWave beam alignment in V2X communications[J]. IEEE Transactions on Wireless Communications, 2023, 22(4): 2699–2714. doi: 10.1109/TWC.2022.3213541.doi:10.1109/twc.2022.3213541.
[10]	ALRABEIAH M, HREDZAK A, and ALKHATEEB A. Millimeter wave base stations with cameras: Vision-aided beam and blockage prediction[C]. 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 2020: 1–5. doi: 10.1109/VTC2020-Spring48590.2020.9129369.
[11]	WANG Heng, OU Binbao, XIE Xin, et al. Vision-aided mmWave beam and blockage prediction in low-light environment[J]. IEEE Wireless Communications Letters, 2025, 14(3): 791–795. doi: 10.1109/LWC.2024.3523400.
[12]	BASAK H and YIN Zhaozheng. Forget more to learn more: Domain-specific feature unlearning for semi-supervised and unsupervised domain adaptation[C]. 18th European Conference on Computer Vision, Milan, Italy, 2024: 130–148. doi: 10.1007/978-3-031-72920-1_8.
[13]	SCHNEIDER S, RUSAK E, ECK L, et al. Improving robustness against common corruptions by covariate shift adaptation[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 968.
[14]	MIRZA J M, SONEIRA P J, LIN Wei, et al. ActMAD: Activation matching to align distributions for test-time-training[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 24152–24161. doi: 10.1109/CVPR52729.2023.02313.
[15]	ZUIDERVELD K. Contrast limited adaptive histogram equalization[M]. HECKBERT P S. Graphics Gems IV. Amsterdam: Elsevier, 1994: 474–485. doi: 10.1016/B978-0-12-336156-1.50061-6.
[16]	ALKHATEEB A, CHARAN G, OSMAN T, et al. DeepSense 6G: A large-scale real-world multi-modal sensing and communication dataset[J]. IEEE Communications Magazine, 2023, 61(9): 122–128. doi: 10.1109/MCOM.006.2200730.
[17]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
[18]	DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
[19]	WANG Dequan, SHELHAMER E, LIU Shaoteng, et al. Tent: Fully test-time adaptation by entropy minimization[C]. 9th International Conference on Learning Representations, 2021: 1–15. (查阅网上资料, 未找到本条文献出版地、页码信息, 请确认).