A Test-Time Adaptive Method for Nighttime Image-Aided Beam Prediction

SUN kunayng; YAO Rui; ZHU Hancheng; ZHAO JIaqi; LI Xixi; HU Dianlin; HUANG Wei

doi:10.11999/JEIT250530

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

SUN kunayng, YAO Rui, ZHU Hancheng, ZHAO JIaqi, LI Xixi, HU Dianlin, HUANG Wei. A Test-Time Adaptive Method for Nighttime Image-Aided Beam Prediction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250530

Citation:

SUN kunayng, YAO Rui, ZHU Hancheng, ZHAO JIaqi, LI Xixi, HU Dianlin, HUANG Wei. A Test-Time Adaptive Method for Nighttime Image-Aided Beam Prediction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250530

Citation:

PDF( 1840 KB)

A Test-Time Adaptive Method for Nighttime Image-Aided Beam Prediction

doi: 10.11999/JEIT250530 cstr: 32379.14.JEIT250530

1.
School of Computer Science and Technology/School of Artificial Intelligence, China University of Mining and Technology, Xu Zhou, Jiangsu, 221116, China
2.
Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, 999077, China
3.
School of Computer Science and Information Engineering, He Fei University of Technology, He Fi, 230009, China

Funds: The Fundamental Research for the Central Universities (XJ2025005101)

Received Date: 2025-06-09
Accepted Date: 2025-11-03
Rev Recd Date: 2025-08-28

Available Online: 2025-11-08

Abstract

Abstract

To address the high latency of traditional beam management methods in dynamic scenarios and the severe performance degradation of vision-aided beam prediction under adverse environmental conditions in millimeter-wave (mmWave) communication systems, this work proposes a nighttime image-assisted beam prediction method based on test-time adaptation (TTA). While mmWave communications rely on massive multiple input multiple output (MIMO) technology to achieve high-gain narrow beam alignment, conventional beam scanning mechanisms suffer from exponential complexity and latency bottlenecks, failing to meet the demands of high-mobility scenarios such as vehicular networks. Existing vision-assisted approaches employ deep learning models to extract image features and map them to beam parameters. However, in low-light, rainy, or foggy environments, the distribution shift between training data and real-time image features leads to a drastic decline in prediction accuracy. This work innovatively introduces a TTA mechanism, overcoming the limitations of conventional static inference paradigms. By performing a single gradient back propagation for entire model parameters during inference on real-time low-quality images, the proposed method dynamically aligns cross-domain feature distributions without requiring prior collection or annotation of adverse scenario data. Besides, an entropy minimization-based consistency learning strategy is designed to enforce prediction consistency between original and augmented views, driving model parameter updates toward maximizing prediction confidence and reducing uncertainty. Experimental results on real-world nighttime scenarios demonstrate that the proposed method achieves a top-3 beam prediction accuracy of 93.01%, outperforming static schemes by almost20% and significantly surpassing traditional low-light enhancement approaches. Leveraging the cross-domain consistency of background semantics in fixed-base-station deployment scenarios, this lightweight online adaptation mechanism enhances model robustness, offering a novel pathway for efficient beam management in mmWave systems operating in complex open environments. Objective Millimeter-wave communication, a cornerstone of 5G and beyond, relies on massive multiple-input multiple-output (MIMO) architectures to mitigate severe path loss through high-gain narrow beam alignment. However, traditional beam management schemes, dependent on exhaustive beam scanning and channel measurement, incur exponential complexity and latency (hundreds of milliseconds), rendering them impractical for high-mobility scenarios like vehicular networks. Vision-aided beam prediction has emerged as a promising solution, leveraging deep learning to map visual features (e.g., user location, motion) to optimal beam parameters. Despite its daytime success (>90% accuracy), this approach suffers catastrophic performance degradation under low-light, rain, or fog due to domain shifts between training data (e.g., daylight images) and real-time degraded inputs. Existing solutions rely on costly offline data augmentation with limited generalization to unseen harsh environment. This work addresses these limitations by proposing a lightweight, online adaptation framework that dynamically aligns cross-domain features during inference, eliminating the need for pre-collected harsh environment data. The necessity lies in enabling robust mmWave communications in unpredictable environments, a critical step toward practical deployment in autonomous driving and industrial IoT. Methods This TTA method operates in three stages. First, a pre-trained beam prediction model (ResNet-18 backbone) is initialized using daylight images and labeled beam indices. During inference, real-time low-quality nighttime images are fed into two parallel pipelines: (1) the original view and (2) a data-augmented view incorporating Gaussian noise. A consistency loss minimizes the prediction distance between these two views, enforcing robustness against local feature perturbations. Simultaneously, an entropy minimization loss sharpens the output probability distribution by penalizing high prediction uncertainty. These combined losses drive single-step gradient back propagation to update the model's entire parameters. This process aligns feature distributions between the training (daylight) and testing (nighttime) domains without altering the global semantic understanding, as illustrated in Fig. 2. The system architecture integrates a roadside base station equipped with an RGB camera and a 32-element antenna array, capturing environmental data and executing real-time beam prediction. Results and Discussions Experiments on a real-world dataset demonstrate the method’s superiority. Under nighttime conditions, the proposed TTA framework achieves 93.01% top-3 beam prediction accuracy, outperforming static inference (71.25%) and traditional low-light enhancement methods (85.27%) (Table 3). Ablation studies confirm the effectiveness of both the online feature alignment method designed for small-batch data (Table 4) and the entropy minimization with multi-view consistency learning (Table 5). Figure 4 illustrates the continuous online adaptation performance during testing, revealing rapid convergence that enables base stations to swiftly recover performance after new environmental disturbances occur. Conclusions To address the insufficient robustness of existing visual-aided beam prediction methods in dynamically changing environments, this study introduces a test-time adaptation framework using nighttime image-aided beam prediction. Firstly, a novel small-batch adaptive feature alignment strategy is developed to resolve feature mismatch in unseen domains while meeting real-time communication constraints. Besides, a joint optimization framework integrates classical low-light image enhancement with multi-view consistency learning, enhancing feature discrimination under complex lighting conditions. Experiments were conducted using real-scene data to validate the proposed algorithm. Results demonstrate that the method achieves over 20% higher Top-3 beam prediction accuracy compared to direct testing. This improvement highlights the method's effectiveness in dynamic environments. This approach provides new technical pathways for optimizing visual-aided communication systems in non-ideal conditions. Future work may extend to beam prediction under rain/fog and multi-modal perception-assisted communication systems.
- Vision Aided Beam Prediction,
- Test-Time Adaptation,
- Consistency learning

FullText(HTML)

References(19)

References

[1]	JIANG Shuaifeng and ALKHATEEB A. Computer vision aided beam tracking in a real-world millimeter wave deployment[C]. 2022 IEEE Globecom Workshops, Rio de Janeiro, Brazil, 2022: 142–147. doi: 10.1109/GCWkshps56602.2022.10008648.
[2]	HUANG Wei, HUANG Xueqing, ZHANG Haiyang, et al. Vision image aided near-field beam training for internet of vehicle systems[C]. 2024 IEEE International Conference on Communications Workshops, Denver, USA, 2024: 390–395. doi: 10.1109/ICCWorkshops59551.2024.10615560.
[3]	CHARAN G, OSMAN T, HREDZAK A, et al. Vision-position multi-modal beam prediction using real millimeter wave datasets[C]. 2022 IEEE Wireless Communications and Networking Conference, Austin, USA, 2022: 2727–2731. doi: 10.1109/WCNC51071.2022.9771835.
[4]	LI Kehui, ZHOU Binggui, GUO Jiajia, et al. Vision-aided multi-user beam tracking for mmWave massive MIMO system: Prototyping and experimental results[C]. IEEE 99th Vehicular Technology Conference, Singapore, Singapore, 2024: 1–6. doi: 10.1109/VTC2024-Spring62846.2024.10683659.
[5]	OUYANG Ming, GAO Feifei, WANG Yucong, et al. Computer vision-aided reconfigurable intelligent surface-based beam tracking: Prototyping and experimental results[J]. IEEE Transactions on Wireless Communications, 2023, 22(12): 8681–8693. doi: 10.1109/TWC.2023.3264752.
[6]	WEN Feiyang, XU Weihua, GAO Feifei, et al. Vision aided environment semantics extraction and its application in mmWave beam selection[J]. IEEE Communications Letters, 2023, 27(7): 1894–1898. doi: 10.1109/LCOMM.2023.3270039.
[7]	DEMIRHAN U and ALKHATEEB A. Radar aided 6G beam prediction: Deep learning algorithms and real-world demonstration[C]. 2022 IEEE Wireless Communications and Networking Conference (WCNC). Austin, USA, 2022, 2655–2660. doi: 10.1109/WCNC51071.2022.9771564.
[8]	ZHANG Tengyu, LIU Jun, and GAO Feifei. Vision aided beam tracking and frequency handoff for mmWave communications[C]. IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). New York, USA, 2022, 1–2. doi: 10.1109/INFOCOMWKSHPS54753.2022.9798197.
[9]	XU Weihua, GAO Fefei, TAO Xiaoming, et al. Computer vision aided mmWave beam alignment in V2X communications[J]. IEEE Transactions on Wireless Communications, 2023, 22(4): 2699–2714. doi: 10.1109/TWC.2022.3213541.doi:10.1109/twc.2022.3213541.
[10]	ALRABEIAH M, HREDZAK A, and ALKHATEEB A. Millimeter wave base stations with cameras: Vision-aided beam and blockage prediction[C]. 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 2020: 1–5. doi: 10.1109/VTC2020-Spring48590.2020.9129369.
[11]	WANG Heng, OU Binbao, XIE Xin, et al. Vision-aided mmWave beam and blockage prediction in low-light environment[J]. IEEE Wireless Communications Letters, 2025, 14(3): 791–795. doi: 10.1109/LWC.2024.3523400.
[12]	BASAK H and YIN Zhaozheng. Forget more to learn more: Domain-specific feature unlearning for semi-supervised and unsupervised domain adaptation[C]. 18th European Conference on Computer Vision, Milan, Italy, 2024: 130–148. doi: 10.1007/978-3-031-72920-1_8.
[13]	SCHNEIDER S, RUSAK E, ECK L, et al. Improving robustness against common corruptions by covariate shift adaptation[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 968.
[14]	MIRZA J M, SONEIRA P J, LIN Wei, et al. ActMAD: Activation matching to align distributions for test-time-training[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 24152–24161. doi: 10.1109/CVPR52729.2023.02313.
[15]	ZUIDERVELD K. Contrast limited adaptive histogram equalization[M]. HECKBERT P S. Graphics Gems IV. Amsterdam: Elsevier, 1994: 474–485. doi: 10.1016/B978-0-12-336156-1.50061-6.
[16]	ALKHATEEB A, CHARAN G, OSMAN T, et al. DeepSense 6G: A large-scale real-world multi-modal sensing and communication dataset[J]. IEEE Communications Magazine, 2023, 61(9): 122–128. doi: 10.1109/MCOM.006.2200730.
[17]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
[18]	DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
[19]	WANG Dequan, SHELHAMER E, LIU Shaoteng, et al. Tent: Fully test-time adaptation by entropy minimization[C]. 9th International Conference on Learning Representations, 2021: 1–15. (查阅网上资料, 未找到本条文献出版地、页码信息, 请确认).