Latest Articles

Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes/issues, but are citable by Digital Object Identifier (DOI).
Display Method:
A Low-latency Synchronization Header Detection Algorithm and Circuit for the JESD204C Interface
YIN Peng, ZHANG Chao, LEI Changan, HOU Weizhou, SHU Zhou, LIU Shubin, ZHU Zhangming
Available online  , doi: 10.11999/JEIT260163
Abstract:
  Objective  With rapid advances in high-speed electronics, front-end Analog-to-Digital Converters and Digital-to-Analog Converters (ADCs/DACs) continue to increase in sampling rate and resolution. Back-end Field-Programmable Gate Arrays and Application-Specific Integrated Circuits (FPGAs/ASICs) also provide stronger computing capability. These trends impose strict requirements on high-speed data interfaces, including high bandwidth, low latency, low power consumption, and reliable synchronization. As a mainstream high-speed Serializer/Deserializer (SerDes) interface, the JESD204C interface still suffers from long link initialization latency and high synchronization power consumption. These limitations restrict system real-time performance and energy efficiency. To address these issues, this study optimizes the link-layer design of the JESD204C receiver and proposes an efficient Synchronization Header (SH) detection method. The method implements exponential compression of the search set through global observation and iterative convergence. Detection efficiency is improved, fast and accurate SH positioning is achieved, link synchronization latency is reduced, and synchronization stability and energy efficiency are enhanced.  Methods  A typical JESD204C interface uses serial sliding detection for SH detection, which causes high link initialization latency and large delay jitter. To solve these problems, an Iterative Set Screening (ISS)-based SH detection algorithm is proposed. The SH detection task is modeled as the rapid localization of a deterministic pattern in a binary random sequence. A theoretical model based on information theory and stochastic processes is constructed. Expected space utilization and Bit Error Rate (BER) are introduced to support quantitative performance evaluation. In this model, SH candidate positions are defined as a dynamic set. Based on the inherent polarity inversion characteristic of the SH and global observations in each clock cycle, multilevel XOR logic is used to verify all candidate hypotheses in parallel. Non-inverting candidate positions are eliminated, and the search space is dynamically compressed. This design improves synchronization speed and position robustness, providing a low-latency and reliable initialization solution for high-speed SerDes links.  Results and Discussions  The proposed ISS-based SH detection algorithm is validated under harsh conditions, including SH crossing block boundaries and loss of lock caused by burst errors. The results demonstrate strong robustness, with rapid SH locking and link resynchronization under all test conditions (Figures 1116). To evaluate performance, four representative schemes are reproduced: a single-bit serial locking circuit, a 66-bit serial locking architecture, a register-intensive block synchronization method, and a parallel search circuit. A systematic comparison is then conducted between these schemes and the proposed design. The results show that the normalized locking time of the single-bit serial locking circuit, 66-bit serial locking architecture, and register-intensive block synchronization method varies substantially with SH position (Figure 17(a)), especially at block boundaries (Figure 17(b)). When the SH is located at the Most Significant Bit (MSB), typical sliding detection requires about 1.8 times the time needed at the Least Significant Bit (LSB), indicating strong sensitivity to the starting position and search path. In contrast, the proposed ISS scheme maintains a stable normalized locking time within 1.0 ± 0.05 across all positions, with the standard deviation reduced by more than 70%. By evaluating all candidate positions equally through parallel filtering, the scheme eliminates position dependence. Synchronization can be completed within tens of clock cycles whether the SH is located at the LSB, the MSB, or any other position in the block. The experimental results verify that the ISS algorithm improves synchronization robustness and predictability while accelerating link initialization. Table 3 summarizes the performance metrics. The average locking time is only 24.4 clock cycles, representing an overall improvement of more than 70% compared with the single-bit serial locking circuit, 66-bit serial locking architecture, and register-intensive block synchronization method. The standard deviation of locking time is only 4.7, indicating a more stable synchronization process. In terms of resource utilization, the design consumes 509 Look-Up Tables (LUTs) and only 2.0 mW, much lower than the 3 503 LUTs and 94.1 mW required by the register-intensive scheme. Its energy efficiency reaches 0.03 mW/bit, which is better than those of the three conventional methods. Compared with the parallel search circuit, the average locking time is reduced by 6.11%, power consumption is reduced by 50.3%, and energy efficiency is improved by 53.8%. Therefore, the proposed JESD204C receiver link shows advantages in SH detection speed, stability, power consumption, and energy efficiency.  Conclusions  An ISS-based SH detection algorithm is proposed for the JESD204C receiver. By screening the data stream in parallel through multilevel XOR logic, dynamically compressing the search space, and efficiently eliminating non-inverting candidate positions, the algorithm converges to the true SH position. This approach improves the conventional serial detection mechanism. The design is verified on the Xilinx KC705 FPGA platform. A Pseudorandom Binary Sequence 31 (PRBS31) is used to emulate the random distribution of polarity transitions, and a high-speed SubMiniature version A (SMA) cable is used for data loopback transmission. The results show that the algorithm achieves an average locking time of only 24.4 clock cycles, with a standard deviation as low as 4.7. High robustness is maintained for the SH at any position within the 66-bit block, and the energy efficiency reaches 0.03 mW/bit. The algorithm is superior to existing typical schemes in locking speed, delay stability, and energy efficiency. It provides a low-latency, reliable, and energy-efficient synchronization initialization approach for high-speed SerDes links.
Physiological Signal-driven QoE Optimization for Wireless Virtual Reality Transmission
WU Chang, PENG Mingyu, CHEN Yuang, CHEN Yiyuan, GUO Fengqian, QIN Xiaowei, LU Hancheng
Available online  , doi: 10.11999/JEIT260067
Abstract:
  Objective  Virtual Reality (VR) has become a transformative medium for immersive digital experiences because it can deliver high-resolution 360° video with ultra-low Motion-To-Photon (MTP) latency. However, its dependence on wireless transmission creates major challenges. Uncompressed data rates above 1 Gbit/(s·Hz) and latency thresholds below 20 ms place stringent demands on network infrastructure. In mobile scenarios, channel fluctuation and user mobility often compromise service continuity and cause abrupt resolution changes. Traditional Quality of Service (QoS) metrics, such as bandwidth, jitter, and packet loss, provide useful network-level information but cannot adequately reflect subjective user satisfaction. Existing Quality of Experience (QoE) models and Adaptive BitRate (ABR) algorithms often use symmetric metrics, such as Mean Opinion Score (MOS), and overlook the fact that users perceive quality deterioration and quality improvement differently. Sudden resolution downgrading has a stronger negative effect on immersion than the positive effect caused by resolution upgrading. This perceptual asymmetry is consistent with behavioral psychology but remains insufficiently addressed in current transmission schemes. In addition, the separation between Radio Access Network (RAN) resource provisioning and application-layer bitrate adaptation often causes mismatched optimization, video-quality oscillation, and resource underuse. To address these issues, this study establishes a quantitative link between physiological responses and resolution changes. It further develops a physiological signal-driven QoE framework integrated with Deep Reinforcement Learning (DRL) to support adaptive transmission, maximize immersion, and reduce the adverse effects of resolution fluctuation in resource-constrained wireless networks.  Methods  A two-stage method is adopted, including physiological signal analysis and joint optimization framework design. A controlled VR experiment is conducted to quantify the perceptual effect of resolution changes. Nineteen healthy subjects participate in a viewing task using an eye-tracking VR headset, a 32-channel wireless ElectroEncephaloGraphy (EEG) system, ElectroCardioGraphy (ECG) recording, and Galvanic Skin Response (GSR) sensors. The subjects view natural-scene videos in which the resolution levels, including 8K, 4K, 1080P, 720P, and 480P, switch randomly every 8 s. The collected EEG signals are preprocessed by independent component analysis and band-pass filtering. Event-Related Potential (ERP) components are analyzed, with emphasis on the N200 component in the temporal and occipital regions, which reflects visual processing and attention allocation. A Linear Discriminant Analysis (LDA) classifier is used to distinguish different response types. The analysis focuses on the asymmetry between resolution upgrading and downgrading, and on sensitivity to the magnitude of resolution jumps. Based on these physiological findings, a QoE model is formulated by adding penalty terms for resolution degradation and large-amplitude resolution switching. These penalties are weighted more strongly than upgrade rewards to represent user aversion to quality drops. The model is then integrated into an edge-computing environment through a dual-timescale DRL framework. The framework separates control into two cooperative agents: the Scheduling and Utility (SU) agent and the Resolution Scaling (RS) agent. The SU agent operates at the millisecond timescale and performs real-time wireless resource allocation. It uses a Gated Recurrent Unit (GRU) to extract temporal features from Channel State Information (CSI) and transmission history. It then dynamically allocates bandwidth to improve frame delivery success and maintain fairness under VR frame-deadline constraints. The RS agent operates at the frame timescale and determines the resolution of subsequent video frames. Its decision-making is guided by the physiological signal-driven reward function, which penalizes actions that may trigger negative physiological responses, such as sharp resolution drops, unless channel deterioration makes them necessary. Proximal Policy Optimization (PPO) is selected for both agents because of its stable learning behavior in continuous and discrete action spaces. Simulations are conducted using a 3GPP-based wireless channel module with user mobility, shadow fading, and path loss to create a dynamic network environment.  Results and Discussions  The physiological experiment and network simulations validate the proposed framework. In the physiological analysis, a clear N200 response is observed approximately 200 ms after resolution changes. The N200 amplitude is significantly larger during resolution downgrading than during resolution upgrading (p < 0.001), indicating that users are more sensitive to quality deterioration. Large resolution jumps, such as changes from 8K to 1080P, also induce stronger neural responses and more concentrated occipital energy than minor adjustments. The LDA classifier achieves an average Area Under the Curve (AUC) of 74.12% across 19 subjects, confirming that neural responses contain discriminative information about the direction of resolution change. The GSR results support these findings. A dual-branch GSR feature extraction and classification model reaches an average AUC of 78.10% in distinguishing upward and downward switching events. By contrast, ECG signals do not show a stable effect under the current experimental setting and analysis granularity. Therefore, the subsequent QoE model is mainly constructed from EEG and GSR findings. In the network performance evaluation, the proposed physiological signal-driven DRL framework is compared with several baselines, including Proportional-Fair (PF) scheduling, equal resource allocation, and traditional congestion control represented by SCReAM. The training curves show that the dual-agent system converges and learns to coordinate capacity provisioning with resolution decisions. The SU agent smooths short-term channel fluctuation and provides a stable capacity basis, which enables the RS agent to make more reliable resolution decisions. Quantitative results show that the proposed scheme improves the average video resolution by up to 88.7% compared with the equal-resource baseline. More critically, the resolution switching frequency is reduced by up to 81.0%. This reduction is essential because frequent switching, especially downward switching, causes user discomfort, as demonstrated by the physiological analysis. By prioritizing long-term resolution stability and penalizing abrupt drops through the physiological signal-driven reward function, the proposed system reduces the “ping-pong” effect commonly observed in traditional ABR algorithms. Compared with schemes using different penalty weights, the proposed method achieves a better balance. It avoids overly conservative behavior under large penalties, which lowers the average resolution, and unstable visual quality under small penalties, which increases resolution fluctuation. The joint optimization also allocates resources preferentially to users with urgent frame deadlines or higher risks of perceptible quality degradation, while maintaining a frame delivery success rate above 99%.  Conclusions  This paper addresses the conflict between wireless-channel instability and the human need for visually consistent VR streaming. By adopting a physiological signal-driven approach, the asymmetric effect of resolution changes on user experience is quantified, which challenges the symmetric assumptions used in traditional QoE models. Integrating this physiological evidence into a dual-timescale DRL framework enables the RAN to go beyond throughput-oriented optimization. Wireless resource allocation supports stable application-layer adaptation, while application-layer demands guide resource scheduling. The proposed solution improves immersive experience by increasing average resolution and reducing the physiologically disruptive effects of sudden quality degradation. The reduction in resolution switching frequency by more than 80% shows that the system can shield users from network variability. This study also indicates the value of edge intelligence in making resource-allocation decisions based on human perception rather than network statistics alone. Future work should extend the QoE model by considering multisensory factors, such as MTP latency, cybersickness, spatial distortion, stalling, and audiovisual synchronization. Individual differences in physiological sensitivity should also be addressed through personalized modeling. For real-world deployment, privacy protection is essential. Federated learning and local edge updates may allow biometric data to be processed locally while supporting global policy optimization. This work provides a human-centric basis for immersive networking and shifts the focus from QoS to physiologically validated QoE.
A General Evaluation Framework for Mission Planning Algorithms for Remote Sensing Satellite Constellations
LI Jinfei, YU Xiaogang, TIAN Jing, HE Haochen, XING Xiangwei, ZHANG Xiaohan
Available online  , doi: 10.11999/JEIT260335
Abstract:
  Objective  The rapid growth in remote sensing satellite constellations has shifted mission planning from single-satellite static scheduling to large-scale dynamic coordination across heterogeneous constellations. However, evaluation methods have not kept pace with algorithm development. Existing studies often rely on private datasets, simplified metrics centered on Completion Rate, and idealized simulations that ignore realistic constraints, such as attitude maneuvers, illumination conditions, and dynamic task insertion. These limitations prevent fair cross-paper comparison and slow engineering application. To address this gap, this paper proposes the Remote Sensing Constellation Mission Planning Benchmark (RSCMP-Bench), a general, open, and reproducible evaluation framework. It is designed as a unified benchmark for the community, similar to ImageNet in computer vision and General Language Understanding Evaluation (GLUE) in Natural Language Processing (NLP).  Methods  RSCMP-Bench consists of three components. First, the multi-scenario standard task library contains 300 standardized scenarios at three difficulty levels: Low, Medium, and High, with 100 scenarios per level. Satellite numbers range from 30 to 200, and task demands range from 56 to 560. All scenarios are generated from public Two-Line Element (TLE) data and explicitly model realistic constraints. Optical satellites require a minimum solar elevation angle, and Synthetic Aperture Radar (SAR) satellites require incidence angles within specified ranges. General constraints, such as per-orbit maximum on-time, minimum single-operation on-time, attitude maneuver time, and valid execution windows, are also modeled. The scenarios include point tasks, area tasks, static tasks, and dynamically inserted tasks. Second, the multi-dimensional effectiveness evaluation system includes a Basic Performance layer and a Dynamic Adaptability layer. The Basic Performance layer uses Completion Rate, Weighted Completion Rate, Average Response Delay, and Time Utilization. The Dynamic Adaptability layer uses multi-stage rolling evaluation with random dynamic task insertion. The Dynamic Adaptability Score measures the post-insertion Completion Rate relative to the baseline, and Dynamic Response Efficiency measures the performance gain per unit replanning time. A composite RSCMP-Bench Score is also provided. Third, the simulation and evaluation platform uses a client-server architecture. It integrates a Simplified General Perturbations 4 (SGP4) propagator, algorithm adapters, two-stage constraint verification, an intelligent scenario generator, and visualization tools. The platform has been deployed at https://www.tianzhibei.com and has supported a national competition with more than 80 research teams.  Results and Discussions  Baseline experiments comparing Random Scheduler and Priority Greedy validate the feasibility, reproducibility, and discriminative capacity of RSCMP-Bench. Random Scheduler yields very low Completion Rates of 7.3%, 3.8%, and 1.9% on the Low, Medium, and High levels, respectively. These results confirm the extreme sparsity of the feasible solution space. Priority Greedy achieves higher Completion Rates but still degrades as scenario difficulty increases, decreasing from 76.1% at the Low level to 63.7% at the Medium level and 49.2% at the High level. These findings indicate that high-difficulty scenarios remain challenging even for reasonable heuristic methods. They also show considerable room for more advanced algorithms. The dynamic adaptability protocol quantifies algorithm robustness under unexpected dynamic task insertion, which is not captured by static evaluations. The two-stage constraint verification module rejects infeasible plans and generates detailed error reports to support debugging.  Conclusions   RSCMP-Bench provides a unified, fair, and reproducible benchmark for remote sensing constellation mission planning. By combining a public library of 300 standardized scenarios, a multi-dimensional effectiveness evaluation system based on Basic Performance and Dynamic Adaptability, and a simulation and evaluation platform with realistic constraints and automated scenario generation, the framework addresses the long-standing lack of standardized evaluation in this field. Baseline results confirm its discriminative capacity and reveal clear performance bottlenecks in large-scale dynamic scenarios. Inspired by ImageNet and GLUE, RSCMP-Bench can support systematic community evaluation and fair competition. The framework has been deployed at https://www.tianzhibei.com, and its adoption can accelerate progress in intelligent mission planning for next-generation remote sensing constellations.
Survey on Intelligent Semantic Covert Communication
FENG Zhaoxin, XU Yifan, XING Chengwen, XU Yuhua, ZHAO Nan, WANG Jinlong
Available online  , doi: 10.11999/JEIT260184
Abstract:
  Significance   As the Sixth-Generation mobile communication network (6G) evolves from the Internet of Everything to the Intelligent Internet of Everything, the communication paradigm is shifting from reliable bit transmission to effective semantic transmission. Semantic communication extracts and compresses task-related semantics to reduce redundancy and resource use. However, because semantic information is highly structured and task-specific, it is vulnerable to eavesdropping, inference, and attacks. Covert communication addresses this risk by hiding transmission behavior from unauthorized monitoring. With support from Artificial Intelligence (AI), covert communication can use reinforcement learning to adjust power and resource allocation in dynamic environments. Generative models can also conceal transmitted signals by learning and reproducing environmental patterns. However, strict covertness constraints limit the achievable transmission rate and make large-scale information transmission difficult. Intelligent semantic covert communication integrates semantic extraction with covert transmission, providing a reliable approach to secure and efficient 6G communications.  Progress   With the development of AI, especially deep learning for complex feature modeling, semantic communication can support efficient semantic extraction and nonlinear compression of multimodal data. Research on semantic communication has also shifted from Separate Source-Channel Coding (SSCC) to Joint Source-Channel Coding (JSCC), which supports end-to-end training and improved transmission performance. For image transmission, Convolutional Neural Networks (CNNs) use local receptive fields to capture spatial correlations. For sequential data transmission, Long Short-Term Memory (LSTM) networks use gating mechanisms to maintain temporal coherence. In covert communication, Generative Adversarial Networks (GANs) and diffusion models can learn the statistical patterns of environmental noise in the time, frequency, and spatial domains, thereby concealing transmitted signals. These methods reduce the effectiveness of unauthorized monitoring and detection, and improve system adaptability in dynamic environments. AI also improves autonomous decision-making in dynamic covert communication. By modeling covert transmission as a Markov Decision Process (MDP), Deep Reinforcement Learning (DRL) can learn resource allocation strategies through interaction with the environment. This approach reduces computational complexity compared with traditional convex optimization methods. By integrating semantic extraction and covert transmission, intelligent semantic covert communication further supports semantic-driven covert transmission. Large Language Models (LLMs) can evaluate semantic sensitivity and contextual risks, enabling selective covert transmission of sensitive semantic information.  Conclusions  Research on intelligent semantic covert communication shows the advantages of coordinated semantic perception and physical-layer covert mechanisms. AI improves semantic extraction efficiency and strengthens adaptation to dynamic and complex environments. By integrating semantic understanding with covert transmission strategies, intelligent semantic covert communication supports both efficiency and security for ubiquitous 6G services.  Prospects   Future research on intelligent semantic covert communication should address several key challenges, including AI-enabled detection, unified semantic metrics, lightweight model design, multimodal semantic alignment, system interpretability, and semantic hallucination. Active threat detection and adaptive defense strategies are needed to counter AI-driven surveillance. Causal reasoning in Large Multimodal Models (LMMs) can help mitigate semantic hallucination and improve data transmission reliability. Advances in model compression and cloud-edge collaboration are also needed to deploy high-complexity AI models on resource-limited terminals. With the rapid development of AI, intelligent semantic covert communication is expected to provide core support for intelligent connectivity of everything and help build more secure, efficient, and reliable 6G networks.
DeepION Model Evaluation for SPP Navigation Performance During Solar-active Periods
WANG Zitong, FU Haiyang, JIANG Zhuojun, CAI Dijia
Available online  , doi: 10.11999/JEIT250662
Abstract:
  Objective  Accurate characterization of ionospheric variability is essential for reliable Global Navigation Satellite System (GNSS) positioning, especially during geomagnetic storms, when rapid and highly structured disturbances occur. Existing empirical and physics-based ionospheric models often have limited ability to represent storm-time ionospheric dynamics and small-scale irregularities in real time. This study develops a unified data-driven ionospheric modeling framework that uses GNSS-derived Slant Total Electron Content (STEC) time series as input and learns spatiotemporal mappings to key ionospheric parameters, including Vertical Total Electron Content (VTEC) and the Rate Of TEC Index (ROTI). By using deep operator learning, the proposed framework improves short-term ionospheric modeling and forecasting under disturbed conditions and provides more reliable ionospheric corrections for single-frequency positioning.  Methods  A unified data-driven ionospheric modeling framework, named DeepION, is proposed based on the Deep Operator Network (DeepONet) architecture. The framework uses STEC time series as the primary input and learns nonlinear spatiotemporal mappings to key ionospheric parameters. DeepION models and predicts STEC and VTEC, whereas ROTI is derived from the predicted STEC series. In the network design, a Convolutional Neural Network (CNN) is used as the branch network to extract spatiotemporal features from historical STEC time series. The trunk network uses a multilayer fully connected structure with periodic time encoding. Its inputs include GNSS observation geometry and temporal information, which allows the model to capture the continuous temporal dynamics of ionospheric behavior. During data preprocessing, a VTEC-based modeling strategy is first used to estimate and remove receiver Differential Code Bias (DCB), thereby providing high-quality STEC observations. The model is then trained and validated using GNSS observations collected during the May 2024 geomagnetic storm. Its outputs include ray-path STEC values, gridded VTEC fields, and derived ROTI time series. The proposed framework is further evaluated by incorporating model-derived VTEC corrections into Single Point Positioning (SPP) experiments. Modeled and observed ionospheric parameters are compared under both geomagnetically quiet and disturbed conditions to assess the modeling accuracy and practical performance of DeepION.  Results and Discussions  The experimental results show that DeepION robustly characterizes ionospheric spatiotemporal variability under different space weather conditions. It captures both large-scale structures and small-scale disturbances during geomagnetic storms. For STEC forecasting, the model achieves a Root Mean Square Error (RMSE) of 12.82 TECU over a 3-day prediction horizon and maintains high consistency with GNSS observations (Fig. 4). The model also predicts ionospheric irregularities accurately, as indicated by the close agreement between predicted and observed ROTI time series at the mid-latitude NVSK station (Fig. 5). For VTEC modeling, DeepION-generated global VTEC maps reproduce equatorial anomalies and storm-enhanced density regions. These maps closely match the Center for Orbit Determination in Europe Spherical Harmonic (CODE-SH) model and outperform the Klobuchar and NeQuick empirical models in spatial resolution and structural fidelity (Fig. 6). Further ray-path-level analysis shows that STEC derived from DeepION-based VTEC mapping yields the lowest residual error at the mid-to-high-latitude NLIB station. It achieves an RMSE of 6.80 TECU, outperforming Klobuchar and NeQuick and slightly improving on CODE-SH (Fig. 7). In GNSS positioning applications, the SPP results show that DeepION-derived ionospheric corrections consistently reduce positioning errors at both the CUSV and NLIB stations. The improvement is especially clear in the vertical and geometric components during storm-time conditions, indicating stronger robustness under intensified geomagnetic disturbances (Fig. 8, Fig. 9).  Conclusions  This study presents DeepION, a data-driven ionospheric modeling framework based on the DeepONet architecture. The framework learns spatiotemporal relationships between GNSS-derived STEC observations and key ionospheric parameters. With a CNN-based branch network and a periodically encoded trunk network, DeepION models and predicts STEC and VTEC, and then derives ROTI from the predicted STEC series. Experiments using global GNSS data during the May 2024 geomagnetic storm show that DeepION captures storm-time ionospheric variability and achieves stable performance in STEC forecasting and global VTEC reconstruction. Compared with conventional empirical and physics-based models, DeepION improves modeling accuracy and spatial representation. SPP experiments further show that ionospheric corrections derived from DeepION reduce positioning errors at both mid- and high-latitude stations, especially in the vertical and geometric components under disturbed geomagnetic conditions. These results indicate the practical value of DeepION for GNSS ionospheric correction during space weather events. Overall, DeepION provides a scalable framework for data-driven ionospheric modeling. Future work will extend it to multi-GNSS constellations, longer prediction lead times, and additional ionospheric observations.
A Point Cloud Slice-based UAV SLAM Method for 3D Reconstruction of Large Container Port Areas
HU Zhaozheng, ZUO Zhihang, XU Cong, TAO Qianwen, LIU Chao, MENG Jie
Available online  , doi: 10.11999/JEIT251112
Abstract:
  Objective  With the continuous development of port intelligence, the demand for digital management in container port areas has increased. In large container yards, Three-Dimensional (3D) reconstruction of the yard environment can be achieved using Unmanned Aerial Vehicle (UAV)-based Simultaneous Localization And Mapping (SLAM). However, container port areas contain many repetitive semantic structures. Traditional semantic matching methods therefore show low efficiency and limited accuracy. In addition, lanes between container yards form large feature-sparse regions during UAV-based 3D reconstruction, which can cause odometry degradation. Repetitive scene features also interfere with loop closure detection. To address these problems, this paper proposes a rapid feature extraction method based on point cloud slicing and further optimizes it according to the structural characteristics of container yards. A UAV point cloud slice-based SLAM method, termed Slice-SLAM, is proposed for high-precision 3D reconstruction of large container port areas.  Methods  To improve point cloud semantic extraction, a rapid point cloud slicing method is proposed. The principal direction is extracted rapidly, and the point cloud is divided into multiple layers to obtain multi-layer semantic point clouds efficiently. The slicing strategy is further optimized for container yard scenarios. Principal plane extraction is simplified using the gravity direction, and the elevation range of each container layer is obtained adaptively from point cloud density gradient changes. Multi-layer slice point clouds are then constructed. A progressive adaptive Light Detection And Ranging (LiDAR) odometry method based on slice point clouds is developed. Elevation slices are used to identify degenerate scenarios adaptively, and a layer-wise incremental slice matching and fusion strategy is used. This improves the accuracy, efficiency, and stability of LiDAR odometry. In addition, a factor graph optimization method that integrates slice point cloud information is designed. Fusion voting is performed on the matching results of multi-layer slice point clouds to remove erroneous matches and reduce the effect of repetitive structures on loop closure detection. Slice factors are then used to construct factor graph edges, which improves global optimization and supports efficient and stable 3D reconstruction.  Results and Discussions  The feasibility and effectiveness of the proposed method are verified in CARLA simulation scenarios and real-world tests at a large container port in Wuhan. First, comparisons with three semantic extraction algorithms, namely RANSAC, Region Growth, and 3DG_SEG, demonstrate the efficiency and accuracy of the proposed semantic extraction method. Second, estimated trajectories are compared with those obtained by two open-source LiDAR algorithms, FAST-LIO2 and Faster-LIO, confirming the advantages of the proposed odometry method. Finally, speed and confidence score are compared with those of six algorithms: ICP, NDT, GICP, Fast-GICP, Scan Context+ICP, and Quatro. The loop closure detection module of LIO-SAM is also integrated into FAST-LIO2, and the Scan Context module is integrated into Faster-LIO. The resulting estimated trajectories are compared with those of the proposed method, verifying the effectiveness of the proposed loop closure detection algorithm. The proposed method achieves high 3D reconstruction accuracy and is suitable for practical port operations.  Conclusions  The proposed method uses an efficient point cloud slicing technique and a multi-layer slice matching mechanism. Points within the same elevation range are defined as a slice point cloud, and the segmentation process is defined as point cloud slicing. This design enables efficient and robust 3D reconstruction in large-scale scenes with repetitive features. First, the LiDAR point cloud is aligned with the positive Z-axis using the gravity direction derived from the Inertial Measurement Unit (IMU). A sliding window records density gradient changes to determine the elevation range of each layer adaptively. This simplifies point cloud slicing and reduces the effects of non-standard containers and ground height variations on semantic extraction. Multi-layer slice information is then integrated into the odometry module to detect degenerate scenarios. Under normal conditions, progressive slice matching is used to initialize pose estimation. In degenerate scenarios, iterative Kalman filtering with increased IMU weighting is used. Finally, the fusion voting mechanism removes outliers from multi-layer slice matching results. The optimal match is used to initialize loop closure for global registration of container-region point clouds, enabling dual-stage loop closure detection and slice factor construction. By integrating slice point cloud information into factor graph optimization, the proposed method unifies point clouds in a common coordinate system and achieves efficient and robust 3D reconstruction.
Power Side-channel Leakage Assessment and Chosen-ciphertext Attack on the Decoding Function of Kyber
QIU Yubo, LI Ziqi, YUAN Chaoxuan, ZHOU Zijian, HU Wandi, HU Wei
Available online  , doi: 10.11999/JEIT251243
Abstract:
  Objective   The standardization of Post-Quantum Cryptography (PQC) has made the implementation security of Kyber a practical concern. Kyber, standardized as Module-Lattice-based Key-Encapsulation Mechanism (ML-KEM), is a lattice-based scheme with favorable efficiency and security based on the hardness of the Module Learning With Errors (MLWE) problem. However, its deployment on embedded devices can still produce measurable physical leakage. Existing studies have shown that side-channel attacks can target several Kyber modules, but two issues remain insufficiently studied. First, the leakage strengths of different auxiliary functions on the decapsulation and re-encryption path have not been compared under a unified assessment framework. This limits the identification of the most vulnerable implementation-level weak point. Second, although chosen-ciphertext attacks and power analysis have been studied, the decoding function poly_frommsg() has not been fully examined from the perspective of periodic leakage modeling and low-query key recovery. To address these issues, this work evaluates function-level leakage in the key operations of Kyber decapsulation and develops a chosen-ciphertext Simple Power Analysis (SPA) attack against the most vulnerable decoding function. The study provides a practical attack method and implementation-oriented security insights for protecting post-quantum cryptographic software on embedded platforms.  Methods   A function-oriented evaluation-and-attack framework is established for the execution path of Kyber.CCAKEM.Dec(). Four representative target functions are selected: the Barrett reduction function poly_reduce(), the encoding function poly_tomsg(), the decoding function poly_frommsg(), and the hash function G(). For each function, the intermediate variable with the largest data-dependent bit transition under crafted ciphertext inputs is first analyzed from the perspective of Hamming-distance leakage. Two ciphertext sets are then constructed so that the selected intermediate variable takes two maximally distinguishable values. For each set, 50 power traces are collected. The experiments are performed on an STM32F407IG embedded platform, and power signals are captured using a PicoScope 6406E oscilloscope at a sampling rate of 5 GS/s. Welch’s t-test-based Test Vector Leakage Assessment (TVLA) is used to quantify leakage significance, with ±4.5 used as the decision threshold for leakage detection. After poly_frommsg() is identified as the most vulnerable point, a chosen-ciphertext SPA attack is designed. The attack first constructs ciphertexts according to the coefficient range of the secret polynomial. It then extracts 256 Points of Interest (PoIs) from reference traces through local-maximum search. Finally, a grouped threshold model is built according to the periodic energy structure of the PoIs. The recovered message bits are mapped back to the coefficients of the secret polynomial, enabling full private-key reconstruction for Kyber512 and Kyber768.  Results and Discussions   The leakage assessment shows clear differences among the four target functions. For poly_reduce(), the intermediate variable t directly depends on the coefficients of the intermediate polynomial mp, and the maximum Hamming distance reaches 13. The measured TVLA peaks are therefore concentrated around 50 for both Kyber512 and Kyber768 (Fig. 5). For poly_tomsg(), the relevant binary transition corresponds to a Hamming distance of only 1, and the observed TVLA values are much smaller, at approximately 6 (Fig. 6). For poly_frommsg(), the message-dependent mask flips between 0 and 0xffff, producing a Hamming distance of 16 and the strongest leakage among all tested functions. The TVLA peaks reach about 60, identifying this module as the primary attack target (Fig. 7). For the hash function G(), the leakage is weaker and less regular, but several sampling points still exceed the TVLA threshold. This result indicates that theoretical indistinguishability under chosen-ciphertext attack (IND-CCA) reinforcement through the Fujisaki-Okamoto (FO) transform does not automatically remove physical leakage (Fig. 8). These results show that implementation-level vulnerability is strongly associated with data-dependent bit transitions. They also show that linear message-expansion functions may expose more stable power signatures than some arithmetic modules. Based on this observation, the proposed attack focuses on poly_frommsg(). Local-extrema analysis shows that the 256 message-bit operations generate 256 stable PoIs. Their energy values show a periodic pattern with an approximate period length of 8 (Fig. 10, Fig. 11). Instead of applying a single global threshold to all PoIs, the proposed grouped threshold model divides the PoIs according to their positions within the period and computes location-aware thresholds. This design suppresses position-dependent drift and improves the consistency of bit decisions. The resulting message-recovery procedure reliably reconstructs the bit sequence from one attack trace under each chosen ciphertext. Combined with the precomputed ciphertext table, only 6 chosen ciphertexts are required to recover the private key of Kyber512, and only 9 chosen ciphertexts are required for Kyber768. Compared with the prior poly_frommsg()-based method, which requires 8 and 12 ciphertexts, respectively, the proposed method reduces the ciphertext requirement by 25.0% while maintaining a 100% success rate (Table 4). Compared with the attack on poly_tomsg(), the proposed method exploits a function with stronger leakage observability and therefore achieves higher decision stability and equal or better overall efficiency. The periodic PoI model is thus not only an empirical observation, but also a direct basis for the attack design and a key reason for the practical gain in low-query key recovery.  Conclusions  This work shows that Kyber contains different implementation-level vulnerabilities along its decapsulation path and that poly_frommsg() is the most critical leakage point in the tested software implementation. By combining function-level TVLA assessment with a chosen-ciphertext SPA attack, the study identifies leakage sources in poly_reduce(), poly_tomsg(), poly_frommsg(), and G(). It also converts the observed periodic leakage structure of poly_frommsg() into an effective grouped threshold model for key recovery. The resulting attack reduces the number of required ciphertexts for Kyber512 and Kyber768 to 6 and 9, respectively, while preserving a 100% success rate. These findings indicate that practical protection of post-quantum software should go beyond algorithm-level security claims. Masking, execution randomization, balanced implementations, and function-level leakage testing should be considered explicitly during deployment and validation.
Recent Advances in Remote Sensing Image-Text Retrieval Driven by Vision-Language Foundation Models
WU Hui, ZHAO Yan, ZHANG Peirong, HOU Yingyan, QI Xiyu, WANG Lei
Available online  , doi: 10.11999/JEIT260189
Abstract:
  Significance  Remote Sensing Image-Text Retrieval (RS-TIR) connects large-scale Earth observation imagery with natural-language queries and has become an important interface for geospatial intelligence systems. Compared with conventional content-based retrieval, RS-TIR allows users to search for scenes, objects, spatial layouts, and functional regions through semantic descriptions rather than handcrafted visual cues. This capability is increasingly needed in natural resource monitoring, urban governance, disaster response, environmental assessment, and on-demand retrieval from rapidly growing satellite archives. However, RS-TIR remains challenging. Remote sensing imagery is captured from nadir or near-nadir perspectives, shows strong rotation invariance, and contains extreme scale variation, ranging from tiny vehicles to large airports. It also requires domain-specific semantic descriptions, such as land-use attributes, spatial distributions, and geoscientific relations. Meanwhile, high-quality image-text annotations remain limited relative to the scale of remote sensing data. These properties widen the cross-modal semantic gap between images and language and limit the generalization ability of traditional cross-modal retrieval methods. Against this background, this review examines how Vision-Language Foundation Models (VLMs) reshape RS-ITR through large-scale contrastive pre-training, stronger transferable representations, and more flexible multimodal interaction mechanisms. It also explains why remote sensing adaptation is needed and why a focused synthesis of architectures, datasets, alignment mechanisms, and future directions is timely for this field.  Progress   The technical development of RS-ITR is reviewed from three complementary perspectives. First, this review summarizes the domain-specific challenges that shape the task, including visually isotropic topology with extreme scale variation, professional and fine-grained textual semantics, and the compounded cross-modal semantic gap between overhead imagery and natural-language descriptions (Fig. 3). The overall survey structure is then presented to show the logical progression from task formulation to future challenges (Fig. 1). From a methodological perspective, RS-ITR has evolved from handcrafted visual descriptors and shallow semantic mapping to deep representation learning, and then to VLM-driven paradigms with stronger generalization and zero-shot transfer capability (Fig. 4, Table 2). Early methods rely on color, texture, shape, and hash-based retrieval. However, they struggle to model high-level geospatial semantics and complex scene composition. Deep learning methods improve retrieval by learning joint embedding spaces, adopting dual-encoder or interaction-based architectures, and using multi-scale feature fusion and region-aware matching. These methods improve semantic consistency, but they still depend heavily on labeled data and often show limited robustness in open or cross-sensor scenarios. Second, this review summarizes the benchmark ecosystem used to evaluate these methods. Representative datasets range from small-scale test sets, such as Sydney-Caption and UCM-Caption, to mainstream benchmarks, such as RSICD and RSITMD, and recent large-scale training resources, such as RS5M and SkyScript (Table 1). These datasets show a clear transition from small manually annotated corpora to web-scale or automatically generated image-text pairs. This transition supports domain pre-training and large model adaptation. Third, this review analyzes the core VLM techniques that now drive progress in RS-ITR. The model spectrum and representative architecture families are systematically summarized, including contrastive dual-encoder models, multimodal interaction models, and remote sensing foundation models integrated with large language models (Fig. 5, Fig. 6, Table 3). Domain adaptation routes are further grouped into continued remote sensing pre-training, parameter-efficient transfer learning, adapter-based tuning, prompt learning, and instruction tuning. At the semantic alignment level, this review focuses on contrastive joint embedding, fine-grained multi-scale alignment, and the use of remote sensing priors, such as spatial topology and geolocation. Performance comparisons on RSICD and RSITMD show that remote sensing VLMs, especially RemoteCLIP, GeoRSCLIP, iEBAKER, and LRSCLIP, yield consistent gains in mean Recall (mR) and overall retrieval robustness (Table 4). In parallel, this review tracks the extension of retrieval capability into unified multi-task remote sensing models, in which retrieval, grounding, segmentation, and reasoning begin to share a common multimodal representation space.  Conclusions  Several conclusions are drawn from the comparative analysis. First, VLMs establish a dominant paradigm for RS-ITR because they narrow the cross-modal semantic gap and improve transferability across datasets and scenes. Second, no single architecture is universally optimal. Dual-encoder models remain attractive for large-scale retrieval because of their efficiency, whereas interaction-based or instruction-enhanced models provide finer semantic alignment at a higher computational cost. Third, domain adaptation is indispensable. Continued pre-training on remote sensing image-text corpora, parameter-efficient tuning, and prompt-based adaptation consistently outperform direct reuse of internet-trained VLMs. This finding indicates that remote sensing imagery differs too strongly from natural-image distributions for generic pre-training alone to be sufficient. Fourth, the most effective recent methods do not improve performance through scale alone. They also exploit remote sensing-specific information, including multi-scale structures, foreground objects, explicit keyword reasoning, and spatial priors. Finally, this review shows that the field is shifting from isolated retrieval models toward more general geospatial multimodal systems. Retrieval is no longer treated only as a matching task. It is also becoming a key capability that supports question answering, instruction following, knowledge augmentation, and coordinated reasoning in remote sensing applications.  Prospects   Future research is expected to advance in four closely related directions. The first direction is the unified representation of multi-source heterogeneous data, especially the integration of optical imagery with Synthetic Aperture Radar (SAR), hyperspectral data, thermal infrared observations, and multi-temporal acquisitions. The second direction is knowledge-enhanced retrieval, in which geospatial priors, land-use rules, remote sensing terminology, and external knowledge bases are incorporated into multimodal alignment and retrieval-augmented reasoning. The third direction is lifelong and open-world learning. Real deployment requires models to remain reliable under seasonal variation, sensor updates, regional domain shifts, cloud contamination, and newly emerging categories, while avoiding catastrophic forgetting. The fourth direction is efficiency and deployability. Practical remote sensing systems often operate under tight computational budgets. Therefore, lightweight tuning, sparse computation, token reduction, model compression, and on-orbit and edge inference will become increasingly important. Interactive and explainable retrieval is also likely to gain importance. It allows analysts to refine queries through dialogue and inspect the image regions or semantic cues that support retrieval decisions. Overall, continued progress in data construction, domain adaptation, semantic alignment, and efficient multimodal modeling is expected to make RS-ITR a more robust infrastructure capability for Earth observation applications.
Towards Privacy-Preserving and Lightweight Modulation Recognition for Short-Wave Signals under Channel Shifts
YAO Yizhou, DENG Wen, LI Baoguo
Available online  , doi: 10.11999/JEIT251017
Abstract:
  Objective  Supervised short-wave signal modulation recognition methods generally assume identical distributions between source-domain training data and target-domain test data. Short-wave channels are affected by ionospheric variation, which creates substantial distribution discrepancies across domains and reduces model performance. Deployment on unmanned edge platforms is further restricted by limited computational resources, scarce labeled samples, and data-privacy requirements. This study proposes a lightweight recognition method based on source-model transfer that enables privacy-preserving model adaptation without access to source-domain data.  Methods  A Multi-Modal Source-Model Transfer Framework (M-SMOT) is developed. It applies information-maximization loss and a self-supervised pseudo-labeling strategy to support model adaptation without revisiting source-domain data. The method achieves cross-channel recognition of short-wave modulation signals with reduced computational cost while maintaining data privacy. Multi-modal information—including in-phase/quadrature (I/Q) components, amplitude-phase (AP) characteristics, and spectral features—is fused to exploit complementary representations and improve robustness under complex channel variation.  Results and Discussions  Experiments show that the proposed method consistently outperforms the Source-Only baseline across six cross-channel scenarios, with accuracy gains from 0.31% to 10.81% (Table 1). In few-shot adaptation, average recognition accuracies reach 98.3% and 96% of the full-sample baseline when target-domain samples are reduced to 10% and 1%, respectively (Fig. 12). Ablation studies confirm the effectiveness of the self-supervised pseudo-labeling module (Fig. 16) and the multi-modal fusion strategy (Fig. 17). The lightweight design is verified by zero source-data storage, a peak memory footprint of 6.00 MB, and convergence within one fine-tuning epoch (Table 2). These findings show that the method mitigates domain discrepancies and protects privacy under resource-limited conditions.  Conclusions  The M-SMOT method integrates data-privacy protection, source-model adaptation, few-shot generalization, and low resource consumption. It provides a practical solution for cross-channel modulation recognition in short-wave communication and is suited for deployment on resource-constrained edge devices.
Construction of DNA Strand Displacement Memristor and Research on Its Filter Circuit Characteristics
WANG Yanfeng, CHEN Guanzhou, SUN Ce, SUN Junwei
Available online  , doi: 10.11999/JEIT260283
Abstract:
  Objective  In modern control and signal processing systems, filter circuits are essential for noise suppression and signal integrity enhancement. Conventional RC filters, while widely used, lack adaptability and miniaturization capabilities required for emerging molecular and nano-scale computing platforms. This study introduces a novel integration of DNA Strand Displacement (DSD) technology with memristor-based circuits to develop tunable, multi-stable molecular filters. The objective is to design and validate first- and second-order low-pass filter circuits that leverage the dynamic response and state-dependent behavior of DSD-based memristors. These filters aim to achieve improved frequency selectivity, parameter adaptability, and system stability compared to traditional filter architectures. The proposed approach targets applications in molecular signal processing, integrated bio-circuits, and adaptive filtering systems where compact size and reconfigurability are critical.  Methods  The methodology follows a four-stage process. First, core DSD reaction modules (sine, cosine, integration, addition, multiplication) are designed to construct a programmable multi-state memristor model. Second, DSD-based square and sinusoidal inputs are synthesized to evaluate memristor response under varying frequencies and amplitudes. Third, these memristors are integrated into RC filter topologies to build first-order and second-order low-pass filters, replacing fixed resistors with tunable DSD-based memristive elements. Fourth, comprehensive simulations are performed using Visual DSD for molecular dynamics and MATLAB for circuit-level analysis. Performance is assessed via transfer functions, Nyquist plots, Bode diagrams, and time-domain comparisons with classical RC filters. This multi-tool approach rigorously validates both molecular feasibility and electronic functionality.  Results and Discussions  The DSD-based memristor exhibits multi-stable behavior with six equilibrium states under controlled initial conditions (Fig. 7). The first-order filter provides stable attenuation for square and sinusoidal inputs, with output amplitudes consistently exceeding those of traditional RC filters across tested frequencies (Table 3). The second-order filter further reduces signal delay and improves stability, especially under high-frequency inputs (Table 4). Frequency response analyses confirm that cutoff frequencies can be dynamically tuned by adjusting DSD reaction rates and initial concentrations (Figs. 8, 10). The system maintains robust performance under varying signal types and environmental simulations, demonstrating adaptability. These results validate the feasibility of DSD-memristor integration for adaptive filtering, offering a promising alternative to conventional rigid circuits in molecular-scale applications.  Conclusions  This study successfully designs and validates a DSD-based memristor with multi-stable characteristics and its corresponding first- and second-order low-pass filter circuits. The proposed filters demonstrate superior performance in terms of output stability, parameter tunability, and frequency adaptability compared to traditional RC architectures. By integrating DSD technology with memristor theory, we enable a new class of reconfigurable, molecular-scale filtering systems suitable for advanced signal processing applications. The work provides a foundation for future research in adaptive molecular circuits, intelligent filtering, and nano-electronic system design. Further developments could include hardware implementation, real-time tuning algorithms, and integration with machine learning for autonomous signal optimization in IoT and biomedical devices.
Off-Grid Blind Near-Field Integrated Sensing and Communication: Algorithm Design and Lower Bound
YUAN Zhengdao, GUO Qinghua, HUANG Chongwen, GAO Dawei, MEI Fengtong, LIAO Guisheng
Available online  , doi: 10.11999/JEIT260404
Abstract:
  Objective  With the widespread deployment of extra-large scale antenna arrays in 6G networks, user terminals are mostly located in the near-field region. Existing near-field integrated sensing and communication (NF-ISAC) algorithms face critical challenges including off-grid power leakage, severe model mismatch, and strong dependence on pilot signals, which cannot meet the requirements of 6G low-overhead and high-performance transmission. This paper aims to design a novel off-grid blind NF-ISAC algorithm, and derive the theoretical performance bound for near-field sensing.  Methods  To overcome the inherent limitations of analytical geometric steering vectors and accommodate more accurate electromagnetic propagation characteristics without closed-form expressions. First, an amplitude-phase separation method is proposed to decompose the nonlinear near-field steering vector into amplitude and phase terms, which enables high-precision characterization of the steering vector with a single-hidden-layer neural network. Second, the NF-ISAC problem is formulated as a constrained matrix factorization problem, and the corresponding factor graph model is constructed. The trained neural network is embedded into the factor graph as a function node, and the penetration calculation of the neural network is realized via message passing algorithm, to complete joint blind coordinate sensing, channel estimation and signal detection without pilot assistance. Finally, the Cramér-Rao Lower Bound (CRLB) for multi-user near-field joint distance and angle sensing in polar coordinates is derived based on the neural network-fitted steering vector.  Results and Discussions  Extensive Monte Carlo simulations are conducted to evaluate the performance of the proposed algorithm. Simulation results show that the proposed algorithm achieves millimeter-level high-precision position sensing, and obtains significant performance improvements in both communication bit error rate (BER) and sensing accuracy compared with existing mainstream algorithms. It achieves 2~3dB performance gain in sensing accuracy over the state-of-the-art near-field off-grid method, and its performance is closest to the derived theoretical CRLB, which effectively mitigates off-grid power leakage and model mismatch.  Conclusions  The proposed off-grid blind NF-ISAC algorithm breaks through the pilot dependency and model mismatch limitations of existing NF-ISAC schemes, and realizes integrated high-precision sensing and reliable communication for near-field users in a pilot-free manner. The derived CRLB provides a theoretical benchmark for performance evaluation of near-field ISAC systems. This work can offer key technical support for the design of 6G near-field ISAC systems.
Intelligent Privacy-Aware Computation Offloading Method Against Multi-Server Joint Inference Attacks
MIN Minghui, LIU Mingcheng, ZHANG Peng, DUAN Jincheng, LI Shiyin, ZHANG Hongliang
Available online  , doi: 10.11999/JEIT260249
Abstract:
  Objective  With the rapid advancement of the low-altitude economy, services such as intelligent transportation, smart healthcare, and low-altitude logistics have become increasingly pervasive, the efficient operation of which relies heavily on the real-time processing of massive sensor data. Mobile edge computing (MEC) enhances task execution efficiency and alleviates device computational burdens by offloading tasks to nearby MEC servers. However, user privacy and security threats have become progressively severe. In dynamic scenarios where multiple MEC servers collaboratively process tasks, joint inference attacks via information sharing drastically escalate the risk of user location privacy leakage. Although existing studies have adopted differential privacy (DP) to safeguard user location privacy, DP-based solutions remain insufficient. Existing methods inject noise into offloading decisions to protect privacy, yet unconstrained noise can degrade the accuracy of task allocation. Furthermore, in dynamic computation offloading scenarios, the real-time mobility of users induces continuous dynamics in channel states. Both the privacy leakage risks inherent in task offloading and the behaviors of attackers exhibit significant uncertainties. Traditional optimization theories, relying on static system models, fail to cope with the optimization challenges in such dynamic environments. To overcome these challenges, this paper proposes an Asynchronous Advantage Actor-Critic (A3C)-based intelligent privacy-aware computation offloading (AIPCO) scheme capable of resisting multi-server joint inference attacks. While effectively safeguarding user location privacy, the proposed scheme maximizes the overall utility of the MEC system.  Methods  This paper proposes a DP-based task offloading rate perturbation mechanism. By introducing controlled noise, the mechanism enhances the randomness of user task offloading toward multiple MEC servers. Concurrently, a truncated Laplace mechanism is utilized to constrain the boundaries of the perturbed offloading rates, thereby strictly satisfying the mathematical guarantees of DP and effectively degrading the accuracy of multi-server cooperative inference attacks in identifying sensitive user locations. On this basis, privacy entropy is introduced to dynamically evaluate the real-time efficacy of privacy protection. Finally, the AIPCO scheme is constructed. Leveraging its multi-threaded asynchronous training mechanism, the scheme interacts with the environment through iterative trial-and-error to efficiently learn the optimal real-time offloading policy online. While dynamically safeguarding user privacy, the proposed scheme minimizes computational overhead, ultimately achieving the maximization of the comprehensive system utility.  Results and Discussions  The AIPCO scheme achieves simultaneous optimization of user privacy and task offloading costs by strategically incorporating multidimensional performance variables into the reward function of reinforcement learning. A comprehensive multi-dimensional performance analysis of this scheme (Figure 4) indicates that, in terms of dynamic convergence performance, when the continuous learning iterations reach 200, its privacy protection level improves significantly by 2.52%, 3.56%, and 22.90% compared to baseline schemes RCLM, JODRL, and DODA-DT, respectively. This distinct advantage stems directly from AIPCO’s adoption of a DP-based method for perturbing the offloading rate, which successfully utilizes a truncated Laplace mechanism to enhance data randomness while strictly limiting the perturbation range. In sharp contrast, RCLM only perturbs the rate via range-limited DP without implementing a truncated Laplace mechanism; JODRL merely increases randomness through network policy optimization, resulting in lower protection levels; and DODA-DT focuses exclusively on balancing energy consumption and system latency without optimizing user privacy. Regarding the critical privacy weight parameters (Figure 5), systematically increasing $w$ enhances privacy protection. For instance, the privacy protection level improves by 5.64% as the weight rises from 0.2 to 0.7, with the performance gain being particularly significant at a weight of 0.7. As the system proxy reduces its primary focus on computational costs, user benefits remain optimal despite rising expenses. Furthermore, when adjusting the physical distance between users and the MEC server (Figure 6), AIPCO demonstrates superior privacy protection capabilities in long-distance scenarios. Greater distance inherently reduces tasks offloaded to the server; thus, the less information an attacker obtains, the better the privacy protection. Although computational costs inevitably rise with distance, AIPCO consistently outperforms competing schemes, confirming that it achieves optimal benefits for the MEC system while safeguarding user privacy.  Conclusions  To mitigate joint inference attacks from information sharing among collaborative MEC servers, this paper proposes an intelligent privacy-aware computation offloading method. A DP-based task offloading rate perturbation scheme enhances randomness, using a truncated Laplace mechanism to constrain perturbed rates within reasonable boundaries. While proving the scheme strictly satisfies DP mathematical guarantees, privacy entropy is introduced to quantitatively evaluate privacy protection efficacy. Furthermore, the designed AIPCO scheme leverages a multi-threaded asynchronous training mode, enabling the agent to efficiently learn the optimal perturbed offloading policy within a continuous space to maximize overall system utility. Simulation results demonstrate that the proposed scheme significantly outperforms baselines in both dynamic and average performance, achieving optimal system utility while safeguarding user privacy.
Bearing Fault Diagnosis of Roadheader via Cross-modal Kernel Fusion-sphere Space Learning
SU Shuzhi, GUI Yang, MA Tianbing, ZHU Yanmin, WU Kanghui
Available online  , doi: 10.11999/JEIT260494
Abstract:
  Objective  Traditional roadheader bearing fault diagnosis methods often struggle with high-dimensional and non-linear multi-sensor data, failing to effectively perceive cross-modal, multi-scale fault information or integrate local and global structural features. To address these limitations, this paper proposes a novel Cross-modal Kernel Fusion-sphere Space Learning (CKFSL) method. By perceiving cross-modal multi-scale fault information, the proposed method extracts highly discriminative features from roadheader bearing cross-modal fault samples, improving the accuracy of roadheader bearing fault diagnosis.  Methods  The CKFSL method first maps roadheader bearing cross-modal fault samples into a high-dimensional kernel space via implicit transformation. It employs dual extremal point anchoring and polar neighbor allocation mechanisms to capture clusters of fault samples with similar isomorphic information, forming kernel fusion-spheres. Subsequently, an adaptive binary partitioning strategy is designed based on the geometric span of internal fault samples to tighten isomorphic boundaries, constructing micro-neighbor kernel fusion-spheres and achieving highly isomorphic manifold aggregation at a microscopic scale. A micro-neighbor kernel fusion-sphere space is further formed to re-evaluate local isomorphism (Fig.1). To characterize wide-area topological correlations, a wide-area topological isomorphism constraint is proposed, which constructs a wide-area dynamic isomorphism graph among micro-neighbor kernel fusion-spheres (Fig. 1). Finally, an objective optimization function is formulated within the space learning framework, which integrates local manifold isomorphism and wide-area topological correlations of roadheader bearing cross-modal fault samples, as illustrated in the CKFSL diagnostic flowchart (Fig. 2). The analytical solution for spatial projection is theoretically derived to obtain discriminative cross-modal kernel fusion-sphere space isomorphic features from roadheader bearing cross-modal fault samples.  Results and Discussions  The proposed CKFSL is first validated on the self-built AUST roadheader bearing cross-modal fault dataset, with the experimental platform shown in Fig. 3. The average recognition rates with increasing training fault samples are illustrated in Fig. 4. On the AUST dataset, CKFSL achieves a 99.49% recognition rate with only 70 training fault samples, reaching 100% as the number of training fault samples increases. Table 1 summarizes the standard deviations under different training fault sample sizes, demonstrating that CKFSL maintains the lowest standard deviation and superior robustness compared to the other seven algorithms. Furthermore, three-dimensional fault feature distributions are presented in Fig. 5, confirming that CKFSL effectively partitions highly overlapping fault samples into distinct clusters, overcoming the boundary confusion of compared algorithms. To verify the generalization capability, the proposed CKFSL method is further evaluated on the public Paderborn dataset, with the experimental setup shown in Fig. 6. As depicted in Fig. 7 and Fig. 8, CKFSL achieves 100% average recognition accuracy across four complex fault categories, significantly outperforming comparative methods that struggle to surpass an 85% recognition rate for the F4 fault category.  Conclusions  The proposed CKFSL effectively overcomes the inability of traditional roadheader bearing fault diagnosis methods to perceive complex multi-scale fault information. By utilizing the wide-area dynamic isomorphism graph learned within the micro-neighbor kernel fusion-sphere space, CKFSL integrates local manifold isomorphism and wide-area topological correlations of roadheader bearing cross-modal fault samples. This process enables the CKFSL to extract highly discriminative cross-modal kernel fusion-sphere space isomorphic features, thereby improving the accuracy of roadheader bearing fault diagnosis and ensuring the reliability and continuous operability of the roadheader.
An Inverse-Hybrid-Modeling Digital Twin System for Natural Gas Energy Metrology
LIU Bin, ZHONG Lu, FENG Quanyuan, CHEN Yihong
Available online  , doi: 10.11999/JEIT260289
Abstract:
  Objective  Global natural gas consumption continues to increase at an average annual rate of 3.2%. A 0.1% reduction in energy measurement error can reduce trade disputes by approximately $750 million per year. Traditional studies mainly use indirect methods for energy measurement. Among these methods, chromatographic analysis and acoustic velocity correlation are the most widely used, but both have clear application limits. Chromatographic analysis has a low interference error, but it shows delayed dynamic response at high flow rates and limited dynamic calibration capability. It also has poor adaptability to multi-gas-source switching, requires manual calibration, and has high operation and maintenance costs. The lack of interoperability standards for energy networks further increases the difficulty of system integration. Acoustic velocity correlation provides a low-latency dynamic response for flow measurement, but it has a high interference error. This error may increase when the content of a single component changes, such as when the hydrogen content increases from 5% to 10%. The method may even fail under complex operating conditions, such as multi-gas-source mixing and dynamic pressure fluctuations. To address these issues, new mechanism-modeling-oriented methods have been developed. The two most representative directions are mechanism-modeling-driven methods and hybrid-modeling methods. Both methods combine multi-source data fusion with virtual-physical interaction to establish mechanism models that link flow rate, other parameters, and energy. These methods provide a new approach for accurate energy measurement, but new challenges remain. Mechanism-modeling-driven methods are usually based on static flow modeling using Computational Fluid Dynamics (CFD). However, their dynamic parameter updates are slow, with delays of more than 30 s. They also have difficulty adapting to real-time operating-condition changes, rely on large labeled datasets, and have limited interpretability. Hybrid-modeling methods still face unresolved problems in collaborative optimization across multiple modules. In addition, existing studies lack support from industrial-grade verification platforms. These limits restrict their ability to solve the dynamic response delay, parameter identification difficulty, excessive physical simplification, and weak interference resistance of traditional natural gas energy metrology methods under complex conditions. Based on recent progress in mechanism-modeling-driven and hybrid-modeling methods, this study proposes an inverse-hybrid-modeling-driven digital twin system. The system introduces a Variational AutoEncoder (VAE)-based operating-condition feature extraction algorithm and a Dynamic Bayesian Network (DBN)-based parameter calibration mechanism. It also uses a Variational Expectation-Maximization (VEM) algorithm for offline calibration. The proposed system aims to improve the accuracy, adaptability, and interference resistance of natural gas energy metrology under complex operating conditions.  Methods   A natural gas energy metrology digital twin system based on inverse hybrid modeling is proposed. The system is built on a three-tier “algorithm-system-scenario” architecture. It integrates calorific value, flow, and energy mechanism models with multi-source real-time data streams. The VAE is used for unsupervised mining of operating-condition features. A parameter self-correction loop is then constructed by combining the DBN with VEM-based system calibration. Industrial-grade devices, including ultrasonic flowmeters and gas chromatographs, are integrated to ensure real-time data transmission and closed-loop control. The system covers key operating conditions, including dynamic pressure fluctuations, hydrogen-blended gas mixtures, and multi-gas-source switching. This design ensures strong adaptability between the model and practical applications. The system was continuously verified for 25 weeks on a full-scale industrial-grade experimental platform. The results show an operational delay of ≤3.8 s, data transmission jitter of ≤0.5 s, average daily energy consumption per device of ≤1.2 kW·h, Mean Time Between Failures (MTBF) of ≥4 100 h, energy measurement error of ≤0.25%, calorific value error of ≤0.12%, and flow indication error of ≤0.2%. The system also meets security requirements through industrial Ethernet encryption and hierarchical access control. It provides engineering support for intelligent pipeline-network optimization and standardized integration.  Results and Discussions  First, a multi-level hybrid modeling framework is established. Modular hybrid modeling is achieved through the algorithm-system-scenario three-tier architecture. Numerical methods combined with data are more flexible than purely analytical models and can represent complex multiphysics systems with fewer lumped physical parameters. These parameters may change during energy measurement under mechanical, energy, and hydrodynamic effects. The VAE and DBN are used to deeply integrate mechanism models with real-time data. This reduces the parameter synchronization delay to 3.8 s and supports fluid-acoustic co-simulation and rapid response under complex operating conditions, such as hydrogen-blended natural gas. Second, an integrated algorithm for inverse hybrid modeling and system calibration is proposed. By incorporating the VAE, DBN, and VEM algorithm, the inverse hybrid modeling algorithm forms a self-supervised, adaptive intelligent system with an internal closed-loop operation. The VAE encoder compresses high-dimensional operating-condition data into low-dimensional feature vectors. This enables unsupervised feature extraction without large labeled datasets. Based on the learned internal data distribution, the VAE can also generate perturbed data similar to the input data. These data are used to simulate abnormal operating conditions and verify interference resistance. The DBN constructs a continuous “prior-evidence-posterior” iterative cycle to support system self-correction and adaptive response to operating-condition changes. The VEM algorithm compensates for systematic errors that are difficult for the DBN to capture, thereby overcoming the limits of traditional static models.  Conclusions  This study describes and validates a hybrid digital twin system that combines experimental data-driven methods with physical models. The system successfully simulates the physical characteristics of natural gas energy metrology. A full-scale test platform was constructed, and the main system parameters were validated using experimental measurement data and compared with industry benchmarks. Each independent module in the algorithm-system-scenario three-tier hybrid modeling architecture, including calorific value measurement, flow calculation, and energy conversion, was continuously verified for 25 weeks. The results confirm strong consistency between model predictions and actual measurements. On the natural gas energy metrology digital twin experimental platform, systematic validation was performed for three core functions: flow measurement under dynamic conditions, multi-component calorific value determination, and energy accumulation. The results show that the output of the digital twin model matches the physical device measurement data with an accuracy of more than 99.5%. Under complex operating conditions, such as pressure pulsations and hydrogen-blended gas mixtures, the system maintains the measurement error within 0.5%. This performance is better than that of traditional methods and meets the Class A accuracy requirements for natural gas measurement. By introducing a multi-tier hybrid modeling framework, this study addresses the parameter identification difficulty and excessive physical simplification of traditional natural gas energy metrology methods. The integration of the VAE, DBN, and VEM algorithm enables unsupervised feature extraction under complex operating conditions and adaptive calibration of model parameters. This reduces dependence on prior physical knowledge and large labeled datasets. The experimental results show that the proposed method maintains high precision and strong stability under complex scenarios, including pressure pulsations and hydrogen-blended gas mixtures, where traditional models have difficulty providing accurate descriptions.
A Hierarchical Cross-layer Closed-loop Learning Framework and Collaborative Mechanism for Complex Multi-agent Systems
ZHANG Long, HUANG wenbo, LEI Zhen, FENG Xuanming, WANG Ying
Available online  , doi: 10.11999/JEIT260143
Abstract:
ObjectiveComplex multi-agent systems (MAS) in dynamic and uncertain environments face challenges in unified modeling, adaptive coordination, and interpretable effectiveness evaluation. Existing methods usually focus on individual decision-making, inter-agent cooperation, or high-level policy evolution separately, resulting in fragmented decision chains and weak cross-layer coupling. Consequently, it is difficult to explain how local learning improvements are transformed into global effectiveness gains under mission variation, environmental disturbance, and partial structural damage. To address this issue, this paper proposes a Hierarchical Cross-layer Closed-loop Learning (HCCL) framework, which couples individual autonomy, system-level collaboration, and system-of-systems learning to build a computable path from local policy optimization to overall effectiveness enhancement.MethodsHCCL adopts a unified three-layer architecture. At the individual autonomy layer, each agent is modeled by a Partially Observable Markov Decision Process (POMDP) to describe decision-making under partial observability. At the system-level collaboration layer, multi-agent cooperation is formulated as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and represented by a dynamic directed weighted collaboration graph. A graph neural network is used to encode interaction dependencies, structural couplings, and joint value information. At the system-of-systems learning layer, a Meta-Decentralized Partially Observable Markov Decision Process (Meta-Dec-POMDP) is established to describe task-context adaptation and rule evolution.A cross-layer closed-loop mechanism is further designed. In the bottom-up behavior induction pathway, local state and capability features are aggregated into graph-level structural representations and supplied to the upper rule-learning process. In the top-down rule-shaping pathway, learned high-level rules are transformed into control parameters and fed back to lower layers to regulate local policies and collaboration relationships. Simulations are conducted under baseline, mission-variation, observation-disturbance, and structural-damage scenarios. The full HCCL model is compared with a non-closed-loop model and an upward-induction-only model, and interface ablation studies are performed to analyze the contributions of cross-layer feature reporting, structural induction, and rule shaping.Results and DiscussionsThe full HCCL model consistently outperforms the comparison models and ablated variants. In the baseline scenario, it achieves a task success rate of 88.6% and a comprehensive system effectiveness of 0.842. Under mission variation, it reduces the adaptation process to 16±2 rounds. Under structural damage, it achieves a recovery rate of 81.4% and restores collaboration-structure stability to 0.742 within 20 steps. These results indicate that HCCL improves task performance, adaptation speed, and structural recovery.Ablation results show that removing any cross-layer interface causes performance degradation, while removing the top-down rule-shaping pathway leads to the largest loss. This demonstrates that upward structural perception alone is insufficient for sustained system-level improvement. The effectiveness gain mainly comes from the closed-loop coupling between upward behavior induction and downward rule shaping, rather than from simple hierarchical stacking.ConclusionsThis paper proposes the HCCL framework for complex MAS by integrating POMDP-based individual autonomy modeling, Dec-POMDP and graph-based collaboration modeling, and Meta-Dec-POMDP-based rule evolution. Through bottom-up behavior induction and top-down rule shaping, HCCL provides a computable and interpretable path from local learning to overall effectiveness enhancement. Experimental results verify its advantages in task completion, adaptation, recovery, and collaboration stability under multiple disturbances. Future work will focus on larger-scale heterogeneous systems, communication-constrained networking, online continual adaptation, and data-driven evaluation in realistic environments.
A Task Prediction-Augmented Hierarchical Offloading Method for Space-Air-Ground Integrated Networks
ZHANG Linghao, XU Bo, SUN Jinlong, LAI Haiguang, ZHAO Haitao
Available online  , doi: 10.11999/JEIT260217
Abstract:
  Objective  Space-Air-Ground Integrated Networks (SAGIN) have emerged as a critical infrastructure for future 6G communications, enabling wide-area coverage and flexible deployment through the collaborative operation of Low Earth Orbit (LEO) satellites, Unmanned Aerial Vehicles (UAVs), and ground users (GUs). With the rapid proliferation of Internet of Things (IoT), Internet of Vehicles (IoV), and smart city applications, the volume and diversity of computation-intensive tasks generated by terminal devices have grown substantially, placing stringent demands on real-time computing and resource scheduling. The integration of Mobile Edge Computing (MEC) into SAGIN architectures has enabled near-user computation services by deploying UAVs and satellites as edge computing nodes, effectively reducing task latency. However, achieving efficient task offloading that simultaneously minimizes task completion latency and UAV energy consumption remains a significant challenge. This difficulty arises from the strong coupling among UAV trajectory planning, task offloading decisions, and resource allocation, compounded by the highly dynamic and partially observable nature of SAGIN environments. In particular, existing multi-agent reinforcement learning (MARL) approaches predominantly rely on reactive, instantaneous decision-making without proactive awareness of future task workload variations, leading to decision lag and insufficient adaptability under bursty traffic conditions. To address these challenges, this paper proposes a task prediction-augmented MARL framework that endows agents with forward-looking decision capabilities in dynamic SAGIN environments.  Methods  The system considers a three-layer SAGIN-MEC architecture comprising one LEO satellite, multiple UAVs, and ground users. Tasks can be processed locally, offloaded to UAVs via Ground-to-Air (G2A) links, or further relayed to the LEO satellite via Air-to-Satellite (A2S) links under a partial offloading mechanism. The joint optimization of UAV trajectory, user association, offloading ratios, and computational resource allocation is formulated as a Mixed Integer Nonlinear Programming (MINLP) problem minimizing the weighted sum of average task latency and UAV flight energy consumption. Given its non-convexity and high dimensionality, the problem is reformulated as a Decentralized Partially Observable Markov Decision Process (DEC-POMDP), upon which a Prediction-Augmented Multi-Agent Proximal Policy Optimization (PA-MAPPO) algorithm is proposed. A lightweight Exponential Smoothing–Autoregressive (ES-AR) prediction module generates multi-step workload forecasts that are incorporated into each agent’s state space. The algorithm adopts a bilevel structure: the outer layer employs Centralized Training and Decentralized Execution (CTDE)-based PA-MAPPO to generate UAV trajectory actions, while the inner layer applies Block Coordinate Descent (BCD) convex optimization to solve resource allocation and offloading subproblems, with closed-form solutions derived via Lagrangian analysis. GAE and PPO-Clip mechanisms ensure training stability and convergence.  Results and Discussions  Simulations involve 1 LEO satellite, 5 UAVs, and 50 ground users in a 1×1 km2 area. PA-MAPPO is compared against MAPPO (without prediction) and PA-MADDPG. Training curves show that PA-MAPPO converges within 500–700 episodes with the highest average reward and smallest variance, demonstrating superior stability (Fig. 3). As the user count increases from 20 to 80, PA-MAPPO consistently maintains the lowest system cost, achieving average reductions of 12.4% and 18.7% relative to MAPPO and PA-MADDPG, respectively (Fig. 4). Experiments varying UAV quantity reveal a U-shaped cost curve for all algorithms, with the optimal configuration at U=5. PA-MAPPO achieves the minimum cost at this point (Fig. 5). Sensitivity analysis over the energy-latency tradeoff weight ω confirms PA-MAPPO’s robustness across different optimization preferences (Fig. 6). The prediction horizon H exhibits a non-monotonic effect on performance, and H=5 yields the optimal result with approximately 14.9% cost reduction over the no-prediction case, while longer horizons degrade performance due to accumulated prediction error (Fig. 7).  Conclusions  This paper proposes the PA-MAPPO algorithm to address the joint optimization of UAV trajectory planning, user association, task offloading, and computational resource allocation in dynamic SAGIN environments. By introducing a lightweight ES-AR task workload prediction module into the MARL framework, the proposed method equips UAV agents with proactive decision-making capabilities that account for future task dynamics, effectively alleviating the decision lag inherent in purely reactive approaches. The inner BCD-based convex optimization guarantees convergence to a KKT-stationary point, while the outer CTDE-based PPO mechanism ensures training stability and scalability. Simulation results demonstrate that PA-MAPPO achieves significant improvements over baseline methods in terms of average task latency, UAV flight energy consumption, and overall system cost, while exhibiting strong scalability and robustness across varying system configurations. Future work will explore online prediction and decision co-optimization mechanisms in multi-satellite cooperative scenarios, as well as the impact of dynamic network topology changes on algorithm performance.
Multi-Task Lightning Nowcasting with Spatio-Temporal Focal Perception and Synergistic Weighted Loss
TANG Zhihao, HAN Yuanpeng, ZHANG Hui, SONG Lin, ZHANG Qilin, LIU Yi
Available online  , doi: 10.11999/JEIT260234
Abstract:
  Objective  Lightning nowcasting is vital for early warning systems and the protection of critical infrastructure such as aviation, power grids, and transportation. Traditional numerical weather prediction models suffer from parameterization dependencies and high computational costs, making them inefficient for rapid-update nowcasting. Existing deep learning methods, despite progress, inadequately handle extreme data sparsity, suffer from serial computation bottlenecks in recurrent architectures, and primarily focus on binary occurrence prediction rather than the synergistic optimization of frequency estimation and regional localization. Moreover, conventional loss functions are easily dominated by vast non-lightning areas, causing biased predictions toward zero or generating excessive false alarms. To address these limitations, this paper proposes STF-Net, a multi-task lightning nowcasting model that jointly predicts lightning frequency and occurrence regions through three key innovations: a Lightning Adaptive Attention Module (LAAM) for explicit spatio-temporal dependency modeling, a Spatio-Temporally Weighted Hybrid Loss function to tackle data sparsity and imbalance, and a spatio-temporal dual-branch Generative Adversarial Network (GAN) to enhance prediction fidelity and temporal coherence.  Methods  STF-Net is built upon the SimVP video prediction architecture, adopting an encoder-translator-decoder paradigm. The Lightning Adaptive Attention Module (LAAM) employs a three-dimensional decoupled attention mechanism along height, width, and channel dimensions, enabling adaptive focus on convectively sensitive regions while maintaining computational efficiency. The Spatio-Temporally Weighted Hybrid Loss combines Temporally-Weighted Mean Squared Error (TW-MSE) for frequency regression accuracy and Dual-Weighted Cross-Entropy Loss (DWCE) for precise regional identification, incorporating time-increasing weights to enhance medium-to-long-term forecast robustness (Fig. 5). The DWCE loss innovatively integrates static class weights with dynamic grid weights, effectively balancing global class proportions and local lightning frequency heterogeneity.A spatio-temporal dual-branch GAN, comprising a spatial PatchGAN discriminator and a temporal 3D convolutional discriminator, is introduced to improve the textural fidelity and temporal coherence of the predicted lightning frequency fields. The model processes 6 consecutive historical lightning frequency frames (256×256 resolution, 10-minute intervals) to predict the next 6 frames, corresponding to a 1-hour forecast window. Experiments are conducted on a high-resolution Very Low Frequency Lightning Location Network (VLF-LLN) dataset containing 11,748 images covering diverse seasonal and weather conditions, split 7:3 for training and testing.  Results and Discussions  Comprehensive evaluation metrics are employed, including frequency regression accuracy (MSE, MAE computed only on lightning-occurring pixels), image quality fidelity (PSNR, SSIM), and regional detection skills (POD, FAR, CSI). STF-Net achieves a Critical Success Index (CSI) of 0.663 within the 1-hour forecast window, a 14.5% improvement over the SimVP baseline (0.579), and reduces the False Alarm Rate (FAR) from 0.351 to 0.216, a relative reduction of 38.5% (Table 1). Ablation studies systematically validate each component: adding GAN improves CSI to 0.624 and reduces MSE to 0.109; further incorporating LAAM increases CSI to 0.629 with the highest POD of 0.894; the complete STF-Net with Hybrid Loss achieves optimal performance with CSI of 0.663 and MSE of 0.105 (Table 1, Fig. 5). Critically, the synergistic prediction of frequency and region is evidenced by the simultaneous improvement in both regression (MSE/MAE) and detection (CSI/FAR) metrics. Time-step analysis reveals that LAAM significantly mitigates long-term performance degradation, with STF-Net maintaining the highest CSI compared to SimVP+GAN and SimVP (Fig. 6). Comparative experiments against ConvLSTM and PredRNN demonstrate STF-Net's superiority across all lead times: it consistently achieves higher CSI and lower FAR, with the advantage becoming more pronounced as the forecast horizon extends (Fig. 7). The consistent advantage in PSNR and SSIM further underscores the spatio-temporal GAN's role in producing coherent, detail-rich predictions.Visualization results show that STF-Net generates structurally clear, continuous lightning activity regions centered on high-frequency areas, accurately tracking dynamic evolution patterns including movement, merging, and splitting while producing minimal noise in non-lightning regions, demonstrating effective collaborative prediction of both frequency magnitude and spatial distribution .  Conclusions  This paper presents STF-Net, a novel deep learning model that achieves synergistic lightning frequency and regional prediction through three core contributions: (1) the Lightning Adaptive Attention Module (LAAM) explicitly models long-range spatio-temporal dependencies and focuses on critical convective zones; (2) the Spatio-Temporally Weighted Hybrid Loss effectively addresses extreme data sparsity and class imbalance, simultaneously optimizing frequency regression accuracy and regional localization precision while suppressing false alarms; and (3) the spatio-temporal dual-branch GAN enhances the spatial structural consistency of predictions and temporal coherence of predictions. Experimental results demonstrate that STF-Net significantly outperforms baseline and state-of-the-art models, achieving superior CSI of 0.663, reduced FAR of 0.216, and optimal MSE/MAE values within a 1-hour forecast window. The model effectively mitigates long-term performance degradation, accurately captures lightning region evolution trends, and generates physically plausible predictions with minimal background noise. This research provides an efficient, end-to-end solution for operational lightning nowcasting systems and offers new insights for model design in sparse meteorological spatio-temporal sequence prediction.
A Dual-polarized Magnetoelectric Dipole Antenna Array with Differential Feeding
TANG Li, WANG Zhihui, ZHAO Luyu
Available online  , doi: 10.11999/JEIT260505
Abstract:
  Objective  This work aims to address key challenges in 5G millimeter-wave terminal antennas by designing a compact, high-performance dual-polarized array. While existing designs often face trade-offs among bandwidth, beam-scanning range, and integration complexity, this study proposes a novel differentially-fed magnetoelectric dipole array. The core innovation involves a stacked stripline-slot-stripline balun to enable efficient single-ended-to-differential conversion and optimized array design. The objective is to realize an integrated solution that simultaneously achieves wideband operation, low cross-polarization, wide-angle scanning, and high density, advancing practical antenna technology for 5G millimeter-wave applications.  Methods  The research employs a structured design methodology, beginning with the development of a novel stacked differential balun based on a stripline-slot-stripline configuration to achieve efficient single-ended-to-differential conversion. Subsequently, a single-polarized magnetoelectric dipole antenna element is designed and integrated with this balun, with its performance thoroughly characterized. Finally, the design is extended by orthogonally integrating two such elements to form a dual-polarized unit, which is then used to construct a 1×4 linear array. The entire process involves iterative full-wave electromagnetic simulation and optimization to balance key performance metrics, including wideband impedance matching, high port isolation, wide beam-scanning capability with stable gain, and effective suppression of grating lobes and mutual coupling.  Results and Discussions  The optimized 1×4 dual-polarized differentially-fed magnetoelectric dipole antenna array with an element spacing of 4.6 mm (0.4λ@26 GHz) achieves an excellent trade-off between grating lobe suppression and inter-element coupling reduction. The measured –10 dB reflection coefficient bandwidths reach 25–29.4 GHz for the +45° polarization port and 25–27.7 GHz for the –45° polarization port (Fig. 18), with slight matching differences arising from the incomplete structural symmetry of the baluns under two polarization modes (Fig. 13). At the 26 GHz operating frequency, both polarization modes of the array deliver a peak gain of 10.7–11 dBi, supporting effective wide-angle beam scanning of ±60° with a gain attenuation of no more than 3 dB in the main lobe (Fig. 21). The measured radiation performance is highly consistent with the simulated results, with minor errors caused by the extreme dimensional sensitivity of the millimeter-wave band and slight deviations in high-precision processing and test assembly. Moreover, the array maintains stable low cross-polarization characteristics and high port isolation across the entire operating band due to comprehensive optimization measures including equal-length feed lines, symmetric layout, ground pad shielding and metalized via electromagnetic isolation (Fig. 16), which effectively suppress inter-element mutual coupling and parasitic radiation, and ensure the consistency of radiation performance for dual polarization modes, thus meeting the stringent performance requirements of 5G millimeter-wave terminal antenna modules.  Conclusions  This paper presents a dual-polarized magnetoelectric dipole antenna array with differential feeding for 5G millimeter-wave applications. Through the design of a novel stacked stripline-slot-stripline balun and the optimization of the radiating structure and array layout, a balanced performance integrating wide bandwidth, high gain, low cross-polarization, and wide-angle scanning is achieved. The differential balun enables efficient single-ended-to-differential conversion with excellent amplitude and phase balance across the target band. The implemented 1×4 array, with an optimized element spacing of 4.6 mm (0.4λ), demonstrates a simulated peak gain of 11 dBi at 26 GHz and supports effective beam scanning over ±60° with a gain variation of less than 3 dB. The overall design validates the feasibility of utilizing a differentially-fed magnetoelectric dipole architecture to meet the stringent requirements of 5G millimeter-wave terminals for compact, high-performance antenna modules. Future work may focus on scaling the array to larger configurations and further integration with beamforming integrated circuits (BFICs).
Construction of MDS Entanglement-Assisted Quantum Error-Correcting Codes
QU Yuanyue, GAO Jian
Available online  , doi: 10.11999/JEIT251251
Abstract:
  Objective  Entanglement-Assisted Quantum Error-Correcting Codes (EAQECCs) provide an effective way to protect quantum information by using pre-shared entanglement between the sender and receiver. Existing constructions of EAQECCs mainly rely on classical cyclic or constacyclic codes and often require strong algebraic constraints, which limit the range of achievable parameters. This paper develops a general and systematic framework for constructing new families of EAQECCs from Twisted Reed-Solomon (TRS) codes over finite fields. The study has two aims. The first is to extend classical Reed-Solomon-based code design to the twisted setting so that richer algebraic structures can be used. The second is to determine the exact number of maximally entangled pairs required to attain the quantum Singleton bound. The final objective is to construct Maximum-Distance Separable (MDS) EAQECCs with greater flexibility and broader parameter ranges than existing methods.  Methods  The proposed method starts from the definition of TRS codes over finite fields. A twist parameter is introduced into the generator matrix, which changes the structure of the corresponding parity-check matrices. By systematically analyzing the associated coset-sum matrices in the twisted and untwisted cases, the rank of the relevant matrix product is determined. This rank equals the number of required entangled pairs and therefore provides the theoretical basis for the construction of EAQECCs. A detailed algebraic analysis shows that the matrix contains a submatrix with entries \begin{document}$ {M}_{l,j}=\displaystyle\sum\nolimits_{y\in W}{\left({\xi }^{j}y\right)}^{tl} $\end{document}, which simplifies to \begin{document}$ t\zeta^{jl} $\end{document}under suitable group-theoretic conditions. The resulting matrix is a Vandermonde matrix, and its full rank gives an explicit characterization of the entanglement structure. This property is then used to construct MDS EAQECCs. Based on these results, two families of EAQECCs are derived according to the number of entangled pairs. The corresponding parameters are tabulated and are shown to satisfy the quantum Singleton bound with equality, which confirms that the constructed codes are MDS.  Results and Discussions  Comprehensive parameter analysis and explicit examples verify the theoretical results. Comparative analysis further shows the flexibility of the proposed framework. Unlike previous constructions that require divisibility conditions such as \begin{document}$ a\mid (q+1) $\end{document}and \begin{document}$ a\mid (q-1) $\end{document}, the present approach remains applicable under broader algebraic settings and thus extends the feasible range of code parameters. This difference is summarized in the remark section and verified numerically. A systematic comparison with existing MDS EAQECCs (Table 4) reveals several new parameter regimes that are not accessible with classical or cyclic-code-based constructions. In particular, the proposed method yields larger code lengths and more flexible entanglement consumption rates \begin{document}$ \dfrac{c}{n} $\end{document}, which improves both the efficiency and the generality of EAQECCs. The algebraic consistency observed across all tested cases supports the correctness and general applicability of the TRS-based framework.  Conclusions  This study establishes an algebraic framework for constructing MDS EAQECCs from TRS codes. By rigorously analyzing the rank properties of coset-sum matrices, the required entanglement is determined precisely, and the conditions under which the constructed codes attain the quantum Singleton bound are identified. Two broad classes of MDS EAQECCs are obtained, corresponding to \begin{document}$ a\mid \left(q+1\right) $\end{document} and \begin{document}$ a\mid \left(q-1\right) $\end{document}, respectively, and both are verified by explicit examples and tabulated results. Compared with existing studies, the proposed approach not only generalizes earlier constructions but also extends the achievable parameter space to cases not covered by Reed-Solomon-code- or cyclic-code-based frameworks. The derived codes show improved structural flexibility, clearer algebraic characterization, and potential value for high-performance quantum information systems. This work therefore provides a unified perspective for the development of algebraically optimized EAQECCs and offers a basis for future studies of TRS-based quantum code families and their efficient encoding implementations.
A Study of the Effects of Amplitude and Phase Errors on Angle-Measurement Accuracy in Phased Array Radar under Interference Cancellation Conditions
ZHAN Siheng, ZHOU Liang, SHEN Ruobin, ZHANG Jiahao, WANG Bin, MENG Jin
Available online  , doi: 10.11999/JEIT251195
Abstract:
  Objective  The electromagnetic environment is becoming increasingly complex, and mainlobe suppression jamming degrades the detection performance of phased array radars. Adaptive Interference Cancellation (AIC) can suppress such jamming. However, it may distort the mainlobe pattern and introduce azimuth angle-measurement errors. Most existing studies focus on interference cancellation mechanisms, whereas the angle-measurement errors caused by cancellation have received limited attention. Receive-channel amplitude and phase errors can further reduce angle-measurement accuracy. This paper investigates the effect of receive-channel amplitude and phase errors on the angle-measurement errors of monopulse phased array radar without a difference-difference channel.  Methods  A monopulse phased array radar without a difference-difference channel is analyzed. Receive-channel amplitude and phase errors are modeled by normal distributions. The mean represents the systematic offset, and the standard deviation represents random fluctuation. The operating principles of phased array radar receivers, monopulse radar systems, sum-difference angle measurement, and mainlobe suppression jamming cancellation are first described. Two angle-measurement models are then derived: an ideal reference model and an amplitude and phase error model. Under ideal interference-free and error-free conditions, the effective angle-measurement range of the radar is ±2.5°. The jamming source is set at –1.2°, and the corresponding angle-measurement results are used as the reference for subsequent experiments. Monte Carlo simulations, with 100 independent tests for each parameter set, are performed to analyze the statistical characteristics of the angle-measurement errors. Heatmaps are used to present the absolute errors and their variation trends.  Results and Discussions  (1) Without receive-channel amplitude and phase errors, the jamming angle remains fixed at –1.2°. Before interference cancellation, the target indication angle is consistent with the true value. After cancellation, the absolute error between the target indication angle and the true value near the beam normal is no more than 0.1°. However, a cancellation null near the jamming angle causes abrupt changes in the azimuth indication, and the error increases as the target moves away from the beam normal. (2) Before cancellation, the azimuth angle-measurement error increases with the absolute amplitude-error mean and the incident angle. The error reaches more than 0.06° when the amplitude-error mean is ±0.9 dB and the incident angle is ±2.5°. Within an incident-angle range of ±2°, the error is generally below 0.02°. When the amplitude-error mean is fixed, the error increases with the amplitude-error standard deviation. When the phase-error standard deviation is fixed, the error increases with the absolute phase-error mean. The error exceeds 0.15° at a phase-error mean of ±0.9° and reaches approximately 0.6° at a phase-error standard deviation of 6° and an incident angle of ±2.5°. (3) After cancellation, the effect of phase error is strongest at an incident angle of 0.5°, where the azimuth angle-measurement error reaches approximately 0.4°. Outside this region, the error is generally controlled within 0.2° and decreases rapidly as the target moves away from the beam normal.  Conclusions  This paper quantifies the effect of receive-channel amplitude and phase errors on azimuth angle-measurement errors before and after interference cancellation. The main conclusions are as follows. First, amplitude and phase errors both cause random fluctuations in azimuth angle measurement, and phase errors have a stronger effect than amplitude errors. Second, in the absence of jamming, azimuth angle-measurement errors are smallest near the beam normal and increase as the target approaches the boundary of the effective angle-measurement range. Third, under jamming and cancellation conditions, the azimuth angle-measurement error reaches its peak near the beam normal and then decreases rapidly. This study provides guidance for azimuth angle-measurement error assessment, error budgeting, and mainlobe suppression jamming cancellation in engineering applications. Future work will focus on non-normal amplitude and phase errors, calibration dynamics, multiple-jamming-source scenarios, and experimental validation.
Phase Shift-Based Covert Backdoor Attack Strategy in Deep Neural Networks
ZHANG Heng, XIA Yu, REN Yan, DU Linkang, ZHANG Zhikun
Available online  , doi: 10.11999/JEIT251145
Abstract:
  Objective  The proliferation of Deep Neural Networks (DNNs) in safety-critical domains such as autonomous driving and biomedical diagnostics has raised serious concerns about their vulnerability to adversarial threats, particularly backdoor attacks. In these attacks, hidden triggers are embedded during training, causing models to behave normally on clean inputs while producing malicious outputs when specific triggers are present. Existing backdoor methods mainly operate in either the spatial domain or the frequency domain, but they face a fundamental tradeoff between Attack Success Rate (ASR) and stealth. Spatial triggers often introduce visible artifacts, whereas frequency-domain amplitude perturbations disrupt spectral energy distributions and can therefore be detected by advanced defenses such as spectral anomaly detection. This study addresses the need for a backdoor paradigm that simultaneously achieves high attack performance, minimal perceptual distortion, and robustness against state-of-the-art defense methods. The objective is to develop a frequency-domain backdoor attack based on phase manipulation, which is better aligned with human visual perception and structural consistency, thereby overcoming the limitations of existing methods.  Methods  FDPS integrates frequency-domain phase manipulation, perceptual similarity screening, and standard data poisoning. The method first converts input images from RGB to Y'CbCr color space. This conversion isolates the chrominance channels while preserving the luminance component. Discrete Fourier Transform (DFT) is then applied to the chrominance components to obtain complex frequency spectra. Phase information is computed with the atan2 function, and selected high-frequency components are shifted to embed the trigger. Image reconstruction is performed through Inverse Discrete Fourier Transform (IDFT). The framework further incorporates Learned Perceptual Image Patch Similarity (LPIPS) filtering. This filter removes generated samples that do not satisfy the similarity threshold. The screening process ensures that all retained triggers remain visually imperceptible. The accepted poisoned samples are assigned the target class labels and then combined with the clean training data according to standard protocols.  Results and Discussions  FDPS achieves near-perfect ASR, reaching 99%, while maintaining Benign Accuracy (BA) across three datasets and two network architectures (Table 1). The method embeds triggers by manipulating phase information in the Cb and Cr chrominance channels through Fourier transforms, and LPIPS filtering helps preserve visual stealth. Experimental results show that poisoned images retain semantic focus, as confirmed by Grad-CAM visualizations that remain aligned with the clean-image patterns (Fig. 4). The method also shows strong resistance to defense mechanisms. Under Neural Cleanse, FDPS yields an anomaly index of 1.73, which is below the detection threshold of 2 (Figs. 3-5). Under STRIP, the entropy distribution of poisoned samples substantially overlaps with that of clean samples. Additional analysis shows that high-frequency phase perturbation achieves strong attack performance with limited poisoning. In particular, on the GTSRB dataset, FDPS achieves 99% ASR with only 2% poisoned training samples, while minimizing the effect on model utility (Fig. 6; Table 3).  Conclusions  An end-to-end frequency-domain strategy is proposed to embed covert triggers into image classification models while preserving fidelity on clean samples. By shifting selected high-frequency phase components in the chrominance channels and applying LPIPS-based filtering, FDPS achieves 99% ASR with negligible BA loss and minimal visible artifacts. It also evades representative detection methods, including Grad-CAM, Neural Cleanse, Adversarial Neuron Pruning (ANP), and STRIP. These findings indicate that high-frequency phase perturbation constitutes an effective and stealthy backdoor mechanism. Future work should extend this strategy to broader modalities and develop dedicated frequency-domain anomaly detectors as principled countermeasures.
Research on Inverse QR Decomposition Optimization for Sparse Adaptive System Identification Algorithms
PENG Yi, ZHANG Pengfei, WANG Xiaoyong, GAO Junqi, LI Changlong, ZHANG Zhiyuan, SUN Tianxiang
Available online  , doi: 10.11999/JEIT250562
Abstract:
  Objective  Traditional sparse-regularized Recursive Least Squares (RLS) algorithms, namely L1/L0-norm Recursive Least Squares (L1/L0-RLS), have theoretical advantages in sparse parameter-space estimation and are widely used in system identification and channel equalization. However, under limited numerical precision, iterative covariance matrix computation may cause rounding errors to accumulate. This can lead to divergence and instability in the least-squares solution.  Methods  To address this problem, an improved algorithm based on the Inverse QR Decomposition (IQRD) framework is proposed. The framework suppresses rounding-error accumulation in traditional regularized RLS algorithms. It also removes the back-substitution step for weight coefficients required in conventional QR decomposition. These features improve numerical robustness and system identification efficiency in finite-precision environments. Specifically, L1-IQRD-RLS and L0-IQRD-RLS algorithms are constructed under an L1/L0-constrained IQRD architecture. A general recursive expression for the weight coefficients is derived. An automatic parameter selection mechanism is also incorporated into the algorithm framework to solve the dynamic optimization problem of the sparse regularization parameter.  Results and Discussions  Monte Carlo simulations are conducted to evaluate the sparse constraints and robustness of the proposed algorithms. The results show that L1-IQRD-RLS and L0-IQRD-RLS maintain long-term numerical stability in an 11-decimal-place fixed-point computing environment. Compared with traditional algorithms, the proposed algorithms show clear advantages in system sparsity representation, parameter estimation variance, and covariance matrix condition number. Measured-data verification further confirms that the improved algorithms maintain numerical stability under limited-precision conditions and are more robust than traditional methods. The measured-data results also show that the regularized RLS algorithms optimized by the IQRD framework have advantages in system sparsity representation, parameter estimation, and numerical stability. Their iterative convergence success rate is higher than that of traditional methods.  Conclusions  This paper addresses sparse system identification in adaptive filtering. Traditional sparse-regularized RLS algorithms still face numerical stability problems under limited numerical precision. To solve this problem, an IQRD framework is constructed to reduce the numerical ill-conditioning caused by accumulated rounding errors in sparse-regularized RLS algorithms. The proposed method improves numerical robustness in low-precision environments. In addition, an automatic parameter selection mechanism is incorporated into the algorithm framework. This reduces repeated parameter tuning and supports stable performance optimization under sparse constraints. In practical electromagnetic signal processing, system identification and beamforming are limited by the finite precision of hardware implementation and often exhibit inherent system sparsity. The proposed algorithm provides a targeted solution. Its finite-word-length robustness suppresses numerical divergence during adaptive weight updates and supports stable implementation on fixed-point processors. The sparse constraints also match the physical characteristics of sparse systems and improve estimation accuracy. This study provides a practical algorithm for high-performance and high-stability sparse-constrained systems on precision-limited hardware platforms.
A Multi-view Feature Extraction and Dual-edge Contrastive Learning Approach for Image Forgery Detection
XU Zhuang, YE Ziyi, PAN Enkang, LIU Chunxiao
Available online  , doi: 10.11999/JEIT251271
Abstract:
  Objective  With the rapid development and wide use of image editing tools, such as Adobe Photoshop and Meitu, realistic forged images can now be created and disseminated with increasing ease. This trend poses challenges to visual content authentication in journalism, forensic analysis, and social security. Existing image forgery detection methods usually define the task as pixel-wise binary classification. This formulation may cause label conflicts, especially when the same object has different labels in different images. In addition, most methods mainly focus on spatial-domain features and make limited use of complementary information from other views, such as noise-domain clues.  Methods  To address these limitations, this paper proposes an image forgery detection algorithm based on multi-view feature extraction and dual-edge contrastive learning. The detection task is reformulated as intra-image inconsistency detection, which avoids label conflicts caused by conventional pixel-wise classification. To reduce semantic ambiguity near tampered boundaries, a dual-edge contrastive learning strategy is designed. Inner-edge and outer-edge features are extracted and contrasted separately, and non-edge tampered and non-tampered features are also contrasted. This strategy guides the model to focus on difficult edge samples and improves boundary detection accuracy. A dual-branch multi-view feature encoder is further developed to extract complementary forgery clues. The spatial-domain branch uses a High-Resolution Network (HRNet) backbone to extract multi-scale spatial features. A mixture-of-experts gating mechanism dynamically weights features across scales and fuses residuals between adjacent scales, which helps capture subtle forgery traces. The noise-domain branch extracts multiple noise-related features, including noise fingerprint features, Spatial Rich Model (SRM) filter responses, Bayar convolution features, max-pooling features, average-pooling residuals, and learnable Fourier-domain features with adaptive masking. A mixture-of-experts strategy is also used to dynamically assign weights to these heterogeneous features according to the characteristics of each input image. During training, the fused multi-view features are optimized using the dual-edge contrastive learning framework, which strengthens discrimination between tampered and non-tampered regions, particularly near their boundaries. During inference, K-means clustering is applied to the learned feature representations to locate tampered regions without explicit pixel labels.  Results and Discussions  Extensive experiments are conducted on widely used benchmark datasets, including NIST, Columbia, COVERAGE, DSO, and CASIA-v1. These datasets cover different forgery types, including splicing, copy-move, object removal, and post-processing. The proposed method consistently outperforms state-of-the-art methods. Compared with the best existing methods, it improves the average permuted F1 (pF1) and permuted Intersection over Union (pIoU) by 26.0% and 10.1%, respectively (Table 3). Visualization results show more accurate localization of tampered regions, especially along tampered boundaries, with fewer false positives and clearer edge delineation (Fig. 5). Ablation studies further verify the effectiveness of each key component, including multi-view feature extraction, the mixture-of-experts fusion mechanism for noise features, and the dual-edge contrastive learning strategy (Tables 46).  Conclusions  This paper presents an image forgery detection framework that addresses the limitations of conventional classification-based methods by modeling the task as intra-image inconsistency detection. Dual-edge contrastive learning reduces semantic ambiguity at tampered boundaries, and the multi-view feature encoder extracts complementary spatial-domain and noise-domain clues. Experimental results on different datasets show improved detection accuracy and boundary precision. Future work will explore the extension of the inconsistency detection paradigm to additional modalities, such as text, for multimodal forgery detection.
A Tensor Framework for ISAC: Information Fusion-Enhanced Channel Estimation and Target Localization
YU Weijia, DU Jianhe, CHEN Yuanzhi, HE Jing, ZHANG Peng, GUAN Yalin
Available online  , doi: 10.11999/JEIT251371
Abstract:
  Objective  Communication and sensing systems are moving toward higher frequency bands, larger antenna arrays, and smaller hardware. Their hardware architectures, channel characteristics, and signal processing methods are therefore becoming increasingly similar. This trend supports Integrated Sensing And Communication (ISAC), in which joint estimation of channel and sensing target parameters has become a key research topic. Existing studies have achieved joint estimation of these two parameter categories within a unified tensor framework, but two limitations remain. First, most studies focus on parameter estimation and do not further convert multidimensional estimates into accurate localization of Scatterer Points (SPs), the Mobile Transmitter (MT), and sensing targets. This limitation prevents a complete spatial characterization of the wireless propagation environment. Second, the fusion of channel and sensing target parameter information has received limited attention, which restricts further improvement in parameter estimation and localization accuracy.  Methods  To address channel/sensing target parameter estimation and localization in millimeter-Wave (mmWave) Multiple-Input Multiple-Output (MIMO) ISAC systems, this paper proposes a tensor decomposition algorithm based on information fusion. First, a unified fourth-order PARAllel FACtor (PARAFAC) model is constructed at the Base Station (BS) for uplink channel and sensing target parameter estimation. To reduce computational complexity, the fourth-order tensor model is transformed into a third-order form, and the Trilinear Alternating Least Squares (TALS) method is used to estimate three factor matrices. The special structure of one factor matrix is then exploited. A closed-form decomposition is used to decouple the coupled factor matrix, and Angle of Departure (AoD), Angle of Arrival (AoA), time delay, Doppler shift, and coefficients are extracted from the four estimated factor matrices. Based on these estimates, the MT, SPs, and sensing targets are localized separately using geometric relationships. The estimation accuracy of SPs is further improved by fusing Doppler shift and position information from SPs and sensing targets. The Cramér-Rao Bound (CRB) is derived as a theoretical performance benchmark for the five types of parameters.  Results and Discussions  The first simulation experiment shows that the proposed algorithm and the Optimized Quadrilinear Alternating Least Squares (Op-QALS) algorithm outperform the Co-SVD-BALS algorithm in terms of Root Mean Square Error (RMSE) for channel/sensing target parameter estimation and localization (Fig. 2, Fig. 3, Fig. 4). With information fusion, the proposed algorithm achieves the best Doppler shift and position estimation performance for SPs (Fig. 2(d), Fig. 4(a)). This advantage occurs because the proposed algorithm and Op-QALS fully exploit the multidimensional structure of the received signal. The fusion operation further improves the estimation capability of the proposed algorithm, whereas Co-SVD-BALS accumulates errors during stepwise factor matrix estimation. In terms of Average Processing Time (APT), the proposed algorithm requires slightly more time for localization than Co-SVD-BALS, but far less time than Op-QALS (Table 1 and Table 2). The proposed algorithm therefore achieves accurate parameter estimation and localization at a reasonable computational cost. The second simulation experiment shows that, under two Signal-to-Noise Ratio (SNR) levels, the localization accuracy of all algorithms improves as \begin{document}$ K $\end{document} increases. The proposed algorithm maintains SP and MT localization accuracy comparable to that of Op-QALS, while requiring much lower APT (Fig. 5). The fusion operation does not substantially increase the APT of the proposed algorithm (Fig. 5(d)). The third simulation experiment indicates that increasing \begin{document}$ {M}_{\mathrm{RE}}\left(M_{\mathrm{RE}}^{\mathrm{s}}\right) $\end{document} and \begin{document}$ N $\end{document} improves the ability of the proposed algorithm to resolve multipath signals, thereby yielding more accurate localization (Fig. 6).  Conclusions  This paper proposes an information fusion algorithm for channel/sensing target parameter estimation and localization within a unified tensor framework. By exploiting the Vandermonde structure of a factor matrix, the proposed algorithm preserves estimation accuracy while reducing computational complexity. The fusion operation further improves SP parameter estimation and localization without a substantial increase in computational overhead. Future work will extend the algorithm to more general array configurations and examine higher-order tensor processing for multi-BS cooperation and multi-user access scenarios.
A Social-Aware Ant Colony Optimization Algorithm with Reproductive Division of Labor for MCS Task Allocation
SHEN Xiaoning, SHE Juan, WANG Zhilong, LI Jiayuan
Available online  , doi: 10.11999/JEIT260018
Abstract:
  Objective  With the rapid development of handheld and wearable smart devices, Mobile Crowd Sensing (MCS) has become an efficient data collection paradigm. Effective task allocation can improve system efficiency, requester and participant satisfaction, and platform sustainability. Existing models often neglect task skill requirements, do not use participants’ social networks as auxiliary execution resources in emergencies, and overlook the effect of collaboration efficiency on team-task quality. To address these issues, this paper proposes a Social-Aware MCS Task Allocation model (SAMCSTA) with two objectives: maximizing total platform revenue and total task sensing quality. Social networks are used to build a two-layer collaboration framework of platform participants and social-network friends, which expands available execution resources and improves allocation flexibility. For complex tasks, participant sensing capability is quantified, and collaboration efficiency is introduced to optimize team composition.  Methods  This paper proposes a Multi-objective Ant Colony Optimization based on Reproductive Division of Labor (MACORDL) algorithm. The main innovations are as follows. First, the ant colony is divided into four collaborative subpopulations: queen ants, male ants, scout ants, and worker ants. Local enhancement, memetic crossover, knowledge transfer, and other search strategies are designed for these subpopulations to form a hierarchical collaborative search framework. Second, a statistical-learning-based mating selection strategy is designed to support intelligent transfer of elite genes. Third, the short-term contribution of each subpopulation is predicted from historical performance, which enables dynamic and adaptive allocation of computational resources. Fourth, a cooperative update mechanism for node pheromones and participant pheromones is designed to establish a dual-layer search guidance system.  Results and Discussions  The evaluation uses 8 synthetic instances and 4 real-world instances. Performance is measured by HyperVolume Ratio (HVR) and Inverted Generational Distance (IGD). The Wilcoxon rank-sum test at a significance level of 0.05 is used for statistical comparison. The results show that MACORDL achieves the best HVR and IGD on most instances (Table 2, Table 3). On average, MACORDL improves HVR and IGD by 16.41% and 18.04%, respectively, compared with the second-best algorithm. Visual comparisons further show that the Pareto front obtained by MACORDL has better convergence, distribution uniformity, and breadth (Fig. 4). Although its fine-grained local search can still be improved for a few large-scale instances, MACORDL shows stable performance and good scalability across different problem scales. It helps the platform obtain task allocation schemes with higher revenue and better sensing quality.  Conclusions  This paper studies the task allocation problem in MCS systems by considering interactions among platform participants and between participants and their social-network friends. A social-aware MCS task allocation model is established, and MACORDL is proposed to solve it. Comparative experiments on 8 synthetic instances and 4 real-world instances with different scales show that MACORDL outperforms six representative algorithms on most instances. It obtains allocation schemes and paths that yield higher total platform revenue and better task sensing quality, indicating good scalability. MACORDL uses multiple strategies to balance local exploitation and global exploration. However, the current model assumes that all tasks are released at the initial stage and that complete information is available. Participant privacy protection is also not considered. Future work will focus on MCS task allocation models in dynamic and uncertain environments and on privacy-preserving distributed optimization.
Non-Terrestrial Network Architecture and Key Technologies for Civil Aviation
LIU Xiangnan, QIU Yu, HUANG Zhipeng, ZHANG Haijun
Available online  , doi: 10.11999/JEIT260348
Abstract:
  Significance   Civil aviation communication systems are entering a new stage of development driven by the rapid growth of global air transportation, the increasing demand for intelligent air traffic management, and the continuous expansion of in-flight connectivity services. Traditional civil aviation communication systems mainly rely on high frequency radio, high frequency radio, terrestrial air-to-ground links, and conventional satellite communication systems. These technologies have supported aircraft operation, air traffic control, airline operational communication, and low-rate data transmission for a long time. However, they still face limitations when applied to future civil aviation scenarios characterized by global coverage, high-speed mobility, low latency, high reliability, and service diversification. Particularly, terrestrial networks are difficult to deploy in transoceanic routes, polar regions, deserts, mountains, and remote airspace, while traditional geostationary satellite systems suffer from large propagation delay and limited capacity. Current systems cannot fully meet the requirements of continuous aircraft access, real-time flight monitoring, engine health data transmission, aviation safety communication, and passenger broadband services. Non-Terrestrial Networks (NTNs) provide a promising technical path for overcoming these limitations. By integrating GEOstationary satellites (GEO), Medium Earth Orbit satellites (MEO), low Earth orbit satellites (LEO), Very Low Earth Orbit satellites (VLEO), High-Altitude Platform Stations (HAPS), Unmanned Aerial vehicles (UAV), electric Vertical Take Off and Landing (eVTOL), and terrestrial infrastructures, NTN can construct a multi-layer air-space-ground integrated communication system. Such a system is able to provide continuous coverage, flexible deployment, resilient connectivity, and differentiated service support for civil aviation. NTN is becoming an important enabling technology for future civil aviation communication systems and for the digital and intelligent transformation of the aviation industry.  Progress   This paper reviews the development of NTN technologies for civil aviation and summarizes key research progress from three aspects: network architecture, access and mobility management, and resource management and scheduling. (1) We propose an aviation-oriented NTN networking framework composed of three layers: the satellite edge layer, the airborne core layer, and the terrestrial assistance layer. The satellite edge layer includes GEO, MEO, LEO, and VLEO satellites connected through inter-satellite links. GEO satellites are suitable for wide-area broadcasting and non-real-time services, MEO satellites can support navigation and intermediate-delay services, LEO satellites are suitable for low-latency and high-capacity broadband access, and VLEO satellites can further reduce propagation delay for future near-real-time aviation applications. The airborne core layer includes civil aircraft, HAPS, UAVs, and eVTOL platforms. HAPS can act as a regional relay, edge computing node, or software-defined control carrier, while UAVs and eVTOL platforms can provide flexible low-altitude coverage, emergency communication, and local access support. The terrestrial assistance layer consists of terrestrial base stations and gateway stations, which support air-to-ground communication and satellite-terrestrial interconnection. Civil aviation services can be divided into air traffic control and air traffic management services, airline operational control services, and airline passenger communication or in-flight entertainment services. Through network slicing, these heterogeneous services can be logically isolated and managed over a shared air-space-ground infrastructure. In congestion, rain attenuation, or shortened visibility-window scenarios, safety slices should be protected with the highest priority, while passenger service slices can be rate-limited, buffered, or degraded. (2) We analyze the characteristics of NR-NTN access and air-to-ground direct access in civil aviation. NR-NTN can provide continuous coverage for oceanic, polar, desert, and remote flight routes through satellites or HAPS, while air-to-ground direct access can provide low-latency and high-rate links in areas where terrestrial base stations can be deployed. However, aircraft differ significantly from ordinary terrestrial terminals because their flight trajectory, altitude, speed, and route are highly predictable. Therefore, the key issue in aviation NTN access is not only how to execute random access, but how to predict the access window, timing compensation, frequency offset, and target access node before the aircraft enters the coverage area. By using satellite ephemeris, Global Navigation Satellite System information, aircraft trajectory, and velocity parameters, civil aircraft can predict satellite visibility and pre-compute timing advance, scheduling offset, and Doppler compensation before initiating access. This transforms random access from a passive response process into a proactive and predictive access process, thereby improving access certainty and synchronization stability in highly dynamic aviation scenarios. For mobility management, a signaling interaction process for aircraft handover is designed. Based on trajectory prediction and satellite visibility prediction, the network can select a target satellite or gateway with longer residence time and better service capability. Before the aircraft reaches the handover boundary, the source and target network sides can complete context preparation, user-plane path preparation, radio resource reservation, and protocol data unit session update. When the handover condition is triggered, the aircraft performs random access to the target satellite or beam and then switches the user-plane path. This “prediction–preparation–fast handover” mechanism can reduce service interruption and maintain session continuity. For safety-critical traffic, priority and isolation policies should remain consistent and auditable throughout session preparation, handover execution, and path switching. (3)We discuss computing and caching resource management in civil aviation NTN. As onboard computing capability is limited and aviation applications generate increasing computing demands, NTN can provide mobile edge computing and caching services through LEO satellites, HAPS, UAVs, and inter-satellite cooperation. The paper introduces several computing offloading modes, including on-orbit satellite collaborative offloading, network-level integrated offloading, and cloud-edge-terminal hybrid offloading. These mechanisms can support tasks such as aviation monitoring, trajectory analysis, intelligent inference, and in-flight service optimization. In addition, caching mechanisms such as onboard satellite caching, inter-satellite cooperative caching, and named-data-networking-based content caching can improve content delivery efficiency and service continuity. Cache placement should consider content popularity, regional demand prediction, visibility windows, cache prefetching, and cooperative cache sharing among different satellite layers.  Conclusions   NTN can effectively complement traditional civil aviation communication systems by filling coverage gaps in remote and oceanic airspace, enhancing service continuity, and supporting differentiated aviation services. The proposed aviation-oriented NTN architecture integrates multi-orbit satellites, HAPS, UAVs, civil aircraft, and terrestrial infrastructures into a unified framework. The on-demand isolated slicing mechanism can provide differentiated protection for ATC/ATM, AOC, and APC/IFE services. Ephemeris-map-assisted access and predictive mobility management can improve access reliability and reduce handover interruption in high-speed aviation scenarios. Computing offloading and cooperative caching further enhance the ability of NTN to support intelligent and data-intensive aviation applications.  Prospects   Future civil aviation NTN should evolve toward deeper integration of low-altitude networks, space networks, and terrestrial networks. Cross-domain topology visualization, link-state sharing, policy distribution, and programmable logical networks are essential for improving controllability and scalability. In mobility management, integrated cross-domain handover mechanisms should be developed to cope with satellite beam switching, terrestrial cell handover, and air-to-air relay reconstruction. In resource management, communication, navigation, computing, and caching resources should be jointly scheduled and transformed according to aviation service requirements. With continuous advances in NTN architecture, network slicing, predictive access, mobility management, computing offloading, and caching, NTN is expected to provide more efficient, stable, and intelligent communication support for civil aviation and to promote the digital transformation of future air transportation systems.
Data-driven Sliding-mode Disturbance-rejection Formation Control for Quadrotor UAV Swarms Under Uncertain Disturbances
LI Qianxiong, LU Xiaoqing
Available online  , doi: 10.11999/JEIT260050
Abstract:
  Objective  Quadrotor Unmanned Aerial Vehicle (UAV) cooperative formation can increase payload capacity and extend the operational range. However, quadrotor UAVs are highly nonlinear and underactuated systems. Differences in size and actuator hardware further weaken the effectiveness of model-based formation-control methods. Therefore, disturbance-rejection formation control is needed for quadrotor UAV swarms with unknown internal models and uncertain external disturbances.  Methods  To address the difficulty of precise modeling for quadrotor UAV swarm formation under uncertain disturbances, this paper proposes a data-driven sliding-mode disturbance-rejection formation control method. First, a data-driven formation-control model is established using the input and output states of each UAV and its neighboring UAVs. Then, an extended state observer and an integral sliding-mode formation controller are designed to estimate uncertain disturbances online and achieve robust formation control. Finally, stability analysis is conducted to derive sufficient conditions under which all UAVs achieve sliding-mode disturbance-rejection formation. The proposed method is verified through simulations and experiments under an unknown system model and uncertain disturbances.  Results and Discussions  The simulation results show that multiple quadrotor UAVs can maintain the desired formation geometry in a wind-disturbed environment (Fig. 4). The formation position error converges to within 0.1 m in 15 s and reconverges rapidly after a 7 m/s gust is applied (Fig. 6). The velocity curves also show rapid convergence among the UAVs (Fig. 5). The experimental results indicate that three UAVs can follow the trajectory of the virtual leader while maintaining the desired triangular formation (Fig. 17). The formation error is mostly kept within 0.1 m (Fig. 18). When the observation matrix fluctuates strongly between 10 s and 20 s, the corresponding formation error is relatively large. When the observation matrix curve becomes smoother between 20 s and 30 s, the formation error also decreases (Fig. 20). Compared with traditional model-based formation-control methods and existing data-driven methods, the proposed method reduces the formation error by 41% and shortens the formation response time by 40%.  Conclusions  This paper proposes a data-driven sliding-mode disturbance-rejection formation control method for quadrotor UAV swarms with unknown internal models and uncertain external disturbances. Under an unknown quadrotor UAV model and a 7 m/s wind disturbance, the proposed method keeps the formation error below 0.1 m. It also reduces the formation error by 41% and shortens the formation response time by 40% compared with traditional model-based formation-control methods and existing data-driven methods. Future work will study multilayer data-driven formation control for heterogeneous UAV-UGV swarm systems. It will also optimize computational cost and scalability in large-scale and complex application scenarios.
Research on Secure and Covert Transmission for UAV-assisted Visible Light Communication Systems
WU Mengru, LIN Jiale, LU Weidang, LI Bo, GUO Lei
Available online  , doi: 10.11999/JEIT260239
Abstract:
  Objective  Unmanned Aerial Vehicles (UAVs) can serve as aerial base stations for Visible Light Communication (VLC) because of their mobility and on-demand coverage capabilities. However, air-ground communication links are exposed to open environments, which makes VLC vulnerable to data eavesdropping and malicious detection. To address this issue, this paper proposes a secure and covert transmission strategy for a UAV-assisted VLC system from the perspectives of Physical Layer Security (PLS) and Covert Communication. The proposed strategy jointly optimizes UAV transmit power and hovering altitude to maximize the system secrecy capacity. The optimization is subject to covert communication requirements, illumination requirements, and operational constraints on UAV transmit power and hovering altitude.  Methods  This paper investigates secure and covert communication in a UAV-assisted VLC system. A UAV-assisted VLC system model is first established. In this model, a mobile UAV equipped with a Light-Emitting Diode (LED) is used to establish a VLC link with a legitimate ground user in the presence of an eavesdropper (Eve) and a warden (Willie). An optimization problem is then formulated to maximize the system secrecy capacity by jointly optimizing UAV transmit power and hovering altitude. To solve this problem, a Two-Layer OPtimization (TLOP) algorithm is proposed. The transformed problem is decomposed into two subproblems: an inner-layer transmit power optimization problem and an outer-layer UAV hovering altitude design problem. A closed-form expression for the optimal transmit power is derived for the inner-layer problem. A Particle Swarm Optimization (PSO) algorithm is then developed to solve the outer-layer problem.  Results and Discussions  In the simulations, the proposed optimization scheme is compared with two baseline schemes. First, the convergence of the proposed TLOP algorithm is verified (Fig. 3). The results show that the algorithm converges rapidly within a limited number of iterations. Second, the optimal UAV hovering altitude with respect to the UAV horizontal coordinates is illustrated under the spatial distribution (Fig. 4). The results indicate that the optimal hovering altitude decreases as the UAV approaches the legitimate ground user. The secrecy capacity with respect to the UAV horizontal coordinates is then presented (Fig. 5). The secrecy capacity increases as the UAV approaches the legitimate ground user. This is because the legitimate VLC channel gain increases when the UAV is closer to the user. In contrast, when the UAV approaches Eve and Willie, the security and covertness constraints become stricter. The UAV is then forced to reduce its transmit power or increase its hovering altitude, which decreases the system secrecy capacity. Furthermore, the secrecy capacity of all schemes increases as ϵ increases (Fig. 6). This is because a larger ϵ relaxes the covertness requirement. The UAV can therefore adjust its hovering altitude and transmit power more flexibly to increase the system secrecy capacity. In addition, the secrecy capacity decreases as the number of symbols increases (Fig. 7). This occurs because more symbols provide Willie with more signal samples for detection, thereby improving Willie’s detection capability. Finally, the secrecy capacity of all schemes decreases as the uncertainty-region radius of illegal nodes increases (Fig. 8). This trend occurs because greater location uncertainty forces the UAV to address potential threats over a wider area. The UAV must therefore adopt a more conservative strategy under worst-case eavesdropping and detection conditions. Overall, the simulation results confirm that the proposed scheme improves the secrecy capacity of the UAV-assisted VLC system.  Conclusions  This paper investigates secure and covert communication in a UAV-assisted VLC system. The objective is to maximize the system secrecy capacity by jointly optimizing UAV transmit power and hovering altitude under covert communication, illumination, transmit power, and hovering altitude constraints. Because the formulated problem is highly non-convex, a PSO-based TLOP algorithm is designed to solve it. The proposed algorithm decomposes the problem into an inner-layer transmit power optimization problem and an outer-layer UAV hovering altitude optimization problem. Simulation results show that the proposed algorithm converges rapidly and improves the system secrecy capacity compared with the baseline schemes.
KE-HNS: Knowledge-Enhanced Personalized Recommendation Model with Hierarchical Noise Suppression
XIE Jun, WANG Dantong, ZHANG Bo, CHEN Guijun, LV Jiaqi, LUO Xiongyan
Available online  , doi: 10.11999/JEIT260051
Abstract:
  Objective  In the era of Big Data and Artificial Intelligence (AI), rapid information growth has increased the difficulty of filtering valuable content from redundant data. Personalized recommender systems are key tools for accurate information matching and resource allocation. Knowledge Graphs (KGs) can enrich user-item representations. However, current KG-based recommendation models still face weak noise suppression, coarse-grained user-interest modeling, and imbalanced use of heterogeneous information, which reduce recommendation accuracy. This paper proposes Knowledge-Enhancedpersonalized recommender model with HierarchicalNoise Suppression (KE-HNS), which integrates knowledge enhancement with hierarchical noise suppression. By combining graph representation learning and contrastive learning, KE-HNS addresses noise interference, fine-grained preference modeling, and multi-source information balance, thereby improving recommendation performance.  Methods  KE-HNS adopts a hierarchical noise-suppression paradigm. At the input stage, Input Noise Reduction (INR) is used to reduce noise from two sources. For user-item interactions, a learnable binary mask matrix is used to remove noisy edges. For KG denoising enhancement, triples are scored by importance, low-score triples are identified with a Bottom-K strategy, and noisy triples are masked. At the feature-fusion stage, Isolated Noise Suppression (INS) is used to preserve spatial independence by partitioning entity-attribute spaces according to relation type. This design limits high-order noise propagation and semantic contamination. At the representation-optimization stage, Comparative Noise Suppression (CNS) is implemented through contrastive learning to suppress irrelevant entity noise and strengthen robust semantic signals. To capture fine-grained user interests, Graph Convolutional Networks (GCNs) are used to enhance user representations from historical interactions and related entities. Adaptive weight layers further refine item representations by using entity attributes and relations. To balance heterogeneous information, a dual-view contrastive learning mechanism is constructed between the user-item view and the item-entity view. Positive and negative sample pairs are used to adaptively adjust the weights of different information sources. Finally, user and item representations are matched by inner product to generate the Top-K recommendation list.  Results and Discussions  KE-HNS is evaluated on three public datasets, Book-Crossing, MovieLens-1M, and Last.FM, through performance comparison, ablation experiments, denoising evaluation, case analysis, and complexity assessment. For Click-Through Rate (CTR) prediction, KE-HNS outperforms the best baseline models by 0.94%~1.01% in Area Under the Curve (AUC) and 0.43%~0.90% in F1-score (Table 3). For Top-K recommendation, its Recall@K is higher than those of most advanced methods across nearly all K values, with only a slight gap behind CG-KGR on Last.FM (Fig. 7). The ablation results show that all three denoising components contribute to the performance gains (Table 4). The denoising evaluation shows that KE-HNS effectively suppresses noise and maintains high prediction accuracy under noisy conditions (Fig. 8). The complexity analysis further indicates that the model remains feasible for practical deployment (Table 5).  Conclusions  This paper presents KE-HNS, a personalized recommendation model that combines knowledge enhancement with hierarchical noise suppression. By reducing noise interference and balancing collaborative filtering signals with knowledge-aware semantics, KE-HNS improves recommendation accuracy across multiple benchmark datasets. The model still has limitations in computational efficiency and depends on the coverage and completeness of the KG. Future work may focus on computational optimization and dynamic knowledge integration.
Slice Pricing and Access Control with QoS Guarantee for Vehicular Networks
CUI Yaping, ZHANG Feng, WU Dapeng, HE Peng, WANG Ruyan, WANG Pan
Available online  , doi: 10.11999/JEIT251219
Abstract:
  Objective  Vehicular applications have diverse Quality of Service (QoS) requirements that traditional spectrum-focused networks cannot adequately meet. Although network slicing based on Mobile Edge Computing (MEC) provides customized service provisioning, existing methods often fail to jointly consider slice generation and adaptive access control. To address these limitations, this paper proposes a two-stage vehicular network slicing framework that integrates resource-aware slice generation with dynamic pricing and access control. The framework supports efficient resource allocation and slice access management. It also improves service quality, resource utilization, and system adaptability for both the MEC Network Service Provider (MEC-NSP) and vehicles through a Stackelberg game-based interaction mechanism.  Methods  The proposed solution uses a two-layer coupled mechanism consisting of resource pre-allocation and Stackelberg game-based pricing and access control. In the first stage, a three-dimensional resource pre-allocation mechanism jointly optimizes communication, computation, and caching resources to satisfy vehicular latency and bandwidth requirements. The resource allocation problem is formulated as a Mixed-Integer Nonlinear Programming (MINLP) problem. It is then decoupled into uplink and downlink subproblems, which are solved using branch-and-bound and interior-point methods, respectively. In the second stage, a Stackelberg game is developed to balance MEC-NSP profit and vehicle QoS. The MEC-NSP acts as the leader and sets dynamic slice prices. The network controller acts as the follower and determines the optimal slice selection probabilities. This interaction is solved using the Iterative Slices Pricing Algorithm (ISPA), which is proven to converge to a Nash equilibrium.  Results and Discussions  Simulation results show that the proposed framework consistently outperforms baseline algorithms, including Fixed Slice Pricing, Average Resource Allocation, Random Selection, and Dynamic Combinatorial Double Auction (DCDA), under different network conditions. In bandwidth-constrained scenarios, the proposed framework increases MEC-NSP profit by up to 20.77% compared with the Random Selection approach. When resources are abundant, with 150% capacity, it maintains profit gains of 3–9% over other baselines. The ISPA converges to equilibrium after approximately 175 iterations. The flexible pricing mechanism balances network loads, improves cache hit rates, and reduces resource bottlenecks, thereby supporting high QoS satisfaction.  Conclusions  The proposed dual-layer framework integrates slice generation and pricing for resource-aware network slicing in vehicular MEC environments. By coupling three-dimensional resource pre-allocation with a Stackelberg game-based pricing strategy, the framework improves MEC-NSP profit, resource utilization, and vehicle QoS. Future work will study blockchain-based mechanisms for trusted negotiation and decentralized resource orchestration in cross-domain cooperation under multi-operator and multi-vendor environments.
A Spatiotemporal Coupling Traffic Flow Prediction Model with Dynamic Graph Recursion and State Space
ZHANG Hong, QI Fangzheng, LUO Shengjun, ZHANG Xijun, HOU Liang, HUANG Hairong
Available online  , doi: 10.11999/JEIT251198
Abstract:
  Objective  Accurate traffic flow prediction is a key task in intelligent transportation systems. However, it remains challenging to capture dynamically evolving spatial structures and complex spatiotemporal dependencies in urban road networks. To address these issues, this paper proposes DGGRU-Mamba, a spatiotemporal traffic flow prediction framework that integrates dynamic graph recurrent modeling with a structured state space mechanism. The model jointly captures dynamic spatial dependencies and long-range temporal dependencies.  Methods  DGGRU-Mamba contains two core modules: Dynamic Graph Recurrent Modeling (DGRM) and Spatiotemporal Mamba (ST-Mamba). A spatiotemporal embedding generator is first designed to jointly encode periodic temporal information and node-specific spatial features, thereby supporting adaptive graph construction. DGRM dynamically updates time-varying adjacency structures through Dynamic Graph Gated Recurrent Units (DGGRUs), which enables adaptive modeling of evolving spatial dependencies. ST-Mamba uses structured state transitions to efficiently capture long-range temporal dependencies. In addition, a dual-branch prediction scheme with Forecast and Backcast branches is used to improve multi-step prediction accuracy and reduce cumulative errors.  Results and Discussions  DGGRU-Mamba is evaluated on four benchmark datasets, namely PEMS03, PEMS04, PEMS07, and PEMS08. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) are used as evaluation metrics. Experimental results show that DGGRU-Mamba achieves strong performance on all datasets. On PEMS04, compared with the mainstream attention-based model STAEformer, DGGRU-Mamba reduces MAE, RMSE, and MAPE by approximately 4.2%, 3.8%, and 2.9%, respectively. Its inference time is also shortened by 4.82 s. These results indicate that the proposed framework improves prediction accuracy while maintaining high computational efficiency. The performance gains mainly arise from the complementary effects of DGRM and ST-Mamba, which strengthen dynamic spatial dependency modeling and long-range temporal dependency learning with lower computational cost.  Conclusions  This paper proposes DGGRU-Mamba, a spatiotemporal traffic flow prediction framework for modeling dynamic spatial structures and long-range temporal dependencies in complex traffic networks. By integrating dynamic graph recurrent modeling with a structured state space mechanism, the framework achieves a favorable balance between prediction accuracy and computational efficiency. Experiments on multiple benchmark datasets verify its effectiveness and scalability in multi-step traffic flow prediction. Future work will consider external factors, such as weather and traffic events, to further improve its applicability in real traffic scenarios.
Energy-Efficient Trajectory Planning and Resource Optimization for UAV Relay Communications over Hybrid RF/FSO Links
LI Baolong, PAN Wenwei, JIANG Hao, FENG Simeng, WU Qihui
Available online  , doi: 10.11999/JEIT260139
Abstract:
  Objective  In low-altitude communication networks, hybrid Radio Frequency/Free-Space Optical (RF/FSO) Unmanned Aerial Vehicle (UAV) relaying can ease RF spectrum congestion and improve uplink data aggregation. However, in obstacle-rich urban environments, FSO backhaul links are vulnerable to blockage and intermittent outages. This creates a severe mismatch between the RF access-link rate and the FSO backhaul-link rate. UAV trajectory planning is also constrained by obstacle avoidance and flight dynamics. To address these coupled issues, this paper investigates an energy-efficiency maximization problem. Multiuser Non-Orthogonal Multiple Access (NOMA)-based RF access and the Three-Dimensional (3D) obstacle-avoiding UAV trajectory are jointly optimized, and buffer-assisted RF/FSO rate decoupling is incorporated.  Methods  A time-slotted UAV relaying model is considered, in which multiple ground users upload data to the UAV through an RF link using NOMA. The UAV decodes superposed signals by Successive Interference Cancellation (SIC), and the decoding order in each slot is determined according to the received-power ranking. The successfully received data are then forwarded to a Base Station (BS) through an FSO backhaul link. Urban blockage is modeled using 3D geometric obstacles. A visibility test is used to determine whether each relevant link is in Line-Of-Sight (LOS) or Non-Line-Of-Sight (NLOS), which captures the spatially correlated and time-varying RF access-link rate and intermittent FSO backhaul capacity. To suppress blockage-induced rate mismatch between the RF access link and the FSO backhaul link, an onboard finite-capacity buffer is deployed at the UAV. In each slot, the forwardable data amount is jointly limited by the instantaneous FSO backhaul capacity and the data available in the buffer, and buffer-capacity constraints are imposed to prevent overflow. System energy efficiency is defined as the ratio of cumulative data successfully delivered to the BS over the mission horizon to UAV propulsion energy consumption. Propulsion power is modeled as a function of UAV velocity and acceleration to reflect the effect of flight dynamics. Under 3D flight-region boundaries, prescribed start and end locations, discrete-time kinematic equations, maximum velocity and acceleration limits, and obstacle collision-avoidance constraints, a non-convex optimization problem is formulated. The decision variables are cross-slot multiuser transmit powers and the 3D UAV trajectory. An alternating optimization framework is then developed. For a fixed trajectory, propulsion energy is fixed, so maximizing energy efficiency is equivalent to increasing end-to-end successfully forwarded data. This yields a power-optimization subproblem. Because of NOMA coupling and logarithmic rate expressions, this subproblem remains non-convex and is solved by Successive Convex Approximation (SCA). For fixed transmit powers, Particle Swarm Optimization (PSO) is used to search candidate 3D trajectories in continuous space. To ensure feasibility under strict dynamics and safety constraints, Quadratic Programming (QP) projection is used to enforce velocity and acceleration constraints. Collision checks are performed for trajectory waypoints and inter-slot line segments to ensure obstacle-free flight. These two optimization procedures are performed alternately. The resulting joint design satisfies flight-dynamics feasibility and collision-avoidance requirements and improves energy efficiency.  Results and Discussion   Simulations are conducted in an urban airspace with multiple users, a BS, and dense 3D obstacles. Blockage causes frequent LOS/NLOS switching as the UAV moves. Fig. 2 and 3 compare the 3D trajectory and its planar projection, respectively. Compared with the initial trajectory, the optimized trajectory shows clear detours and necessary altitude adjustments. It achieves collision-free flight while satisfying velocity and acceleration constraints, thereby verifying the feasibility and safety of the proposed trajectory planning method. Fig. 4 shows the convergence of energy efficiency under different user transmit-power budgets. The proposed alternating optimization generally stabilizes within a small number of outer iterations. The converged energy efficiency increases with the power budget, indicating synergy between power control and trajectory adaptation. Fig. 5 shows buffer evolution over time. The buffer gradually accumulates data when the backhaul is blocked or experiences strong fading. It is quickly drained when the UAV enters regions with LOS backhaul and improved FSO capacity. To quantify buffering gain, Fig. 6 compares system energy efficiency between the proposed buffering mechanism and the no-buffer scheme. The proposed mechanism enables store-and-forward temporal smoothing during backhaul interruptions and improves system energy efficiency. Fig. 7 shows energy-efficiency convergence under different buffer capacities. As buffer capacity increases, the converged energy-efficiency level improves. A larger buffer enhances the UAV’s ability to temporarily store incoming data and reduces data accumulation and transmission blockage when RF access-link and FSO backhaul-link rates are mismatched or the backhaul link is constrained. Figure 8 compares four benchmark schemes, namely a non-optimized baseline, a power-optimization scheme, a trajectory-optimization scheme, and the proposed joint power-and-trajectory optimization scheme. The coordinated design of power allocation and obstacle-avoiding trajectory improves end-to-end energy efficiency. Trajectory optimization also plays a more dominant role under blockage-limited conditions.  Conclusion  This paper investigates a hybrid RF/FSO UAV relaying scheme with NOMA and an onboard buffering mechanism for low-altitude urban communication. Given dense obstacles, frequent blockage, FSO-link susceptibility, and strict flight-dynamics constraints, an energy-efficiency maximization problem is formulated for the joint optimization of multiuser NOMA power allocation and UAV trajectory. An SCA-based power-allocation method and an obstacle-avoiding trajectory design that combines PSO with QP projection are developed. The obtained trajectory satisfies flight-dynamics feasibility and collision-avoidance requirements and improves throughput per unit propulsion energy. Simulation results show that the planned trajectory can avoid obstacles, and that the onboard buffer provides an effective cushion between RF access and FSO backhaul to mitigate rate mismatch. The proposed method consistently outperforms benchmark schemes in energy efficiency. Trajectory optimization is also shown to be generally more effective than power allocation in improving overall system performance.
Secure and Covert MIMO Short packet Communication with Location-Uncertain Malicious Nodes
TIAN Bo, YANG Weiwei, YANG Xiaoqin, BAI Mengmeng
Available online  , doi: 10.11999/JEIT260059
Abstract:
  Objective  This paper investigates secure and covert short-packet communication in Multiple-Input Multiple-Output (MIMO) wireless systems with location-uncertain malicious nodes over quasi-static Rician fading channels. In the considered scenario, a legitimate transmitter sends confidential short packets to a legitimate receiver. Meanwhile, multiple monitoring nodes (Willie nodes) attempt to detect whether transmission occurs, and multiple eavesdropping nodes (Eve nodes) attempt to intercept the confidential information. Because malicious nodes may remain silent and their exact locations are unavailable to the legitimate system, their spatial uncertainty poses major challenges to joint covertness and secrecy analysis. To address this problem, a unified analytical and optimization framework is established for secure and covert short-packet transmission. The framework is used to characterize the coupling among covertness, secrecy, and reliability and to improve the Average Effective Secrecy and Covert Rate (AESCR).  Methods  The transmitter adopts Singular Value Decomposition (SVD)-based precoding, and the legitimate receiver applies Maximum Ratio Combining (MRC) to enhance the legitimate link. Monitoring nodes and eavesdropping nodes are modeled as two independent Poisson Point Processes (PPPs) outside a circular protection zone centered at the transmitter. This model captures the spatial randomness of malicious nodes. For covertness analysis, each monitoring node is assumed to perform optimal Likelihood Ratio Test (LRT)-based detection with full knowledge of the system model, noise power, channel state, and codebook information. Using the Chernoff bound and the Bhattacharyya coefficient, a theoretical lower bound on the minimum detection error probability of a single monitoring node is first derived. Stochastic geometry is then combined with the distribution of the strongest monitoring node to obtain a tractable lower bound on the average minimum detection error probability. For secrecy analysis, the finite blocklength normal approximation is used to account for decoding error and information leakage penalties. The legitimate channel is statistically characterized under Rician fading conditions, and the strongest eavesdropping node is analyzed through stochastic geometry. Based on these results, an approximate analytical expression for the average secrecy rate is derived. AESCR is proposed as a comprehensive performance metric that jointly reflects reliability, secrecy, and covertness. Under the average covertness constraint and the short-packet length constraint, a joint optimization problem for transmit power and packet length is formulated. By using the monotonic properties of the objective function and the covertness constraint, the original coupled optimization problem is transformed into a one-dimensional search problem.  Results and Discussions  Simulation results verify the accuracy of the theoretical derivations and reveal the effects of key system parameters. Both the simulated average minimum detection error probability and its theoretical lower bound decrease as the packet length increases. Higher transmit power further reduces the detection error probability, indicating that excessive power makes transmission more exposed to monitoring nodes (Fig. 2). Increasing the number of monitoring-node antennas strengthens spatial reception capability and further degrades covertness (Fig. 2). Enlarging the protection zone improves covertness because malicious nodes are forced to remain farther away from the transmitter. However, increasing the monitoring-node density weakens this benefit by raising the probability that a strong monitoring node appears near the protection-zone boundary (Fig. 3). The average secrecy rate increases with packet length and gradually approaches the asymptotic secrecy-capacity upper bound because the finite blocklength rate penalty decreases as the packet length grows (Fig. 4). AESCR first increases and then decreases with packet length, confirming the existence of an optimal packet length. This behavior results from the tradeoff between the reduced finite blocklength penalty and increased detection exposure (Fig. 5). Higher malicious-node density and more malicious-node antennas degrade system performance because they enhance both monitoring and eavesdropping capabilities (Fig. 5). Relaxing the covertness constraint improves the achievable AESCR because the system can select a higher transmit power or a more favorable packet length (Fig. 6). Results under different Rician factors show that the proposed analytical framework is applicable to both Rician and Rayleigh fading conditions (Fig. 6). Increasing the number of legitimate receive antennas improves AESCR, and a larger transmit antenna array provides additional SVD precoding gain (Fig. 7). Compared with benchmark schemes, the proposed joint optimization of transmit power and packet length consistently outperforms the scheme with fixed packet length and power-only optimization. This result demonstrates the need to jointly balance reliability, secrecy, and covertness in MIMO short-packet transmission (Fig. 8).  Conclusions  This paper develops a stochastic-geometry-based analytical framework for secure and covert MIMO short-packet communication with location-uncertain multi-antenna malicious nodes. By deriving a lower bound on the average minimum detection error probability, obtaining an approximate analytical expression for the average secrecy rate, and proposing AESCR, the framework reveals the fundamental tradeoff among covertness, secrecy, and reliability under finite blocklength transmission. The results show that increasing the number of legitimate transmit and receive antennas improves secure and covert performance, whereas higher malicious-node density and more malicious-node antennas degrade system performance. The existence of an optimal packet length further shows that packet length and transmit power should be jointly designed. The proposed joint optimization method therefore provides an effective solution for secure and covert short-packet transmission in mission-critical and low-latency wireless systems.
Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating a Two-row Simultaneous Update Strategy
ZHU Jianyong, CHEN Kun, YANG Hui, NIE Feiping
Available online  , doi: 10.11999/JEIT260111
Abstract:
  Objective  As data structures become increasingly complex, conventional unsupervised clustering methods often fail to achieve satisfactory performance. Semi-supervised clustering has therefore attracted growing attention because it uses limited prior information to improve clustering quality. However, existing methods have two major limitations. First, traditional constrained projection clustering algorithms usually use a two-step independent strategy, in which the projection matrix is learned before k-means clustering is performed. This separation allows projection errors to be propagated directly to the clustering stage, causing accumulated learning errors. In addition, applying pairwise constraints only during projection deviates from the goal of using prior information to guide clustering. Second, many existing methods, including spectral clustering-based approaches, handle pairwise constraints implicitly, for example through eigen-decomposition of a modified similarity matrix. Such implicit processing may not strictly satisfy the constraints, especially Cannot-Link (CL) constraints, which are non-transitive, resulting in high constraint violation rates. To address these issues, this paper proposes a joint optimization method for pairwise constrained Projection Clustering Integrating a Two-row simultaneous Update Strategy (PCITUS). The objective is to unify dimensionality reduction and clustering within a single framework to reduce information loss, while designing an explicit optimization strategy that lowers constraint violations and improves computational efficiency.  Methods  The proposed PCITUS model integrates constrained projection and clustering into a unified objective function for collaborative optimization, with pairwise constraints optimized directly. First, the algorithm uses the transitive property of Must-Link (ML) constraints. Samples belonging to the same ML connected component are merged into a single hyper-point in the feature space. This preprocessing step ensures that all ML constraints are naturally satisfied. A trade-off parameter is then introduced to incorporate projection learning into the clustering framework as a regularization term, allowing both components to be jointly optimized under one objective. Prior information is further embedded into the clustering process by transforming pairwise constraints into row-wise constraints on the indicator matrix. An improved coordinate descent method is then used to optimize the discrete indicator matrix directly, which improves computational efficiency and produces better clustering results. A key feature of PCITUS is the two-row simultaneous update strategy for CL constraints. PCITUS explicitly checks CL conflicts by simultaneously evaluating objective function values obtained by moving conflicting rows to suboptimal classes and then selects the case with the higher value.  Results and Discussions  Extensive experiments are conducted on eight benchmark datasets and compared with nine state-of-the-art semi-supervised clustering algorithms. Quantitative results based on ACCuracy (ACC) and Normalized Mutual Information (NMI) demonstrate the superiority of PCITUS (Table 4 and Table 5). PCITUS achieves the best performance on most datasets. In particular, on the Mushroom dataset, NMI is improved by 7.29% compared with the second-best algorithm. The comparison with CNP, a two-step projection method, confirms that the unified framework effectively reduces error propagation and information loss. This effect is also supported by the mutual reinforcement between projection and clustering: a better projection space produces a clearer clustering structure, while a more reasonable clustering structure guides the formation of a more discriminative projection space. The effectiveness of explicit constraint handling is further illustrated (Fig. 1). PCITUS produces no ML constraint violations because of the hyper-point merging strategy. For CL constraints, the two-row simultaneous update strategy enables PCITUS to maintain an extremely low violation rate, such as 0.57% on Mushroom and 0.41% on Satimage, greatly outperforming methods that handle constraints implicitly. Additionally, the parameter sensitivity analysis (Fig. 2) shows that PCITUS remains stable across a wide range of trade-off parameter values. The noise sensitivity experiments (Fig. 3a and Fig. 3b) confirm its robustness. The convergence curves (Fig. 3c and Fig. 3d) and runtime comparisons (Table 7) further verify its computational efficiency, showing rapid convergence and a stable objective function value within approximately 10 iterations in most cases.  Conclusions  This paper presents PCITUS, a semi-supervised clustering framework that jointly optimizes pairwise constrained projection and clustering structures. The method addresses the difficulty of optimizing CL constraints and overcomes the limitations of traditional constrained projection clustering frameworks based on a two-step separation scheme. By integrating the projection objective into the clustering framework as a regularizer, the proposed method enables subspace learning and data partitioning to reinforce each other and jointly approach the global optimum. Pairwise constraints are used throughout the learning process, allowing prior knowledge to guide optimization more fully. The coordinate descent method with the two-row simultaneous update strategy directly and accurately allocates samples under CL constraints, significantly reducing constraint violations. Experimental results show that PCITUS outperforms existing algorithms in clustering performance.
MG-MoE: Routed Multi-Granularity Expert Ensemble
XIAN Fengyu, JIAN Haifang, XIE Zihui, DU Jun, ZHANG Yuanyuan, NING Xin, DONG Miaomiao, WANG Hongchang
Available online  , doi: 10.11999/JEIT260219
Abstract:
  Objective  Fine-Grained Image Recognition (FGIR) aims to distinguish visually similar subcategories that differ only in subtle local patterns. It must also remain robust to large intra-class variations caused by pose changes, occlusion, illumination shifts, and complex backgrounds. In real-world scenarios, these challenges are further intensified by long-tailed category distributions. Rare or difficult classes are more likely to overfit spurious contextual cues and suffer from unstable decision boundaries. Therefore, a conditional computation paradigm is needed, in which complementary inductive biases are separated into specialized expert branches and adaptively combined for each sample. This work aims to develop a routed multi-granularity mixture-of-experts framework that improves discriminative performance under controllable inference cost. It also enhances robustness for difficult samples and long-tailed categories through adaptive sparse expert activation.  Methods  A Multi-Granularity Mixture-of-Experts (MG-MoE) model is proposed. It is a routed ensemble architecture composed of a shared backbone, four heterogeneous experts, and a learnable router that predicts input-conditioned expert weights (Fig. 2). The experts are designed with complementary inductive biases to address key factors in FGIR. MPSA emphasizes global structure and contour-level semantics. PMG captures fine local details through multi-granularity part modeling. TransFG focuses on pose and deformation modeling. PIM improves robustness in cluttered backgrounds through background suppression. To limit interference and reduce unnecessary computation, MG-MoE adopts sparse fusion. Only the Top-K experts, with K=2 by default, contribute to the final prediction during inference. To improve routing stability and generalization, a two-stage optimization strategy is designed. In the first stage, dynamic cluster-level training is performed. A cluster-level soft teacher distribution is constructed from validation-set statistics and imposed through Kullback-Leibler (KL) divergence regularization. This process stabilizes routing behavior and promotes effective expert specialization. In the second stage, residual fine-tuning is conducted. The feature-driven routing mechanism is kept unchanged, while the classification heads of the Top-2 experts associated with each cluster are selectively unfrozen. The router and expert heads are then jointly optimized with grouped learning rates. This design reduces fusion bias and strengthens discrimination for difficult samples and long-tailed categories.  Results and Discussions  MG-MoE achieves strong performance on standard FGIR benchmarks. On CUB-200-2011, it obtains 92.89% Top-1 accuracy. This result is higher than those of representative expert backbones used individually, including MPSA (91.23%), PIM (91.17%), and TransFG (90.49%). It also outperforms the multi-granularity baseline PMG (88.32%) (Table 1). On the Bird-1445 sampled set, MG-MoE achieves 96.80% Top-1 accuracy and consistently improves over strong baselines (Table 2). These results indicate that routed multi-expert specialization remains effective in data-limited and highly similar fine-grained scenarios. The efficiency-accuracy trade-off is summarized in Table 3. With Top-2 sparse routing, MG-MoE reaches 92.89% accuracy with a compute budget of 143.9 GFLOPs. It avoids dense expert activation during inference by selecting only the Top-2 experts for each sample, thereby achieving a favorable balance between accuracy and efficiency. Ablation experiments show that increasing K beyond 2 does not yield consistent gains, which suggests that indiscriminate fusion can dilute discriminative evidence. Top-2 fusion produces the best performance, whereas Top-1 fusion is more sensitive to routing errors and larger K values may introduce noise and reduce accuracy (Table 4). The role of expert diversity and composition is also analyzed. Two- and three-expert variants generally underperform the full four-expert configuration, indicating that each inductive bias contributes to different fine-grained difficulty factors. In contrast, adding homogeneous experts without new functional diversity brings diminishing or negative gains, which is consistent with increased routing ambiguity and limited expert complementarity (Table 5). These results support the use of a compact set of heterogeneous experts combined with sparse routing. To interpret the learned specialization, category-wise routing statistics are visualized. The expert-category heatmap shows that MPSA receives dominant routing weights across many categories, reflecting the central role of global structure in fine-grained discrimination. PIM and TransFG show higher activation for specific difficult categories, which is consistent with their roles in background suppression and pose and deformation modeling (Fig. 3). Finally, t-SNE visualizations illustrate the qualitative effect of expert fusion on class separability. Shared backbone features show stronger inter-class entanglement among visually similar subcategories. In contrast, fused outputs form clearer clusters with better between-class separation and within-class compactness, indicating a more reliable decision space shaped by routed expert aggregation (Fig. 4).  Conclusions  MG-MoE is a multi-granularity routed mixture-of-experts framework for fine-grained recognition. By combining four complementary experts, Top-2 sparse fusion, and a two-stage optimization strategy for stable routing and calibrated fusion, MG-MoE improves recognition accuracy on CUB-200-2011 and the Bird-1445 sampled set. It also provides interpretable evidence of expert specialization (Table 1, Table 2, Fig. 3, Fig. 4). Ablation results confirm that controlled Top-2 fusion and heterogeneous expert design are key to the observed performance gains. Overly dense fusion or homogeneous expert expansion provides limited benefit (Table 4, Table 5).
Rotatable-Antenna-Aided Near-Field Wideband Integrated Sensing and Communication System: Hybrid Beamforming Design
XU Hongbo, MO Minghui, XIN Wei, WANG Shuli, WANG Ji, LI Xingwang, ZHENG Le
Available online  , doi: 10.11999/JEIT260023
Abstract:
  Objective  Near-field wideband Integrated Sensing and Communication (ISAC) systems face two main challenges: pronounced near-field effects and wideband beam splitting. These effects reduce communication throughput and sensing reliability, particularly when fixed-orientation antenna arrays and phase-shifter-based beamforming architectures are used. Because such architectures provide limited spatial adaptability and frequency-independent phase control, the spatial-frequency degrees of freedom available in near-field wideband channels cannot be fully used. To address this issue, a Rotatable-Antenna-assisted near-field wideband ISAC architecture is investigated to improve the system sum rate under sensing constraints.  Methods  A near-field wideband ISAC architecture assisted by Rotatable Antennas (RAs) is proposed. By allowing the antenna boresight direction to be adjusted mechanically or electronically, additional angular degrees of freedom are provided at the element level, which enables more flexible spatial coverage and more accurate energy focusing. A True Time Delay (TTD)-based hybrid beamforming architecture is further adopted to provide frequency-dependent phase shifts and compensate for the frequency-independent property of conventional phase shifters. Consistent beam focusing across subcarriers is thus maintained, and wideband beam splitting is effectively suppressed. Based on a spherical-wave near-field channel model that incorporates propagation distance, angular information, and the orientation gain of RAs, a joint optimization problem is formulated to maximize the system sum rate under transmit power constraints, sensing power thresholds, and antenna rotation constraints. Because the resulting problem is highly non-convex, a Penalty-Based Fully Digital Approximation (PBFDA) algorithm is developed. In each iteration, the RA orientations are first optimized by Particle Swarm Optimization (PSO) to improve the weighted channel gain. Then, with the antenna orientations fixed, a reduced-dimensional formulation with Successive Convex Approximation (SCA) is used to solve the fully digital beamforming problem. Finally, a manifold-based Block Coordinate Descent (BCD) algorithm is used to jointly optimize the analog beamformer, digital beamformer, and TTD units, so that the hybrid beamforming solution gradually approaches the fully digital solution (Algorithm 1–Algorithm 4).  Results and Discussions  Simulation results verify the effectiveness of the proposed RA-assisted near-field wideband ISAC framework. The proposed PBFDA algorithm converges monotonically within a limited number of iterations, which confirms its numerical stability and efficiency (Fig. 2). Compared with fixed-antenna architectures, the proposed RA-assisted scheme achieves a clear improvement in system sum rate under the same transmit power constraint (Fig. 3). When the system bandwidth increases, the spectral efficiency of TTD-based hybrid beamforming decreases because the limited number of TTD units and the restricted maximum delay weaken frequency-dependent compensation and aggravate beam splitting. By contrast, the optimal fully digital beamforming scheme maintains nearly unchanged spectral efficiency because each subcarrier can be controlled accurately (Fig. 4). When the sensing power threshold increases, the achievable sum rate decreases for all schemes, which reflects the trade-off between communication and sensing. The proposed method, however, consistently outperforms the benchmark schemes (Fig. 5). The effects of antenna number, antenna directivity factor, and maximum rotation angle are also evaluated. Spectral efficiency increases with the number of antennas because of the higher array gain (Fig. 6). As the antenna directivity factor increases, the RA-assisted system attains further gains through adaptive orientation, whereas fixed-orientation and isotropic schemes degrade (Fig. 7). A larger allowable rotation range also provides greater spatial alignment flexibility and further improves system performance (Fig. 8). Overall, the proposed architecture improves near-field energy focusing and achieves performance close to that of fully digital beamforming with lower hardware complexity.  Conclusions  A Rotatable-Antenna-assisted near-field wideband ISAC system with a TTD-based fully connected hybrid beamforming architecture is investigated. By jointly using antenna rotation and true time delay, the proposed framework effectively mitigates near-field effects and wideband beam splitting. The developed PBFDA algorithm solves the resulting highly non-convex optimization problem efficiently. Numerical results show that the proposed scheme significantly improves the system sum rate under sensing constraints and approaches the performance of fully digital beamforming, which supports its use in near-field wideband ISAC systems.
A Method for Lightning Electromagnetic Signal Identification Using Cross-Layer Deep Feature Fusion
SONG Lin, YANG Jun, CAO Wei, ZHAO Ziqi, NING Yuan, WANG Wenjing, ZHANG Qilin
Available online  , doi: 10.11999/JEIT251134
Abstract:
  Objective   Lightning identification is essential for lightning observation, location, warning, and disaster prevention. Large volumes of Low-Frequency/Very-Low-Frequency (LF/VLF) Lightning Electromagnetic Pulse (LEMP) waveform data require automatic and accurate classification methods. Deep learning has been widely used for feature extraction and classification, providing a feasible approach for LEMP waveform identification. However, anthropogenic electromagnetic interference and natural LEMP signals often overlap in the time and frequency domains. Their waveform features are also complex and diverse, which limits the accuracy and generalization ability of existing identification algorithms. Therefore, a more efficient deep learning model is required to distinguish LEMP signals from non-lightning electromagnetic signals.  Methods   This paper proposes a Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) deep neural network model that integrates multi-scale residual convolution and cross-layer feature fusion. The model is designed for binary classification of LEMP and non-lightning electromagnetic signals and enables accurate diagnostic identification of LEMP signals. Using observational data from an LF/VLF lightning magnetic-field detection system, a multi-scale residual network is first used to extract multidimensional features from electromagnetic waveforms layer by layer. The time-domain features output by each convolutional layer are then organized into a cross-layer time-domain feature sequence according to network depth. This sequence is input into the LSTM module for adaptive weighted fusion. This mechanism uses the sequence modeling ability of LSTM to learn the relative importance of features at different hierarchical levels, rather than to model the temporal dynamics of the original waveform.  Results and Discussions   The proposed CNN-LSTM model achieves a precision of 100%, a recall of 99.82%, an F1-score of 99.91%, and an accuracy of 99.89%. It obtains the best performance across all evaluation metrics. The model effectively identifies LEMP samples and reduces the misclassification of non-lightning samples. The Bayes classifier achieves high precision (93.14%), but its recall is relatively low (80.14%). The Support Vector Machine (SVM) model improves on the Bayes classifier across all metrics, but it remains inferior to the proposed CNN-LSTM model. The Multilayer Perceptron (MLP) and K-Nearest Neighbor (KNN) models also show limitations in precision, recall, and accuracy compared with CNN-LSTM. The Decision Tree (DT) model obtains reasonable results, but its precision and recall are lower than those of MLP and KNN, with a recall of only 88.01%. These results indicate that CNN-LSTM has clear advantages in LEMP waveform identification. This improvement is mainly attributed to the multi-scale residual CNN module, which automatically extracts low-level local features from raw waveform data. Additionally, the LSTM-based adaptive weighted fusion mechanism is applied to feature sequences from different network layers. As a feature integration tool across network depths, its input is an inter-layer feature sequence rather than an original waveform time series. This design improves the flexibility and discriminative ability of feature fusion, enables the model to learn the relative importance of features at different network depths, and supports effective aggregation of discriminative features. A confusion matrix was also generated to evaluate classification performance on the test set. Overall, comparison with baseline models confirms the superiority of the proposed model for LEMP waveform identification.  Conclusions   The CNN-LSTM model effectively identifies LEMP samples and reduces the misclassification of non-lightning samples. Compared with baseline models, it shows excellent identification performance in the binary classification of LEMP and non-lightning electromagnetic signals. The results also verify the effectiveness of convolutional feature extraction and LSTM-based cross-layer feature fusion for LEMP waveform identification.
S4-UNET: A Long-Sequence Modeling Blind Source Separation Method for Single-Channel Co-Channel Overlapped Communication Signals
GAO Shaoyuan, GUO Wenpu, SHI Hao, PENG Ruiyan
Available online  , doi: 10.11999/JEIT251144
Abstract:
  Objective  Blind Source Separation (BSS) of single-channel co-channel overlapped communication signals remains challenging in non-cooperative reception. Conventional multi-channel methods are not applicable because of antenna limitations. Existing deep learning methods also show limited long-sequence modeling ability, high computational cost, and reduced performance for signals with small carrier frequency offsets. These limitations restrict the practical use of BSS techniques in dense electromagnetic environments. An efficient and robust framework is therefore needed to capture long-range temporal dependencies while maintaining computational feasibility.  Methods  S4-UNET integrates the U-NET encoder-decoder framework with the Structured State Space sequence model (S4). A Temporal State Enhancement Module (TSEM) is designed as the backbone block of both the encoder and decoder. It extracts local temporal features through residual learning. To model long-range dependencies, S4 is embedded in the odd-numbered stages of the encoder. This design captures global temporal correlations with near-linear computational complexity. S4 converts sequence modeling into a state-space evolution process and uses the Fast Fourier Transform (FFT) for efficient convolution. Skip connections and the Gated Linear Unit (GLU) are used to preserve fine-grained local details. Multi-scale feature fusion is achieved through skip connections between corresponding encoder and decoder stages. Signal resolution is then progressively restored by interpolation-based upsampling. The model also adaptively tokenizes feature maps in the temporal or channel dimension according to feature scale, which improves sequence representation.  Results and Discussions  Experiments are conducted on simulated datasets with small carrier frequency offsets, including same-modulation mixtures, mixed-modulation mixtures, and different-bandwidth mixtures. Public benchmark datasets and a measured dataset collected using hardware are also used. Quantitative results and visualizations (Fig. 3, Fig. 5, Table 5) show that S4-UNET consistently outperforms representative deep learning baselines, including ConvTasNet and CTDCRN, and the classical Time-Delay Embedding Independent Component Analysis (TDE-ICA) algorithm across different signal lengths and modulation schemes. The model maintains robust separation fidelity under randomly distributed carrier frequency offsets and initial phase differences (Table 3), confirming its strong generalization ability. Ablation and sensitivity analyses (Table 6, Table 7, Table 8) show that placing S4 in the odd-numbered encoder stages, using suitable convolutional stride settings, and adopting GLU jointly support a favorable balance between separation accuracy and computational efficiency. The model also maintains competitive inference latency while processing both long and short sequences, indicating its practical value.  Conclusions  S4-UNET addresses the main challenges of single-channel co-channel BSS by combining multi-scale convolutional feature extraction with efficient state-space long-sequence modeling. It achieves superior separation performance, strong robustness to small carrier frequency offsets, and good generalization across different data domains. The present work focuses on dual-source mixtures. Its modular architecture provides a basis for future extensions to mixtures with an unknown number of sources by integrating source number estimation and iterative cancellation strategies.
Box Particle Filter δ-GLMB Algorithm for Multiple Maneuvering Group Targets Tracking
GAN Linhai, WANG Gang, LI Zhihui, SUN Wen, WANG Baotang
Available online  , doi: 10.11999/JEIT251273
Abstract:
  Objective  Targets that move in a coordinated manner or have similar motion patterns and exhibit certain collective motion characteristics are often referred to as group targets. Dense group targets, characterized by a large number of closely spaced individuals, suffer from poor measurement resolvability, severe measurement overlap, and frequent target disappearance and reappearance, making it difficult to establish stable tracks for individual targets within the group. Therefore, such groups are typically treated as a whole to jointly estimate the kinematic state of their centroid and their extended shape. To enhance the tracking accuracy and computational efficiency for multiple maneuvering group targets under nonlinear measurements, an interacting multiple model group box-particle δ-generalized labeled multi-Bernoulli (IMM-GBP-δ-GLMB) algorithm is proposed. The tracking efficiency under nonlinear measurements is improved through the box particle filter (BPF) method. By improving the likelihood function of the GPB algorithm and introducing the IMM algorithm, the tracking capability for the extended shape and the centroid kinematic state of group targets is respectively enhanced, and the tracking accuracy of the algorithm is improved. Finally, by integrating with the GLMB filter, the tracking of multiple maneuvering group targets with unknown number is achieved.  Methods  To address the limitation of existing algorithms, which primarily capture the area-based overlap relationship between the predicted extended state of group targets and the measurement distribution while neglecting shape similarity, the likelihood function of the BPF is modified. The improved algorithm achieves higher prediction accuracy by incorporating geometric parameters — such as the semi-major axis, semi-minor axis, and inclination angle — into the likelihood function, thereby enhancing the modeling of similarity between the predicted extended state and the measurement distribution. This is particularly beneficial in scenarios involving maneuvering group targets, where the inclination angle of the extended shape changes frequently as the group maneuvers. Based on modeling group motion with the IMM, a model index is appended to the kinematic state of each box particle’s centroid. By jointly estimating the model index and the centroid kinematic state in each iteration of the algorithm, we realize tracking of the mode transitions of individual box particles, which further improves tracking accuracy. Finally, we embed the improved IMM-GBP filter into the labeled random finite set framework and derive the IMM-GBP-δ-GLMB algorithm, which enables effective tracking of multiple maneuvering group targets.  Results and Discussions  Simulation experiments are conducted to compare the proposed algorithm (IMM-GBP-δ-GLMB) with the IMM sequential Monte Carlo δ-GLMB (IMM-SMC-δ-GLMB) filter. While comparable estimation accuracy in terms of centroid state, extended state, measurement rate, and target number for multiple group targets is maintained, emphasis is placed on computational efficiency. In the given simulation scenario, the proposed algorithm achieves a 3.8-fold improvement in timeliness, at the cost of a loss of about 8.5% in tracking accuracy. For the scenarios with two and three group targets, the average tracking time growth rate of the proposed algorithm is 96% of that of the IMM-SMC-δ-GLMB filter, showing good temporal robustness to increasing group target numbers. Hence, the proposed algorithm has strong practical value.  Conclusions  This paper addresses the tracking problem of multiple maneuvering group targets under nonlinear measurement conditions by proposing the IMM-GBP-δ-GLMB algorithm. The main contributions are as follows: (1) By improving the likelihood function of the BPF, we enhance the algorithm's ability to measure the similarity between the target's extended shape and the measurement distribution, which in turn improves the tracking accuracy of the group target state. (2) By labeling the motion model for each box particle, we track the transition of the target's motion state during the filtering process. This allows the filter to achieve higher tracking accuracy with fewer box particles, thereby improving computational efficiency. (3) Integrating the IMM-GBP method into the δ-GLMB framework yields the final IMM-GBP-δ-GLMB filter and realizes effective tracking of multiple maneuvering group targets.
GNN-driven Beamforming and Resource Allocation for RIS-assisted MISO-OFDMA Multi-group Multicast System
MA Yu, DING Chunxia, JIN Weijie, LI Xiao, JIN Shi
Available online  , doi: 10.11999/JEIT251381
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RISs) have strong potential to improve coverage and Spectral Efficiency (SE) in future wireless networks. However, when RISs are applied to wideband Multiple-Input Single-Output Orthogonal Frequency Division Multiple Access (MISO-OFDMA) systems, their practical benefits are limited by two key challenges. First, RIS reflection coefficients may not match the frequency-selective channel conditions across all subcarriers. Second, subcarrier allocation, Base Station (BS) active beamforming, and RIS passive beamforming are strongly coupled. These challenges become more serious in multi-group multicast scenarios, where shared data streams increase inter-group interference. Therefore, this article proposes a Graph Neural Network (GNN)-driven optimization framework to maximize the system SE through joint active beamforming, passive beamforming, and subcarrier allocation.  Methods  To address the optimization difficulty caused by the strong coupling among subcarrier allocation, BS active beamforming, and RIS passive beamforming, this work develops a model-driven GNN optimization framework. The objective is to maximize the system SE. First, a complete system model containing the BS, RIS, and multi-group multicast users is established (Fig 1). The formulation includes practical constraints, such as the BS transmit power limit, the unit-modulus constraint of RIS elements, and the binary constraint on subcarrier allocation. To satisfy the multicast requirement, the SE of each group is defined as the minimum SE among all users in that group. This definition further increases the non-convexity of the optimization problem.The first component of the proposed network, GNN1 (Fig 3), contains an initialization layer and a message-update layer. For each subcarrier \begin{document}$ n\in \mathcal{N} $\end{document}, every user is modeled as a node, and the input to GNN1 is the set of channel matrices \begin{document}$ \left\{{\mathbf{H}}_{k,n},k\in \mathcal{K}\right\} $\end{document}. Because standard GNNs process real-valued features, each complex channel vector is decomposed into its real and imaginary parts and used as the node feature representation. Group-level aggregation (Fig. 4) and RIS-level aggregation (Fig. 5) are then performed. GNN2 (Fig 6) takes the subcarrier-wise embeddings generated by GNN1 as input and constructs an expanded graph with group nodes (Fig. 7) and an RIS node (Fig. 8). By aggregating messages among subcarrier nodes, group nodes, and the RIS node, GNN2 fuses cross-subcarrier information and captures the global coupling among system components. Based on the integrated representation, GNN2 outputs the BS active beamforming matrix and RIS passive beamforming vector. Output-layer normalization is used to satisfy the physical constraints. Finally, given the beamforming parameters, subcarrier allocation is performed using the maximum-SE criterion. The learning objective is defined as maximizing the total SE.  Results and Discussions  The proposed GNN algorithm consistently outperforms all random benchmark schemes, including APG-randAllocate, APG-randActive, and APG-randPassive, across the full transmit power range from 0 to 20 dBm. This advantage indicates that the proposed method can dynamically handle subcarrier allocation and joint active and passive beamforming optimization. It also maintains stable and superior performance under large transmit-power variations. Overall, the system SE of all schemes increases monotonically with BS transmit power because higher transmit power improves the received signal-to-noise ratio and increases the achievable rate. Compared with the benchmark methods, the GNN adaptively coordinates BS active beamforming and RIS passive beamforming at different power levels and better uses the reflection gain provided by the RIS. Therefore, the GNN maintains a consistent performance advantage across the full power range. Even in the high-power region, it outperforms APG and LAO, which further verifies its robustness (Fig. 10).When the number of RIS elements varies, the GNN maintains a clear performance advantage over both APG and LAO. In general, the system SE increases with the number of RIS elements because a larger RIS provides higher array gain and improves the equivalent channel conditions. According to the numerical results, the proposed GNN achieves a spectral efficiency of 2.066 5 bit/(s·Hz), which is approximately 7.46% and 3.79% higher than those of LAO and APG, respectively. Meanwhile, the average computational time of the GNN is only about 0.007 5 s, which is approximately 4% of that required by the benchmark methods. These results demonstrate that the proposed GNN effectively uses the performance gain provided by RIS scaling and achieves a good balance between system performance and computational complexity (Fig. 11 and Table 2).The relationship between system SE and the number of user groups is then examined under fixed settings for the number of transmit antennas and users. The overall SE decreases as the number of user groups increases. This decrease occurs because more multicast groups lead to stronger inter-group interference and because limited subcarrier resources must be shared among more groups. In all considered scenarios, the proposed GNN consistently outperforms LAO. Although its SE is slightly lower than that of APG, the GNN still achieves about 98% of APG performance while requiring only about 4% of the computational time. This result indicates that the proposed method can reduce computational overhead while maintaining near-optimal system performance, which is useful for real-time or large-scale deployment (Fig. 12).The generalization ability of the proposed GNN is further evaluated by training the model at a fixed transmit power and testing it over a wide transmit power range from 0 to 20 dBm. The training and testing curves almost overlap, indicating that the proposed GNN generalizes well to unseen transmit power levels. Across the full power range, the GNN consistently outperforms the LAO and APG benchmarks, further confirming its robustness and adaptability under different transmission conditions (Fig 13).  Conclusions  For the RIS-assisted MISO-OFDMA system, this paper formulates a joint optimization problem for subcarrier allocation, BS active beamforming, and RIS passive beamforming to maximize the system SE. A model-driven GNN method is proposed to solve this problem. Comparative experiments with benchmark algorithms are conducted to validate the proposed method. The results demonstrate that the proposed GNN algorithm consistently outperforms LAO and APG in overall performance. It also exhibits strong robustness under different numbers of user groups and transmit power settings, which supports its potential for practical deployment in complex engineering scenarios.
A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception
PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai
Available online  , doi: 10.11999/JEIT260063
Abstract:
  Objective  Multimodal sentiment analysis is often affected by visual noise from complex environments, image-text sentiment inconsistency, and imbalanced modality contributions. When all modalities are treated without distinction, visual noise can degrade model performance. A robust mechanism is therefore needed to evaluate visual confidence and filter redundant visual information.  Methods  A Multimodal Sentiment Analysis Model with Multi-source Knowledge-guided Visual confidence Perception (MKVP) is proposed (Fig. 1). A multi-source knowledge guidance matrix is constructed using syntactic-dependency, sentiment-intensity, and aspect-focused operators (Fig. 2). Guided by this matrix, the Visual Confidence Perception (VCP) module measures semantic affinity and dynamically suppresses irrelevant visual noise (Fig. 3). A dual-stream parallel interaction module is then used to support deep cross-modal alignment, and a global gated fusion mechanism further adjusts the fusion weights of different modalities.  Results and Discussions  Extensive experiments are conducted on the MVSA-Single, MVSA-Multiple, and HFM datasets. The proposed MKVP model achieves accuracy and F1 scores of 77.56% and 76.70%, 72.72% and 70.66%, and 87.26% and 86.78%, respectively. Compared with the baseline models, the accuracy and F1 score are improved by 2.45% and 3.68%, 2.19% and 2.21%, and 1.83% and 1.91%, respectively (Table 3). Ablation studies show that each component contributes to performance, especially the VCP module, which filters visual noise and improves feature quality (Table 5). Feature-space visualization further confirms that the VCP module refines semantic representations by promoting clearer clustering of samples with the same sentiment polarity (Fig. 4). Case studies on mismatched image-text samples also verify the ability of the model to resolve cross-modal semantic conflicts (Table 6). Model-complexity analysis shows that MKVP maintains high computational efficiency and low inference latency (Table 8).  Conclusions  The proposed MKVP framework reduces the effects of visual noise and image-text sentiment inconsistency in multimodal sentiment analysis. By using multi-source knowledge to guide visual confidence perception and combining dual-stream interaction with dynamic gated fusion, the model learns robust sentiment representations from noisy multimodal data. This method provides an efficient and reliable solution for complex social media scenarios.
Research Status and Prospects of Mid-Wavelength Infrared Superlattice Detector Technology
LIU Ming, ZHAO Yaqi, GUAN Xiaoning, ZHANG Fan, LU Pengfei
Available online  , doi: 10.11999/JEIT260083
Abstract:
  Significance   Mid-Wavelength Infrared (MWIR) detectors are widely used in civilian and military applications because of their high sensitivity and excellent temperature discrimination. Type-II SuperLattice (T2SL) materials, especially the InAs/GaSb and InAs/InAsSb systems, have become promising candidates for third-generation infrared photodetectors. This review systematically analyzes the research status and future trends of MWIR T2SL detector technology. It focuses on key photoelectric parameters, including Quantum Efficiency (QE), dark current density, and Specific Detectivity (D*). This work provides a reference for material selection and performance optimization in this rapidly developing field.  Progress   Considerable progress has been made in dark current suppression and photoresponse enhancement for MWIR T2SL detectors. For dark current suppression, advanced barrier structures, such as nBn, XBn, and M-structures, are designed through band-structure engineering. These structures effectively block majority-carrier transport while allowing efficient collection of photogenerated carriers. For instance, an nBn device with an AlAsSb/InAsSb superlattice barrier shows a dark current density of 2.01×10–5 A/cm2 at 150 K (Fig. 1(a,b)). Strain compensation and optimized epitaxial growth further reduce bulk dark current. One device achieves a dark current density of 4.5×10–7 A/cm2 at 140 K (Fig. 2(c,d)). Device process optimization, including two-step etching and Zn-diffusion-based planar junction formation, also reduces surface leakage current (Fig. 3). For photoresponse enhancement, the main strategies include micro/nano-optical structure integration, epitaxial growth optimization, and device process improvement. Monolithically integrated metalenses increase the peak responsivity to 9.01 A/W at 300 K (Fig. 7(d)). Guided-mode resonance architectures enable a room-temperature External Quantum Efficiency (EQE) of approximately 60% (Fig. 8(c)). Epitaxial optimization, including stepped absorption layers and interfacial graded doping, increases the QE to 59.4% at 150 K (Fig. 10(c,d)). Device process optimization, such as substrate removal and Anti-Reflection (AR) coating deposition, also improves QE. An average QE of 63.7% is reported in the 3.7~4.8 μm range (Fig.13(c)). Comparative analysis shows that InAs/GaSb detectors are mainly reported at 77~150 K, whereas InAs/InAsSb detectors show stronger potential for higher-temperature operation, especially near 150 K (Fig. 7, Fig. 8). Overall, dark current densities are generally suppressed below 10–4 A/cm2, and peak QEs approach 70%.  Conclusions  T2SL materials, with tunable band structures and low Auger recombination rates, have become a core material platform for high-performance MWIR detection. Current studies have addressed key challenges in dark current suppression and photoresponse enhancement. Through advanced barrier design and device process optimization, dark current densities have been suppressed to the 10–6 A/cm2 level at approximately 150 K. Through optical and epitaxial engineering, QEs have been increased to approximately 60% or higher. The InAs/InAsSb material system is particularly promising for High-Operating-Temperature (HOT) applications.  Prospects  Future development will focus on four main directions. First, the HOT limit should be further increased, with the goal of maintaining diffusion-limited performance at 180 K or higher. Second, large-format Focal Plane Arrays (FPAs) should be developed based on highly uniform material growth through mature Molecular Beam Epitaxy (MBE), aiming for pixel operability higher than 99%. Third, multicolor and multispectral detection should be expanded by precisely tuning superlattice periods, enabling integrated dual-band or multiband MWIR detection with reduced crosstalk. Fourth, new device architectures and coupled physical mechanisms should be explored to extend detector performance and application boundaries.
A Multi-layer Resilient Control Framework for Networked Microgrids against False Data Injection Attacks
HUANG yu, CAO zhengyang, HU songlin, YUE dong, CHEN yonghua, YAN yunsong
Available online  , doi: 10.11999/JEIT250850
Abstract:
  Objective  With the increasing penetration of distributed renewable energy and the growing dependence on cyber-physical infrastructure, Networked MicroGrids (NMGs) are increasingly vulnerable to False Data Injection Attacks (FDIAs). These attacks threaten frequency stability and system security. Traditional secondary control methods are limited by constrained communication resources and fixed sampling mechanisms. They often fail to maintain resilient operation under stealthy FDIAs and dynamic disturbances. To address these challenges, this study develops a multi-layer resilient control strategy that integrates event-triggered communication/control, data-driven attack observation, and double-replay Q-learning. The objective is to improve communication efficiency, attack detection, and stability recovery in NMGs under complex cyber threats.  Methods  The proposed Event-Triggered Control-Radial Basis Function-Double-Replay Q-Learning (ETC-RBF-DRQL) framework integrates an Event-Triggered Control (ETC) mechanism, a Radial Basis Function Unknown Input Observer (RBF-UIO), and a Double-Replay Q-Learning (DRQL) compensator to achieve resilient frequency control in NMGs under FDIAs. The ETC mechanism reduces redundant data transmission while maintaining system stability. The RBF-UIO estimates system states and detects anomalous deviations. After an attack is detected, the DRQL module adaptively generates compensation signals to suppress the attack effect and restore system stability. The framework is formulated using a modular dynamic model of NMGs, which supports stability analysis under communication and attack constraints. Simulation experiments are conducted on a 4-node distributed microgrid testbed in MATLAB/Simulink. The testbed includes different renewable energy sources and realistic communication links to verify the effectiveness and scalability of the proposed approach.  Results and Discussions  The proposed ETC-RBF-DRQL framework is validated on a 4-node NMG under FDIA scenarios. Simulation results show that the method achieves better overall performance in frequency regulation, communication efficiency, and attack resilience. Specifically, the frequency deviation peak is reduced from 0.021 8 Hz under periodic Proportional-Integral (PI) control to 0.012 1 Hz. The steady-state average deviation and fluctuation standard deviation are reduced to 0.009 7 Hz and 0.007 4 Hz, respectively (Fig. 4, Table 2). The average communication event rate decreases to 11.9 pkt/s, corresponding to a 76.2% reduction compared with periodic sampling (Table 2). The proposed framework also maintains reliable attack detection performance, with a detection rate of 91.5%, a false alarm rate of 4.8%, and an area under the curve (AUC) of 0.968 (Table 2). These results indicate that the proposed method can coordinate frequency recovery, communication overhead reduction, and FDIA mitigation in NMGs.  Conclusions  This paper investigates a multi-layer resilient control framework for NMGs under FDIAs and communication constraints. The proposed ETC-RBF-DRQL method integrates event-triggered communication/control, RBF-UIO-based attack detection, and DRQL-based adaptive compensation. It therefore enables closed-loop coordination among anomaly detection, attack suppression, and frequency stability recovery. Simulation results on a 4-node NMG show that, compared with conventional PI-based schemes, the proposed approach reduces frequency deviation peaks and shortens recovery time while lowering communication overhead. Theoretical analysis further confirms its feasibility and stability under bounded estimation errors. This study focuses on sensor-side FDIAs and simplified communication conditions. Future work will consider more complex multi-type attacks and hardware-in-the-loop validation to support engineering applications.
Graph Representation Learning Driven Adaptive Streaming for Point Cloud Video
LIU Wei, CHEN Ruiyang, WANG Xi, ZHANG Jiawei, XU Jing
Available online  , doi: 10.11999/JEIT251084
Abstract:
  Objective   The increasing demand for immersive media propels point cloud video into the spotlight for applications such as virtual and augmented reality. However, the massive data volume of point cloud streams poses a significant challenge to current network infrastructures, jeopardizing the user’s Quality of Experience (QoE) under limited bandwidth. Existing Adaptive BitRate (ABR) streaming solutions are hindered by two primary limitations. Viewport prediction models often focus solely on temporal features, leading to insufficient accuracy for long-term predictions in complex Six-Degrees-of-Freedom (6DoF) movement. Concurrently, dynamic quality allocation strategies struggle to make optimal online decisions under the uncertainties of prediction errors and network fluctuations, failing to effectively balance conflicting QoE metrics. This research addresses these challenges by proposing an integrated framework that combines high-precision viewport prediction with intelligent, context-aware quality allocation to enhance QoE for point cloud video streaming.  Methods   The proposed method integrates a graph-based viewport prediction scheme with a context-aware quality allocation mechanism. For viewport prediction, an “anchor point graph” is constructed to explicitly model the user’s spatial movement patterns. This graph is processed using representation learning to generate low-dimensional embeddings for each anchor point, which encapsulate rich spatial context. These learned spatial features are concatenated with real-time 6DoF viewport data to form a fused feature sequence. A stacked Long Short-Term Memory (LSTM) network processes this sequence to accurately predict the user’s future viewport trajectory. For quality allocation, the sequential decision-making process is modeled as a contextual bandit problem, adopting the LinUCB algorithm as the decision engine. At each decision epoch, a context vector is constructed for each spatial tile, incorporating critical information such as its predicted utility, historical quality level, and location relative to the predicted viewport. The LinUCB algorithm utilizes this context to select an optimal action for each tile, thereby maximizing cumulative QoE under the bandwidth budget, as detailed in Algorithm 1.  Results and Discussions   Extensive simulations validate the framework’s performance using the public 8i Voxelized Full Bodies dataset, real-world user viewport traces, and 5G network bandwidth profiles. In the viewport prediction task, the proposed model significantly outperforms baselines, achieving a stable average F1-score of 0.984 (Fig. 4) and maintaining a consistently low Root-Mean-Square Error (RMSE) over long prediction horizons (Fig. 3). In the end-to-end streaming evaluation, the integrated framework demonstrates remarkable improvements in overall QoE. Cumulative Distribution Function (CDF) plots reveal that the proposed scheme consistently delivers higher QoE, user-perceived utility, and video quality, while incurring the lowest quality fluctuation (Fig. 5). Notably, under fluctuating network conditions, the solution improves the mean QoE by 54.82% compared to the next-best baseline at an average bandwidth of 100 Mbps (Fig. 6), highlighting its efficiency in resource-constrained environments.  Conclusions   This paper presents a complete adaptive streaming framework to address the QoE optimization challenge for point cloud video. By developing a novel 6DoF viewport prediction model that leverages graph representation learning, long-term prediction accuracy is significantly enhanced. Furthermore, by framing dynamic quality allocation as a contextual bandit problem, the system makes intelligent, online decisions that adapt to both prediction outcomes and dynamic network conditions. Comprehensive experimental results validate the effectiveness of this integrated approach, which consistently outperforms existing solutions in both prediction accuracy and overall user QoE.
Robust Optimization of Low-altitude Communication and Computation Resources in Uncertain Environments
GONG Yucheng, LI Bin, WANG Xinyi, FEI Zesong
Available online  , doi: 10.11999/JEIT260090
Abstract:
  Objective  Low-altitude edge computing networks provide flexible computing services and extended coverage for user equipment. However, quality of service is often degraded by uncertainty in task data size and by Unmanned Aerial Vehicle (UAV) position jitter caused by environmental disturbances. Existing robust methods commonly rely on deterministic uncertainty sets, which tend to be conservative and cannot accurately describe the stochastic distribution of task demands. To address these challenges, a robust energy minimization framework is proposed for multi-UAV-assisted Mobile Edge Computing (MEC) networks. The objective is to minimize the weighted sum of system energy consumption. This is achieved by developing a joint optimization model that coordinates UAV flight trajectories, task splitting decisions, and computation and communication resource allocation. The model explicitly accounts for the dual uncertainties of task data size and UAV trajectory.  Methods  To handle the nonconvexity and strong coupling among optimization variables, the problem is first modeled as a Markov Decision Process (MDP). A comprehensive state space is defined to characterize real-time system dynamics, and a continuous action space is designed for trajectory control and resource management. A Distributionally Robust Optimization Soft Actor-Critic (DRO-SAC) algorithm is then developed to solve the MDP. In this framework, an ambiguity set based on the L1-norm distance is constructed to characterize the distributional uncertainty of the task demand distribution. A maximum-entropy reinforcement learning mechanism is used to learn an optimal policy under the worst-case distribution within the ambiguity set. In this way, UAV trajectories, task splitting, and computation and communication resource allocation are jointly optimized to improve system robustness under dynamic environmental fluctuations.  Results and Discussions  The performance of the proposed DRO-SAC algorithm is evaluated through simulations. DRO-SAC achieves faster convergence and higher rewards than Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) algorithms (Fig. 3). For energy consumption, the proposed method consistently achieves higher efficiency under different user densities (Fig. 4). The robustness of the system against position errors is also verified, with energy fluctuations kept at a low level (Fig. 5). Dynamic trajectory adjustment further confirms that the proposed method can provide effective user coverage while reducing system energy consumption (Fig. 6).  Conclusions  A DRO-SAC-based joint optimization framework is proposed to address uncertainty in task data size and UAV position jitter in multi-UAV-assisted MEC networks. By constructing an ambiguity set for the task demand distribution and optimizing the worst-case expected objective, the proposed method mitigates the limitations of traditional deterministic models in dynamic environments. Weighted system energy consumption is minimized while latency and safety constraints are satisfied. Simulation results demonstrate that the proposed scheme achieves stable convergence and high energy efficiency, even when communication and computation resources are limited and environmental parameters fluctuate strongly.
MGM-3DUNet: A Multi-scale Edge Semantic Guided Graph Convolutional Sequence Method for Brain Tumor Segmentation
ZHUANG Jianjun, LI Xiang, JING Shenghua, LV Zhenglong
Available online  , doi: 10.11999/JEIT260128
Abstract:
  Objective  The model feature fusion method represented by U-Net and its three 3D variants is simplistic, and the segmentation of tumor core and enhancing tumor region is insufficiently fine-grained. Recent approaches such as VM-UNet have made progress in sequence modeling efficiency, but they focus more on global information modeling, and there are still deficiencies in local detail preservation and edge enhancement. Therefore, the current methods are still limited in segmentation accuracy and clinical utility.  Methods  MEGM is designed to enhance the segmentation accuracy of the tumor boundary through learnable edge detection. GCSM, which combines the local aggregation ability of graph convolution with the efficient long-range modeling advantages of Mamba-like structure, enhances semantic consistency while reducing parameters, and retains small tumor structure details. MCPM is introduced to improve the complementarity of tumor features at different scales through dual-scale fusion.  Results and Discussions   Experiments show that the average Dice and HD95 distances of the proposed method are better than those of the comparison method. The visualization results ( Figure 9, Figure 10) qualitatively confirm that the segmentation results are more accurate after incorporating MEGM. In summary, the method proposed in this paper demonstrates enhanced sensitivity to edge details and context correlation while maintaining low parameter count, and its segmentation performance is highly robust and accurate.  Conclusions   This method improves the accuracy of tumor boundary prediction by introducing edge enhancement in the shallow layer to emphasize tumor contours. In the bottleneck layer, multimodal local and global semantic information is fused, while multi-scale context features are integrated during the decoding stage. This design achieves high segmentation accuracy at low computational cost and is suitable for platform deployment with low computing power.
A Joint Source-Channel Coding Modulation Scheme for the Transmission of Gaussian Sources
LV Yaping, MA Xiao
Available online  , doi: 10.11999/JEIT251224
Abstract:
  Objective  The Separated Source-Channel Coding (SSCC) scheme has been proven, which will not incur performance loss as long as the source block length goes to infinity. However, the SSCC scheme usually leads to a large buffer and long delay, and may cause error propagation in case of a single symbol error incurred in the communication channel. In order to alleviate these issues, Joint Source Channel Coding (JSCC) schemes have been investigated to transmit Gaussian sources. In this paper, a JSCC modulation scheme for the transmission of Gaussian sources is proposed, and a Gaussian source reconstruction scheme and its reconstruction expression are provided.  Methods  In this paper, the Gaussian source sequence is quantified as a sequence of M-ary symbols by a Lloyd-Max quantizer. For the M-ary quantization symbol sequence, the matching M-ary Fourier Transform Pair (FTP) code is constructed, and the modulation mode adopts the corresponding M-ary Pulse Amplitude Modulation (M-PAM). In particular, the modulated M-ary symbol sequences are transmitted in a block Markov superposition way, which constructs the Block Markov Superposition Transmission FTP (BMST-FTP) code. In addition, in order to obtain the shaping gain, the constellation Geometry Shaping (GS) scheme is also proposed. For the proposed source reconstruction scheme, the system output is the weighted average of the representative elements of the Lloyd-Max quantizer, which replaces the representative elements.  Results and Discussions  The simulations are conducted over M-PAM modulated AWGN channels using GF(3) and GF(5) BMST-FTP codes. For FTP codes employing random mapping, the WER approaches the Union Bound (UB) at high SNR. Similarly, the FTP codes with m repeated transmissions exhibit WER performances that approach the corresponding UBs. Furthermore, the WER performance of BMST-FTP codes with memory m matches UBs in the high SNR region (Fig. 6). For Symbol Error Rate (SER), the GF(3) BMST-FTP code outperforms the GF(5) BMST-FTP code (Fig. 7(a)). For the GF(5) BMST-FTP code, the GS can achieve an SER performance gain of approximately 0.3 dB (Fig. 8(a)). In terms of distortion performance, the GF(3) BMST-FTP code outperforms the BMST-FTP code in the low SNR region, whereas in the high SNR region, the GF(5) BMST-FTP code performs better (Fig. 7(b)). Furthermore, compared with other work, the GF(3) BMST-FTP code with m=1 has a similar performance, and the GF(5) BMST-FTP code with m=1 performs better (Fig. 7(b)).  Conclusions  This work has proposed a joint source-channel coding modulation scheme for the transmission of Gaussian sources. In the proposed scheme, two types of BMST-FTP codes were constructed, each matched with a corresponding Lloyd-Max quantizer and M-PAM modulator. Additionally, a Gaussian source reconstruction scheme and its reconstruction expression were provided. Simulation results demonstrate that the appropriate transmission scheme can be selected according to the aim performance. The proposed GS scheme can obtain a gain of SER of about 0.3dB, which can improve the distortion performance of the waterfall area.
The Method of Hierarchical Attention Mechanism-based Path Planning for Multi-UAV Inspection
FEI Bowen, XING Wenjie, LIU Daqian
Available online  , doi: 10.11999/JEIT260192
Abstract:
  Objective  In the domain of modern power grid maintenance, the deployment of multiple Unmanned Aerial Vehicles (UAVs) for efficient and cooperative inspection has become a pivotal yet challenging task. Existing multi-UAV path planning methods often struggle with inadequate cooperative scheduling and fail to accurately capture the complex topological relationships among heterogeneous nodes—specifically between inspection equipment points and charging stations under strict energy constraints. To address these limitations, this paper proposes a novel Hierarchical Attention mechanism-based Path Planning method for multi-UAV power Inspection (HAPPI). The primary objective is to minimize the total flight distance of a UAV fleet while ensuring all devices are inspected and all UAVs safely return to the base, despite dynamic energy consumption and partial environmental observability.  Methods  The multi-UAV power inspection problem is first formulated as a combinatorial optimization problem with energy constraints and modeled within a Markov Decision Process (MDP) framework. To solve this problem, HAPPI adopts an encoder-decoder architecture powered by a customized hierarchical attention mechanism. The encoder employs a multi-level attention design comprising three dedicated layers: the first layer uses self-attention among all equipment nodes to learn their spatial proximity and visitation preferences; the second layer applies cross-attention between equipment nodes and charging stations to model their energy supply-demand relationships; and the third layer utilizes self-attention among charging stations to explicitly capture the topological structure of the charging network. This hierarchical design enables the model to discern functional differences and dependencies among heterogeneous nodes effectively. The decoder integrates the global graph embedding, the embedding of the last visited node, and the UAV's current remaining energy to generate a context vector. Through a single-head attention mechanism, it computes compatibility scores for all candidate nodes, followed by a masking strategy that invalidates infeasible nodes (e.g., visited nodes, unreachable nodes, or nodes that would strand the UAV). The final node selection follows a probability distribution derived via softmax, supporting both greedy and sampling-based decoding strategies. The policy network is trained using a reinforcement learning framework with a baseline network to stabilize training, optimizing the parameters via policy gradient to minimize the expected total path length (Fig. 2).  Results and Discussions  Extensive simulations are conducted across three problem scales: T20C2 (20 devices, 2 stations), T60C6 (60 devices, 6 stations), and T100C10 (100 devices, 10 stations). The training process shows that HAPPI achieves faster convergence and lower final cost compared to baseline Attention Model (AM) and Heterogeneous Attention-based Deep Reinforcement Learning (HADRL) methods (Fig. 4). In comprehensive performance comparisons, HAPPI (sampling) obtains the shortest total path lengths on T60C6 (6.21) and T100C10 (8.41), outperforming five classical meta-heuristic algorithms and the two deep reinforcement learning baselines (Table 3). Notably, HAPPI reduces total path length by approximately 12% on average compared to the baselines. Visualization of planned paths demonstrates that HAPPI generates routes with less crossover and more balanced workload distribution among UAVs, enhancing both safety and efficiency (Fig. 6, Fig. 7). The single-UAV path length distribution further confirms HAPPI's superior load-balancing capability across all scales (Fig. 8).  Conclusions  This paper presents HAPPI, a novel hierarchical attention mechanism-based deep reinforcement learning method for cooperative path planning in multi-UAV power inspection scenarios with multiple charging stations. By explicitly modeling the spatial relationships among equipment points, the energy dependencies between equipment and charging stations, and the internal topology of the charging network, HAPPI effectively addresses the shortcomings of existing methods in information aggregation and constraint satisfaction. Experimental results across various scales that HAPPI achieves superior planning quality, higher computational efficiency, and stronger generalization compared to state-of-the-art heuristic and learning-based algorithms. Future work will extend the framework to multi-objective optimization incorporating time, risk, and energy trade-offs, and validate the method with real-world inspection data.
Defending Deepfakes by Attribute-Aware Attack
GAO Fan, YAN Weidan, SHAO Wenze, ZHANG Dengyin
Available online  , doi: 10.11999/JEIT260043
Abstract:
  Objective  The illegal misuse of deepfakes can seriously cause personal property damage. To prevent the spread of forged images, existing methods often employ adversarial examples to protect facial images from manipulation by deepfakes. However, traditional gradient-based attacks suffer from low generalization and poor generation efficiency in black-box attack scenarios, lagging behind current methods that use generative adversarial networks (GAN) to train cross-model defensive examples. Although GAN-based methods enable fast inference, the lack of perceptual constraints often makes the generated adversarial perturbations visually noticeable. Moreover, the rapid evolution of deepfake models imposes higher requirements on the generalization ability of adversarial examples. Therefore, developing imperceptible and generalizable adversarial attack methods is crucial for proactive defense against deepfakes.  Methods  To further improve the transferability and imperception of adversarial examples generated by existing methods, this paper proposes an attribute-aware adversarial example generation method for deepfakes defense. The proposed method aims to generate imperceptible perturbations while enhancing cross-model generalization through a random identity fusion mechanism. Specifically, by focusing on the foreground regions of facial images, we introduce attribute-aware salient segmentation of facial and hairstyle regions and combine it with adaptive spatial-frequency attention to generate region-specific adversarial perturbations. This strategy not only effectively improves the imperceptibility of adversarial examples but also reduces the additional computational overhead caused by global processing. Furthermore, from the perspective of data augmentation, this paper utilizes phase swapping in the frequency domain to fuse identity-related features from reference face image, preventing overfitting of perturbations while improving generalization performance.  Results and Discussions  The method is trained and tested on the CelebA-HQ dataset for proxy models. Compared with existing proactive defense methods, experimental results show that the proposed method can generate adversarial examples with strong imperceptibility and cross-model defense capabilities, achieving a high defense success rate against various proxy models. The average PSNR of the adversarial perturbation forged output can be reduced to 16.79 dB, which is about 1.87% higher than the suboptimal method. The defense performance against HiSD is significantly improved, by about 7.5% compared to the suboptimal method. The defense performance against AttGAN is about 12.7% higher than the suboptimal GAN-based defense method. Moreover, the LPIPS index of the method indicates that the objective perturbations have high imperceptibility.  Conclusions  In this study, a facial attribute-aware attack method is proposed for deepfakes defense. It incorporates a frequency-domain fusion mechanism to enhance the diversity of adversarial feature inputs. In addition, adaptive perturbation generators are designed to extract local facial information and dynamically adjust the adversarial features. This enables the method to preserve imperceptible yet attack-effective perturbation components, thereby achieving strong cross-model generalization performance. Future work will focus on developing more proactive defense methods against deepfakes with improved imperceptibility and generalization, especially for cross-model transfer attack scenarios.
HWT-SRNet: Heterogeneous Windows Transformer Network for Image Super-Resolution
LU Di, DANG Anyuan
Available online  , doi: 10.11999/JEIT250868
Abstract:
In the era of big data, image quality varies greatly, making the reconstruction of high-resolution images from low-quality inputs a critical task in computer vision. Existing super-resolution methods based on window self-attention, such as SwinIR, encounter limitations in receptive field expansion and insufficient ability to capture high-frequency details. These shortcomings reduce their effectiveness in reconstructing fine image structures, thereby necessitating further improvements. To overcome these challenges, this study proposes the Heterogeneous Windows Transformer Network for Image Super-Resolution (HWT-SRNet), a novel architecture built upon SwinIR. By integrating innovative module designs, HWT-SRNet enhances the extraction of high-frequency details while simultaneously expanding the receptive field, offering a more advanced solution for super-resolution tasks.  Methods   Building upon the Swin IR framework, this study incorporates two key modules to optimize super-resolution reconstruction performance:(1) Heterogeneous Windows Transformer Block (HWTB): Traditional window-based self-attention mechanisms suffer from a constrained receptive field, limiting their ability to capture long-range dependencies. To overcome this limitation, HWTB alternates between square windows and pale windows, preserving local feature extraction while significantly expanding the receptive field. This alternating mechanism enables the network to better model both fine-grained details and global structural information, improving the overall image reconstruction quality. The choice of window size and alternation frequency is optimized to trade off between computational efficiency and feature extraction.(2) High-Frequency Prior Extraction Network (HFPEN): Transformer-based super-resolution models often struggle with capturing high-frequency details due to their inherent bias towards low-frequency components. To mitigate this issue, the HFPEN module is introduced to explicitly extract high-frequency prior information from images using a Gaussian Difference of Gaussian (DoG) filter. The DoG filter emphasizes high-frequency details, including edges and textures, by computing the difference between a lightly blurred image (containing mid-frequency information) and a more heavily blurred one (capturing low-frequency information).This high-frequency information is then fused with the heterogeneous window attention mechanism, allowing HWT-SRNet to enhance fine details while maintaining structural coherence. The DoG filter is applied in the spatial domain, enabling the model to effectively capture and reconstruct sharp edges and textures without the need for frequency-domain transformations. This approach ensures that the network can focus on high-frequency features while preserving the overall image structure.  Results and Discussions   To thoroughly assess the effectiveness of HWT-SRNet, we performed experiments on several widely used benchmark datasets, namely Set5, Set14, BSD100, Urban100, and Manga109. Our method was compared with representative state-of-the-art approaches, including ACT, ART, and CAT.,The results demonstrate its superior performance across key evaluation metrics (see Table 1 for detailed comparisons). Specifically, HWT-SRNet achieves improvements in PSNR ranging from 0.10 dB to 0.37 dB compared to baseline models, demonstrating its effectiveness in enhancing image quality. Additionally, structural similarity (SSIM) scores also exhibit consistent improvement, indicating better perceptual quality and more visually pleasing reconstructions. Qualitative results further confirm that HWT-SRNet is capable of restoring sharper edges, preserving textures, and reducing blurring artifacts compared to existing methods. To further validate the contribution of each component in HWT-SRNet, we conducted ablation studies to analyze the impact of the Heterogeneous Windows Transformer Block (HWTB) and the High-Frequency Prior Extraction Network (HFPEN) (see Table 2 for ablation results).These advantages stem from the synergistic effect of heterogeneous window attention mechanisms and high-frequency prior extraction, which enable the network to effectively balance local feature refinement and global contextual understanding. By leveraging the alternating self-attention mechanisms and high-frequency prior extraction, HWT-SRNet provides a highly efficient solution for expanding the receptive field and improving high-frequency detail reconstruction.  Conclusion   Considering the limitations of existing super-resolution algorithms, this paper proposes a novel Heterogeneous Windows Transformer Network (HWT-SRNet), designed to improve image reconstruction quality by addressing challenges in receptive field expansion and high-frequency detail capture. The integration of heterogeneous window attention mechanisms and high-frequency prior feature extraction allows the model to achieve a more effective fusion of local and global features, leading to superior performance in both PSNR and SSIM. Experimental results confirm that HWT-SRNet surpasses existing state-of-the-art methods, providing a more efficient and accurate solution for super-resolution tasks. However, this study does not specifically explore the model's adaptability to noise interference in real-world scenarios. Future research can focus on further optimizing HWT-SRNet’s robustness to noisy and degraded inputs, improving its applicability to practical image restoration tasks in diverse environments. Additionally, the model's performance on specialized datasets, such as medical or satellite images, remains to be explored, which could further validate its generalization capabilities.
A Lightweight True Random Number Generator Based on Chain-Coupled Oscillation Rings
ZHANG Yuan, YING Haixuan, GAO Kai, YE Jin, WANG Shuang, ZHANG Jiliang
Available online  , doi: 10.11999/JEIT260377
Abstract:
  Objective  With the rapid growth of the Internet of Things, 5G/6G, and satellite Internet, resource-constrained devices increasingly require high-quality random numbers for key generation, authentication, masking, and other security functions. Although pseudo-random number generators are efficient, their outputs may be predictable once the seed or internal state is compromised. True random number generators (TRNGs) offer a hardware root of trust by extracting entropy from physical randomness, but many existing designs rely on multiple entropy sources or complex post-processing, leading to increased area and power consumption. To address this issue, this paper proposes a lightweight TRNG based on chain-coupled oscillation rings for high-quality randomness with very low FPGA overhead.  Methods  Starting from the state evolution of a Galois oscillation ring (GARO), this work demonstrates that ideal matched-delay conditions can result in periodic and predictable oscillation. However, in practical circuits, delay mismatch, jitter, and process variation disturb the ideal evolution and can be exploited as entropy sources. On this basis, a compact delay-feedback XOR ring is proposed to enhance state uncertainty, introduce feedback competition, and improve randomness through inter-stage delay differences. In addition, a second-order oscillation ring is incorporated to eliminate the all-zero stop state and provide continuous excitation. Multiple rings are then chain-coupled, enabling adjacent rings to mutually interfere with one another and thereby generate stronger irregular oscillations. The proposed design is modeled in MATLAB and implemented on a Xilinx Artix-7 FPGA. Finally, we evaluate its performance by NIST SP 800-22, NIST SP 800-90B, bias, autocorrelation, and voltage-temperature robustness tests.  Results and Discussions  Simulation confirms that the proposed structure avoids stable periodic locking and produces sustained irregular oscillation. Experimental results show that the TRNG passes all NIST SP 800-22 tests and achieves an average minimum entropy of 0.9936 in NIST SP 800-90B test, outperforming conventional RO and GARO-based TRNGs under similar conditions. The measured bias is only 0.0228%, and the autocorrelation remains well below the threshold, indicating excellent statistical independence. The design also maintains high entropy over temperatures from 0 °C to 80 °C and supply voltages from 0.9 V to 1.1 V. Implemented on Artix-7, our proposed TRNG achieves 200 Mbps throughput using only 11 LUTs and 4 DFFs, with 0.108 W power consumption.  Conclusions  This paper presents a lightweight chain-coupled oscillation-ring TRNG that exploits delay mismatch, phase disturbance, and feedback competition to generate high-quality physical randomness. The theoretical analysis clarifies how practical nonidealities transform ideal periodic oscillation into irregular oscillation, providing a design basis for compact oscillator-based entropy sources. By combining delay-feedback XOR rings with chain-coupled mutual disturbance and continuous excitation, the proposed design enhances entropy while avoiding excessive hardware overhead and complex post-processing. FPGA implementation and statistical evaluations verify high entropy, low bias, and high randomness under voltage and temperature variations. Therefore, the proposed TRNG achieves high randomness quality and high throughput while effectively reducing hardware overhead, making it suitable for resource-constrained security applications such as IoT terminals, lightweight cryptographic modules, and embedded authentication systems.
SHAP-based Reliable Threshold Decision-driven Remaining Useful Life Prediction for MOSFETs
LIU Jinfeng, WU Qiuxue, HERBERT Ho-Ching Iu
Available online  , doi: 10.11999/JEIT251379
Abstract:
To address the disconnect between conventional fixed-threshold early warning methods for power MOSFETs and their physical failure mechanisms, this paper proposes a lifetime prediction framework that integrates Explainable Artificial Intelligence (XAI). First, an adaptive dual-threshold partitioning strategy is designed by combining K-means clustering with the Proximal Policy Optimization (PPO) algorithm. The initial solution obtained by K-means is used as the search starting point. A multi-objective reward function is then constructed to balance interval proportion, state-transition sensitivity, and threshold-spacing penalties. This function guides the agent in threshold optimization and enables accurate partitioning of degradation stages. Second, SHAPley additive explanations (SHAP) analysis is introduced to improve the interpretability of the black-box decision-making process. It verifies the rationality of threshold decisions from the perspective of feature-mechanism correlations. The results show that the low threshold is mainly governed by steady-state features in the healthy stage and meets the safety baseline requirement. The high threshold is dominated by dynamic features of late-stage accelerated degradation and accurately identifies the critical point. These findings confirm the reliability and transparency of the threshold decisions. Based on this framework, an early warning mechanism is triggered when degradation data exceed the reliable low threshold. A Residual-connected Stacked Gated Recurrent Unit (R-SGRU) is then used for Remaining Useful Life (RUL) prediction. Experiments on the NASA dataset show that the proposed model outperforms several baseline models, including Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN). The test-set Mean Squared Error (MSE) is below 0.001 5, and R2 is above 0.98. This study provides accurate and reliable decision support for early warning in MOSFETs. It also links data features with physical mechanisms through explainable techniques, supporting the development of trustworthy artificial intelligence for device prognostics.  Objective  This study addresses two key issues in power MOSFET lifetime prognostics: the disconnect between conventional fixed-threshold early warning methods and physical mechanisms, and the limited interpretability of existing approaches. A framework integrating adaptive dual-threshold partitioning with XAI is proposed to support predictive maintenance with both physical credibility and high prediction accuracy.  Methods  An adaptive dual-threshold partitioning strategy is proposed by integrating K-means clustering with PPO reinforcement learning. Threshold positions are optimized using a multi-objective reward function to accurately identify degradation stages. SHAP analysis is used to quantify the contributions of 13-dimensional morphological features based on Shapley values. This validates the physical rationality of threshold decisions from a mechanistic perspective. When degradation data exceed the low threshold, an early warning is triggered. The R-SGRU network is then used for RUL prediction by capturing long-term dependencies through its gating mechanism. The proposed method is validated using the NASA dataset, forming a complete technical route from intelligent early warning to accurate prediction.  Results and Discussions  The thresholds optimized by PPO achieve the best performance across all metrics (Table 1). SHAP analysis reveals the physical rationale for the threshold decisions. In the healthy stage, the low threshold is mainly governed by steady-state features. By contrast, the high threshold is determined by accelerated degradation dynamics. This result establishes a quantitative correlation between data-driven results and physical failure mechanisms. SHAP interaction heatmaps (Figs. 6 and 7) further show the synergistic effects among features. Device failure is a complex process driven by the coordinated evolution of multiple features. The R-SGRU prediction model based on the optimized thresholds shows excellent performance on the NASA dataset (Table 5). Across the four device groups, the model achieves an MSE below 0.001 5 and an R2 above 0.98, outperforming the baseline models.  Conclusions  This study proposes an XAI-based framework for predicting the RUL of power MOSFETs. For threshold partitioning, an adaptive dual-threshold strategy combining K-means clustering and PPO reinforcement learning is adopted. A multi-objective reward function enables accurate identification of nonlinear degradation stages, and its performance is validated across four test devices. For interpretability, SHAP analysis provides mechanistic support for threshold decisions. The results show that low thresholds depend on steady-state features in the healthy period, whereas high thresholds are dominated by late-stage accelerated degradation features. This pattern is consistent with actual failure mechanisms. Feature interaction heatmaps reveal complex cooperative effects among multiple features and improve the understanding of the decision-making process. The R-SGRU prediction model shows strong time-series modeling capability and ensures high stability and accuracy. This work establishes a complete technical route from intelligent early warning to accurate prediction. It achieves adaptive threshold optimization and links data-driven results with physical mechanisms through interpretability analysis. The findings provide reliable support for the intelligent operation and maintenance of power MOSFETs.
From Touch to Semantics: A Cross-Modal Framework for Zero-Shot Spiking Tactile Object Recognition
CHI Wei, XU Jin
Available online  , doi: 10.11999/JEIT260158
Abstract:
  Objective  Tactile perception enables robots to understand object properties and perform dexterous interactions. However, tactile data are costly to collect and difficult to scale, which limits conventional supervised learning in open-world scenarios. Zero-Shot Learning (ZSL) provides a promising solution by transferring knowledge from seen to unseen categories through semantic representations. Existing tactile ZSL methods either rely on auxiliary visual information or use manually designed attributes, which are often subjective and limited in generalization. Event-based spiking tactile signals are sparse and asynchronous, with rich spatiotemporal dynamics. These properties make semantic modeling more challenging. Systematic studies on zero-shot recognition for such data remain limited. To address these issues, this paper proposes a zero-shot object recognition framework for spiking tactile perception. The framework aims to bridge low-level tactile dynamics and high-level semantics in a scalable manner.  Methods  The proposed framework consists of three components (Fig. 1): spiking tactile feature extraction, semantic prototype construction, and cross-modal tactile-semantic alignment. First, a biomimetic Spiking Graph Neural Network (SGNN) is used to model raw event-based spiking tactile signals. By integrating Leaky Integrate-and-Fire (LIF) neurons with graph-based message passing, the SGNN captures temporal firing dynamics and spatial relationships among tactile sensing units. It then generates discriminative and biologically interpretable high-level tactile embeddings. Second, instead of using manually annotated attributes, a Large Language Model (LLM) is used to generate structured, fine-grained, and extensible tactile attribute descriptions for each object category. These textual descriptions are encoded as continuous semantic vectors to form class-level semantic prototypes with consistent dimensionality across categories. This strategy supports flexible semantic expansion and avoids labor-intensive attribute engineering. Third, a bidirectional tactile-semantic alignment mechanism is designed to improve generalization to unseen categories. A forward mapping projects tactile embeddings into the semantic space for classification, whereas a reverse mapping reconstructs tactile features from semantic representations. A cycle-consistency constraint is imposed between the two mappings to preserve structural coherence and semantic stability across modalities. The overall framework is trained only on seen categories. During zero-shot inference, tactile embeddings of unseen samples are matched with their corresponding semantic prototypes in the shared embedding space.  Results and Discussions  The proposed method is evaluated on the Ev-Object event-based tactile dataset under a strict zero-shot setting, with disjoint seen and unseen category sets. Performance is assessed using Mean Class Accuracy (MCA), Top-k accuracy, and the Semantic Alignment Score (SAS). The proposed framework consistently outperforms representative tactile ZSL baselines across all metrics. It achieves an MCA of 73.48%, a Top-1 accuracy of 62.68%, and a Top-2 accuracy of 88.75%. Ablation studies show that removing the LLM semantic module, bidirectional mapping, or cycle-consistency constraint reduces recognition performance and semantic alignment quality. Removing the LLM semantic module causes a substantial decrease in MCA, which confirms the role of structured LLM-generated tactile semantics in knowledge transfer. Removing the bidirectional mapping or the cycle-consistency constraint also reduces performance, indicating that both components help maintain stable cross-modal alignment. The t-SNE visualization further shows that cycle-consistent alignment yields more compact intra-class clusters and clearer inter-class separation for unseen categories. Semantic prototypes are also better located near the centers of tactile feature clusters. These results indicate that combining biologically inspired spiking models with LLM-generated tactile semantics provides an effective solution for open-world tactile perception.  Conclusions  This paper presents a zero-shot object recognition framework for spiking tactile perception by integrating SGNN-based tactile representation with semantic prototypes. The proposed method addresses key limitations of existing tactile ZSL approaches by avoiding visual data and manual attribute design while effectively modeling the spatiotemporal dynamics of event-based spiking tactile signals. Experimental results under strict zero-shot settings confirm the effectiveness and robustness of the proposed framework. This work provides a strong baseline for zero-shot spiking tactile recognition and offers a principled path toward open-world tactile cognition in robotic systems. Future work will explore generalized zero-shot tactile perception, multimodal extensions, and real-world robotic deployment under noisy and dynamic sensing conditions.
Cross-Domain Deepfake Detection with Dynamic Artifacts Tracking and Spatial-Frequency Interaction Analysis
LI Zilong, YANG Gaoming, HAN Dongyu, FANG Xianjin
Available online  , doi: 10.11999/JEIT251290
Abstract:
  Objective  The rapid development of generative adversarial networks and diffusion models has led to a sharp increase in the number of fake images. The widespread dissemination of fake images poses a potential and unpredictable threat to individuals, societies, and nations. Developing efficient and highly generalizable deepfake detection methods is needed. In current forgery detection research, cross-domain detection capability has become a core task in deepfake detection. However, existing detection methods still suffer from problems such as feature extraction relying on specific artifacts or fixed parameters, spatial-frequency modalities often being learned in isolation and lacking dynamic interaction mechanisms, and insufficient global feature association capabilities. To address these limitations, a Pyramidal Interactive Dual-Stream Network (PIDSNet) integrating dynamic artifact tracking and spatial-frequency interaction analysis has been proposed.  Methods  The PIDSNet is centered on two branches in the spatial and frequency domains (Fig. 1), with four modules working collaboratively: Multi-Branch Feature Extraction (MBFE) module, Frequency Domain Feature Extraction (FDFE) module, Pyramid Spatial-Frequency Interaction (PSFI) module, and Multi-Head Pyramid Squeezing Attention (MHPSA) module. The MBFE module (Fig. 2), as the basic unit of the spatial branch, avoids information loss as the receptive field increases by constructing multi-level, multi-branch dilated convolutions, achieving collaborative extraction of global and local features. The FDFE module, the core module of the frequency branch, fuses the MBFE module with spectral convolutions to achieve dynamic mining of frequency domain artifact features, reducing the dependence of traditional frequency domain methods on fixed parameters and frequency bands, significantly improving the model's adaptive capture ability of artifact features from different generative models. The PSFI module is key to the spatial-frequency branch interaction (Fig. 3), capturing low-frequency global information and high-frequency detailed features by constructing a spatial Gaussian pyramid and a frequency Laplacian pyramid. Dynamic weight enhancement at each level of the pyramid achieves adaptive fusion of spatial-frequency features, constructing a dynamic spatial-frequency feature interaction mechanism. The MHPSA module combines multi-head self-attention (MHSA) with dilated convolution (Fig. 4). While inheriting the local detail capture capability of the Pyramid Squeeze Attention (PSA) module, it also enhances the global feature modeling capability, thereby improving the model's adaptability and robustness.  Results and Discussions  To comprehensively verify the cross-domain detection capabilities of PIDSNet across different generative paradigms, this paper trains it on the ProGAN dataset and tests it on multiple GAN and diffusion model datasets. First, for the GAN generative model, in the ForenSynths test set containing four GANs (Table 3), the average Acc. reaches 95.2%, an improvement of 5.3% and 5.2% compared to LGrad and FreqNet. In the GANGen dataset containing nine GANs (Table 4, 5), the average Acc. reaches 95.5%, an improvement of 20.1% compared to F3Net, and improvements of 4.1% and 1.3% in average Acc. and average A.P. compared to FreqNet. Second, for the diffusion model, tests were conducted on the DiffusionForensics and Ojha datasets. In the DiffusionForensics dataset (Table 6), the average Acc. reaches 95.4%, an improvement of 4.8% and 13.2% compared to LGrad and FreqNet. In the Ojha dataset (Table 7), the average Acc. and average A.P. reached 96.1% and 99.4%, showing a significant improvement. More importantly, PIDSNet has only 2.4M parameters (Table 8), and achieves average Acc. and average A.P. of 95.7% and 98.7% across 25 datasets, surpassing other methods. The above experiments show that PIDSNet, trained only on the ProGAN dataset, can adapt to multiple types of GAN models and effectively detect diffusion model images with significant differences in artifact features between the spatial and frequency domains, demonstrating excellent cross-model and cross-generative paradigm generalization capabilities. Moreover, Grad-CAM visualizations reveal that despite not being trained on face images (Fig. 5), PIDSNet demonstrates strong detection performance on face images.  Conclusions  This paper addresses the problems of current GAN and diffusion model detection methods, such as feature extraction relying on domain-specific artifacts or fixed parameters and weak modal interaction, which lead to weak domain adaptability and poor generalization performance. To solve these problems, a spatial-frequency collaborative learning framework and a dynamic artifact mining mechanism are constructed to reduce the limitations of traditional methods that rely on specific domain artifacts and fixed parameters, enhancing the extraction capability of general forgery features and reducing dependence on specific artifacts. The model's effectiveness is validated on image datasets generated by 25 different GAN and diffusion models. Compared with current state-of-the-art models, the average Acc. and A.P. are significantly improved, confirming good performance in cross-domain forgery detection tasks. However, experiments reveal that PIDSNet still has certain limitations. When dealing with specific models whose high-frequency energy distribution is very close to that of real images (such as S3GAN), there is still room for performance improvement, and the frequency domain feature mining mechanism needs optimization. Therefore, future work will focus on two main aspects: firstly, continuing to optimize the frequency domain feature extraction mechanism to enhance the ability to identify forged samples with high-frequency energy features close to real images; secondly, focusing on improving the detection capability of low-quality forged images with compression distortion and noise interference, while studying artifact separation and detection methods for forged images generated by multiple models to enhance the adaptability of the model in real complex environments.
Design of a Timing-Controlled Non-Volatile Flip-Flop with Low-Switching-Ratio FeFET
DU Shimin, YANG Chang, WANG Lunyao, ZHANG Zhe
Available online  , doi: 10.11999/JEIT251059
Abstract:
  Objective  Nonvolatile processors(NVPs) have become a key technology for Internet-of-Things (IoT) and energy-harvesting systems, where maintaining computational states during unexpected power interruptions is essential. Conventional volatile processors rely on external nonvolatile memory(NVM) for state retention; however, this approach incurs significant latency and energy overheads. Integrated nonvolatile flip-flops using ferroelectric field-effect transistors(FeFETs) offer a promising alternative by enabling on-chip state backup and recovery. Nevertheless, existing single-ended FeFET-based flip-flops are prone to contention-induced failures during power-up recovery, especially when the FeFET on/off ratio degrades. This issue originates from competing discharge paths that lead to uncertainty in internal node voltage settling, thereby resulting in unreliable state restoration. To address this challenge, this work proposes a novel flip-flop architecture that replaces contention-based recovery with a timing-controlled two-phase mechanism. The primary objectives of this design are to achieve high-reliability recovery even under degraded FeFET on/off ratios as low as 102, optimize timing parameters such as Hold-Time and Clock-to-Q delay, and maintain low energy consumption suitable for IoT applications.  Methods  The proposed design is an extension of the Static Contention-Free Single-Phase-Clocked Flip-Flop(SSCFF), which inherently eliminates internal node contention through its fully static structure. Based on this foundation, one FeFET device and five additional MOSFETs are integrated to construct a single-ended nonvolatile flip-flop(NVFF). Two control signals, RES and MOD, are introduced to manage the recovery process.In the normal operation mode where MOD=0, the circuit functions as a conventional SSCFF and supports state backup during runtime. In the recovery mode where MOD=1, the recovery operation is divided into two distinct phases.In the pre-charge phase, when RES=0, the internal nodes are pre-charged to VDD. In the selective discharge phase, as RES transitions from low to high, the resistance state of the FeFET determines whether discharge occurs. If the FeFET is in the low-resistance state(LRS), a discharge path is formed, pulling the node voltage down to ground. If the FeFET remains in the high-resistance state(HRS), the node retains its charge until the next clock edge.This sequence of pre-charging followed by selective discharge eliminates contention during recovery and ensures that the internal node voltages settle deterministically and reliably.The design is implemented in a 130nm CMOS process with integrated FeFET models. Simulations, including Monte Carlo analysis, were performed in Cadence Virtuoso across a supply voltage range of 0.6–0.9 V and FeFET on/off ratios ranging from 102 to 104. Key performance metrics, such as Setup-Time, Hold-Time, Clock-to-Q delay, restore energy, and recovery success rate, were evaluated and compared against traditional Transmission Gate Flip-Flop.  Results and Discussions  Simulation results show that the timing-controlled recovery improves reliability even under severe FeFET degradation. The proposed flip-flop achieves 100 % restore yield when the FeFET on/off ratio drops to 102. This is because the proposed structure eliminates the competing discharge paths. Timing metrics are also improved: the 3σ worst-case Hold-Time is reduced by 64.6 % , and the Clock-to-Q delay is shortened by 33.9 %. Although Setup-Time increases slightly, it can be compensated by device sizing. Restore energy remains in the low-fJ (10–15 J) range across all supply voltages, rising only modestly compared with the TGFF because of the added pre-charge phase.  Conclusions  A Ferroelectric FET Nonvolatile Flip-Flop with timing-controlled two-phase recovery has been presented, addressing the contention-induced failure modes that limit low-voltage NVFF reliability. By integrating a single FeFET with an enhanced SSCFF structure and using RES signal to manage the pre-charge and discharge steps, high restore yield is maintained even under severely degraded FeFET on/off ratios, while Hold-Time and Clock-to-Q delay are significantly improved relative to traditional transmission-gate NVFFs. The proposed architecture offers a compelling solution for energy-constrained IoT processors requiring fast, reliable state preservation under unpredictable power conditions.
A Noise Reduction Strategy via Coprime-Spacing Subarrays for Biodiversity Acoustic Indices
CHEN Lei, XU Zhiyong, ZHAO Zhao
Available online  , doi: 10.11999/JEIT260237
Abstract:
  Objective  As a popular tool for rapid biodiversity assessment, acoustic indices have attracted increasing attention in the field of soundscape ecology in recent years. Nevertheless, most commonly used acoustic indices are susceptible to background noise. Traditional single-channel noise reduction strategies, including spectral subtraction, high-pass filtering, and threshold detection, have been widely adopted as preprocessing approaches to optimize the calculation of acoustic indices. However, when dealing with anthropogenic interference that overlaps with biotic signals in both time and frequency domains, the denoising capability of single-channel methods degrades severely. Although spatio-temporal adaptive whitening filtering based on microphone arrays provides a feasible approach for suppressing directional interference, it suffers from a non-uniform two-dimensional spatio-temporal amplitude response and the self-cancellation of target signal in the unconstrained interference cancellation. These disadvantages lead to distortion in the time-frequency distribution of target signals, causing acoustic index calculations to deviate from the ground truth. Therefore, this study aims to propose a noise reduction strategy via coprime-spacing subarrays for biodiversity acoustic indices. This method effectively suppresses directional interference while maximally preserving the time-frequency distribution structure of biotic signals.  Methods  The noise reduction strategy based on microphone array spatio-temporal adaptive whitening filtering is proposed, incorporating the Frequency-dependent Acoustic Diversity Index (FADI), which is insensitive to fluctuations in the array's two-dimensional spatio-temporal amplitude response. A noise-robust acoustic index method, termed Adaptive Interference Cancellation–Frequency-dependent Acoustic Diversity Index (AIC-FADI), is subsequently developed. Specifically, a non-uniform linear array is first constructed using three microphones to form two dual-element subarrays with coprime spacing. This design fully exploits the high spatial resolution of wide-spacing arrays to narrow the null width in the direction of interference. Meanwhile, it avoids the physical implementation difficulties and mutual coupling effects associated with small-spacing array designs caused by the ultra-wideband characteristics of target signals. The spatio-temporal adaptive whitening filtering is then performed on each coprime-spacing subarray separately, adaptively forming two-dimensional nulls within the interference support region, thereby suppressing directional anthropogenic interference in analytical data before index calculation. Next, a frequency-dependent threshold scheme is utilized to obtain the binary spectrogram for each coprime-spacing subarray output, abating the influence from gain differences along the frequency axis for a certain direction. Afterwards, by leveraging the high spatial resolution of wide-spacing arrays and the interleaved characteristics of spatial aliasing null positions between the spatio-temporal frequency responses of the two subarrays with coprime spacing, a pointwise maximum fusion is applied to the above two binary spectrograms. This process reconstructs the binary time-frequency distribution structure of target signals outside the interference support region, leading to a single binary spectrogram where biological sound components are preserved to a great extent and anthropogenic interference is considerably suppressed. Ultimately, from this single binary spectrogram, the proportions of non-zero time-frequency bins within each frequency band are calculated and forwarded to the entropy function, resulting in the final AIC-FADI result.  Results and Discussions  The simulation result indicates that the proposed AIC-FADI maintains numerical robustness across an SINR range down to –15 dB (the yellow line in Fig. 5), substantially outperforming the classical ADI version based on single-channel noise reduction algorithm (FADI) and other ADI versions based on single-array interference suppression processing mentioned in this paper (AIC-FADI-s, AIC-FADI1, and AIC-FADI2). The real-world experiment confirms that the proposed spatio-temporal adaptive whitening filtering effectively suppresses wideband interference signals in complex scenarios, thereby improving the SINR of the analyzed recording. This enables some weaker biotic signals to exceed their corresponding frequency-dependent adaptive thresholds, greatly reducing missed detection of the target signal. In addition, by performing pointwise maximum fusion of the binary spectrograms from the two coprime-spacing subarray outputs, AIC-FADI further alleviates the extent of target signal missed detection (Fig. 8). Nevertheless, the real-world experiments also reveal that the interference suppression performance of AIC-FADI degrades for highly time-varying interference components.  Conclusions  This paper addresses the challenge of calculating acoustic indices reliably in complex soundscapes where directional anthropogenic interference overlaps with biotic signals in both time and frequency domains. A noise reduction strategy using coprime-spacing subarrays is proposed, and a new noise-robust acoustic index (AIC-FADI) is then developed. The method is evaluated through simulations and real-world recordings, and the results show that: (1) By applying spatio-temporal adaptive whitening filtering on each coprime-spacing subarray followed by pointwise maximum fusion, the proposed method achieves both wideband interference suppression capability and target information fidelity in complex soundscapes containing strong interference. (2) As a result, the proposed AIC-FADI maintains numerical robustness down to –15 dB SINR, substantially outperforming the classical FADI algorithm and other ADI versions based on single-array interference suppression methods. (3) The proposed method provides a feasible technical solution for extending the practical application scenarios and spatio-temporal coverage of biodiversity acoustic indices in human-dominated areas. However, this study only considers directional interference that is relatively stable or slowly time-varying. Hence, the interference suppression performance degrades for highly time-varying or uncorrelated noise components. These challenges should be addressed in future work through more advanced signal processing techniques to further improve the robustness of acoustic indices in highly complex acoustic environments.
A Survey of Quantum Covert Communication Integration Schemes and Application Scenarios
SUN Yiheng, XU Yongjun, ZHANG Haibo, HUANG Zishan
Available online  , doi: 10.11999/JEIT260282
Abstract:
  Significance   With the growing demand for network communication security, research and development in covert communication and quantum communication have continued to evolve. However, current covert communication suffers from inherent security vulnerabilities; the transmission reliability of quantum communication has been limited by information eavesdropping and harmful interference. Therefore, quantum covert communication has become a research hotspot, integrating the advantages of both covert and quantum communication while addressing their respective security limitations. To this end, this paper provides a comprehensive survey of quantum covert communication integration schemes and application scenarios, including the principles of covert communication and typical enabling techniques; protocols for quantum communication and important quantum techniques; and three types of quantum covert communication integration schemes summarized by different application scenarios. This paper contributes to the design of advanced secure communication networks while offering guidance for the development of future quantum covert communication systems.  Progress   This paper presents a comprehensive survey of recent advances in quantum covert communication integration schemes and application scenarios, with an in-depth discussion of the principles of covert communication and key enabling techniques, such as Fluid Antenna (FA), Reconfigurable Intelligent Surface (RIS), and Unmanned Aerial Vehicle (UAV). FA actively reshapes wireless channel characteristics, particularly the spatial correlation of multipath components, by dynamically adjusting the transmitter physical configuration, thereby reducing information leakage. In Non-Line-of-Sight (NLoS) scenarios, RIS can dynamically alter the direction of reflected transmission of the incident signal, not only enhancing the Channel State Information (CSI) quality of the covert signal but also reducing signal leakage. In flexible or temporary communication networks, UAVs can increase CSI uncertainty, preventing unauthorized users from establishing a stable monitoring model and thereby complicating eavesdropping. Then, key protocols and significant techniques of quantum communication are introduced, including BB84, B92, and E91 for Quantum Key Distribution (QKD), and BF02, Two-Step for Quantum Secure Direct Communication (QSDC). Additionally, the quantum repeaters and Quantum Random Number Generator (QRNG) are reviewed. Based on different application scenarios, quantum covert communication integration schemes can be categorized into enabling, covert, and symbiotic integration schemes, depending on the integration mechanisms. To be specific, the enabling integration scheme leverages the unconditional security of quantum communication to address the security vulnerabilities in covert communication, the covert integration scheme utilizes enabling techniques in covert communication to reduce the detection probability of quantum communication, and the symbiotic integration scheme combines both advantages of covert communication and quantum communication to achieve mutual empowerment and deep symbiosis. Finally, critical challenges are highlighted, including stringent hardware precision requirements, low resource allocation efficiency, and obstacles in large-scale applications. Promising directions for future research are also identified, including R&D on precision communication equipment, dynamic resource management, cost control during deployment, and the promotion of standardized development.  Prospects   Despite remarkable progress in preliminary applications and specific scenarios, research on quantum covert communication remains in its infancy. As quantum covert communication scenarios become increasingly diverse and complex, future studies should prioritize challenges that restrict further development and large-scale application of quantum covert communication. The stringent hardware precision requirements are the primary challenge, limiting reliable transmission distance and stability. Low resource allocation efficiency is another challenge, as the quantum covert communication system that generates quantum entanglement over lossy channels remains subject to the Square Root Law (SRL) constraints, while signal transmission exhibits burstiness and dynamics. Additionally, high deployment costs and the lack of standardization present significant hurdles. To address the challenges mentioned, future directions should include R&D on precision communication equipment, dynamic resource management, cost control during deployment, and the promotion of standardized development to facilitate the development of high-performance, large-scale, and multi-scenario quantum covert communication.  Conclusions  This paper provides a comprehensive survey of quantum covert communication with particular emphasis on integration schemes and application scenarios. The fundamentals and typical enabling techniques of covert communication are first reviewed, highlighting its Low Probability of Detection (LPD) secure paradigm and unique channel characteristics. The typical protocols and important techniques of quantum communication are then examined, including QKD, QSDC, quantum repeaters, and QRNG. Three types of quantum covert communication integration schemes have been further classified by different integration mechanisms and corresponding application scenarios. Finally, several existing challenges are identified, including stringent hardware precision requirements, low resource allocation efficiency, and obstacles to large-scale applications. Relevant research directions are also outlined, including R&D on precision communication equipment, dynamic resource management, cost control during deployment, and the promotion of standardized development. These directions are expected to serve as a valuable reference for advancing and standardizing quantum covert communication in future secure networks.
A Frequency Domain Self-Attention Guided Multi-Scale Inverse Lithography Technology
LUO Binling, WANG Ying, CAI Shuting
Available online  , doi: 10.11999/JEIT251382
Abstract:
  Objective  Optical Proximity Effects (OPE) in lithographic processes cause printed patterns on wafers to deviate from target layouts, necessitating Optical Proximity Correction (OPC) through mask optimization prior to exposure. Traditional rule-based OPC methods suffer from significant accuracy degradation when handling complex layouts, while model-based OPC approaches incur high computational cost. In recent years, deep learning--based methods have been introduced to accelerate mask generation; however, their limited receptive fields hinder effective modeling of long-range optical interference effects, thereby constraining optimization accuracy. To address these challenges, this work proposes a Frequency Domain Self-Attention Guided Multi-Scale Inverse Lithography Technology (FMS-ILT), which jointly models local geometric details and global optical interactions, leading to improved printed image fidelity, edge placement accuracy, and process robustness.  Methods  FMS-ILT adopts a residual convolution--based multi-scale encoder--decoder architecture, where shallow layers extract fine-grained geometric features such as edges and corners, while deeper layers capture large-scale layout context. Residual blocks and multi-level skip connections are employed to preserve high-frequency information and stabilize training. To overcome the limited receptive field of spatial convolutions, a Frequency Domain Self-Attention Mechanism (FSAM) is introduced at the encoder output. Global feature interactions are enabled via the Fourier transform, and the resulting attention responses are mapped back to the spatial domain through the inverse Fourier transform to adaptively reweight feature representations. A two-stage training strategy is adopted. During pretraining, a dual-branch structure is used to jointly learn mask geometry and imaging consistency, providing physically meaningful initialization. During main training, lithography simulation is applied under nominal, maximum, and minimum process conditions to further refine mask optimization under physical constraints.  Results and Discussions  The comparison results with baseline models are summarized in Tables 2 and 3. Our method is set as the reference (Ratio = 1), and all experiments are conducted on the LithoBench dataset. In terms of overall imaging \begin{document}$ \mathcal{L}2 $\end{document} error, our method achieves the lowest value of 19,998, outperforming baseline models by 2%–107%. For the process robustness metric Process Variation Band (PVB), GAN-OPC obtains the best result of 19,156, which is 31% lower than ours; however, its \begin{document}$ \mathcal{L}2 $\end{document} error and EPE are 107% and 1115% higher, respectively, indicating an imbalance between imaging fidelity and edge accuracy. The remaining baseline models exhibit PVB performance comparable to ours. Regarding Edge Placement Error (EPE), our method also demonstrates a significant advantage, achieving an average EPE of 1.95, which is 47%–1115% lower than the baselines. These improvements can be attributed to three key factors: (1) a multi-scale encoder–decoder fusion mechanism that effectively integrates local and global features, (2) the combination of attention mechanisms and frequency-domain operations to guide the model toward critical regions, and (3) a dual-branch pretraining strategy that injects physical priors into the network. With these modules jointly contributing, FMS-ILT achieves more balanced and superior performance in imaging fidelity, process stability, and edge accuracy.  Conclusions  This work proposes a Frequency Domain Self-Attention Guided Multi-Scale Inverse Lithography Technology (FMS-ILT). The model adopts a residual convolution--based multi-scale encoder--decoder architecture to extract rich spatial features and incorporates a frequency-domain self-attention mechanism to jointly model local geometric details and global optical interference characteristics. A two-stage training strategy is employed. In the pretraining stage, a dual-branch task of mask generation and target image reconstruction is used to enhance the physical consistency between the mask and the printed image. In the main training stage, lithography simulation is introduced to further improve imaging accuracy and process robustness. Experimental results on the public LithoBench dataset demonstrate that FMS-ILT achieves superior performance in terms of \begin{document}$ \mathcal{L}2 $\end{document}, PVB, and EPE metrics, effectively improving printed image quality and providing a feasible and efficient solution for computational lithography.
Full-Space Covert Integrated Sensing and Communications Assisted by Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface
XIE Wenwu, ZHANG Qinke, YANG Liang, WANG Ji, YU Chao, LIU Xinzhong, CUI Yaru
Available online  , doi: 10.11999/JEIT260145
Abstract:
  Objective  The evolution of Sixth Generation (6G) mobile communications toward higher frequencies and larger antenna arrays has made Integrated Sensing And Communication (ISAC) a key enabling technology. However, ISAC systems still face limited communication covertness and resource competition between sensing and communication. Covert communication and Reconfigurable Intelligent Surface (RIS) techniques provide promising solutions. However, most existing studies use reflective RISs with half-space coverage and assume far-field propagation. These assumptions limit deployment flexibility and fail to capture near-field spherical-wave characteristics. To address these issues, this paper proposes a near-field full-space ISAC framework assisted by an Extremely Large-Scale Simultaneously Transmitting And Reflecting Reconfigurable Intelligent Surface (XL-STAR-RIS). The objective is to jointly optimize active transmit beamforming and passive XL-STAR-RIS coefficient design to improve the covert communication rate while satisfying sensing performance and covertness requirements.  Methods  The detection capability of warden Willie is first analyzed, and a closed-form lower-bound expression for the minimum Detection Error Probability (DEP) is derived. A non-convex optimization problem is then formulated to maximize the covert communication rate under sensing Signal-to-Noise Ratio (SNR), covertness, and total transmit power constraints. Direct solution is difficult because the active transmit beamforming vectors and passive XL-STAR-RIS coefficients are strongly coupled. An Alternating Optimization (AO) framework is therefore adopted to decompose the original problem into two tractable subproblems. The active transmit beamforming subproblem is solved using SemiDefinite Relaxation (SDR) combined with a penalty-based successive convex approximation method. The passive XL-STAR-RIS coefficient design subproblem is solved using the Dinkelbach algorithm and a rank-one penalty method. The two subproblems are solved alternately until convergence.  Results and Discussions  Simulation results verify the effectiveness of the proposed framework. The algorithm converges within approximately 10 iterations and achieves a covert communication rate of about 11.5 bit/(s·Hz). This rate is higher than those of the passive-RIS scheme (9.8 bit/(s·Hz)) and the non-RIS scheme (8.0 bit/(s·Hz)). The performance gain becomes more evident as the transmit power increases, which indicates strong power adaptability. The proposed framework also maintains robust performance under strict operational constraints. When the sensing SNR threshold increases, it achieves a higher covert communication rate than the benchmark schemes. Under a stricter covertness requirement, it also preserves a higher communication rate. These results show that joint active transmit beamforming and passive XL-STAR-RIS coefficient design can effectively balance communication, sensing, and covertness in near-field ISAC systems.  Conclusions  This paper presents an XL-STAR-RIS-assisted covert communication framework for near-field ISAC systems. By jointly designing active transmit beamforming and passive XL-STAR-RIS coefficients through an efficient AO algorithm, the proposed framework balances communication rate, sensing performance, and communication covertness. Simulation results confirm its advantages over conventional passive-RIS and non-RIS schemes, especially under strict sensing and covertness constraints. The results also indicate the potential of XL-STAR-RIS for secure full-space 6G applications. Future work will consider imperfect Channel State Information (CSI), dynamic propagation environments, and multi-RIS collaboration to improve practical robustness.
Millimeter-Wave Air-to-Ground Channel Prediction Assisted by Visual Information of the Propagation Environment
CHENG Yuanxun, HU Qingsong, ZHANG Xiaomin, WANG Xuesong
Available online  , doi: 10.11999/JEIT260274
Abstract:
  Objective  Accurate prediction of air-to-ground (A2G) channel states is essential for adaptive transmission and resource optimization in unmanned aerial vehicle (UAV) communications. In urban millimeter-wave scenarios, however, A2G links are highly sensitive to blockage, reflection, scattering, and the rapidly changing geometric relationship among the transmitter, the receiver, and surrounding buildings. As a result, the channel exhibits strong spatial and temporal nonstationarity, and conventional pilot- or feedback-based acquisition methods may become ineffective because the obtained channel state information is easily outdated. Recent data-driven approaches have shown potential, but many of them rely heavily on historical channel observations or directly use raw images as network inputs, which may introduce redundant visual information and weaken physical interpretability. To address these limitations, this paper proposes a vision-assisted millimeter-wave A2G channel prediction method that extracts low-dimensional geometric features from the propagation environment instead of using raw visual data directly. The objective is to preserve the key structural information governing channel evolution while reducing irrelevant redundancy, thereby improving the prediction of channel.  Methods  A communication-and-sensing integrated dataset with strict spatial and temporal alignment is established for millimeter-wave UAV A2G channel prediction. On the sensing side, a high-fidelity three-dimensional urban scenario containing 23 buildings, roads, and intersections is constructed in Unreal Engine 4.27, where synchronized RGB and depth images are collected through AirSim using a multirotor UAV equipped with RGB and depth cameras. The UAV flies along 10 preset trajectories at a height of 55 m with a spatial sampling interval of 1 m, yielding 2160 valid visual samples (Fig. 1, Fig. 2). On the communication side, the same scene is reconstructed in Wireless InSite, and the transmitter-receiver positions are synchronously updated along the same trajectories to ensure frame-level alignment between visual and channel data (Fig. 3). To obtain compact and physically meaningful environmental representations, a cross-modal spatial feature extraction scheme is developed. Buildings are first detected from RGB images using YOLO-V8 (Fig. 4), and the detected regions are then registered with depth images to reconstruct three-dimensional point clouds. After Euclidean clustering and axis-aligned bounding-box fitting, key geometric attributes, including planar position, height, and volume, are extracted. These features are combined with the transmitter-receiver distance to form the spatial feature vector of each frame, and their relevance to path loss, received power, and RMS delay spread is evaluated through cosine-similarity-based correlation analysis (Fig. 6). Based on the extracted features, a hybrid Transformer-MLP network is designed for channel prediction (Fig. 5). Building features are first projected into a latent space, and a stacked Transformer encoder is employed to capture global interactions among buildings through masked multi-head self-attention. Masked average pooling is then used to aggregate building-level representations into a scene-level environmental descriptor, which is concatenated with the link distance feature and fed into a multilayer perceptron regressor to predict the three target channel parameters.  Results and Discussions  The results confirm the effectiveness of the proposed spatial feature representation. Correlation analysis shows that the extracted geometric features are consistently related to path loss, received power, and RMS delay spread under different aggregation strategies (Fig. 6), indicating that compact building descriptors can effectively characterize the propagation environment. Among them, building height exhibits the strongest correlation with all three channel parameters, highlighting its important role in blockage, attenuation, and multipath propagation in urban millimeter-wave A2G channels. In prediction experiments, the proposed method accurately tracks the variation trends of all three targets. It remains effective in deep-fading and sharp-fluctuation regions for path loss prediction (Fig. 7), achieves high consistency with the ground truth for RMS delay spread (Fig. 8), and follows rapid local fluctuations of received power with good fidelity (Fig. 9). In contrast, the benchmark model only captures the general trend and shows larger deviations in peaks, valleys, and abrupt-changing intervals. Residual analysis further demonstrates the superiority of the proposed method. Its errors are more concentrated around zero and fluctuate within narrower ranges than those of the benchmark model across all three tasks (Fig. 10). Quantitatively, both the mean absolute error and the root mean squared error are reduced (Fig. 11). In addition, the model maintains acceptable complexity, with about 5.5 M parameters and a single-frame inference delay of about 3.4 ms, indicating good potential for real-time deployment.  Conclusions  A vision-assisted millimeter-wave A2G channel prediction method for UAV communications is proposed. By constructing a strictly aligned communication-and-sensing dataset and extracting low-dimensional spatial features with clear physical meaning, the method establishes an effective mapping from environmental geometry to channel parameters. The proposed Transformer-MLP framework achieves accurate prediction of path loss, received power, and RMS delay spread, while offering better interpretability, robustness, and efficiency than the benchmark model.
Transfer Learning Aided CNN for Efficient Data Detection in ReRAM with Sneak-Path Interference
DAI Bin, WU Anni
Available online  , doi: 10.11999/JEIT260354
Abstract:
  Objective  Sneak path interference (SPI) in resistive random-access memory (ReRAM) introduces unpredictable inter-cell correlations, significantly increasing the complexity of signal detection. Traditional detection methods typically rely on assumptions about known channel noise states, resulting in limited generalization capability in practical applications. To address this issue, three data detection methods based on convolutional neural networks (CNNs) are proposed, which can effectively model and mitigate interference without relying on prior channel information: first, a method combining constrained coding with a multi-layer CNN, which uses constrained coding to determine the sneak path interference state and recover data; second, a dual-CNN framework that first employs a lightweight CNN for sneak path interference identification, followed by a multi-layer CNN for refined detection; third, an approach incorporating transfer learning, which maintains detection accuracy while reducing the required training sample size to one-thousandth of that of traditional methods. Simulation results demonstrate that the proposed method achieves superior bit error rate (BER) performance under unknown channel conditions, with a BER reduction of at least half relative to existing algorithms, approaching the theoretical performance limit. Moreover, the integration of transfer learning reduces the required training samples from \begin{document}$ {10}^{6} $\end{document} to \begin{document}$ 1000 $\end{document}, corresponding to a reduction of three orders of magnitude.  Methods  To address distinct challenges in sneak path interference detection, this paper proposes three methods sequentially:1. The integrated constrained coding aided convolutional neural network (CC-CNN) detection framework effectively addresses the complex inter-cell correlations introduced by sneak path interference. This approach first employs constrained coding to detect the presence of interference and subsequently utilizes a CNN to learn and capture the random correlations under the influence of interference, thereby achieving accurate signal recovery.2. The dual-CNN-based detection method resolves the code rate loss associated with traditional constrained coding. By directly leveraging a CNN to learn and identify sneak path interference patterns from raw data, this method eliminates the need for redundant coding or additional overhead. It ensures high-precision interference detection while preserving the overall code rate performance of the system.3. The transfer learning-based CNN (TL-CNN) detection method overcomes the dependence of high-performance CNNs on large-scale training datasets. By reusing knowledge from pre-trained models, this method enables rapid adaptation to ReRAM signal detection tasks. It significantly reduces the required number of training samples while maintaining high detection accuracy and resource efficiency, thereby enhancing the feasibility of the solution in practical scenarios.  Results and Discussions  Simulation results demonstrate that the performance of the three proposed methods consistently approaches the theoretical lower bound (Fig.6), outperforming baseline methods such as the Belief Propagation (BP) detector, Deep Neural Network (DNN) detector, and Elementary Signal Estimator (ESE) detector. The two-step network achieves performance comparable to that of the single-step network while successfully avoiding code rate loss. Notably, the transfer learning-aided CNN attains near-optimal BER with only 1000 target domain samples, and its performance stabilizes when the sample size exceeds 1000 (Fig.7), fully validating its data efficiency. The integration of SK modules enables the models to effectively capture SPI-induced spatial correlations, while the transfer learning strategy ensures the models’ robust performance under different noise conditions.  Conclusions  The crossbar array architecture of ReRAM is susceptible to sneak-path interference during storage operations, leading to reduced data reliability. To address this issue, this paper proposes three deep learning-based detection methods. Type-I integrates constrained coding with a CNN to achieve efficient and fast interference detection. Type-II adopts a two-stage processing approach: it first classifies interference patterns in the memory array and then performs detection specifically on affected units, thereby ensuring high detection accuracy while minimizing coding rate loss. Type-III introduces a transfer learning framework that leverages a pre-trained model from the source domain, significantly reducing the number of training samples required in the target domain and effectively lowering training overhead. Experimental results show that under different noise conditions, all three proposed methods achieve performance close to the theoretical lower bound, providing an effective solution for enhancing the reliability of ReRAM storage systems.
An Overview of Key Technologies on 6G-Enabled Communication and Computing Integration for Energy-Efficiency Optimization
LIU Guangyi, CAI Qing, WANG Xinyao, CHEN Tianjiao, JIN Jing, XUE Yahui, WANG Ailing, WANG Hanning
Available online  , doi: 10.11999/JEIT260399
Abstract:
  Significance   Constrained by physical conditions such as size, power consumption, and cost, high energy consumptions have become key bottleneck for the large-scale application of new intelligent terminals. In contrast to Fifth-Generation (5G) networks, Sixth-Generation (6G) will achieve profound architectural enhancement of the RAN, sink computing capabilities toward the RAN side, and enable the RAN to perform part of tasks originally executed by end devices. With the end-edge collaboration, new intelligent terminals are expected to realize lightweight, low-cost and long-endurance evolution, which is of great significance for supporting the large-scale deployment of ubiquitous intelligence in 6G networks.  Progress   Current advancements in terminal energy consumption optimization with 6G end-edge collaboration are discussed, focusing on three primary offloading modes: local execution, full offloading, and partial offloading. Local execution requires the terminal to process all tasks, leading to high computational energy consumption, while full offloading shifts all tasks to the RAN, reducing terminal energy use but increasing transmission energy costs, particularly in poor channel conditions. Partial offloading combines the advantages of both modes, optimizing energy consumption based on real-time network conditions. For partial offloading, existing research has introduced several optimization techniques to enhance energy efficiency. (1) Feature extraction and filtering: Through semantic encoding and information extraction approaches, feature extraction is performed at the UE to transmit only task-relevant data to the RAN. This reduces the amount of redundant or unnecessary data sent, minimizing transmission energy consumption (2) Model partitioning for offloading: This technique divides a large deep learning model into different layers based on its network structure, with simpler layers processed at the UE and more complex ones offloaded to RAN. By leveraging end-edge collaborative reasoning, this method optimizes energy consumption by balancing the computational load between the terminal and RAN. (3) Model lightweighting: By reducing model complexity through techniques like pruning, quantization, and knowledge distillation, this method lowers computational overhead while maintaining performance. (4) Incremental reasoning: This method focuses on the changes in data or features, performing localized reasoning only on updated portions and reusing historical computations, significantly reducing redundant calculations. The above optimization techniques collectively enhance the performance and energy efficiency of terminal devices within the 6G end-edge collaboration framework.  Conclusions  This paper provides a comprehensive discussion of terminal energy consumption optimization with 6G end-edge collaboration. It summarizes the functional evolution of enhanced RAN, constructs an end-edge collaborative service framework for communication-computation integration, and establishes a theoretical model including terminal computing energy consumption and transmission energy consumption. The composition and influencing factors of energy consumption under different offloading modes are clarified. Key technologies for energy optimization based on end-edge collaboration are further discussed, including feature extraction and filtering, model partitioning for offloading, model lightweighting, and incremental reasoning. Given the energy consumption fluctuations caused by the dynamic nature of wireless channels, this paper introduces energy optimization mechanisms such as semantic compression, dynamic partitioned offloading, adaptive model pruning, and incremental reasoning to strike a dynamic balance between optimizing energy consumption and maintaining task performance. Taking intelligent robot video understanding as a typical application scenario, a test platform is developed to validate the effectiveness of the proposed optimization mechanisms. This paper also analyzes the challenges currently faced in the research and discusses future research directions.  Prospects   Although the end-edge collaborative energy-saving technologies have achieved initial progress, they still face many challenges in practical deployment, especially under real network environments, dynamic wireless channels, and large-scale user access. Future research should focus on the trade-off between optimization overhead and system robustness, and further investigate dynamic communication–computation resource substitution modeling in stochastic resource environments, as well as multi-user collaborative strategies and global energy efficiency optimization. Meanwhile, as the technology matures, the standardization and engineering implementation of end-edge collaborative energy-saving frameworks will become crucial for the large-scale adoption of 6G applications. Future studies should therefore promote deeper integration between algorithm design and network architecture, enabling the practical deployment of low-power, high-efficiency intelligent communication systems.
Lightweight Semantic Communication System Driven by User Personalization in UAV Networks
WEI Yuxuan, CHEN Xiao, CHEN Qiuyu, JIANG Hao, YANG Zhaohui
Available online  , doi: 10.11999/JEIT260370
Abstract:
  Objective  With the rapid development of the low-altitude economy and 6G intelligent networks, Unmanned Aerial Vehicle (UAV) image communication shows great promise in scenarios such as target reconnaissance, emergency communications, and intelligent inspection. However, constrained by the limited bandwidth, payload capacity, and onboard computational resources of UAVs, conventional pixel-level transmission fails to meet the demands of efficient, low-latency, and intelligent communications. Semantic Communication (SC), which transmits only task-relevant information, offers an effective solution to enhance communication efficiency in such resource-constrained scenarios. However, existing research on UAV image SC faces several challenges. First, fixed network architectures apply unified semantic encoding and transmission strategies for all users, failing to accommodate diverse personalized requirements. Second, new user onboarding typically requires interest pre-training or model fine-tuning, leading to high deployment overhead. Third, the computational complexity of models is generally high. To address these issues, this paper proposes a lightweight personalized UAV SC system, LPUSC, aimed at achieving a balanced trade-off among computation, bandwidth, and personalized demands. The system enables personalized transmission via low-cost semantic index interaction and a lightweight semantic extraction module, without pre-training for new users. Additionally, a dual-branch end-to-end network is designed, where the semantic index transmission network collaborates with the semantic image transmission network trained with a weighted hybrid loss strategy to ensure high-precision and high-quality transmission of personalized semantic images.  Methods  The proposed LPUSC system adopts a dual-branch architecture to enable accurate task-driven semantic content transmission. First, in the semantic index interaction branch, the lightweight object detection model YOLO11s is employed to perform semantic perception on UAV-captured visual scenes, compressing complex image information into low-dimensional semantic index vectors to reduce transmission redundancy and communication overhead. On this basis, an end-to-end semantic index transmission network is designed to enhance the robustness of semantic index transmission under complex wireless channel conditions. Through the semantic index interaction mechanism, the system is capable of accurately identifying targets of user interest, providing prior guidance for subsequent semantic content extraction. Second, in the semantic image transmission branch, the lightweight yet high-precision MobileSAM model is adopted for semantic region extraction. This branch receives the interest target bounding boxes returned by the semantic index interaction branch as heuristic prompt inputs, enabling pixel-level accurate segmentation and extraction of specific semantic targets. Third, to further enhance the reconstruction quality of semantic images, a weighted hybrid loss function is designed. This loss function integrates Mean Squared Error (MSE), L1 norm, Structural Similarity Index Measure (SSIM), gradient, perceptual, and background suppression losses to jointly optimize the network across pixel-level accuracy, structural preservation, and fine detail restoration. Through the joint constraint of multiple loss terms, the proposed system effectively enhances the reconstruction capability of semantic regions, thereby achieving high-quality semantic image transmission.  Results and Discussions  Simulation results validate the proposed LPUSC system in terms of semantic extraction and end-to-end transmission. In terms of semantic extraction, three schemes are compared, including YOLO11s-seg, “YOLO11s + SAM”, and “YOLO11s + MobileSAM” (Fig. 4). The results show that the detection-segmentation decoupled architecture achieves superior semantic boundary localization accuracy. Combined with the quantitative analysis in Table 1, the “YOLO11s + MobileSAM” scheme significantly reduces resource consumption while maintaining high extraction accuracy, confirming its suitability for resource-constrained UAV platforms. In terms of end-to-end transmission, the semantic index vector transmission results (Fig. 5) show that the Bit Error Rate (BER) decreases monotonically with increasing Signal-to-Noise Ratio (SNR) across all three channel environments, with rural environments achieving the best performance, followed by suburban and urban environments. The performance differences are primarily attributed to variations in scatterer density and link blockages across environments. The proposed transmission network maintains stable BER under different Doppler frequencies, demonstrating its robustness in dynamic channel conditions. In terms of semantic image transmission, the proposed weighted hybrid loss function demonstrates good training stability (Fig. 6), and LPUSC consistently outperforms the DeepJSCC and “JPEG + LDPC” baselines across the full SNR range (Fig. 7). Specifically, LPUSC achieves SSIM and Peak Signal-to-Noise Ratio (PSNR) gains of 1.3% and 4.8% over DeepJSCC, and 43% and 79.5% over JPEG, respectively. The results indicate that the proposed personalized semantic image transmission network achieves high-quality reconstruction with robustness to channel variations.  Conclusions  To improve the efficiency and flexibility of UAV image communication, this paper proposes a lightweight personalized SC system called LPUSC. The system employs a dual-branch transmission architecture that integrates a lightweight, high-precision object detection model and a semantic segmentation model, enabling personalized content transmission without interest pre-training. This design meets personalized user requirements while maintaining low computational and communication overhead. Simulation results demonstrate that the LPUSC system achieves stable and reliable semantic index interaction, and significantly outperforms DeepJSCC and JPEG baselines in semantic region reconstruction. The proposed system offers a valuable reference for efficient UAV image SC in 6G low-altitude intelligent networking.
Load Optimization of Inverter Air Conditioning Cluster Driven by Constraint Surface Projection and Spatial-Fitness Synergy
ZHENG Bowen, PAN Mingming, WANG Lei, LIU Chang, ZHENG Qingrong, TANG Zhuofan, ZHAO Jianli
Available online  , doi: 10.11999/JEIT260149
Abstract:
  Objective  Supply-demand imbalances in modern power distribution networks are exacerbated by the increasing penetration of distributed renewable energy and frequent extreme weather events. Consequently, large-scale inverter air conditioning (IAC) clusters are utilized for Demand Response (DR) as a viable strategy to enhance grid flexibility. However, existing dispatch strategies are often limited by the curse of dimensionality, and aggregate power equality constraints are not strictly met without compromising user comfort. In this study, an optimization framework is developed to achieve precise grid power control while thermal discomfort is minimized and fairness among heterogeneous users is maintained.  Methods  A multi-objective optimization framework based on an Equivalent Thermal Parameter (ETP) model is established to evaluate the thermodynamic states of heterogeneous buildings. To balance collective comfort and individual fairness, a composite fitness function is designed, in which a weighted mean square error term, a temperature variance penalty, and a violation suppression term are integrated. To address the steady-state errors inherent in traditional penalty-based methods, a Spatial-Fitness Adaptive Particle Swarm Optimization (SFA-PSO) algorithm is proposed. Particles are mapped strictly onto the power conservation hyperplane by a geometric constraint surface projection mechanism to ensure power balance. Furthermore, learning factors are dynamically adjusted by a spatial-fitness synergistic strategy based on the cognitive dissonance between a particle's fitness rank and spatial distance rank, whereby premature convergence in high-dimensional spaces is prevented.  Results and Discussions  Extensive continuous scheduling simulations were conducted under a complex dynamic environment, which comprehensively incorporated multi-source thermal disturbances, a 1% bidirectional communication packet loss rate, and varying part load ratios of 20%, 50%, and 80%.First, regarding the effectiveness of the proposed mechanisms, ablation experiments confirmed that the constraint surface projection guarantees power tracking accuracy. While traditional penalty-based methods (e.g., Penalty-PSO) exhibited steady-state power deviations of approximately 10-1 kW, SFA-PSO successfully restricted the aggregate power tracking errors within 10-9 kW (Fig. 3). Furthermore, the introduction of the Spatial-Fitness Adaptive (SFA) strategy effectively prevented the premature convergence observed in Phy-PSO, enabling continuous fitness descent particularly in low-load scenarios with narrow feasible regions (Fig. 4). This is directly attributed to the dynamic evolution of the learning factors, where the cognitive factor remains high initially to encourage global exploration, and subsequently decreases while the social factor rises to enhance precise local exploitation (Fig. 5).Second, in terms of continuous dynamic scheduling performance, a 6-hour simulation during the peak load period (12:00 to 18:00) with 5-minute dispatch intervals, totaling 72 decision steps, was executed. Under extreme power limitations, standard algorithms like GA and WOA suffered from severe power limit violations due to poor synergy with the projection mechanism, whereas SFA-PSO maintained perfect constraint satisfaction (Fig. 7). SFA-PSO consistently positioned itself at the lowest fitness level throughout the real-time evolution curves, demonstrating superior robustness against environmental thermal noise and network transmission delays (Fig. 8). Quantitatively, compared to eight baseline algorithms including SLPSO, CSO, and DSCPSO, the proposed SFA-PSO achieved the most outstanding comprehensive performance with an average fitness of 904, a minimum fitness of 243, and the lowest standard deviation of 551 (Table 2).Finally, comprehensive scalability analyses across diverse cluster sizes ranging from 100 to 1,000 nodes further validated the algorithm's high-dimensional solving capability. Across all scale scenarios, SFA-PSO exhibited the strongest optimization capacity, characterized by a rapid initial descent within the first 20 iterations and sustained exploration in later stages (Fig. 9). Although the integration of the projection and SFA mechanisms increased the computational time by 30% to 50% compared to the basic PSO algorithm (Fig. 6) , the absolute optimization solving time remained highly stable at approximately 1.5 seconds even for a massive 1,000-node cluster (Fig. 9). This minor computational overhead is entirely negligible for minute-level control cycles, fully satisfying the stringent real-time dispatch requirements of modern smart grids.  Conclusions  The steady-state error limitations of traditional soft-constraint methods in aggregate power control are effectively addressed by the proposed SFA-PSO algorithm. By ensuring precise tracking of dispatch commands and mitigating high-dimensional traps, a robust and scalable solution is provided for the flexible scheduling of large-scale IAC loads in smart grids, and a practical balance between grid-side regulation and user-side comfort is maintained. Objectively, cross-algorithm generalization is restricted by the inherent algorithm dependency of the constraint projection mechanism, and additional computational overhead is introduced to guarantee high-precision tracking. Consequently, adaptive constraint processing and algorithm lightweighting technologies are primary focuses for future research.
Resource Allocation in Dual-RIS Cooperative Rate-Splitting Multiple Access Networks
CHEN Yuang, WU Chang, PENG Mingyu, LU Hancheng
Available online  , doi: 10.11999/JEIT260171
Abstract:
  Objective  In RSMA systems, the achievable common-stream rate is fundamentally constrained by the user with the weakest channel quality, which limits scalability, robustness, and user fairness in dense 6G networks. Existing cooperative RSMA architectures only partially alleviate this bottleneck and still suffer from rigid channel dependencies and limited interference management capability. To address these issues, this paper proposes a dual-RIS cooperative RSMA architecture, where two collaboratively deployed RISs jointly create additional controllable propagation paths through cooperative double reflection. The objective is to maximize the system sum rate through the joint optimization of BS beamforming, RS strategies, and dual-RIS phase configurations, thereby improving spectral efficiency, robustness, and user fairness under users’ QoS constraints.  Methods  A tractable system model is developed for the dual-RIS cooperative RSMA architecture, accurately capturing cascaded multi-link channels and interference coupling. Based on this model, a joint optimization problem is formulated to maximize the system sum rate by optimizing BS beamforming, RS strategies, and discrete phase shifts of both RISs. Due to strong variable coupling and non-convexity, a low-complexity and efficient AO algorithm is designed, which decomposes the original problem into manageable subproblems and solves them iteratively with fast convergence.  Results and Discussions  Extensive simulation results demonstrate the effectiveness of the proposed dual-RIS cooperative RSMA system. The proposed AO algorithm converges rapidly within 6–7 iterations and achieves over 97% of the steady-state sum rate within three iterations for large-scale RIS deployments (Fig. 3). Compared to classic phase configuration scheme, the proposed phase configuration yields up to at least 10.6% sum-rate gains (Fig. 4). Moreover, the proposed RSMA system outperforms NOMA and SDMA by 10.0% and 14.6%, respectively (Fig. 5). Dual-RIS cooperation provides 11.9% gain over single-RIS, with performance approaching the continuous-phase upper bound (Fig. 6). Balanced RIS element allocation maximizes performance (Fig. 7). In contrast, the proposed beamforming significantly surpasses traditional methods, delivering up to at least 33.2% gains at 30 dBm transmit power (Fig. 8). These results highlight the superiority of the proposed dual-RIS cooperative RSMA system in enhancing common-stream decoding and interference suppression, leading to improved robustness and fairness.  Conclusions  This paper investigates a dual-RIS cooperative RSMA system that effectively improves public-stream decoding performance while mitigating complex interference. To maximize the system’s sum rate, this paper jointly optimizes BS beamforming, RS decisions, and discrete phase shifts of both RISs. A low-complexity AO algorithm is developed to address the strongly coupled non-convex problem. Extensive results demonstrate that the proposed dual-RIS cooperative RSMA scheme achieves significant sum-rate gains over state-of-the-art schemes while exhibiting superior robustness and user fairness.
One-step Reconstruction Diffusion Model for Poisoning Attack on QoS-aware cloud API Recommender System
TAN Zeyu, WANG Haoyuan, QI Mingyang, SUN Mengmeng, SHEN Limin, CHEN Zhen
Available online  , doi: 10.11999/JEIT260115
Abstract:
  Objective  In the cloud era, cloud Application Programming Interface (cloud API), as the best carrier for data output, capability replication and service delivery, has become an indispensable core element for service-oriented software development and operation. With the rapid increase in the number of cloud APIs, it is difficult for users to choose from a large number of cloud APIs with the same functions. For this purpose, researchers introduced Quality of Service (QoS) to effectively differentiate cloud APIs based on their non-functional attributes. Therefore, QoS-aware cloud API recommender systems (QARS) are gradually playing an increasingly important role in guiding users to choose the most suitable cloud API. However, existing research mainly focuses on improving the accuracy of QARS, ignoring the security risks brought about by the economic benefits of cloud APIs and the openness of the network environment. These risks are especially evident in the threats posed by poisoning attacks. Attackers manipulate the recommendations by injecting fake users, causing serious damage to the fairness and credibility of the QoS-aware cloud API recommender system. To counter the threat of poisoning attacks, this paper reveals the attack mechanisms of diffusion model-based attack methods from the perspective of learning defense through attacking, inspiring the design of corresponding defense methods.  Methods  This paper systematically defines the attack process of poisoning attacks and fake user profiles, and proposes attack scales to flexibly simulate poisoning attacks. Then, to reveal the attack principle of the diffusion model-based attack method, this paper further proposes a Preference guided one-step reconstruction Diffusion model-based Poisoning Attack framework (PDPA) to simulate poisoning attacks. Following the collaborative principle that similar users may have similar preferences toward cloud APIs, the fake users generated by the attack method need to ensure that both their QoS values and the distribution of cloud API invocations remain similar to those of real users, thereby exploiting the collaborative influence of fake users to interfere with the QARS's modeling of user preferences. Therefore, to effectively carry out poisoning attacks, PDPA aims to generate fake users that are similar to real users. Firstly, PDPA uses the One-step reconstruction Diffusion Model (ODM) to model the QoS data and the invocation distribution of real users, respectively. ODM avoids the error accumulation that occurs during the iterative denoising process caused by the noise dependence of standard diffusion models, enabling ODM to generate fake user cloud API invocation behaviors similar to those of real users, thereby ensuring that fake users can effectively have a collaborative influence. Subsequently, in order to improve the attack performance, PDPA systematically selects fake users with a preference for invoking the target cloud API to fill the maximum QoS value. This not only enhances the aggressiveness of fake users, but also alleviates the interference of the target cloud API's addition on the invocation behavior of fake users, ensuring the concealment of fake users.  Results and Discussions  The experiment was conducted in the real-world QoS dataset WS-DREAM. Firstly, this paper uses six recommendation methods as target recommender systems, and six baseline attack methods to simulate poisoning attacks. The experimental results (Table 3) reveal the vulnerability of the recommender system to poisoning attacks. Each attack method can cause damage to the accuracy of the recommender system. PDPA achieves the best attack performance in most experimental settings, which is attributed to its sufficient modeling of user invocation preferences, thereby enabling fake users to effectively exert collaborative influence on the QARS. Secondly, the comparison of the F1 and distribution in latent space of fake users generated by ODM and the standard diffusion model was conducted. The experimental results (Figure 2) verify that ODM is superior to the standard diffusion model not only in terms of stealth but also as reflected in low-dimensional visualization. Subsequently, the ablation study on each module of PDPA was conducted. The experimental results (Tables 4 and 5) verify that each module of PDPA is a necessary guarantee for the attack performance and concealment of fake users. Finally, the comparison of MAE and F1 on various attack scales was conducted to verify the impact of attack scale on the attack effect and concealment of fake users. The experimental results (Figure 3 and Table 6) indicate that increasing the attack scale could effectively enhance the attack performance, but it would also lead to an increase in the number of detected fake users.  Conclusions  To counter the threat of poisoning attacks, this paper explores the attack process and key attack parameters of poisoning attacks, and reveals the vulnerability of the QoS-aware cloud API recommender system by simulating poisoning attacks. This paper simulates poisoning attacks on QARS by constructing the PDPA, which demonstrates the significant potential of diffusion models in poisoning attacks and validates the necessity of separately modeling QoS data and cloud API invocations through ablation studies. Furthermore, PDPA reveals the underlying mechanism of generating fake users via diffusion models, providing insights for designing targeted countermeasures.
Physical-layer Security in Visible Light Communications: Fundamental Theories, Key Techniques, and Future Challenges
WANG Jinyuan, YAN Xinrun, LIN Zihan, LI Yuanyuan, LI Zheng, ZHANG Xin
Available online  , doi: 10.11999/JEIT260338
Abstract:
  Significance   Due to the broadcast nature of optical signals, information security represents a critical research direction in visible light communication (VLC). Conventional encryption techniques address network security issues at the upper layers of the protocol stack through access control, cryptographic protection, and end-to-end encryption. However, their security relies on the assumption that eavesdroppers possess limited computational capabilities, an assumption that currently faces significant challenges. In recent years, physical layer security (PLS) has emerged as a novel information security paradigm and has attracted considerable attention from researchers worldwide. PLS exploits the randomness, heterogeneity, and distinctiveness between the main channel and the eavesdropping channel to achieve secure information transmission at the physical layer. To date, extensive research achievements have been made regarding PLS techniques in conventional radio frequency wireless communications (RFWC). Nevertheless, due to substantial differences in frequency bands, transmitted signals, power representations, and channel characteristics, PLS research results from RFWC systems cannot be directly applied to VLC. Although scholars worldwide have conducted research on VLC PLS technology, the foundational theories, key techniques, and future challenges involved in VLC PLS still lack a systematic review. To bridge this gap, this paper presents a comprehensive survey of VLC PLS technology.  Progress   To evaluate and enhance system performance, a classic VLC PLS system model—comprising the received signal model, the input constraint model, and the channel gain model—is initially established. A comprehensive theoretical framework for performance evaluation is then developed, encompassing instantaneous performance metrics, statistical performance metrics, and asymptotic performance metrics. Specifically, to characterize instantaneous performance, existing works on instantaneous secrecy capacity and instantaneous secrecy rate across different scenarios are summarized. As statistical performance metrics, average secrecy capacity, average secrecy rate, secrecy outage probability, probability of strictly positive secrecy capacity, and interception probability are analyzed. To demonstrate asymptotic performance, secrecy diversity order and secrecy degrees of freedom are derived. Furthermore, to enhance the PLS performance, advanced technologies, including secure beamforming, artificial noise, physical region protection, secure coding, and secure diversity, are summarized.  Prospects   Despite existing research achievements, numerous challenges remain in VLC PLS. This paper identifies four critical challenges: (i) Accurate PLS performance limit: Deriving exact expression of secrecy capacity under VLC's unique physical constraints remains challenging. (ii) Incomplete evaluation framework: Some key metrics widely used in RFWC have not been investigated in VLC, and the construction of a comprehensive VLC PLS performance evaluation framework remains unresolved. (iii) Limitations of existing methods: Conventional PLS performance enhancement methods typically adopt a “modeling-optimization-verification” separated research paradigm, often falling into a vicious cycle of “inaccurate modeling-suboptimal solutions-limited performance gains”. Therefore, it is imperative to integrate novel technologies (such as deep learning, reinforcement learning, and digital twins) to construct a data-model dual-driven framework for VLC PLS performance enhancement. (iv) Hardware platform gap: The absence of dedicated hardware platforms featuring adversarial topologies and real-time processing capabilities significantly impedes the practical deployment of VLC PLS technologies. Therefore, addressing these challenges is essential for transitioning VLC PLS from theoretical advances to commercial applications.  Conclusions  The broadcast nature of optical signals renders VLC systems vulnerable to eavesdropping attacks. This paper presents a comprehensive survey of PLS in VLC, covering system models, performance metrics (instantaneous, statistical, and asymptotic), and key performance enhancement technologies including secure beamforming, artificial noise, physical region protection, secure coding, and secure diversity. Despite significant progress, challenges remain in establishing accurate performance bounds, complete evaluation frameworks, novel enhancement techniques, and practical hardware implementations. By exploiting channel disparities at the physical layer without relying on complex encryption, PLS represents a paradigm shift in security assurance, paving the way for next-generation secure and reliable VLC networks.
Aerial Spatio-Temporal Image Generation via Latent Diffusion Models
SHANG Yuying, HOU Yingyan, LIU Zinan, LU Wanxuan, HUANG Yuhong, WANG Yixiao, YU Hongfeng, FU Kun
Available online  , doi: 10.11999/JEIT260165
Abstract:
  Objective  Aerial Earth observation plays a pivotal role in environmental monitoring, disaster warning, and urban planning. However, constraints such as flight-platform endurance and mission-window timeliness often prevent acquired aerial imagery from fully characterizing the long-term evolution of the Earth's surface. Although pre-trained latent diffusion models have shown strong potential for image generation, their application in aerial scenarios remains challenging because of the scarcity of high-quality temporal annotation data and semantic-visual misalignment caused by variable observation scales. To address these challenges, this paper proposes ASTIG, a training-free framework for Aerial Spatio-Temporal Image Generation. By leveraging the generative priors of pre-trained latent diffusion models and Large Language Models (LLMs), ASTIG provides a new paradigm for semantically controllable aerial spatio-temporal image generation.  Methods  ASTIG consists of three coordinated components. First, a dynamic semantic decomposition process is proposed to parse complex descriptions of aerial scene evolution into frame-level visual prompts, thereby compensating for the lack of temporal semantic annotations in existing aerial image-text datasets. Second, a Linguistic Binding (LB) strategy is proposed to establish explicit associations between key ground objects and their corresponding visual attributes within the cross-attention mechanism of the diffusion model, thereby improving the semantic response precision of the generated images. Third, a Temporal Anchor Attention (TAA) mechanism is incorporated. It uses dual reference frames to maintain subject stability and background consistency across the generated spatio-temporal image sequence, thus suppressing inter-frame temporal drift under training-free conditions.  Results and Discussions  ASTIG and the baseline methods are evaluated on 7,236 high-quality aerial spatio-temporal descriptions using six automated metrics, including subject consistency, background consistency, temporal flickering, motion smoothness, aesthetic quality, and imaging quality. Quantitative results (Tables 1 and 2) show that ASTIG outperforms the baseline methods in spatio-temporal image generation, with improvements of 3.91% in subject consistency and 4.57% in motion smoothness over the frame-prompt baseline. Qualitative comparisons (Fig. 4) further show its strong ability to model long-term surface evolution in aerial imagery. Ablation studies validate the individual effectiveness of the LB strategy and the TAA mechanism (Table 3 and Fig. 5). Sensitivity analyses of the intervention steps (Table 4 and Fig. 6) and binding strength (Table 5 and Fig. 7) further identify suitable parameter settings. Extension experiments from satellite perspectives (Figs. 8 and 9) also show that ASTIG has the potential to generalize beyond aerial platforms to broader Earth observation scenarios.  Conclusions  This paper proposes ASTIG, a training-free framework for aerial spatio-temporal image generation that addresses the scarcity of high-quality long-term temporal data and semantic-visual misalignment. By leveraging the generative priors of pre-trained latent diffusion models and LLMs, ASTIG integrates a dynamic semantic decomposition process, an LB strategy, and a TAA mechanism to improve temporal semantic construction, semantic response precision, and inter-frame consistency. Experimental results show that ASTIG outperforms existing baseline methods across multiple automated evaluation metrics, providing a new paradigm for aerial spatio-temporal image generation. As a training-free method, ASTIG is still limited by the prior knowledge of the backbone model. Future work will examine geometric correction and nadir-view prior constraints to better align the generated results with the physical properties of satellite imagery.
Joint Channel Estimation and Diagnosis for Blocked RIS-Assisted Multi-User Multipath Millimeter-Wave Systems
LI Shuangzhi, LIU Cong, WANG Ning, HAN Gangtao, GUO Xin
Available online  , doi: 10.11999/JEIT260093
Abstract:
  Objective  Reconfigurable Intelligent Surface (RIS) can effectively modulate Millimeter-Wave (mmWave) signals and reshape the wireless propagation environment. In practical deployments, however, RIS elements are vulnerable to adverse weather and physical obstructions, which cause unpredictable distortion and motivate joint channel estimation and blockage diagnosis. Most existing studies focus on single-user systems, whereas multi-user scenarios remain insufficiently studied. This gap creates an opportunity to exploit the common RIS blockage vector and the shared RIS-Base Station (BS) channel across users. This paper therefore proposes a low-complexity framework for joint channel estimation and blockage diagnosis by exploiting the sparsity and correlation of multi-user cascaded channels.  Methods  Under the assumption that all User Equipment (UE) shares the same RIS-BS channel and is affected by a common RIS blockage vector, the problem is divided into two stages. First, a target UE is selected. The sparsity of the mmWave channel and blockage vector, together with the linear dependence among RIS-BS paths, is used to formulate a sparse recovery problem. A hierarchical Bayesian model is then adopted, and an efficient Sparse Bayesian Learning (SBL) algorithm is used for joint recovery. Second, partial Channel State Information (CSI) obtained from the target UE is used to construct a common channel matrix that combines the RIS-BS channel and blockage information. Channel estimation for the remaining UEs is then reformulated as another sparse recovery problem.  Results and Discussions  A low-complexity strategy for cascaded channel estimation and blockage diagnosis is developed by exploiting the sparsity and correlation of multi-user cascaded channels and the commonality of the RIS blockage vector. Ideal estimation results are used as a theoretical lower bound, and the proposed algorithm is compared with two benchmark schemes. Simulation results show that the proposed algorithm consistently outperforms the benchmark schemes (Fig. 1). Specifically, a higher target-user Signal-to-Noise Ratio (SNR) improves the Normalized Mean Square Error (NMSE), which confirms the importance of target-user selection (Fig. 2). The algorithm also shows good convergence as the number of iterations increases (Fig. 3), and its performance approaches the ideal case more closely as the number of time frames increases (Fig. 4). In addition, the method remains robust as the number of blocked elements increases (Fig. 5). More BS antennas further improve performance by enhancing array orthogonality (Fig. 6). By exploiting path correlation, the proposed method achieves better estimation accuracy with slightly lower runtime (Table 1). However, estimation accuracy decreases as the number of paths increases because the model becomes more complex (Figs. 7 and 8).  Conclusions  This paper proposes a joint channel estimation and blockage diagnosis framework for blocked RIS-assisted multi-user multipath mmWave systems. Simulation results show that the method approaches the theoretical performance bound in complex multipath environments. It also maintains clear performance advantages under high blockage rates while reducing computational complexity through the use of common channel structures. This study provides a practical solution to performance degradation in RIS deployment, clarifies the effects of key parameters, and offers guidance for system design. Because practical blockages often exhibit block-sparse or structured-sparse characteristics, future work may incorporate structured priors, such as group sparsity and Markov random fields, into the SBL framework to capture spatial correlation and improve diagnostic accuracy and robustness.
PLS-YOLO: A Lightweight Model for Signal Modulation Recognition
ZHOU Xiaobo, ZHANG Fan, SHE Chao, ZHOU Guofei, MENG Jianping
Available online  , doi: 10.11999/JEIT251377
Abstract:
  Objective  As wireless communication evolves toward high efficiency, low latency, and ubiquitous connectivity, higher requirements are placed on Automatic Modulation Recognition (AMR) to ensure link reliability in complex electromagnetic environments. Deep learning has improved recognition performance compared with traditional methods, which often rely on subjective feature design and have limited robustness. However, existing YOLO-based AMR models are not fully optimized for specific signal characteristics or practical deployment. These models often have excessive parameters and high computational complexity, which makes them unsuitable for resource-constrained hardware, such as edge nodes and Field-Programmable Gate Arrays (FPGAs), and limits their ability to meet real-time communication requirements. To address these bottlenecks, this paper proposes Precision and Lightweight Structure-YOLO (PLS-YOLO), a lightweight AMR model based on YOLOv10n. By optimizing network channels, replacing core modules, and improving the downsampling mechanism, the proposed model enables efficient integration of modulation signal classification and localization. It also reduces the parameter count and computational complexity, thereby supporting AMR deployment in resource-constrained scenarios.  Methods  The method includes two main stages: dataset preprocessing and PLS-YOLO model construction. In the preprocessing stage, the public RadioML2016.10a and RadioML2016.10b benchmark datasets for signal modulation recognition are used. For In-phase and Quadrature (IQ) signals in these datasets, the Short-Time Fourier Transform (STFT) is used to map one-dimensional temporal signals into two-dimensional time-frequency spectrograms containing phase and amplitude information. This process provides richer feature representations for the model. A random sampling strategy without replacement is then used to stitch individual time-frequency samples into 3×3 composite images (Fig. 4). Target labels matching the input format of YOLO-series models are generated at the same time. The dataset is divided into training, validation, and test sets at a ratio of 7:1.5:1.5 by stratified sampling to ensure consistent signal-type distributions across all subsets. The model is built on YOLOv10n, with targeted improvements designed to balance the parameter count and recognition performance. The C2f module in the original backbone network is replaced with the CSPPC module, which is based on the CSP architecture and consists of feature splitting, Partial Convolution (PConv) processing, and feature fusion. This design reduces parameters while improving recognition performance. The feature dimensionality reduction process in the backbone network is also reconstructed to reduce the increase in computational complexity caused by parameter redundancy. The traditional downsampling module is replaced with CGBlock, which improves the capture of complex modulation signal features by fusing context-aware information. Finally, standard convolutions in the PSA and v10Detect modules are replaced with PConv to further reduce computational complexity and jointly optimize lightweight design and recognition performance.  Results and Discussions  Experimental results on RadioML2016.10a show that PLS-YOLO achieves a mean Average Precision (mAP) of 68.4% within the Signal-to-Noise Ratio (SNR) range of –20 to 18 dB. The mAP increases to 94.3% when SNR ≥ 0 dB. Compared with the baseline YOLOv10n model, PLS-YOLO improves mAP by 0.6%, reduces the parameter count by 47.33%, and decreases computational complexity by 34.15%. Its inference speed also increases by 5 frames per second (fps) (Table 2). These results show that the model effectively balances recognition performance and lightweight deployment by reducing computational cost while improving precision. To verify robustness, additional experiments are conducted on RadioML2016.10b. As shown in Table 4, PLS-YOLO achieves an mAP of 73.30% over the –20 to 18 dB range and 95.4% at SNR ≥ 0 dB. It outperforms mainstream models such as MCNet and LSTM2, confirming its strong recognition performance. Furthermore, Fig. 5 shows that converting IQ data into spectrograms is more suitable for PLS-YOLO recognition of digital modulation signals. By contrast, the recognition performance for analog modulation signals remains limited. Future work should therefore improve feature modeling and recognition capability for analog signals.  Conclusions  This study proposes PLS-YOLO, a lightweight AMR model based on YOLOv10n. To jointly improve modulation recognition performance and model compactness, the network structure is optimized through channel dimensionality reduction, core module replacement, downsampling mechanism improvement, and PConv substitution. These strategies reduce key limitations of existing YOLO-based AMR models, including parameter redundancy, high computational complexity, and limited adaptability to resource-constrained scenarios such as edge nodes and FPGAs. Experiments on the RadioML2016.10a and RadioML2016.10b benchmark datasets show that PLS-YOLO achieves strong overall performance. While integrated signal classification and localization are maintained, both parameter count and computational complexity are substantially reduced compared with the baseline YOLOv10n model, with a clear improvement in recognition performance. The results verify the effectiveness and feasibility of the proposed optimization strategies and provide a practical technical path for AMR implementation. The remaining limitations in analog modulation signal recognition also indicate a clear direction for future research.
Dynamic Focus and Semantic Prompt Network for Fine-Grained Pest Classification
LIU Changyuan, ZHAO Haijian, WU Haibin
Available online  , doi: 10.11999/JEIT260044
Abstract:
  Objective  Agricultural pest images are often affected by complex background interference, large appearance differences across morphological stages, diverse shooting angles, and substantial scale variation. These factors limit feature extraction and morphological adaptability in existing fine-grained classification models. To address these challenges, an Agricultural Pest Multi-Dimensional dataset (APMD) is constructed to cover multiple morphological stages, viewing angles, and object scales. In addition, a Dynamic Focus and Semantic Prompt Network for fine-grained pest classification (DFS-PestNet) is proposed. The network adopts a decoupled parallel architecture that combines a main feature stream and a prompt enhancement stream. A Spatial Dependency Perception (SDP) module is designed to dynamically focus on key discriminative regions, such as pest spots and wing veins, thereby improving local subtle feature extraction under complex backgrounds. An Advanced Haptic-Visual Prompting (AHVP) module is introduced to integrate category semantics and spatial position information into shallow and middle-level features, which improves adaptability to morphological variations across developmental stages. Dual-branch Saliency Sampling (DSS) is further adopted to adaptively aggregate key features from essential pest body parts through learnable prototype components and dual-branch saliency fusion. This strategy improves the recognition of small targets, including tiny pests and early-stage larvae. Experimental results show that the proposed model achieves better classification performance than baseline and mainstream methods on both public and self-constructed datasets. These results verify the effectiveness and application potential of the model in complex agricultural scenarios and provide a technical reference for intelligent pest monitoring and precise control in smart agriculture.  Methods  To improve classification accuracy under complex background interference and multi-morphological conditions, APMD is first constructed. This dataset contains image data covering different pest morphological stages, viewing angles, and scales. Specifically, it includes 15,680 images from 58 species, which are divided into training, validation, and testing sets at a standard ratio of 7:2:1 (Fig. 1) (Table 1). The dataset provides high-quality data support for research on fine-grained pest classification. DFS-PestNet is then proposed. In this network, the SDP module is designed to adaptively locate and enhance key discriminative pest regions. By reducing the effects of pose variation and complex background interference, this module enables more accurate fine-grained feature extraction. The AHVP module is also incorporated into the network to embed category semantics and spatial position information. This module guides the network to focus on key discriminative features across different morphological periods, thereby improving recognition robustness under large morphological changes during the pest life cycle. Furthermore, DSS is proposed to adaptively aggregate features from essential pest body parts. This strategy strengthens the recognition of challenging small targets and reduces the difficulty of small-target recognition in fine-grained pest classification.  Results and Discussions  The performance of DFS-PestNet in fine-grained pest classification is evaluated through multidimensional experiments. First, qualitative visualization is conducted. Grad-CAM heatmaps show that, compared with the baseline model, which is easily affected by complex farmland backgrounds and plant stems, DFS-PestNet effectively suppresses background noise and focuses on fine-grained discriminative parts, such as pest heads and antennae (Fig. 6). The model also shows clear advantages in capturing features of tiny targets, such as leafhopper nymphs, and pests at different life stages, such as Chilo suppressalis hidden within stems. The t-SNE feature reduction results further confirm that the proposed model reduces feature confusion in multi-morphological scenarios. High-dimensional features show clearer inter-class separation and tighter intra-class clustering in a two-dimensional visual space (Fig. 7). Second, quantitative ablation and parameter optimization experiments are performed. The ablation studies validate the synergistic effect of the three improved modules, namely SDP, AHVP, and DSS (Table 2). Their combination increases the classification accuracy of the baseline model by 2.21%, reaching 77.24%, with all core evaluation metrics achieving the best values. Hyperparameter optimization further identifies 6 as the optimal number of prompt position tokens and 0.2 as the optimal feature dropout rate (Fig. 8). This configuration ensures sufficient semantic representation while achieving a good balance between simulating natural occlusion and improving model robustness. Finally, comparative experiments with mainstream state-of-the-art models are conducted. Compared with existing advanced Convolutional Neural Network (CNN) and Transformer architectures, such as Gate-ViT and EST, DFS-PestNet achieves the highest accuracies of 77.24% and 98.01% on the large-scale public dataset IP102 and the challenging self-constructed APMD dataset, respectively (Table 3) (Table 4). These results show consistent improvements across fine-grained classification metrics. Moreover, while maintaining high classification accuracy, the proposed model achieves inference speeds of 158 frame/s and 164 frame/s on the two datasets, respectively. In summary, DFS-PestNet achieves strong classification accuracy and high inference efficiency for complex pest feature extraction across large scale variation and multiple morphological stages. This provides a practical basis for efficient deployment in smart agriculture.  Conclusions  To address multi-morphological variation and small-target recognition in fine-grained pest classification, the APMD dataset is constructed, and DFS-PestNet is proposed based on the MPSA baseline. Specifically, the SDP module is introduced to adaptively focus on pose- and morphology-invariant discriminative features. The AHVP module embeds category semantics and spatial position information into shallow and middle-level networks. The DSS module adaptively aggregates key body-part features to improve small-target recognition. Experimental results show that DFS-PestNet outperforms mainstream models on both the IP102 and APMD datasets across different developmental stages, angles, and scales. Future work will focus on lightweight model design for efficient edge deployment and open-set recognition for early warning of unknown pest categories in complex real-world environments.
Remote Sensing Land-cover Classification Combining Multi-modal and Multi-scale Fusion with Mamba
XIE Wen, ZHU Chaotao, WANG Jin, MA Xiaomeng
Available online  , doi: 10.11999/JEIT251303
Abstract:
  Objective   The rapid development of remote sensing imaging has generated large-scale and diverse data for remote sensing land-cover classification. In recent years, Mamba-based models have been successfully applied to image processing because of their distinctive architectures and strong global modeling capability. Among them, multi-scale vision Mamba models are well suited to complex spatial distributions. This property matches remote sensing scenes, in which ground objects often have large scale variations and complex orientations. To fully use the advantages of Mamba in feature extraction and fusion for remote sensing data, a Mamba-based Multi-modal and Multi-scale fusion model for Remote Sensing land-cover classification (M3RS) is proposed.  Methods   M3RS mainly contains three stages for feature extraction and fusion. First, a Multi-Scale Spatial Encoder based on Spatial Mamba is used to extract features from Light Detection And Ranging (LiDAR) images and Synthetic Aperture Radar (SAR) images. Considering the unique data structure of HyperSpectral Image (HSI), a Multi-Scale Spatio-Spectral Encoder is proposed to extract complex spatio-spectral features by using Spatial Mamba and Spectral Mamba. Next, a Multi-Modal Feature Fusion Module, consisting of the proposed Cross-Mamba and Channel-Concatenated Mamba, is designed to fuse multi-modal features. Cross-Mamba efficiently fuses multi-modal spatial features through the interaction of State Space Model (SSM) parameters from different modalities. Channel-Concatenated Mamba further fuses multi-modal features by constructing four channel scanning strategies. Finally, an improved Multi-Scale Feature Fusion Module is adopted to fuse multi-scale features layer by layer. This design provides highly discriminative features for classification and improves the accuracy of remote sensing land-cover classification.  Results and Discussions   Comparative experiments are conducted on three publicly available multi-modal remote sensing land-cover classification datasets. The proposed model is compared with seven mainstream models. The results show that M3RS achieves the best Overall Accuracy (OA), Average Accuracy (AA), and Kappa coefficient among all compared methods. On the Muufl dataset, the OA of M3RS is 3.49%, 3.80%, and 4.02% higher than those of representative Convolutional Neural Network (CNN)-, Transformer-, and Mamba-based models, respectively (Table 1, Fig. 8). On the Houston2013 and Augsburg datasets, the OA of M3RS exceeds those of all compared algorithms by an average of 3.37% and 3.11%, respectively (Tables 2 and 3). These results indicate that integrating a multi-modal and multi-scale architecture with Mamba improves the accuracy of remote sensing land-cover classification. In addition, the ablation experiment verifies the contribution of each proposed module to classification performance (Table 4). Spectral Mamba provides a clear accuracy gain, and the fusion modules further improve the overall performance to different degrees. The hyperparameter experiment also provides a useful configuration for multi-scale remote sensing image fusion (Table 5). Compared with a Transformer model using the same multi-scale architecture, M3RS achieves higher classification accuracy, reduces the parameter count by 37.4%, and shortens the training time by 10.7%. These results show that Mamba improves both accuracy and efficiency in this framework (Fig. 9).  Conclusions   M3RS uses Mamba to fuse multi-modal and multi-scale features, thereby improving remote sensing land-cover classification. The heterogeneous encoders in M3RS address differences among multi-modal data and provide richer complementary information for fusion and classification. Cross-Mamba and Channel-Concatenated Mamba account for both the similarities and differences between Mamba and Transformer. They achieve efficient multi-modal spatial feature interaction and comprehensive multi-modal feature fusion, respectively, forming a hierarchical fusion strategy. The multi-scale architecture also alleviates the difficulty caused by complex spatial distributions of remote sensing land covers. The proposed Multi-Scale Feature Fusion Module, composed of Spatial Mamba and channel attention, integrates multi-scale features and provides a reliable basis for subsequent classification. Future work will further optimize the model by exploring the principles of Mamba and refining feature alignment in cross-attention-based multi-modal interaction, thereby improving the reliability of feature fusion.
Optimal Weighted Subspace Fitting-based Direct Position Determination with HF/VHF Collaboration
YANG Gao-yuan, YIN Jie-xin, WANG Ding, YANG Bin
Available online  , doi: 10.11999/JEIT260001
Abstract:
  Objective   Passive localization is essential for target detection, navigation, and track tracking, particularly in military applications involving maritime and aerial targets. These targets often transmit across multiple frequency bands, including shortwave High Frequency(HF) and Very High Frequency (VHF). Existing localization methods largely rely on single-band approaches or two-step positioning techniques. Single-band methods underutilize the positional information available across different bands, while two-step methods lose information during intermediate parameter estimation (e.g., Direction-Of-Arrival (DOA); Time-Difference-Of-Arrival (TDOA)), reducing localization accuracy. Collaborative fusion of HF signals (via ionospheric reflection) and VHF signals (via Doppler effects from moving arrays) has been rarely addressed. To overcome low positioning accuracy and limited spatial resolution in over-the-horizon multi-target scenarios, this study proposes a novel collaborative Direct Position Determination (DPD) method designed to integrate the complementary strengths of HF and VHF signals, enhancing localization precision and robustness in complex electromagnetic environments.  Methods  An Optimal Weighted Subspace Fitting (OWSF) DPD algorithm is proposed. Comprehensive signal propagation models are established for heterogeneous observation platforms (Fig. 1). HF signal propagation is modeled using a two-dimensional DOA framework based on ionospheric reflection, incorporating azimuth and elevation angles to handle nonlinear over-the-horizon propagation. VHF signals are modeled using a space-time extended signal framework for a moving Unmanned Aerial Vehicle (UAV), exploiting Doppler effects to create a virtual large-aperture array that captures both one-dimensional angle and Frequency-Of-Arrival (FOA) information. Unlike traditional methods that process each band separately, the OWSF algorithm constructs a unified cost function that fuses the signal and noise subspaces of both HF and VHF data using optimal weighting matrices, balancing the contributions of different signal qualities. Target positions are then estimated by minimizing this cost function via grid search or Newton iteration. The Cramér-Rao Bound (CRB) under Earth-ellipsoid constraints is derived to provide the theoretical performance limit.  Results and Discussions   Simulations are conducted in a centralized processing scenario, where HF stations and UAV VHF signals are transmitted to a central station for joint processing (Fig. 2). The simulation involves three stationary targets and a collaborative system comprising HF stations and a UAV (Fig. 3, Table 2, Table 3). Performance comparisons demonstrate that the OWSF method consistently outperforms traditional two-step positioning methods and single-system DPD methods (DOA-only or FOA-only) in Root Mean Square Error (RMSE) (Fig. 4). When HF SNR is 5 dB lower than VHF SNR, OWSF exhibits superior robustness compared to Subspace Data Fusion (SDF) and Minimum Variance Distortionless Response (MVDR) methods, approaching the CRB at high SNR (Fig. 5). The impact of system parameters is further analyzed, showing that increasing the number of sampling points (Fig. 6) and array elements (Fig. 7) improves accuracy, particularly in low SNR regimes. Regarding spatial resolution, the OWSF algorithm generates sharper spectral peaks for distant targets and successfully resolves closely spaced targets that the SDF-DPD algorithm fails to distinguish (Fig. 8, Fig. 9).  Conclusions   The HF/VHF collaborative DPD method effectively integrates multidimensional observational information from ionospheric reflection and Doppler-based propagation. Simulation results demonstrate substantial improvements in localization accuracy, spatial resolution, and robustness, especially under low-SNR conditions or heterogeneous signal quality between bands. The derived CRB provides a solid theoretical benchmark, confirming that the method overcomes the limitations of single-band and two-step approaches. This approach offers a highly effective solution for over-the-horizon passive localization of multiple stationary targets.
Household Appliance Plastics Identification by Fusing Multi-Level Feature Enhancement and Hierarchical Classification
CHONG Penghao, ZHENG Yunlong, YANG Aosong, GUO Mengci, LI Shifeng
Available online  , doi: 10.11999/JEIT260084
Abstract:
  Objective  Accurate plastic identification remains challenging in waste household appliance recycling under low-resolution spectral conditions. In practical recycling environments, plastics often have complex compositions, surface contamination, and aging effects, which increase classification difficulty. Black plastics are especially difficult to identify because their strong light absorption and spectral overlap in the Visible-Near Infrared (Vis-NIR) range reduce feature separability and degrade classification performance. Under these conditions, conventional single-stage classification models often fail to maintain stable accuracy. To address this problem, an automated identification method is proposed for low-dimensional multispectral feature spaces. The method aims to improve the discriminative capability of limited spectral information and enhance classification accuracy for complex plastic categories.  Methods  A compact Vis-NIR multispectral acquisition system based on the AS7265x sensor is used to collect 18-channel reflectance data in the 410~940 nm range. A handheld acquisition device with a controlled optical structure is designed to reduce environmental interference and ensure measurement consistency (Fig. 3). A total of 576 samples are collected from five typical household appliance plastics, including Acrylonitrile Butadiene Styrene (ABS), High-Impact PolyStyrene (HIPS), PolyPropylene (PP), Acrylonitrile Styrene copolymer (AS), and Polycarbonate/Acrylonitrile Butadiene Styrene (PC+ABS) blends. These samples are obtained from waste household appliances and are subjected to preliminary surface cleaning before spectral acquisition. To improve feature representation, a multi-level feature engineering strategy is adopted. This strategy integrates original spectral intensity features, nonlinear polynomial expansion features, and adjacent-channel ratio features to characterize both global and local spectral information. The nonlinear expansion enhances the representation of reflectance variations, whereas the ratio features capture local spectral-shape changes and reduce external disturbances. These features are combined into a 53-dimensional feature vector. Linear Discriminant Analysis (LDA) is then applied to enhance interclass separability. To address spectral overlap and class imbalance, a Hierarchical Joint Classifier (HJC) is constructed. HJC uses a two-stage classification framework. In the first stage, an XGBoost-based primary classifier performs coarse classification to separate easily distinguishable samples and group spectrally similar black plastics. In the second stage, a TabTransformer-based secondary classifier performs fine-grained classification of difficult samples (Fig. 6). This hierarchical design reduces classification complexity and improves discrimination for challenging categories. Model performance is evaluated using five-fold cross-validation and an independent test set. Accuracy, precision, recall, and F1-score are calculated from confusion matrices (Fig. 7). Comparative experiments are conducted with traditional machine learning methods, ensemble learning models, and deep learning approaches under different feature-processing strategies (Fig. 8, Fig. 9).  Results and Discussions  The proposed HJC achieves a classification accuracy of 97.4% in five-fold cross-validation and 93.1% on the independent test set (Table 4). Compared with single-stage classifiers and methods without feature enhancement, the proposed method provides higher performance and greater stability under low-resolution spectral conditions. Comparative results show that the proposed method outperforms baseline approaches, such as PCA combined with CNN, which achieves an accuracy of approximately 71.3% on the same dataset (Fig. 8). This improvement indicates that the proposed feature engineering strategy effectively strengthens the discriminative capability of low-dimensional spectral data. Combining LDA with feature engineering further improves class separability compared with conventional PCA-based methods. Confusion matrix analysis shows that misclassifications mainly occur between spectrally similar black ABS and black HIPS samples, whereas most other categories achieve high classification accuracy (Fig. 9). These results indicate that spectral overlap remains the main challenge under low-resolution conditions. The hierarchical classification strategy reduces this problem by focusing classification resources on difficult samples, thereby improving the overall generalization ability of the model. Overall, the proposed method shows robustness under practical conditions, including spectral noise, limited channel resolution, and material heterogeneity. These results indicate its suitability for real-world recycling applications.  Conclusions  A hierarchical classification method with multi-level spectral feature engineering is developed for plastic identification under low-resolution Vis-NIR conditions. Nonlinear and spectral-shape features are incorporated into a two-stage framework to improve the identification of spectrally similar materials. The results show stable accuracy across different plastic types. The method is suitable for automated sorting in waste household appliance recycling and can be extended to other material identification tasks with limited spectral information.
Spatial Information-guided Diffusion for Domain Adaptation Semantic Segmentation of Remote Sensing Images
LIANG Yan, LI Jun-Fan, SHAO Kai, HU Lin
Available online  , doi: 10.11999/JEIT260031
Abstract:
  Objective  Domain Adaptation Semantic Segmentation (DASS) is critical for remote sensing applications, including land-cover mapping, urban planning, and environmental monitoring. However, deep learning models often show severe performance degradation under domain shifts caused by imaging variation, geographic differences, and label-semantic heterogeneity. Conventional feature-alignment and generative adversarial network-based methods often fail to preserve semantic consistency. They are also sensitive to noisy supervision, especially when cross-domain gaps are large. This work aims to construct a robust DASS framework for semantically consistent image translation and reliable knowledge transfer.  Methods  A two-stage framework, termed Co-training Spatial-Guided DASS (CoSG-DASS), is proposed by integrating image translation and co-training. In the image-translation stage, a spatial information-guided latent diffusion model enhanced by ControlNet is designed. Semantic pseudo-labels and depth estimates are used as horizontal semantic and vertical spatial conditions to guide target-style image generation. To reduce the effect of noisy pseudo-labels, an Entropy-based Adaptive Guidance Intensity Module (EAGIM) is introduced. EAGIM estimates pixel-level confidence using information entropy and suppresses unreliable features. In the co-training stage, translated target-style images and unlabeled real target-domain images are used to train a segmentation model with a depth-guided segmentation head. Cross-entropy loss and adversarial loss are jointly used for optimization.  Results and Discussions  Extensive experiments are conducted on three cross-domain tasks. CoSG-DASS generates images that better match target-domain distributions. Quantitative results based on Fréchet Inception Distance (FID) show that the proposed method outperforms CycleGAN, UNI-Diff, and CRS-Diff in most settings (Table 1). Visual comparisons (Fig. 6) show that the method reduces edge blurring and category confusion. It also improves the separation of roads and vegetation and preserves small objects, such as vehicles. In the semantic segmentation stage, CoSG-DASS outperforms state-of-the-art domain adaptation methods. It improves mean Intersection over Union (mIoU) by 1.14%, 3.78%, and 2.49% on the cross-geographic task (Vaihingen IRRG→Potsdam IRRG), cross-imaging-mode task (Vaihingen IRRG→Potsdam RGB), and bidirectional label-semantic-heterogeneity tasks between DFC25 and LoveDA, respectively (Tables 24). Visual segmentation results (Fig. 7) confirm its strong boundary preservation and high accuracy in complex scenes. Ablation studies (Table 5) verify the contribution of the core components, including depth control, pseudo-label guidance, EAGIM, and the co-training strategy. Feature-distribution visualization based on Uniform Manifold Approximation and Projection (UMAP) further shows that CoSG-DASS reduces intra-class variation and increases inter-class separation after adaptation (Fig. 8).  Conclusions  CoSG-DASS alleviates domain shifts in remote sensing images through semantic-preserving diffusion-based translation and depth-guided co-training. It improves both image-translation quality and segmentation accuracy over existing methods. The proposed framework provides an effective solution for multi-source remote sensing interpretation. Future work will focus on extreme label-semantic heterogeneity and lightweight diffusion architectures.
SG-DDPG-based Low-intercept Point Beam Design for FDA-MIMO Short-range Detectors
JIA Jinwei, GAO Min, HAN Zhuangzhi, LIU Limin, YIN Yuanwei
Available online  , doi: 10.11999/JEIT260010
Abstract:
  Objective  Radio short-range detectors are widely used in many detection systems. However, in modern battlefields, the electromagnetic environment is increasingly complex, and radio short-range detectors must withstand various forms of electromagnetic interference. In particular, fourth-generation jammers based on Digital Radio Frequency Memory (DRFM) can implement repeater deception jamming. Such jamming may cause failures such as premature detonation in radio short-range detectors and reduce their damage effectiveness. Anti-repeater deception jamming has therefore become a key issue for short-range detectors. Improving the Low Probability of Intercept (LPI) performance of radio short-range detectors is an effective means of resisting repeater deception jamming. According to the Chinese manuscript, this study focuses on the effect of FDA-MIMO array-element frequency-offset settings on beam synthesis and proposes an SG-DDPG-based method for LPI point beam design.  Methods  Frequency Diverse Array-Multiple-Input Multiple-Output (FDA-MIMO) technology is used in this study, and the key factors affecting beam convergence are analyzed. For the spatial LPI beam design of radio short-range detectors, a performance evaluation model for spatial LPI beams is constructed. An FDA-MIMO LPI point beam design method based on the Stage Guidance-Deep Deterministic Policy Gradient (SG-DDPG) algorithm is then proposed. In the SG-DDPG algorithm, a multidimensional staged guidance reward function is designed. An Actor-Critic model is used to maximize the reward value through gradient ascent. The array-element frequency offsets that provide better beam convergence in the current environment are then obtained. The SG-DDPG algorithm is suitable for LPI point beam design under different fall angles of radio short-range detectors. It overcomes the technical limitation of formula-based frequency-offset calculation, which is only applicable when the detector fall angle is close to vertical.  Results and Discussions  The simulations show that, after the array-element frequency offsets are optimized by the SG-DDPG algorithm, the FDA-MIMO beam achieves a half-power beam width of 1 m in the range dimension and 9.9° in the angular dimension. The proposed method provides better beam convergence and LPI performance than classical frequency-offset design methods. These results indicate that the proposed algorithm offers an effective approach for array-element frequency-offset optimization and LPI point beam design, thereby improving the LPI performance of radio short-range detectors.  Conclusions  This paper presents an FDA-MIMO LPI point beam design method based on the SG-DDPG algorithm, with the array-element frequency offset used as the optimization objective. The simulation results support two main conclusions. First, the proposed method removes the restriction that the fall angle of the radio short-range detector must be close to vertical when the array-element frequency offset is calculated by a formula-based method. The algorithm can be applied to LPI beam design under different fall angles and improves the LPI performance of radio short-range detectors. Second, the proposed method achieves a half-power beam width of only 1 m in the range dimension and 9.9° in the angular dimension, which is better than that of traditional methods. Under different fall angles, the beam formed by the proposed method has the smallest intercept area, indicating the best LPI performance.
Performance Optimization and Gate Oxide Electric Field Analysis of 1200V Trench SiC MOSFET Based on PCL-CSL Collaborative Design
FANG Shaoming, LI Hongda, GAO Yuan
Available online  , doi: 10.11999/JEIT260164
Abstract:
  Objective  1 200 V Silicon Carbide (SiC) trench Metal-Oxide-Semiconductor Field-Effect Transistors (MOSFETs) are key devices in medium- and high-voltage power conversion systems. They feature high switching performance, low conduction loss, and high-temperature stability. However, conventional trench structures suffer from electric-field concentration at the trench corner and bottom gate oxide. This effect can cause the peak gate oxide electric field to exceed the industrial reliability criterion of 3 MV/cm, reducing long-term reliability. In addition, strong trade-offs exist among breakdown voltage, specific on-resistance, threshold voltage, and peak gate oxide electric field. These trade-offs make it difficult to achieve high efficiency and high reliability at the same time. To address these issues, this work studies a synergistic structure that combines deep P-type Column (PCL), Carrier Storage Layer (CSL), and locally thickened gate oxide. The aim is to regulate the electric-field distribution, suppress electric-field concentration, improve carrier transport, and achieve balanced device performance. This study provides a systematic design method for high-reliability and high-performance 1 200 V Trench SiC MOSFETs for industrial applications.  Methods  Numerical device simulations were performed using a Technology Computer-Aided Design (TCAD) platform to analyze and optimize the electrical performance of 1 200 V Trench SiC MOSFETs. To ensure reliable simulations, physical models were used for bandgap narrowing, Shockley-Read-Hall (SRH) recombination, Auger recombination, avalanche breakdown, incomplete dopant ionization, doping- and temperature-dependent mobility, and high-field mobility saturation. A device structure with deep PCL, CSL, and locally thickened bottom gate oxide is constructed to reduce the peak gate oxide electric field and improve device reliability. Key structural and process parameters were swept and quantitatively analyzed. These parameters included epitaxial layer thickness (TEpi), epitaxial layer doping concentration (NEpi), trench width, trench depth, P-Well (PW) implantation dose, PCL spacing, and CSL implantation dose. Static electrical characteristics, including threshold voltage (Vth), specific on-resistance (Ron,sp), Breakdown Voltage (BV), and peak gate oxide electric field (Eox,max) are extracted and evaluated. The final parameter combination is finally determined through a trade-off analysis between conduction performance and long-term device reliability.  Results and Discussions  The simulation results show that the deep PCL structure redirects electric-field lines away from the trench bottom gate oxide and reduces electric-field concentration. When this structure is combined with the locally thickened bottom gate oxide, Eox-max is reduced below 3 MV/cm, meeting the industrial reliability criterion. The CSL broadens the vertical conduction path, reduces current crowding, and decreases Ron,sp. Parameter optimization shows that TEpi, NEpi, trench dimensions, PW implantation dose, and CSL implantation dose determine the trade-off between BV and conduction performance (Fig. 5, Fig. 6, Fig. 9, Fig. 10, and Fig. 19). PCL spacing has a strong effect on electric-field shielding and gate oxide protection (Fig. 16 and Fig. 17). After multi-parameter optimization, the device achieves VTH=4.7 V, BV=1 708 V, Ron,sp=1.57 mΩ·cm2, and Eox-max=2.5 MV/cm (Table 2). These results indicate balanced performance for high-voltage power applications.  Conclusions  A synergistic PCL-CSL structural design for 1 200 V Trench SiC MOSFETs is studied and validated through TCAD simulation. The design addresses key limitations of conventional Trench SiC MOSFETs, including high peak gate oxide electric field, limited breakdown capability, and the trade-off between conduction performance and reliability. The effects of TEpi, NEpi, trench dimensions, PW implantation dose, PCL spacing, and CSL implantation dose on device performance and gate oxide reliability are clarified through parameter sweeping and comparative analysis. With coordinated structural optimization, the optimized device achieves low Ron,sp, high BV, suitable VTH, and suppressed electric-field concentration near the trench bottom oxide. Eox-max is controlled below the 3 MV/cm industrial reliability criterion, which reduces the risk of oxide degradation under high-bias operation. The proposed structural strategy and optimization method provide guidance for the design, simulation, and process development of high-voltage, high-reliability SiC power devices.
Multipath Scheduling Algorithm for UAV Video Streaming
CAO Changlong, LI Lingzhi, SHI Lianmin, ZHAO Qingyue
Available online  , doi: 10.11999/JEIT260002
Abstract:
  Objective   With the rapid growth of the low-altitude economy, Unmanned Aerial Vehicle (UAV) technology has been widely used in emergency rescue, disaster monitoring, urban security, and other applications. In these scenarios, stable, low-latency, and high-fidelity video backhaul is critical for task execution. Multipath transport protocols can improve Quality of Experience (QoE) through bandwidth aggregation, providing an effective basis for UAV video streaming. However, under dynamic and heterogeneous network conditions, the performance of multipath transport protocols depends strongly on the design of multipath scheduling algorithms. Existing heuristic schedulers use predefined rules to reduce head-of-line blocking and inter-path load imbalance, but their adaptability remains limited in highly dynamic environments. Learning-based schedulers can learn the mapping between network states and scheduling rewards from real-time feedback, enabling adaptive performance optimization. However, most existing learning-based schedulers are designed for general network scenarios. They are not optimized for UAV networks, and their ability to guarantee QoE has not been fully validated. A multipath scheduling algorithm tailored to UAV video streaming is therefore needed to better exploit the performance potential of multipath transport protocols.  Methods   To address the dynamic and heterogeneous challenges of UAV video streaming, this paper proposes NeuroFly, a multipath scheduling framework based on the NeuralUCB algorithm. In NeuroFly, multipath traffic scheduling is formulated as a Contextual Multi-Armed Bandit (CMAB) problem. The context space is constructed by integrating path state information, video encoding features, and UAV mobility parameters, which jointly characterize the current transmission environment. In the action space, a frame-priority-driven redundant transmission mechanism is proposed. Video frames are assigned different frame priorities according to decoding dependencies, and differentiated redundancy strategies are used to improve the probability of successful video-frame delivery. A multi-objective reward function is further designed to guide policy learning and support adaptive optimization under dynamic and heterogeneous network conditions. In addition, a context monitoring mechanism is integrated into NeuroFly to handle abrupt environmental changes caused by high UAV mobility. This mechanism detects context distribution shifts and triggers a two-stage restart strategy. A soft restart is activated when gradual context drift is detected, removing outdated historical experience. A hard restart is performed under abrupt context changes by clearing the experience replay buffer and reinitializing model parameters, allowing learning to restart under a new distribution.  Results and Discussions   The proposed NeuroFly framework is evaluated in both simulation and field environments. First, Mininet-WiFi is used to simulate realistic UAV network environments and evaluate overall QoE performance. The results (Fig. 4) show that, compared with state-of-the-art heuristic and learning-based schedulers, NeuroFly achieves broad performance gains by fully using aggregated multipath bandwidth. Specifically, the 99th-percentile latency is reduced by 19.9%~51.0%, the average video frame rate is increased by up to 24.6%, image structural similarity is improved by up to 49.2%, and the buffering time ratio is reduced by 13.4%~77.6%. These results demonstrate the strong ability of NeuroFly to guarantee QoE. Field experiments (Fig. 6) further confirm that NeuroFly provides favorable optimization in real UAV operation scenarios. Compared with mainstream transport solutions widely deployed in production environments, NeuroFly achieves better real-time transmission performance and shows strong practical applicability for future large-scale UAV deployment.  Conclusions   This paper addresses network dynamics, path heterogeneity, and time-varying transmission conditions in UAV video streaming over multipath transport protocols. An intelligent multipath scheduling framework, NeuroFly, is proposed based on the NeuralUCB algorithm. In this framework, multipath traffic scheduling is modeled as a CMAB problem. Through the design of the context space, action space, and multi-objective reward function, online learning and adaptive optimization of traffic allocation policies are achieved. To further improve robustness under severe environmental changes, a lightweight context monitoring mechanism is introduced to detect context distribution drift and restart the learning process when needed. Systematic evaluations are conducted on both simulation platforms and real UAV operation environments. The simulation results show that NeuroFly achieves consistent improvements across QoE metrics compared with state-of-the-art heuristic and learning-based schedulers. The field results further indicate that NeuroFly provides reliable guarantees in actual UAV operation scenarios when compared with mature solutions that have been widely deployed in production environments. These results validate the practicality, robustness, and engineering feasibility of NeuroFly, and suggest its potential for large-scale deployment in UAV applications that are sensitive to real-time video quality, including emergency response, power inspection, agricultural monitoring, and logistics delivery.
Research on Energy Efficiency Optimization of Rotatable Hybrid Intelligent Reflecting Surface Communication
ZHANG Guangchi, GUO Xuan, WANG Luyao, CUI Miao, FU Hao
Available online  , doi: 10.11999/JEIT260119
Abstract:
  Objective  With the evolution of 6G communication networks, reconfigurable intelligent surfaces (RIS) have emerged as a pivotal technology for reshaping wireless environments and enhancing spectral efficiency. However, conventional fixed RIS architectures face two critical challenges in practical deployment: the “angle mismatch” loss, where the effective aperture significantly diminishes when users are located at large angles from the RIS normal, and the “energy consumption bottleneck,” caused by the high cumulative power consumption of radio frequency (RF) circuits and static control elements in large-scale arrays. Existing research often treats mechanical rotation and element switching in isolation, lacking a unified framework to balance the trade-off between mechanical/circuit energy consumption and communication gain. To address these limitations, this paper investigates a rotatable and switchable hybrid RIS (H-RIS) assisted downlink communication system. The primary objective is to maximize the system’s energy efficiency (EE) by jointly optimizing the base station transmit power, subarray activation states, physical rotation angles, and electronic phase shifts. This approach aims to introduce mechanical rotation degrees of freedom to compensate for path loss and employ dynamic switching mechanisms to reduce redundant power consumption, thereby achieving sustainable green communication.  Methods  A joint optimization framework is established for the H-RIS aided single-user multiple-input single-output (MISO) system. The system model explicitly accounts for the dynamic power consumption induced by mechanical rotation and the static power consumption of active subarrays. The resulting optimization problem is formulated as a non-convex mixed-Integer non-linear programming (MINLP) problem, involving coupled binary variables (activation status) and continuous variables (power, angles, phases). To solve this challenging problem, a block coordinate descent (BCD)-based alternating optimization (AO) algorithm is proposed to decouple the variables into three sub-problems.Firstly, to tackle the exponential complexity caused by binary switching variables, a channel contribution-based ranking strategy is developed. By performing eigenvalue decomposition on the cascaded channel correlation matrix, the priority of each subarray is quantified, reducing the search space from exponential to linear.Secondly, for the power allocation sub-problem, the non-convex fractional objective function is transformed into a parametric subtractive form using the Dinkelbach algorithm, which is then solved via the interior-point method.Thirdly, for the physical rotation and electronic phase optimization, the problem is decomposed into single-variable sub-problems. A Golden Section Search algorithm is employed to iteratively find the optimal rotation angle and phase shift for each subarray within bounded constraints, ensuring the monotonic convergence of the objective function.  Results and Discussions  Extensive simulations are conducted to evaluate the performance of the proposed H-RIS scheme compared with benchmark schemes, including “Only-Rotation” (always on), “Only-Switching” (fixed angle), and “Conventional” (fixed and always on).The simulation results regarding the maximum transmit power Pmax(Fig. 2 and Fig. 3) demonstrate that the proposed method achieves the highest energy efficiency across the entire power range. Specifically, in the low power regime, the proposed algorithm intelligently turns off redundant subarrays where the rate gain cannot offset the circuit power cost, thereby significantly outperforming the “Only-Rotation” scheme which suffers from high static power consumption.The impact of user distance is also analyzed (Fig. 4 and Fig. 5). Results indicate that the proposed scheme maintains high spectral efficiency comparable to the “Only-Rotation” scheme by dynamically adjusting the rotation angles to align with the Line-of-Sight (LoS) path, effectively compensating for the angle mismatch loss observed in the “Only-Switching” and “Conventional” schemes.Furthermore, the activation pattern of the subarray varies in a “U” shape with distance (Table 1), which allows for flexible adjustment of array size and orientation according to user-RIS geometry.  Conclusions  This paper proposes an energy-efficient transmission scheme for H-RIS aided communication systems by integrating mechanical rotation and dynamic switching capabilities. A low-complexity BCD-based algorithm is developed to jointly optimize the transceiver design. The results confirm that introducing mechanical rotation significantly mitigates the angle mismatch loss, while the proposed channel contribution-based switching strategy effectively eliminates redundant energy consumption. The proposed H-RIS architecture offers a superior trade-off between spectral efficiency and energy efficiency compared to traditional fixed RIS architectures, providing a viable solution for future green 6G networks.
CRLB Optimization for O-RIS-Assisted VLP Systems
ZHANG Zengjie, WU Qi, ZHANG Jian, DUAN Ruijie, FENG Yunhan
Available online  , doi: 10.11999/JEIT260120
Abstract:
  Objective  With the rapid development of indoor location-based services, Visible Light Positioning (VLP) has emerged as a promising high-accuracy positioning technology. The integration of Optical Reconfigurable Intelligent Surfaces (O-RIS) into VLP systems can effectively enhance signal coverage and improve positioning performance. However, optimizing the positioning accuracy and fairness across different user areas in RIS-assisted VLP systems remains a challenging issue. This study focuses on optimizing the Cramer-Rao Lower Bound (CRLB) of the system under both near-field and far-field channel models, aiming to enhance overall positioning precision and fairness through RIS configuration.  Methods  Under the far-field channel model assumption, the RIS orientation optimization problem is formulated as a received power maximization problem. A positioning algorithm combining Particle Swarm Optimization (PSO) and N-step iteration is proposed to dynamically adjust the RIS orientation optimally without prior knowledge of the receiver’s position. Under the near-field channel model assumption, the allocation problem between RIS elements and LEDs is constructed as a Markov Decision Process (MDP). A reinforcement learning method based on experience replay and knowledge utilization is designed to solve this problem, aiming to minimize the CRLB while ensuring positioning fairness for users in different regions.  Results and Discussions  Simulation results demonstrate that the proposed algorithms effectively enhance system positioning performance under both models. In the far-field model, the PSO-based iterative algorithm achieves dynamic optimization of RIS orientation, significantly improving positioning accuracy (Fig. 3). Under the near-field model, the reinforcement learning approach not only minimizes the CRLB but also considerably improves positioning fairness across the entire area, with a noticeable reduction in performance disparity among users in different zones (Fig. 5, Fig. 6). Comparative experiments show that the proposed methods outperform conventional RIS configuration strategies in terms of both average positioning error and fairness index (Table 1).  Conclusions  This paper investigates CRLB optimization methods for O-RIS-assisted VLP systems under near-field and far-field channel models. In the far-field scenario, a PSO-based iterative algorithm is proposed to optimize RIS orientation, enhancing positioning accuracy without requiring prior receiver location information. In the near-field scenario, a reinforcement learning-based approach is designed to optimize RIS element–LED allocation, which effectively minimizes the CRLB and improves positioning fairness across the whole area. Simulation results validate the effectiveness of the proposed algorithms in both models. Future work may consider more practical channel impairments and multi-user scenarios to further improve the robustness and scalability of the system.
Intelligent Resource Allocation Algorithm Based on Outdated CSI for Multi-Node URLLC
ZHAO Yizhen, GAO Wei, HU Yulin, ZHU Yao
Available online  , doi: 10.11999/JEIT260216
Abstract:
  Objective  Ultra-Reliable and Low-Latency Communications (URLLC) have found widespread applications in Industrial Internet-of-Things (IIoT) systems. However, in mobile operation scenarios such as transportation and inspection, the acquisition of instantaneous Channel State Information (CSI) is often impractical due to feedback overhead, forcing resource allocation decisions to be made based on outdated CSI. This mismatch significantly limits the achievable energy efficiency of the system. Traditional convex optimization methods have difficulty addressing such challenges, while classical Deep Reinforcement Learning (DRL) algorithms also exhibit inherent limitations in terms of convergence stability and policy performance when confronted with the stringent Quality-of-Service (QoS) constraints in URLLC. Motivated by these challenges, considering a multi-user URLLC system operating under outdated CSI in dynamic scenarios, this paper formulates an energy efficiency maximization problem while guaranteeing the communication latency and reliability requirements, and aims to design an efficient and stable algorithm for joint power and blocklength allocation.  Methods  To achieve the above objective, this paper proposes a Successive Convex Approximation (SCA)–assisted DRL framework for energy efficiency maximization under outdated CSI. Specifically, a SCA-based algorithm is first developed to derive a pre-allocation of transmit power and blocklength, yielding a feasible and physically interpretable yet relatively conservative baseline solution. Building upon this baseline, a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is employed to perform incremental refinement through interaction with the dynamic environment, thereby alleviating the conservative nature of SCA. Meanwhile, the SCA solution is incorporated as prior knowledge together with user location information into the state representation, which effectively narrows the policy search space and enables the DRL agent to better capture large-scale channel characteristics and system dynamics under outdated CSI, thereby enhancing the learning efficiency and stability.  Results and Discussions  The effectiveness of the proposed method is validated through the following simulation results. In the simulation, the proposed algorithm is evaluated against SCA, TD3 without SCA guidance, and TD3 without user location information. Simulation results demonstrate that the proposed method significantly outperforms all benchmark schemes in terms of convergence stability and system energy efficiency. During the training phase (Fig. 3), the average reward of the proposed algorithm increases steadily and converges stably, whereas removing location information leads to low and highly fluctuating rewards, and removing SCA guidance results in convergence to a much lower reward level, highlighting the importance of both prior guidance and location-aware state representation. Besides, during the actual operation stage of the system, the proposed algorithm achieves high and stable energy efficiency (Fig. 4), significantly outperforming comparative algorithms. Under outdated CSI, DRL-based methods outperform conservative optimization when transmission is successful, while the absence of location information or SCA guidance significantly degrades energy efficiency or increases transmission failures, verifying the two factors' effectiveness in improving energy efficiency and ensuring strategy validity. The simulation also examined the impact of key system parameters on energy efficiency. For basic resource parameters such as blocklength (Fig. 5) or power (Fig. 6), appropriately increasing their budget can help improve system energy efficiency. For parameters about reliability (Fig. 7), in order to avoid waste of resources, they should be reasonably set according to business requirements. Finally, the simulation of the average energy efficiency varying with the number of nodes and the number of network neurons provides certain reference basis for the configuration of the algorithm structure and the design of the network scale (Fig. 8).  Conclusions  In conclusion, this paper addresses the challenge of energy-efficient resource allocation for multi-user URLLC systems operating under outdated CSI by integrating SCA with DRL. That is, a TD3-based DRL approach is enhanced by introducing a SCA reference solution as prior guidance and incorporating user location information into the state representation. Such an optimization–learning dual-driven solution framework combines the interpretability and feasibility of model-based optimization with the adaptivity and expressive power of data-driven learning. The effectiveness of the proposed method is evaluated through simulations: (1) The proposed method achieves higher energy efficiency than pure optimization and conventional TD3 while satisfying URLLC latency and reliability constraints; (2) The SCA reference improves the stability and effectiveness of the strategy under outdated CSI; (3) Incorporating user location information enables more efficient decision-making. However, this work focuses on a single-cell multi-user scenario, and practical issues such as multi-cell interference, cooperative multi-base-station scheduling, and more complex mobility patterns are not considered. Future work will extend the proposed framework to more realistic multi-cell and multi-agent scenarios and investigate its applicability under more severe CSI imperfections.
Optimizing SATisfiability-Based Automatic Test Pattern Generation Systems: Unified Fault Set Construction,Modeling, and Solving
YAN Dapeng, HE Qirun, GUO Jing, WANG Boning, CAI Zhikuang
Available online  , doi: 10.11999/JEIT260025
Abstract:
  Objective  Boolean SATisfiability-Based Automatic Test Pattern Generation (SAT-Based ATPG) is widely used to generate tests for hard-to-detect single stuck-at faults and to prove fault untestability in combinational logic. When SAT-Based ATPG is applied to large netlists with dense fanout and reconvergence, its runtime and memory consumption are often dominated by three interacting issues. Representative fault lists produced by conventional dominance- or equivalence-based fault collapsing can remain large, increasing the number of SAT calls and enlarging the incremental context that must be maintained across faults. Meanwhile, SAT modeling may introduce redundant Conjunctive Normal Form (CNF) overhead, especially when an explicit faulty-circuit copy is constructed or when propagation constraints are encoded globally without locality control. In addition, fanout-reconvergence structures amplify assignment correlations along sensitized paths, and such correlations are often exposed only after repeated decisions and backtracking when only standard unit propagation is used. The unified optimization objective is therefore to reduce overall CNF size and solving cost while preserving completeness, so that a practical SAT-Based ATPG system remains efficient and stable across circuits of different scales.  Methods  A three-part framework is developed and implemented in an incremental SAT-Based ATPG flow, and the overall workflow is illustrated (Fig. 1). First, a checkpoint-driven dynamic fault-set construction method is proposed. Checkpoints are collected during netlist-to-directed-acyclic-graph conversion, including all primary inputs and all fanout branches, and XOR/XNOR outputs are additionally recorded as supplementary checkpoints to avoid over-collapsing XOR-related fault behavior. Representative faults are initialized on checkpoints by compact rules that combine dominance-oriented fault collapsing with equivalence-aware refinement, and solver-guided repair is performed when an untestable representative fault indicates potential masking under structural constraints. The procedure is summarized in Algorithm 1. Second, a SAT modeling method based on fault sensitization constraints is adopted to avoid explicit faulty-circuit duplication. Fault activation, propagation, and observability are represented by additional fault sensitization constraints over the original circuit variables, and auxiliary variables are introduced only when local bookkeeping is required. Constraint localization is restricted to the fault fanout cone, and cone-boundary and internal vertices are identified through a graph-traversal procedure (Fig. 2). Third, a dynamic implication learning mechanism oriented to fanout-reconvergence pairs is integrated into the incremental solving loop. Reconvergence pairs within the fault fanout cone are monitored under partial assignments, and structure-induced implications are injected either as implied assignments when a reconvergent output becomes functionally determined or as short conflict clauses when a branch-value combination becomes inconsistent with the fault sensitization constraints. The dynamic implication learning procedure is summarized in Algorithm 2.  Results and Discussions  The unified system is evaluated on ISCAS’85 and ISCAS’89 benchmark circuits, with TG-PRO used as the baseline implementation under the same SAT solver and termination settings. The checkpoint-driven dynamic fault-set construction method substantially reduces the representative fault space entering ATPG. Relative to the uncollapsed fault space, the average representative-fault ratio decreases from 51.38% to 42.01%, corresponding to an average fault-space reduction of 57.99%. The best-case ratio reaches 33.19% on large circuits with heavy reconvergence, which indicates that checkpoint-centered representative-fault allocation effectively suppresses redundancy without enlarging the untestable fault set (Table 1). The reduced fault-set size is reflected in preprocessing efficiency, and the total runtime for fault-set construction is consistently reduced, with an average reduction of 8.37% across the evaluated circuits (Fig. 3). For SAT model construction, the fault-sensitization-constraint encoding reduces CNF overhead relative to the baseline model construction. Across the benchmark set, the numbers of CNF clauses and CNF variables are reduced by 11.44% and 3.50%, respectively, which shows that avoiding explicit faulty-circuit duplication and localizing auxiliary constraints to the fault fanout cone effectively lowers memory demand (Table 2). The reduced CNF size and strengthened locality of constraints are further reflected in end-to-end runtime, and the total runtime of SAT modeling and solving is reduced across the evaluated benchmarks (Fig. 4). Dynamic implication learning further improves solving efficiency in reconvergence-heavy structures. Compared with static implication learning, CNF construction time increases by 3.0% on average because of the additional monitoring and injection operations, yet the overall runtime decreases by 4.42% on average, which indicates a favorable cost-benefit trade-off. The overhead attributed to dynamic implication learning accounts for 2.51% of the total runtime aggregated across circuits, which confirms that the injected implications and pruning clauses provide measurable solving benefits at limited extra cost (Table 3).  Conclusions  A unified optimization framework for SAT-Based ATPG is developed by combining checkpoint-driven dynamic fault-set construction, localized fault sensitization constraints for CNF modeling, and fanout-reconvergence-oriented dynamic implication learning. Representative faults are compressed through solver-guided repair of dominance and equivalence relations to avoid masking, CNF growth is controlled through duplication-free modeling localized to the fault fanout cone, and reconvergence correlations are exploited through incremental implication injection to strengthen propagation and enable early conflict pruning. Experimental results on standard benchmark circuits show consistent reductions in representative fault scale, CNF size, and total runtime, providing a practical approach for scaling SAT-Based ATPG to larger designs with complex fanout and reconvergence.
Research on UAV-assisted Dynamic-weight Edge Computing Offloading Strategy
WANG Yijun, WANG Yachu, SHAHD Batool, MIAO Ruixin
Available online  , doi: 10.11999/JEIT260054
Abstract:
  Objective  The increasing demands of the Internet of Things (IoT) for computational resources and real-time processing have highlighted the significance of Mobile Edge Computing (MEC). Traditional MEC relies on terrestrial base stations, resulting in coverage blind spots in remote or specialized environments. Unmanned Aerial Vehicle (UAV)-assisted MEC architectures exploit UAVs’ flexible deployment to expand service coverage. However, existing approaches for multi-terminal, multi-UAV scenarios often fail to optimize task offloading latency, system energy consumption, and adaptability to dynamic environments simultaneously. They also overlook optimal UAV selection when terminal devices are covered by multiple UAVs and lack adaptive mechanisms to adjust optimization objectives during task execution. This study addresses these challenges by integrating cooperative caching, offloading decision-making, and resource allocation strategies.  Methods  A three-tier microcloud-edge-terminal architecture is constructed, comprising a central cloud, multiple UAV edge servers with caching capabilities, and numerous mobile terminal devices. A cooperative caching mechanism reduces transmission delay during task execution. Task offloading adopts a fine-grained partial offloading mode, dividing complex tasks into dependent subtasks modeled through a Directed Acyclic Graph (DAG). The Cooperative Caching-Adaptive Hierarchical MultiVerse Optimizer (CCAH-MVO) algorithm is proposed. A hybrid coding scheme encodes offloading decisions, caching decisions, and resource allocation uniformly. A dynamic weight mechanism adaptively balances delay and energy consumption according to the system’s real-time energy state. Additionally, a UAV selection strategy is implemented for scenarios where terminals are covered by multiple UAVs. By simulating inter-universe material exchange and local refined search, the algorithm efficiently determines the optimal offloading strategy. MATLAB simulations validate the method under various experimental settings.  Results and Discussions  The simulation scenario involves 50 randomly distributed terminal devices and 5 UAVs in a 400 m × 400 m area. UAVs are deployed above terminal cluster centers, while terminals at cluster edges are simultaneously within the coverage of multiple UAVs (Fig. 5). The optimal UAV for each terminal is selected using the UAV selection function (Fig. 6), preventing resource bottlenecks and achieving balanced load distribution. In terms of delay performance, the CCAH-MVO algorithm maintains the lowest task delay across all task volumes, with a gradual increase as the number of tasks grows (Fig. 7). Delay under CCAH-MVO is consistently lower than that under fixed-weight strategies across the full task range, demonstrating the effectiveness of the dynamic adaptive mechanism in preserving low latency (Fig. 10). For energy consumption, differences among the algorithms are minor when task quantities are low. Under high task loads, the activation of the dynamic weight mechanism flattens the energy consumption curve (Fig. 8). When the number of tasks reaches 100, total energy consumption under CCAH-MVO is the lowest among all strategies and remains lower than the fixed-weight approach, reflecting effective control under critical energy conditions (Fig. 9). Regarding total system overhead, the CCAH-MVO algorithm consistently achieves the best performance. The gap with fixed-weight strategies widens when task numbers exceed 80, illustrating the dynamic weight mechanism’s collaborative optimization of delay and energy consumption (Fig. 11). Overall, by integrating the dynamic weight mechanism and balancing load through UAV selection, the CCAH-MVO algorithm effectively mitigates resource constraints and high task processing overhead in complex, dynamic UAV-assisted MEC environments. It ensures precise coordination between task delay and energy consumption across different load stages.  Conclusions  The proposed CCAH-MVO framework, incorporating a microcloud-edge-terminal architecture, cooperative caching mechanism, fine-grained partial offloading, dynamic weight adjustment, and UAV selection strategy, effectively addresses resource scheduling in complex multi-UAV MEC environments. Simulations show adaptive optimization of objectives, intelligent energy management, low latency, and reduced total system overhead, improving service stability and user experience. This research provides a practical solution for efficient UAV edge computing in dynamic environments. Future work will explore dynamic energy efficiency optimization and multi-node collaboration while maintaining low-latency performance.
A Radio Frequency Fingerprint Open-set Identification MethodCombining Multi-scale Wavelet Front-end and Hyperspherical Metric Learning
TIAN Xinyu, LI Zirui, ZHENG Qinghe, ZHOU Fuhui, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT260214
Abstract:
  Objective  Open-set Radio Frequency Fingerprint (RFF) identification under low Signal-to-Noise Ratio (SNR) conditions is challenging because fingerprint features are easily masked by noise, multipath effects induce nonlinear distortions, and existing methods struggle with feature extraction and unknown device detection. This study proposes a deep learning framework that integrates a multi-scale wavelet front-end with hyperspherical metric learning to achieve robust open-set RFF identification.  Methods  The proposed method, MS-RANet, comprises three key components. First, a multi-scale wavelet front-end based on one-dimensional stationary wavelet transform performs full-resolution, multi-scale decomposition of I/Q signals, preserving discriminative fingerprint information while suppressing noise. Second, a multi-scale residual attention network incorporates deep residual learning, global self-attention, and Bidirectional LSTM (BiLSTM) to enhance sensitivity to subtle fingerprint features and capture long-range temporal dependencies. Third, hyperspherical metric learning constrains the feature space onto a unit hypersphere, optimizing angular margins to produce compact intra-class and separable inter-class feature distributions. Unknown devices are subsequently detected using cosine similarity.  Results and Discussions  Experiments on a high-fidelity IEEE 802.11 simulation dataset demonstrate the effectiveness of MS-RANet. The method achieves an average classification accuracy of 65.34% across SNR levels from –5 dB to 20 dB, and an Area Under the Curve (AUC) of 0.81 at –5 dB SNR, outperforming DNN, GRU, CNN-LSTM, ResNet50, and DRSN-CA. Confusion matrices and Receiver Operating Characteristic (ROC) curves confirm robustness under extreme channel conditions. t-SNE visualization shows well-separated, compact clusters for known devices, while unknown samples are effectively isolated from known class regions. Ablation studies verify the contributions of the multi-scale wavelet front-end, global attention, BiLSTM, and hyperspherical metric learning modules.  Conclusions  This study presents a robust open-set RFF identification method combining a multi-scale wavelet front-end with hyperspherical metric learning. The framework exhibits strong noise resilience, enhanced feature discrimination, and reliable detection of unknown devices under low-SNR and multipath fading conditions. Future work will focus on reducing computational complexity, improving inference speed, evaluating generalization across diverse scenarios and protocols, and integrating the method with complementary physical-layer security mechanisms for collaborative authentication.
A Survey of Processor Security
CHEN Congcong, GU Zhiyang, ZHANG Jiliang
Available online  , doi: 10.11999/JEIT260026
Abstract:
  Significance   Processor security is a cornerstone of modern information security. Cryptographic algorithms, operating systems, and applications have long relied on processors as trusted computing bases. However, as Moore’s Law slows, modern processors increasingly adopt aggressive microarchitectural optimization techniques to improve performance and energy efficiency, often without sufficient security consideration. This trend has led to frequent security vulnerabilities in recent years. In particular, microarchitectural timing channels, exemplified by Meltdown and Spectre, exploit timing differences caused by microarchitectural state changes to break fundamental hardware and software isolation, affecting billions of devices worldwide. At the same time, the boundary between architectural and microarchitectural behavior has become less clear, giving rise to new attack paradigms and turning timing channels from isolated hardware flaws into cross-layer system security problems.  Progress   Although substantial progress has been made in the study of timing channels, existing surveys still have several limitations. First, the mechanisms of timing channels are highly diverse, and the set of exploitable components continues to grow. Hardware-centric classification schemes are therefore insufficient to capture emerging and previously unknown attacks, and they often obscure the common features shared across different techniques. Second, as traditional microarchitectural channels become better understood and partially mitigated, leakage increasingly shifts to higher-level shared resources, including operating system policies and software-managed shared resources. However, previous studies have often treated software mainly as an execution context rather than a direct source of timing leakage. In addition, current discussions of defenses tend to emphasize individual techniques, with limited analysis of their scope and failure modes.  Contributions   This survey systematically reviews timing channels from a cross-layer perspective and unifies hardware- and software-based timing channels under a common abstraction. Four necessary conditions for timing channel exploitation are identified, and a unified classification framework is established based on the nature of shared mutable state and the mechanisms that make timing differences observable. Within this framework, representative attacks from the past decade are comprehensively reviewed, their attack procedures are systematically analyzed, and their common features are clarified. In addition, existing defense mechanisms are classified according to the leakage conditions they are intended to disrupt, and their scope and possible failure modes are examined. This survey also reviews current automated vulnerability detection methods.  Prospects   Future research on timing channels faces several emerging challenges. New microarchitectural optimization techniques continue to create new attack surfaces, while resource sharing at the software level may produce additional forms of timing leakage. Moreover, emerging platforms, including chiplet-based architectures, cloud computing environments, hardware accelerators, and heterogeneous systems, are likely to expose new types of timing channels that require systematic study.
Shallow-Water Geoacoustic Parameter Inversion Using Stokes Parameters and an Attention-Enhanced Multi-Task U-Net
HUANG Qianzhuo, LI Xiaoman, BI Xuejie, ZHANG Zishi, TONG Han, LI Fei
Available online  , doi: 10.11999/JEIT251085
Abstract:
  Objective  Geoacoustic parameters in shallow water are critical for characterizing underwater acoustic propagation. Traditional inversion methods, however, are limited by high computational complexity, high cost, and strong dependence on the accuracy of environmental models. To address these issues, an efficient and robust inversion method is proposed to improve the reliability and stability of shallow-water geoacoustic parameter estimation while preserving computational efficiency.  Methods  This method is developed from the Stokes parameters of the vector acoustic field. Signals received by a single vector hydrophone are processed with a warping transform to separate and extract the normal modes propagating in a shallow-water waveguide. The extracted signals are then used to calculate the Stokes parameters, which are normalized and used as input features for the inversion model. An attention-enhanced multi-task U-Net is constructed with a shared encoder and multiple prediction branches to estimate key geoacoustic parameters, including compressional wave velocity, shear wave velocity, density, compressional wave attenuation, and shear wave attenuation. In addition, channel attention and spatial attention, together with a multi-task loss function with uncertainty weighting, are used to improve feature extraction and adaptively balance the different parameter inversion tasks.  Results and Discussions  The attention mechanism is shown to suppress fluctuations in model predictions and to improve the accuracy and stability of geoacoustic parameter inversion. When 200 test samples are evaluated, the mean absolute percentage errors of both compressional wave velocity and seabed density remain below 5% (Table 3). After the attention mechanism is introduced, the errors in compressional wave velocity and seabed density are further reduced to below 3% (Table 5), which indicates improved prediction accuracy for these key parameters. The proposed method is also shown to be insensitive to parameter mismatch and to have strong robustness to environmental variation. Furthermore, the method is validated with measured data from a shallow-water region in the northern South China Sea, and its effectiveness and reliability in practical applications are confirmed (Table 6 and Fig. 9). These results show that the attention-enhanced multi-task U-Net effectively captures critical features from the Stokes parameters and yields more stable and accurate geoacoustic parameter estimation in shallow-water environments.  Conclusions  The inversion method based on the Stokes parameters and an attention-enhanced multi-task U-Net effectively improves the accuracy and stability of shallow-water geoacoustic parameter estimation and shows strong performance in the prediction of compressional wave velocity, shear wave velocity, and density. However, limitations remain in the inversion of seabed attenuation. Future work should focus on improving feature extraction methods and network architecture and on testing the applicability of the method under more complex marine conditions.
Index Modulation Design with Sparse Spatial Constellation and Dynamic Multi-RIS Block Selection for RIS-MIMO Systems
HUANG Fuchun, ZHU Han, TANG Xiaoqing, YANG Fan, HUANG Jie
Available online  , doi: 10.11999/JEIT251289
Abstract:
  Objective  This paper aims to address two main challenges in RIS-assisted MIMO index modulation (IM) systems: (1) the practical deployment difficulty of using a single large-scale RIS panel, and (2) the high complexity of designing efficient transmit spatial signal vectors. To overcome these issues, this paper proposes a joint design of sparse spatial constellation and dynamic multi-RIS block selection to enhance spectral efficiency, bit error rate (BER) performance, and deployment flexibility.  Methods  Inspired by the extended space index modulation (ESIM) paradigm, a new design of sparse spatial constellation with two active antennas (SCTA) is proposed, which leads to the SCTA-RIS-SM system. The idea is to mix primary and secondary PAM constellations to form a spatial constellation vector[x1,x2]T and modulated onto two active antennas. Thus, it not only maximizes the minimum Euclidean distance between transmit vectors but also significantly enhances the anti-interference capability. To get around the deployment difficulties of a single large RIS panel, an enhanced scheme of SCTA-MBRIS-SM is further proposed. This system employs a distributed array of multiple small RIS blocks and dynamically selects a subset of blocks for cooperative reflection, treating different “RIS block selection combinations” as a new index modulation dimension. Finally, theoretical analysis of spectral efficiency and average bit error rate is carried out, and Monte Carlo simulations are conducted to compare the proposed systems with several existing schemes.  Results and Discussions  Simulation results demonstrate that the proposed SCTA-RIS-SM system achieves notable signal-to-noise ratio (SNR) gains over RIS-SIM, RIS-SM, and DH RIS-SM systems under the same spectral efficiency (e.g., 10–12 bits/s/Hz) in near-field wideband scenarios. For instance, at BER = 10−3, SCTA-RIS-SM outperforms RIS-SIM by about 1.5–2.5 dB and DH RIS-SM by more than 6 dB. Furthermore, the SCTA-MBRIS-SM system, by exploiting additional index modulation from RIS block selection, further improves the BER performance and spectral efficiency compared to SCTA-RIS-SM without increasing the number of radio frequency chains. With total numbers of reflecting elements kept identical, the proposed multi-block scheme achieves up to 5 dB gain over RIS-SIM at BER = 10−3. Theoretical BER curves match well with simulation results in the high SNR region, validating the analytical derivations. The results also show that the performance advantage is maintained as the number of transmit antennas increases, and the system exhibits good compatibility with channel coding.  Conclusions  This paper addresses the challenges of large-scale RIS deployment and high-complexity spatial signal design in RIS-assisted MIMO systems. The proposed sparse spatial constellation with two active antennas optimizes the Euclidean distance distribution in the signal space, effectively improving system reliability. The introduction of dynamic multi-RIS block selection transforms hardware deployment constraints into a new dimension for spectral efficiency enhancement, offering a feasible path for practical large-scale RIS applications. Simulation results confirm that jointly optimizing the transmit spatial vector and the degrees of freedom of RIS reflections is an effective strategy for performance improvement. Future work will focus on robustness under imperfect channel state information, construction of higher-dimensional sparse constellations, extension to extremely large-scale MIMO scenarios, and multi-user communications.
Semantic-guided Unified Multi-scale Deep Unrolling Network for Pansharpening
CHEN Junjie, WANG Tingting, FANG Faming, ZHANG Guixu
Available online  , doi: 10.11999/JEIT251252
Abstract:
  Objective  With the rapid advancement of satellite imaging technologies, the demand for high-resolution multispectral remote sensing imagery has grown substantially across a wide range of applications. Due to the wide variety of satellite platforms, there exists a significant domain shift across datasets collected from different satellites. As a result, most existing deep learning (DL)-based pansharpening methods are trained individually for each satellite dataset, and consequently exhibit limited generalization capability across different satellites. To address these limitations, this study proposes a Semantic-guided Unified Multi-scale Deep Unrolling Network (SUM-DUN), which is designed based on classical optimization theory, adopting a 3D multi-scale deep unfolding architecture for integrated feature extraction and fusion. Leveraging multimodal large language models (MLLMs), the proposed method derives semantic textual prompts from the input images, which direct the model to adaptively adjust its feature representations and thereby enhance fusion quality. The proposed method aims to achieve unified remote sensing image fusion through tailored network architecture and prompt-guided mechanisms, thereby providing reliable support for high-level image interpretation tasks.  Methods  Following the Maximum A Posteriori(MAP) estimation principle, the optimization process for HRMS recovery is unfolded into the proposed SUM-DUN(Fig. 1). Each iteration stage of SUM-DUN consists of two main modules: a Gradient Descent Module (GDM) and a Semantic-guided Proximal Mapping Network (SPMN), which are used to approximate the operations in Eq. (5) and Eq. (6), respectively. GDM performs a gradient descent update based on the current feature estimate and the degradation model. The SPMN, implemented with a Transformer-based architecture as illustrated in Fig. 2(b), incorporates semantic textual prompts generated from the input image pair by MLLMs. These prompts guide the network to adaptively select appropriate feature propagation strategies for the current pair, helping suppress noise and mitigate discrepancies across different satellite sensors. Moreover, leveraging upsampling and downsampling operations, the network transmits MS and PAN features between iterative stages, thereby progressively preserving and enhancing multi-scale spatial and spectral information throughout the unfolding process.  Results and Discussions  To demonstrate the effectiveness of the proposed method, we compare the method against seven representative baselines, including 2 traditional methods (BDSD and PRACS) and 5 DL–based methods (AWFLN, FusionMamba, PanMamba, WFANet and TMDiff). For the reduced resolution evaluation, where ground-truth HRMS images are available, we adopt several widely-used reference based metrics, including Spectral Angle Mapper (SAM), Spatial Correlation Coefficient (SCC), Peak Signal-to-Noise Ratio(PSNR), Erreur Relative Global Adimensionnelle de Synthèse (ERGAS), Averaged Universal Image Quality Index(QAVE) and the Universal Image Quality Index for 4-band and 8-band images. These metrics jointly evaluate spectral fidelity, spatial consistency, and overall image quality. For the full-resolution evaluation, where ground-truth HRMS are unavailable, we rely on no-reference quality indices. Specifically, we employ the Hybrid Quality with No Reference (HQNR) metric, along with its spectral distortion component and spatial distortion component, to assess the fusion quality in real-world scenarios. Quantitative evaluations on the GF-1, QB, WV-2, and WV-4 test datasets demonstrate that the proposed method consistently achieves either the best or second-best performance across all metrics, under both reduced-resolution and full-resolution settings(Table 23). These results clearly indicate that the proposed method is capable of simultaneously preserving spectral fidelity and spatial consistency, while maintaining robust performance across different satellites and remaining effective in more challenging scenarios. The ablation studies validate the effectiveness of the 3D architecture, the multi-scale network design, and the spatial–channel prompt guidance mechanism, as removing or altering any of these components leads to varying degrees of performance degradation(Table 4-5).  Conclusions  This study proposes a semantic-guided unified multi-scale deep unfolding method for pansharpening, which leverages semantic prompts generated by a MLLM to facilitate efficient and unified fusion of images from different satellites. The proposed approach is built upon a deep unfolding framework and employs a 3D convolutional architecture to accommodate varying numbers of spectral bands across satellite datasets. The multi-scale network design is further incorporated to extract spatial and spectral features at different levels, thereby enhancing the fusion capability. In addition, the sematic prompt integration module is introduced to adaptively route spatial and channel features based on the extracted semantic information, enabling more effective feature propagation and improving both spatial detail reconstruction and spectral consistency. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance in terms of both visual quality and quantitative evaluation metrics.
Secure Multi-Task Federated Panoptic Perception Algorithm for Connected Autonomous Vehicles
HUANG Xiaoge, CHEN Ming, TANG Yi, LIANG Chengchao, CHEN Qianbin
Available online  , doi: 10.11999/JEIT250749
Abstract:
With the rapid development of vehicular networks and deep learning, connected autonomous vehicles (CAV) are now capable of collecting image data from driving scenarios and leveraging Convolutional Neural Networks for feature extraction and processing, thereby enabling efficient perception of their surroundings. However, due to the inherent complexity of driving scenarios, single-task models struggle to address various perception demands. And the performance of deep learning models heavily relies on large-scale data, while the data collected by individual vehicles is insufficient for training models with generalization capabilities. Federated learning overcomes data silos by enabling CAV to upload local model gradients instead of raw data to a central server for aggregation, which can preserve data privacy. Therefore, we present a Secure Multi-Task Federated Panoptic Perception algorithm for vehicular network scenarios. Firstly, the panoptic perception model is constructed to allow CAV to execute multiple perception tasks simultaneously. Besides, a CAV selection strategy based on hybrid scoring is designed to select high-quality local models from vehicles. Finally, a global model aggregation scheme based on Shamir secret sharing is introduced to prevent data leakage in the event of server attacks or outages, which employs secret sharing during the aggregation process. Simulation results validate the effectiveness of the proposed algorithm.
Near-field tomographic imaging for uplink communication and coordinate reconstruction algorithm
YIN Lannuo, WANG Yong
Available online  , doi: 10.11999/JEIT250715
Abstract:
  Objective  With the rapid evolution of 6G network technology, communication systems are evolving toward high bandwidth, low latency, and massive connectivity. Against this backdrop, integrated sensing and communications (ISAC), as a novel system architecture, enables wireless signals to perform dual functions—transmitting information while simultaneously sensing the environment—thereby providing more intelligent and efficient services for 6G networks. Environmental reconstruction, a core component of ISAC systems, aims to restore the true spatial structure of targets and scenes using echo signals. However, current environmental reconstruction techniques in practical applications still face the following three major challenges: First, in 6G communication systems, the dense deployment of base stations (BS) causes building targets to reside in the near-field region of the imaging system, leading to severe coupling among the range, azimuth, and elevation dimensions in tomographic imaging and resulting in significant discrepancies between the reconstructed target geometry and the actual shape. Second, because the positioning error of user equipment (UE) far exceeds the wavelength used by existing communication systems, traditional SAR imaging autofocus algorithms become ineffective, necessitating the development of new methods to circumvent the issues posed by positioning errors. Finally, conventional TomoSAR algorithms adopt a per-channel processing framework by independently generating SLC images for each channel; however, when each channel employs ISAR techniques to generate SLC images, inherent data discrepancies among the channels result in inconsistent translational compensation, which introduces phase errors during the elevation focusing process and ultimately leads to the occurrence of spurious targets in the imaging outcomes.  Methods  In this paper, we first propose applying the nonparametric translational compensation method originally developed for ISAR imaging to the generation of single-look complex (SLC) images, thereby effectively circumventing the adverse effects introduced by positioning errors. Existing ISAR-related literature typically assumes that the target adheres to a turntable model, yet the actual SAR imaging geometry diverges significantly from this idealized assumption. Based on the SAR imaging scenario, we have rederived the mathematical mapping that links the ISAR tomographic imaging results to the target’s true spatial coordinates. Leveraging this mapping, we formulate the coordinate reconstruction challenge as a system of nonlinear equations and subsequently propose a novel coordinate reconstruction method that integrates a particle swarm optimization (PSO) algorithm, ultimately achieving an accurate restoration of the target's genuine geometric shape. Furthermore, in order to address the inherent issue of inconsistent translational compensation among channels within traditional per-channel processing frameworks, we have designed a joint phase calibration tomographic imaging algorithm that employs a unified phase calibration strategy to eliminate inter-channel phase discrepancies, thereby markedly improving both the elevation focusing performance and the overall imaging quality.  Results and Discussions  We validate the proposed methods through simulation experiments on complex building targets under both ideal and non-ideal trajectory conditions, using the CD distance as the evaluation metric for coordinate reconstruction accuracy. The experimental results demonstrate that the CD distances under ideal and non-ideal trajectories are 1.34 and 1.54, respectively, indicating only a slight performance degradation under non-ideal conditions. Notably, imaging point clouds obtained under non-ideal trajectories exhibit evident point dropout. A comparative analysis of the cumulative probability distribution curves of distance errors under the two trajectory conditions reveals that the overall distribution trends are very similar; significant differences in the probability distributions emerge only when the distance error exceeds 2 m. This observation indicates that, in terms of the CD distance evaluation metric, the primary discrepancies between imaging results obtained under ideal and non-ideal trajectories are concentrated in regions exhibiting point cloud dropout and in areas outside the main target. Hence, the influence of non-ideal trajectories is mainly manifested in the variation of scattering intensity distribution. Moreover, comparative experiments between the joint phase calibration framework and traditional algorithm frameworks show that conventional tomographic imaging methods exhibit marked stacking effects at different elevations, with false targets appearing at incorrect elevation levels. This behavior suggests that independently compensating for translational motion in each channel is prone to inducing inter-channel phase discrepancies, thereby severely impairing elevation focusing performance. In contrast, the incorporation of joint phase calibration yields a substantial improvement in imaging quality.  Conclusions  The experimental results validate the effectiveness of the proposed methods: by adopting the ISAR nonparametric translational compensation and the PSO-based coordinate reconstruction techniques, the true geometric shape of the target is successfully recovered. Moreover, the joint phase calibration strategy effectively eliminates the issue of false targets in elevation focusing that arises from conventional per-channel processing, thereby significantly enhancing both the elevation focusing capability and the overall image quality.
A Physics-Constrained Deep Learning Framework for High-Fidelity Sea Clutter Generation under Small-Sample Conditions
SUN Dianxing, LIU Xinliang, LIU Ningbo, DING Hao, YU Hengli, SONG Guanglei
Available online  , doi: 10.11999/JEIT250697
Abstract:
  Objective  The verification and validation of radar target detection algorithms, particularly in maritime surveillance, heavily relies on the availability of high-fidelity synthetic sea clutter data. However, generating realistic sea clutter under high sea-state conditions (e.g., Sea State 4 and above) is a significant challenge due to the non-stationary and non-Gaussian nature of the signal. Traditional statistical models often fail to capture the complex time-frequency characteristics of such data, especially when direct measurement is difficult or unavailable. A novel framework is proposed that combines a complex-valued generative adversarial network with physics-constrained learning and an adaptive transfer learning mechanism to address the issue of small-sample sea clutter generation. The primary goal is to develop a robust and efficient method for generating high-quality synthetic sea clutter data that closely mimics real-world conditions, thereby providing a reliable data foundation for the development and testing of advanced radar systems.  Methods  The proposed framework integrates a Complex Variational Autoencoder Wasserstein Generative Adversarial Network (CVAE-WGAN) with a transfer learning strategy to address the challenge of generating high-fidelity sea clutter data under small-sample conditions. The model operates in the complex domain to jointly process in-phase and quadrature components, preserving the orthogonality and phase relationships of the signal. A Magnitude-Phase Attention (APA) module is introduced to enhance the joint modeling of amplitude and phase, while complex residual blocks are designed to improve gradient propagation and training stability. A physics-constrained loss function system, comprising a time-frequency ridge loss and a Doppler band loss, is implemented to guide the generation process to align with the physical characteristics of sea clutter. To handle data scarcity, an adaptive transfer learning mechanism based on Kullback-Leibler Divergence (KLD) is employed to dynamically adjust the model during fine-tuning in target domains, enabling efficient knowledge transfer across different sea-state scenarios.  Results and Discussions  The performance of the proposed CVAE-WGAN framework is evaluated using real-world sea clutter datasets, demonstrating its effectiveness in generating high-fidelity synthetic data. In the source domain (Sea State 4), the generated data closely matches real measurements in terms of amplitude statistics (PDF-CS = 0.872) (Fig. 5), temporal correlation (ACF-CS = 0.9382) (Fig. 7), and time-frequency characteristics (SPEC-RMSE = 4.5379 dB) (Fig. 6). The time-frequency ridge accuracy reaches 95.2% (|z|≤1) (Fig. 10). The adaptive transfer learning mechanism is validated by applying the pre-trained model to a more challenging scenario (Sea State 5) with only 20% of the target domain samples. The generated clutter maintains a strong fit to the empirical amplitude distribution (PDF-CS = 0.8448) (Fig. 11, Table 2) and exhibits good autocorrelation properties (ACF-CS = 0.9557) (Fig. 12, Table 2), with time-frequency ridge accuracy at 95.24% (∣z∣≤1) (Fig. 14, Table 2). Ablation studies reveal that the Magnitude-Phase Attention (APA) module is critical for joint amplitude and phase modeling, as its removal significantly degrades performance (e.g., PDF-CS drops 17.3%, SPEC-RMSE increases 35.0%) (Table 1). The method proves stable even with as little as 15% of the target data (PDF-CS > 0.6, Z=1 > 82%) (Table 3), underscoring its suitability for data-scarce environments.  Conclusions  This study presents a novel framework for generating high-fidelity sea clutter data under small-sample conditions, combining a complex-valued generative adversarial network with physics-constrained learning and an adaptive transfer learning mechanism. The proposed CVAE-WGAN model, guided by a sophisticated loss function system, demonstrates a strong capability to capture both the statistical and physical properties of high sea-state environments. The integration of the KLD-based transfer learning mechanism significantly enhances the model's adaptability, enabling high-quality data generation even with limited target domain samples. By addressing the challenge of small-sample sea clutter generation, this framework provides a reliable and robust data foundation for the development and testing of advanced radar anti-clutter and anti-jamming algorithms. Future work focuses on further optimizing the framework for extreme data scarcity and exploring its application in other non-stationary radar signal scenarios.
Pearson Correlation Fusion Sensing Method for Noncircular Signals
LAI Huadong, LIN Cong, LUO Peng, XU Jinqiang, LIU Mingxin, XU Weichao
Available online  , doi: 10.11999/JEIT251247
Abstract:
  Objective  With the rapid growth of wireless devices and communication services, spectrum resources have become increasingly scarce. Spectrum sensing, as a fundamental function of cognitive radio, enables dynamic spectrum access and improves spectrum utilization efficiency. However, conventional spectrum sensing methods based on circular signal assumptions cannot effectively detect noncircular signals. In addition, some detectors designed for noncircular signals show degraded performance under low signal-to-noise ratio (SNR) or limited sample conditions. To address these limitations, a nonparametric spectrum sensing scheme based on the Weighted Pearson Correlation Coefficient (WPCC) is proposed. The scheme applies a linear fusion strategy to the real-valued composite coherence matrix, which captures the second-order statistical characteristics of noncircular signals.  Methods  The WPCC detector constructs a real-valued composite observation vector and computes the corresponding composite coherence matrix. Pearson Correlation Coefficients (PCCs) are extracted from this matrix to characterize the statistical properties of noncircular signals. The first two product moments of squared sample PCCs are derived, and optimal fusion weights are obtained based on the deflection coefficient. The true PCCs are approximated by their sample estimates to obtain data-driven fusion weights that do not require prior knowledge of sensing channels. These weights are then linearly combined with the squared sample PCCs to construct the WPCC test statistic, thereby exploiting the spatial diversity of sensing antennas. The final decision is made by comparing the WPCC statistic with a sensing threshold determined by the specified false alarm probability. Specifically, a WPCC value below the threshold indicates the null hypothesis of an idle frequency band, whereas a value above the threshold indicates the alternative hypothesis that the frequency band is occupied by primary users.  Results and Discussions  Simulation experiments evaluate the sensing performance of the proposed nonparametric WPCC-based method (Algorithm 1) in terms of sensing probability, deflection coefficient, Receiver Operating Characteristic (ROC) curve, and Area Under the Curve (AUC), with comparisons to NCLMPIT, NCAGM, NCHDM, and NCJT. The numerical results show that the proposed method outperforms the compared detectors under various simulation conditions. In particular, the WPCC detector achieves the highest sensing probability and exhibits superior performance at low false alarm probabilities of 0.05 (Fig. 2), 0.01 (Fig. 3(a)), and 0.005 (Fig. 3(b)), with sample sizes not exceeding 100. In addition, the proposed method shows clear advantages under different numbers of antennas (Fig. 4), different noise variance conditions (Fig. 5), and different levels of correlation strength (Fig. 6). The applicability of the WPCC method to circular signals is also demonstrated by its high sensing probability for QPSK and 16PSK signals (Fig. 7). The superior overall performance of the proposed detector is further confirmed by higher deflection coefficient curves and ROC curves (Figs. 8, 9). The largest AUC values quantitatively demonstrate its overall optimality among all considered methods (Table 1). These results indicate strong robustness under low SNR and small-sample conditions.  Conclusions  A Pearson correlation fusion sensing method for noncircular signals is proposed based on the real-valued composite covariance representation and the Locally Most Powerful Invariant Test (LMPIT) framework. By combining optimal fusion weights derived from sample PCCs with a linear weighting scheme, the method fully exploits second-order statistical information. It enhances strongly correlated components while suppressing weak correlations and noise interference. Analytical expressions for the false alarm probability and sensing threshold are derived. Both theoretical analysis and simulation results show that the proposed method achieves superior performance compared with existing noncircular signal sensing methods in terms of sensing probability, deflection coefficient, ROC curve, and AUC.
Real-Time Sub-bottom Horizon Picking Based on Maximum Correlated Kurtosis Deconvolution Combined with Continuity Constraint
MENG Xinbao, ZHOU Tian, ZHU Jianjun, LI Tie, WANG Peihong, ZHAO Guoqing
Available online  , doi: 10.11999/JEIT250727
Abstract:
  Objective  Sub-bottom profiling is widely employed in seabed geological and resource exploration, pipeline route inspection, and port and channel safety assurance, and is regarded as a frontier in underwater acoustic detection research. Accurate extraction of sub-bottom horizons plays a critical role in the interpretation of sedimentary structures, analysis of seabed substrate characteristics, and identification of buried objects. However, existing horizon picking techniques often face difficulty in balancing picking quality, false-alarm control, and online real-time performance. To address this issue, a real-time sub-bottom horizon picking method integrating maximum correlated kurtosis deconvolution and continuity constraint is proposed.  Methods  The proposed method consists of three stages: preprocessing, coarse horizon extraction, and fine horizon extraction. In preprocessing, the raw echoes are enhanced via cascaded band-pass filtering and matched filtering, followed by a fixed delay correction to align picked positions with the pulse leading-edge arrivals. In coarse extraction, synthesized periodic signals are constructed under multiple slicing step lengths, and maximum correlated kurtosis deconvolution is applied to enhance impulsive horizon responses, yielding potential horizon sequences. These candidates are then screened and fused using a cross-step-length consistency criterion to suppress false alarms. In fine extraction, a continuity constraint is introduced within an online sliding window to filter isolated points, segment horizons, and perform curve fitting and correction, further reducing residual false alarms and improving continuity.  Results and Discussions  Simulation and field-data experiments were conducted to evaluate detection probability, false alarm probability, horizon positioning error, processing time, and extracted horizon profiles. Monte Carlo results show that the fine extraction stage further reduces false alarms and positioning errors while maintaining detection performance close to that of the coarse extraction stage (Fig.5, Fig.6). When the echo signal-to-noise ratio is higher than –15 decibels, the detection probability exceeds 70.000% and the false alarm probability remains below 0.200%; when it is higher than –10 decibels, the detection probability exceeds 99.000%, the false alarm probability falls below 0.100%, and the positioning error approaches one sample interval (Fig.6). In sub-bottom survey simulation, the proposed method successfully extracts both the seabed surface and the buried sedimentary horizon under different noise conditions, with results more refined than those of the comparative algorithm based on fractional Fourier transform and overall comparable to manual interpretation (Fig.7, Fig.8). Field-data results further confirm its effectiveness: for the signal-based comparative algorithms, the proposed method achieves an average detection probability of 91.833%, an average false alarm probability of 0.004%, and an average positioning error of 10.15 samples, while the comparative algorithm based on fractional Fourier transform shows a much higher false alarm probability of 3.987% (Table 1). For the image-based comparative algorithms, although detection probabilities are above 95%, their false alarm probabilities and processing times remain markedly higher than those of the proposed method (Table 2). Qualitative results also show that the extracted horizons agree well with manual interpretation trends, with lower background noise, no obvious large-scale false layers, and good preservation of local fluctuations and interruptions (Fig.912). Overall, the proposed method achieves a more favorable balance for online horizon extraction by combining acceptable detection probability and positioning accuracy with extremely low false alarm probability and real-time processing capability (Table 1, Table 2).  Conclusions  This study presents a real-time sub-bottom horizon picking method based on maximum correlated kurtosis deconvolution combined with continuity constraint, structured into three stages: preprocessing, coarse extraction, and fine extraction. The method effectively extracts the seabed surface and sedimentary horizons while meeting real-time processing requirements. Simulation results show that when the signal-to-noise ratio exceeds –10 dB, the method achieves a detection probability greater than 99.000%, a false alarm probability below 0.100%, and a positioning error near one sample. Field data processing results indicate an average detection probability of 91.833%, an average false alarm probability of 0.004%, and an average positioning error is 10.15 samples. These findings validate the effectiveness and practical value of the proposed approach for real-time extraction of shallow sub-bottom horizons. The method demonstrates the ability to maintain high detection accuracy while minimizing false alarms and ensuring millisecond-level processing times, making it highly suitable for online sub-bottom horizon extraction tasks in practical applications.
A Cross-Precision Motion Compensation Technique for Security Surveillance Video Coding
JIANG Wei, MA Wei, LU Jinghui, ZHANG Yue, ZHANG Yundong
Available online  , doi: 10.11999/JEIT251301
Abstract:
  Objective  In the field of modern security surveillance, high-altitude dome cameras are often deployed at critical locations such as bridges and tower tops that are susceptible to external interference, resulting in problems such as jitter and blurring in captured videos, which pose great challenges to video coding. In video compression coding, high-precision motion compensation is the key to improving coding efficiency. The existing Ultimate Motion Vector Expression (UMVE) technique suffers from insufficient precision and lack of flexibility in adaptive adjustment. Although high-precision coding tools such as Registration-Based Coding Mode (RCM) and Affine Motion Compensation Prediction (AFFINE) can improve compensation accuracy, they have disadvantages of high computational complexity and hardware cost, making it difficult to meet the multiple requirements of coding efficiency, power consumption and real-time performance in high-altitude surveillance scenarios. Therefore, aiming at the core pain points of video coding for high-altitude dome cameras, it is of important academic value and practical application significance to design an optimized UMVE scheme that combines high-precision motion compensation, low computational complexity and scene adaptability, so as to improve coding efficiency and balance resource consumption.  Methods  This study proposes an Ultimate Motion Vector Expression technique supporting Cross-Precision Motion Compensation (UMVE_CPMC). Its core is to improve motion compensation accuracy by constructing an extended Up-Precision Motion Vector (UPMV), whose mathematical expression is UPMV = BaseMV + MMV(p, angle), where BaseMV is the basic motion vector obtained by the existing UMVE method, and MMV is the refined fine-tuning motion vector based on specific precision p and angle, with incremental candidates only provided at the 1/8 precision level to balance computational complexity and compression efficiency. For step-size adaptive adjustment, an improved scheme with six modes is proposed, covering enhanced UMVE, conventional UMVE and four precision-improved modes, allowing the encoder to switch flexibly according to scene characteristics. The average image gradient is adopted as an objective evaluation index; test scenes are divided into Class A (high-definition motion scenes) and Class B (low-definition scenes), and different coding configurations, sequences and parameters are set to compare coding gains and computational efficiency under different modes.  Results and Discussions  Experiments show that UMVE_CPMC achieves effective performance improvement in various scenes and modes. In Class A high-definition motion scenes, with the adaptive strategy disabled and RCM disabled, the average gains of Y, U and V components in Fusion Mode 1 reach -2.912%, -1.656% and -1.654% respectively, and the average coding time is reduced to 94.55% of the baseline; the average gain of the Y component in Independent Mode 1 reaches -2.925%, with coding time reduced to 91.91% of the baseline. Compared with traditional UMVE, when CPMC Independent Mode 1 is enabled under the scenario where RCM is enabled and other tools work collaboratively, the gain is improved from -0.276% to -1.310%, showing significantly higher cost performance. In Class B low-definition scenes, after enabling adaptive adjustment, the gain losses of Fusion Mode 1 and Mode 0 are significantly reduced, with average gain losses controlled at 0.071% and 0.108% respectively, successfully maintaining the original coding gain. In multi-scene comprehensive tests, when RCM and AFFINE are disabled, 9 out of 10 test sequences in adaptive Fusion Mode 1 show positive gains, including a Y-component gain of -10.691% for the yuxuedaolu sequence and -11.400% for the BQTerrace sequence. When all existing coding tools are enabled, the Y-component gains of dianjing, yuxuedaolu and BQTerrace sequences reach -1.29%, -2.05% and -1.21% respectively, with coding time reduced to 94%–96% of the baseline. In addition, correlation analysis between average image gradient and gain reveals a significant positive correlation: images with high average gradient (high definition) achieve greater gains from UMVE_CPMC, while those with low average gradient (low definition) hardly benefit. Principle analysis indicates that pixel changes in low-definition images are gentle, and high-precision interpolation fails to generate effective pixel values, resulting in insignificant compensation effects. Performance differences among modes match computational complexity: the fusion mode balances gain and stability, while the independent mode further reduces computation. The six step-size adaptive modes can meet real-time and precision requirements of different scenes.  Conclusions  The proposed UMVE_CPMC technique, by integrating cross-precision motion compensation with the UMVE algorithm, effectively solves the core problems of insufficient precision in traditional UMVE and high computational complexity of high-precision coding tools, achieving a favorable balance among coding efficiency, computational complexity and scene adaptability. This technique delivers remarkable coding gains in Class A high-definition motion scenes, with gains exceeding 10% for some sequences without other high-precision compensation tools and 1%–2% when cooperating with other tools. In Class B low-definition scenes, the original coding gain can be maintained through frame-level adaptive adjustment interfaces. Meanwhile, the fusion mode does not increase hardware complexity, and the independent mode significantly reduces coding time, suitable for encoder designs with limited resources or simplified requirements. UMVE_CPMC provides a new effective approach to solving the low coding efficiency caused by jitter and blurring in high-altitude dome camera video coding, enriches the video coding toolset, and offers important practical guidance for the optimization of video coding technologies in the security surveillance field. Future work can further optimize the adaptive strategy, explore integration with other advanced coding technologies, develop personalized coding schemes, and improve performance in complex scenarios.
Modeling and Characterization of Broadband Earth-Moon-Earth Communication Channels
LI Chengqian, QIAN Xiaowei, HU Xiaoling
Available online  , doi: 10.11999/JEIT251028
Abstract:
  Objective  This paper presents a comprehensive channel model for wideband Earth-Moon-Earth (EME) communication, tackling the shortcomings of traditional simplified models that cannot accurately represent the Moon’s complex scattering behavior and terrain-induced effects. Existing approaches, which treat the Moon as a point reflector or depend on empirical scattering laws, are inadequate for broadband, high-capacity systems. To address this, a unified large-scale link model is proposed to statistically capture terrain-driven reflection characteristics, while a small-scale model systematically analyzes multipath and Doppler effects, decomposing the channel and quantifying dynamic impairments. Link-level simulations validate the model’s accuracy. This work fills a critical gap in broadband EME channel modeling, providing a necessary foundation for the design and optimization of future deep space communication systems.  Methods  A dual-scale modeling approach is proposed for wideband Earth-Moon-Earth (EME) channels. At the large scale, a unified integral path loss model is developed for both wide- and narrow-beam scenarios, with lunar terrain statistically represented by a Gaussian height distribution to capture shadowing and roughness effects. A distributed integration method is used to compute effective RCS under narrow-beam conditions. At the small scale, the channel is decomposed into quasi-specular and diffuse components, with delay-power profiles derived from surface roughness and scattering mechanisms. Doppler shift and spread are analytically modeled based on Earth-Moon orbital dynamics. Monte Carlo simulations and numerical integration verify the models, and system-level performance is evaluated in terms of BER under various channel conditions with different equalization and frequency offset correction schemes.  Results and Discussions  A comprehensive channel model is developed to capture both large- and small-scale fading in wideband Earth-Moon-Earth (EME) communication. The large-scale model, validated by simulations, accurately represents the non-uniform power distribution across the lunar disk through an integrated RCS approach. At the small scale, quasi-specular and diffuse components characterize multipath delay spread, while the Doppler model quantifies effects from Earth’s rotation and lunar orbital motion, with a two-way shift of ~4.5 kHz and a spread of ±39.88 Hz at 1.296 GHz. Low-SNR simulations show that conventional equalizers (LMS, RLS, RAKE) stagnate near BER = 0.1, and frequency correction methods (FFT-based, MLE) degrade under large frequency offsets, highlighting the challenges of accurate compensation.  Conclusions  This paper develops and validates a comprehensive channel model for broadband Earth-Moon-Earth (EME) communication. The model more accurately predicts path loss, shadowing, multipath delay, and Doppler effects than conventional point-target or empirical methods. Results show that lunar terrain and surface properties cause severe signal degradation, which traditional equalization and frequency correction cannot effectively mitigate. Future work should integrate high-resolution lunar DEMs and measured RCS data to improve accuracy and explore adaptive methods, such as machine learning, to handle severe delay spread. This model offers a foundation for reliable EME links and future deep-space communication networks.
An Ultra-Wideband Low-Profile Dipole Patch Antenna for VHF-Band Probing Radars
TIAN Yuxiao, ZHANG Feng, MA Zhangjun, WANG Jiacheng, JI Yicai
Available online  , doi: 10.11999/JEIT260105
Abstract:
  Objective  In radar systems, the limitations of traditional narrowband antennas in data transmission rate and resolution have become increasingly evident. Ultra-WideBand (UWB) antennas therefore receive broad attention because they provide high range resolution and strong interference suppression capability. However, at low frequencies, existing UWB antennas usually suffer from excessively large physical size, which makes installation on airborne or vehicle-mounted platforms difficult. By contrast, compact antennas that are easier to deploy often exhibit insufficient gain and cannot satisfy the penetration-depth requirement of deep subsurface detection. Thus, achieving a proper balance among antenna size, bandwidth, and gain over an ultra-wideband range remains a major challenge for VHF-band probing radars. To address this issue, a planar dipole antenna loaded with an Artificial Magnetic Conductor (AMC) structure and metallic shorting walls is proposed. The antenna maintains stable radiation performance over a wide frequency range while preserving a low-profile and structurally simple configuration.  Methods  The reflection-phase characteristics of AMC unit cells with different geometries are compared, and square unit cells are selected to construct a 9 × 7 AMC reflective layer. Owing to its in-phase reflection property, the AMC structure removes the conventional requirement for a quarter-wavelength spacing between the antenna and a metallic ground plane, thereby reducing the profile height. The dipole patch adopts an optimized meandered current-bending structure to reduce the lateral size. Metallic shorting walls are further loaded at both ends of the antenna. According to image theory, equivalent currents are generated on the outer surfaces of these metal walls during operation, which effectively extends the electrical length and improves low-frequency performance without increasing the physical size. In addition, two vertical metallic walls are connected to the ground plane on both sides of the antenna to form a reflective back cavity, which strengthens unidirectional radiation and improves antenna gain. As part of the overall co-design, four 125 Ω resistors are inserted between the feed region and the metallic sidewalls. This resistive loading suppresses strong low-frequency resonances and broadens the impedance bandwidth at the cost of acceptable Ohmic loss.  Results and Discussions  A prototype with favorable simulated performance is fabricated and measured in a microwave anechoic chamber. The measured impedance bandwidth for VSWR<2 is 50~400 MHz, which agrees well with the simulated range of 84~366 MHz. The measured impedance matching is slightly better than the simulated result, mainly because cable loss and power-divider loss in the feeding network reduce the reflected power. The measured gain follows the same trend as the simulated gain, with deviations within 1 dBi. Radiation-pattern measurements show that at 100, 200, and 300 MHz, the measured copolarization patterns agree well with the simulated results, and the maximum radiation direction remains normal to the antenna plane, which confirms the effectiveness of the proposed design. As shown in Fig. 5, the current on the radiating patch layer mainly flows along the +x direction and generates a radiated electric field along the +z direction. The current on the AMC unit can be represented by an equivalent current loop oriented along the +z direction. At this frequency, the x-direction current and the parasitic current loop on the AMC jointly enhance the antenna gain. This result explains the gain-improvement mechanism of the AMC structure. When the operating frequency increases to 400 MHz, the electrical size of the antenna reaches approximately \begin{document}$ 1.6\lambda $\end{document}, which causes main-lobe splitting and shifts the maximum radiation direction toward 90°. Although this high-frequency beam splitting introduces spatial clutter, it is an acceptable physical trade-off for achieving the ultra-low profile of 0.07 λL, while the overall UWB characteristic still supports high time-domain resolution in probing radar systems. At 400 MHz, the measured H-plane co-polarization level is slightly higher than the simulated value, possibly because of coupling between the feeding cable and the vertically mounted antenna.  Conclusions  A low-profile UWB planar dipole antenna is proposed for VHF-band probing radar applications. By combining the AMC layer, metallic shorting walls, and resistive loading, the proposed design improves impedance matching while preserving a compact size. The reflective back cavity further improves the realized gain. The fabricated prototype shows good agreement between measurement and simulation. The antenna operates over 100–366 MHz and exhibits a measured VSWR<2 bandwidth of 50~400 MHz. It maintains a compact electrical size of 0.38λL × 0.18λL × 0.07λL, and the maximum measured gain within the operating band reaches 6 dBi. The proposed co-design provides a practical solution for low-frequency probing radar antennas that require wide bandwidth, low profile, and relatively high gain.
Semantic Relation-enhanced Adaptive Graph Representation Learning for Next POI Recommendation
WANG Zhuolu, XU Shenghua, WANG Yong, JIANG Shunshun
Available online  , doi: 10.11999/JEIT251357
Abstract:
  Objective  In recent years, next Point Of Interest (POI) recommendation has played an increasingly important role in Location-Based Social Networks (LBSNs). However, existing Graph Representation Learning (GRL)-based recommendation methods have struggled to balance node distributions across different domains (i.e., node types) effectively and have often overlooked feature differences among heterogeneous relations. Thus, complex semantic dependencies in contextual information cannot be fully captured when users’ temporal preference patterns are modeled.  Methods  To address these issues, a next POI recommendation method based on Semantic Relation-enhanced adaptive Graph Representation Learning (SR-GRL) is proposed. A heterogeneous transition graph is constructed to integrate three entity types, namely POIs, POI categories, and regions, and their complex interrelationships. An adaptive balanced random walk sampling strategy is designed to balance node distributions across different domains dynamically and to reduce information redundancy. A type-aware attention mechanism is then used to learn semantic associations among nodes through relation-specific transformation matrices, so that feature differences across node types can be identified effectively. The obtained disentangled POI representations are then used for spatiotemporal encoding of user check-in sequences, and a self-attention mechanism is applied to aggregate users, temporal preference features. Finally, next POI recommendation is generated through a Softmax function.  Results and Discussions  Experiments on the Foursquare datasets from Tokyo and New York and the Sina Weibo dataset from Shanghai show that, compared with state-of-the-art baselines, the SR-GRL method achieves Recall@10 improvements of 2.22%\begin{document}$ \sim $\end{document}24.16%, F1@10 improvements of 1.16%\begin{document}$ \sim $\end{document}10.48%, and NDCG@10 improvements of 3.01%\begin{document}$ \sim $\end{document}17.37%, indicating better recommendation performance.  Conclusions  Overall, the SR-GRL approach can balance the distributions of different node types dynamically and strengthen the modeling of complex semantic dependencies in heterogeneous contextual information.
Multi-Agent Deep Reinforcement Learning Strategy for Multi-Spacecraft Long-Distance Orbital Game
DI Peng, YIN Zengshan, LIN Zheng, YAO Ye
Available online  , doi: 10.11999/JEIT251384
Abstract:
This paper introduces a novel research scenario for multi-spacecraft Orbital Pursuit-Evasion Game (OPEG), which has not yet been systematically studied. To enhance the decision-making capabilities of spacecraft and enable them to formulate more robust policies in complex multi-agent games, this paper proposes a multi-agent deep reinforcement learning algorithm based on a progressive adversarial training framework to solve the game policies of each spacecraft. Two sets of examples with different orbital characteristics and various simulation conditions were set up for simulation verification, and behavioral deviation analysis is conducted to verify the robustness of the policy. The impact of different orbital characteristics, simulation conditions, and behavioral deviations on the game policy was analyzed. Simulation results show that the proposed method enables each spacecraft to formulate an effective game policy that satisfies all set constraints and has good robustness.  Objective  As the space environment becomes increasingly complex, space security has become a hot research area. The existence of a large amount of space debris and failed spacecraft poses a serious threat to high-value spacecraft in orbit. Therefore, the study of Orbital Pursuit-Evasion Game (OPEG) for non-cooperative target spacecraft has attracted widespread attention. Existing research focuses on OPEG for two spacecraft, but less on OPEG for multiple spacecraft. When there are more than two players in the game, zero-sum game design is not feasible, and it is difficult to solve using traditional methods. Furthermore, existing research ignores engineering dynamic constraints and simplifies or defines the dynamics as a two-dimensional scene when modeling the problem, which can cause considerable errors. To overcome the limitations of existing spacecraft game scenarios, this paper proposes a novel multi-spacecraft OPEG research scenario. The aim is to investigate the application of the MADRL algorithm in solving the approximate steady-state policies of each spacecraft in long-distance multi-spacecraft OPEG, highlighting the significant advantages of the MADRL algorithm in solving multi-spacecraft OPEG, and providing a feasible solution for truly realizing autonomous multi-spacecraft game play in the future.  Methods  The Multi-Agent Proximal Policy Optimization (MAPPO) algorithm based on the Progressive Adversarial Training Framework (PATF) is used to solve the optimal game policy for each spacecraft in the Multi-Spacecraft OPEG. First, a multi-constrained multi-spacecraft OPEG model is established based on actual engineering constraints, and the problem is transformed into a Decentralized Partially Observable Markov Decision Process (Dec-POMDPs). Secondly, in order to improve the decision-making ability of agents in complex multi-agent game environments and formulate more robust game policies, a novel PATF is introduced, with different reward functions designed for the specific missions of each spacecraft. Finally, two sets of simulation examples with different orbital characteristics were set up, and four different simulation conditions were set up for simulation and behavioral deviation analysis was performed.  Results and Discussions  The MAPPO algorithm based on the PATF proposed in this paper is compared with the original MAPPO (Fig. 3). The results show that the proposed method can learn effective policies more quickly, reduce ineffective exploration, and achieve a higher final convergence reward value with less fluctuation in the reward curve. This also demonstrates that the PATF can significantly enhance the decision-making ability of agents, enabling them to formulate robust policies more effectively. Simulation verification was performed using two sets of examples in four different settings (Figs. 4, 5, 6, and 7). Simulation results (Tables 3 and 4) show that the proposed method performs well in both sets of examples. Furthermore, it was verified that when the pursuer and the interceptor are on the same orbital plane, the pursuer is more likely to be intercepted. When the interceptor and the target are not on the same orbital plane, the interceptor has a relatively easier time carrying out the interception mission. This paper also analyzes the situation where both sides of the game have behavioral biases, and models this by adding control noise. Simulation results (Tables 5 and 6) show that both sides adopt relatively conservative policies to counter the control noise. The game policy formulated by the method in this paper is an approximate steady-state policy. Behavioral deviations will lead to a decrease in one’s own payoff and an increase in the opponent's payoff, and the game policy has good robustness.  Conclusions  The method proposed in this paper can be well applied to solving the long-distance OPEG problem involving multiple spacecraft in non-coplanar elliptical orbits, enabling each spacecraft to formulate excellent game policies. The PATF facilitates better decision-making by the spacecraft in complex multi-spacecraft dynamic systems, with robust control policies developed by the pursuer and interceptors. The results also demonstrate the accuracy and effectiveness of the reward function design. Through two sets of examples and simulation results with different settings, the impact on the policies of both parties when the pursuer and interceptor have different orbital characteristics is analyzed. When interceptors have different maximum thrusts, the decision-making of each spacecraft changes accordingly. The behavior deviation analysis proves that the game policies of each spacecraft have good robustness. When one party’s behavior deviates, the approximate steady-state policy balance will change, resulting in a decrease in its own benefits and an increase in the other party’s benefits. The research scenario formulated in this paper expands the scope of existing research on multi-spacecraft game problems.
Joint Power Allocation and AP On-Off Control for Long-Term Energy Efficient Cell-Free Massive MIMO Systems
WEI Siqi, GUO Fengqian, CHONG Baolin, CHENG Guo, LU Hancheng
Available online  , doi: 10.11999/JEIT260014
Abstract:
  Objective   With the rapid development of wireless communication technologies, Cell-Free Massive Multiple-Input Multiple-Output (CF-mMIMO) has emerged as an effective paradigm to overcome the limitations of traditional cell-centric networks, such as limited performance for edge users. By deploying a large number of distributed Access Points (APs) connected to a Central Processing Unit (CPU) to cooperatively serve users, CF-mMIMO improves spectral efficiency and macro-diversity gain. However, dense AP deployment also introduces a critical challenge: high energy consumption. In practical systems, if all APs remain continuously active, especially during periods of low traffic load, substantial and unnecessary energy consumption occurs. This behavior reduces network sustainability and conflicts with global “dual-carbon” goals. Existing studies on energy efficiency in CF-mMIMO systems mainly focus on short-term performance optimization. These short-term approaches often ignore long-term traffic dynamics and the requirement of queue stability. Therefore, they lack robustness under time-varying traffic conditions and may cause queue congestion and significant performance fluctuations, which are unacceptable for next-generation wireless networks with strict reliability requirements. Although several recent studies examine long-term energy efficiency optimization, most assume that all APs remain active at all times. Therefore, the energy-saving potential of adaptive AP on-off control is not fully utilized.  Methods   To address these issues, a joint power allocation and AP on-off control strategy is proposed for downlink CF-mMIMO systems. The optimization problem aims to maximize long-term energy efficiency subject to user queue stability and AP power constraints. Because the problem has stochastic and long-term characteristics, the Lyapunov optimization framework is applied to transform the original long-term fractional programming problem into a sequence of deterministic drift-plus-penalty minimization problems solved in each time slot. The resulting per-slot problems remain nonconvex. Therefore, each problem is decomposed into two subproblems: power allocation and AP on-off control. The Successive Convex Approximation (SCA) method is used to convert the nonconvex formulations into solvable convex problems. An alternating optimization algorithm is then developed to jointly solve the two subproblems, which enables adaptive resource configuration under dynamic network conditions and stochastic traffic arrivals.  Results and Discussions   The proposed algorithm is evaluated through extensive simulations. First, the convergence behavior is examined. Numerical results (Fig. 2) show that per-slot energy efficiency increases rapidly and stabilizes after several iterations, which verifies the convergence of the alternating optimization procedure. Second, the effect of the control parameter is analyzed. As the parameter increases, the algorithm places greater emphasis on energy efficiency. Average power consumption decreases and then stabilizes (Fig. 3), whereas long-term energy efficiency increases and eventually stabilizes (Fig. 4). These results confirm the trade-off between energy efficiency and queue stability. Third, the proposed scheme is compared with three baseline methods. The results (Fig. 5) show that the proposed joint optimization approach consistently achieves higher long-term energy efficiency than the baseline methods. Fourth, the necessity of long-term optimization is demonstrated by comparing queue lengths with a short-term baseline (Fig. 6). Under the same traffic arrival rate, the short-term method shows cumulative queue growth, whereas the Lyapunov-based approach maintains queue lengths within a stable range and ensures network stability. Finally, robustness under imperfect Channel State Information (CSI) is evaluated (Fig. 7). Although energy efficiency decreases as channel uncertainty increases, the proposed method consistently outperforms the baseline approaches, which demonstrates strong robustness to channel estimation errors.  Conclusions   A long-term energy efficiency optimization framework is proposed for CF-mMIMO systems with stochastic traffic arrivals. By applying Lyapunov optimization theory, the stochastic long-term problem is transformed into slot-level drift-plus-penalty problems based on queue states. This transformation enables per-slot resource scheduling decisions while maintaining queue stability. On this basis, an efficient joint resource scheduling algorithm that integrates power allocation and AP on-off control is developed. The original problem is decomposed into power allocation and AP on-off control subproblems and solved through alternating optimization. Simulation results show that the proposed method adapts to dynamic traffic conditions. By placing underutilized APs into sleep mode, the algorithm improves long-term system energy efficiency and maintains queue stability. These results provide guidance for the design of green and sustainable wireless networks.
Communication, Computation, and Caching Resource Collaboration for Heterogeneous Artificial Intelligence Generated Content Service Provisioning
WU Mengru, GAO Yu, ZHAO Bo, XU Bo, SUN Hao, GUO Lei
Available online  , doi: 10.11999/JEIT251300
Abstract:
  Objective  In the Artificial Intelligence of Things (AIoT), Edge Servers (ESs) provide intelligent content generation services to AIoT devices by utilizing cached Artificial Intelligence Generated Content (AIGC) models. However, the limited computing resources and caching capacity of ESs make it difficult to support the large-scale caching demands of heterogeneous AIGC services. To address this issue, a communication, computation, and caching resource collaboration scheme is proposed based on a combined cloud-edge and edge-edge collaborative framework. The scheme considers three representative AIGC services: lightweight AIGC services, computation-intensive AIGC services, and preprocessing-based AIGC services. The objective is to minimize the total AIGC service latency through joint optimization of transmit power, computing resource allocation, model caching strategies, and offloading decisions.  Methods  Communication, computation, and caching resource collaboration for heterogeneous AIGC services is investigated. First, an AIGC service-oriented AIoT system model is established to incorporate both cloud-edge and edge-edge collaboration. An optimization problem is then formulated to minimize the total latency of AIGC services through joint optimization of transmit power, computing resource allocation, model caching strategies, and offloading decisions. Because the formulated problem is non-convex, an Alternating Optimization (AO) algorithm is proposed. The original problem is decomposed into three subproblems. These subproblems are solved using the Successive Convex Approximation (SCA) method, Karush-Kuhn-Tucker (KKT) conditions, and an improved Harris Hawks Optimization (HHO) algorithm.  Results and Discussions  Simulation experiments compare the proposed joint optimization scheme with three baseline methods: Particle Swarm Optimization (PSO), fixed resource allocation, and random offloading and caching. First, the convergence of the proposed AO algorithm is verified (Fig. 2). The results show that the algorithm converges rapidly within a limited number of iterations across different subproblems. Second, increasing transmission bandwidth significantly reduces the total AIGC service latency (Fig. 3). This occurs because each device obtains more bandwidth resources for task transmission, and the ES can allocate more bandwidth to deliver generated content in the downlink. Furthermore, the total AIGC service latency decreases as the ES storage capacity increases for all schemes (Fig. 4). Greater storage capacity enables the ES to store more AIGC models, which reduces the transmission delay between the ES and the cloud server. Moreover, when the required floating-point operations per bit increase, the total AIGC service latency rises significantly across all schemes (Fig. 5). Finally, the total AIGC service latency decreases as the maximum transmit power of the Base Station (BS) increases (Fig. 6). This occurs because higher BS transmit power improves the downlink signal-to-noise ratio, which increases the downlink transmission rate and reduces overall service latency. The proposed scheme demonstrates better performance than the baseline schemes, particularly under high computational demand.  Conclusions  Communication, computation, and caching resource collaboration for heterogeneous AIGC services is investigated. The objective is to minimize total AIGC service latency through joint optimization of the transmit power of AIoT devices and BSs, computing resource allocation, AIGC model deployment, and service offloading decisions under computation and caching resource constraints. Because the formulated problem is a mixed-integer nonlinear programming problem, an efficient AO algorithm is developed. The original optimization problem is decomposed into three subproblems, which are solved using the SCA algorithm, KKT conditions, and the HHO algorithm, respectively. Simulation results show that the proposed algorithm reduces the total AIGC service latency compared with the baseline schemes.
Research on Monophonic Speech Separation Method Using Time-Frequency Domain Multi-scale Information Interaction Strategy
LAN Chaofeng, YANG Guotao, CHEN Yingqi, GUO Xiaoxia
Available online  , doi: 10.11999/JEIT251340
Abstract:
  Objective  Monaural speech separation aims to extract individual speaker signals from a single-channel mixture. It is a core technology for addressing the “cocktail party problem” and has substantial application value in low-resource, low-latency scenarios such as mobile voice assistants, teleconferencing, and hearing aids. However, the lack of spatial cues in single-channel signals, together with the substantial overlap of multiple speakers in both time-domain waveforms and frequency-domain spectra, makes accurate separation highly challenging, especially when the integrity and clarity of the target speech must be preserved. Current deep learning-based models often show limitations in three closely related aspects: effective coordination of multi-scale dependencies, efficient fusion of time-frequency information, and control of computational complexity. To address these challenges, a novel Multi-Scale Attention model integrating Time-Frequency domain information (MSA-TF) is proposed to improve separation performance, computational efficiency, and generalization capability.  Methods  The MSA-TF model contains three key components. First, a lightweight Time-Frequency fusion module is designed. The module first divides the frequency band into four subbands on the basis of speech priors, such as low-frequency energy concentration and high-frequency detail sensitivity, to extract spectral features efficiently. A dynamic gating mechanism with decomposed convolutions and SiLU activation is then applied to adaptively enhance speaker-discriminative features and suppress redundant channels associated with noise. Finally, a cross-attention mechanism is used to promote deep interaction between time-domain and frequency-domain features during the encoding stage. Global semantic information from the time domain guides the selection and weighting of useful frequency-domain features, allowing mutual correction and complementarity. This module adds only 0.8 M parameters. Second, a Multi-scale Interaction Separator is proposed to address the limitations of sequential or loosely coupled multi-scale processing in models such as SepFormer. Multi-granularity features, ranging from frame-level F 1 to syllable-level semantic F 4, are extracted through cascaded dilated convolutions. Its core is the “GF-LF Iterative Feedback” mechanism. The Global Flash module, based on efficient FLASH attention, captures long-range dependencies and syllable-level context. This global information is upsampled and injected into local features ( F k) through residual connections. Local Flash modules, also based on FLASH attention, then process the enhanced local features (\begin{document}$ {\boldsymbol{F}}_k^{\prime} $\end{document}) to model fine-grained structures and suppress frame-level noise. The updated local features are subsequently fed back through adaptive pooling to refine the global representation in the next iteration. This closed-loop bidirectional flow enables deep synergy between global semantics and local details. A gated fusion mechanism at the end dynamically balances the contributions of different scales. Third, to control computational complexity, an efficient hierarchical grouped attention mechanism is adopted, reducing the complexity from quadratic to nearly linear with sequence length. The overall MSA-TF architecture is end-to-end and consists of a 1D convolutional encoder, the integrated time-frequency and multi-scale modules, a mask network, and a symmetric decoder.  Results and Discussions  Extensive experiments are conducted on the standard WSJ0-2mix and Libri-2mix datasets, with Scale-Invariant Signal-to-Noise Ratio (SI-SNR) and Signal-to-Distortion Ratio (SDR) used as evaluation metrics. Ablation studies (Table 1) confirm the individual and joint contributions of the proposed modules. When only the time-frequency module is added to the TDAnet baseline, SI-SNR increases by 0.3 dB and SDR by 0.4 dB with only a small increase in parameters, confirming its contribution to signal structure modeling, particularly for high-frequency details. When only the multi-scale interaction module is incorporated, SI-SNR increases by 2.5 dB and SDR by 2.7 dB, highlighting its central role in modeling long-term dependencies. When the time-frequency and multi-scale modules are combined in the complete MSA-TF core, a synergistic effect is obtained, reaching 17.6 dB SI-SNR, which exceeds the sum of the individual gains. This result indicates that the dual-dimensional features provided by time-frequency fusion and the deep dependency modeling enabled by multi-scale interaction strengthen each other. Spectrogram analysis (Fig. 3) further shows that the time-frequency module effectively suppresses residual high-frequency noise and produces clearer spectral contours for the target speech. On the WSJ0-2mix test set (Table 2), MSA-TF achieves state-of-the-art performance, with 17.6 dB SI-SNR and 17.8 dB SDR. It matches the performance of SuperFormer and substantially outperforms strong baselines such as Conv-Tasnet by 2.3 dB SI-SNR, while maintaining a reasonable parameter count of 15.6 M. Compared with models with larger parameter sizes, such as SignPredictionNet at 55.2 M, MSA-TF shows more efficient modeling. For generalization evaluation on the completely unseen Libri-2mix dataset (Table 4), MSA-TF, trained only on WSJ0-2mix, achieves 14.2 dB SI-SNR and 14.7 dB SDR. Its performance is comparable to that of Conv-Tasnet models trained specifically on Libri-2mix, which achieve 14.4 dB SI-SNR, and it outperforms BLSTM-Tasnet trained on Libri-2mix. This strong cross-dataset adaptability indicates that the model captures universal time-frequency characteristics and multi-scale dependency structures in speech signals rather than overfitting to a specific dataset distribution.  Conclusions  An MSA-TF model is presented to address key challenges in monaural speech separation through deep integration of multi-scale time-frequency information interaction. The proposed lightweight Time-Frequency fusion module efficiently supplements time-domain features with discriminative frequency-domain information. The Multi-scale Interaction Separator, with its iterative feedback mechanism, enables dynamic bidirectional information flow across scales and substantially improves the joint modeling of short-term details and long-term dependencies. Combined with an efficient attention design, the model achieves superior performance without excessive computational cost. Experimental results show that MSA-TF achieves leading separation performance on standard benchmarks and shows strong generalization ability on unseen data distributions, confirming the effectiveness of this comprehensive design. The model provides an efficient, robust, and generalizable solution for practical low-resource application scenarios. Future work may examine advanced cross-modal fusion techniques and dynamic scale adjustment strategies to further improve robustness and performance in more complex and variable acoustic environments.
Intelligent Sorting Algorithm for Multi-station Radar Signals Based on Federated Learning
YE Chengji, XIE Jian, ZHANG Zhaolin, WANG Ling
Available online  , doi: 10.11999/JEIT251355
Abstract:
  Objective  Radar signal sorting is a critical step in electronic reconnaissance and battlefield situational awareness. It is used to accurately separate interleaved pulse streams in complex electromagnetic environments. Although multi-station cooperative reconnaissance systems provide spatial diversity gains that can mitigate the parameter ambiguity and aliasing problems of single-station systems, their practical deployment faces major challenges. Traditional centralized processing architectures require massive volumes of raw Pulse Description Word (PDW) data to be transmitted to a central server. This requirement leads to prohibitive communication bandwidth costs and increases the risk of leakage of sensitive electromagnetic spectrum intelligence. In addition, because stations are geographically distributed and differ in antenna scanning patterns, the data collected at different stations often show significant Non-Independent and Identically Distributed (Non-IID) characteristics. Such heterogeneity reduces the generalization ability of local models trained on isolated data islands. To resolve the conflict between data isolation and the need for collaborative intelligence, a multi-station collaborative radar signal sorting method is proposed based on a Federated Learning (FL) framework. Collaborative model training is enabled without exchange of raw data, so that data privacy is preserved, communication overhead is reduced, and sorting robustness is improved in heterogeneous and noisy battlefield environments.  Methods  A centralized federated sorting framework is constructed to coordinate multiple reconnaissance stations. The method contains three main components: feature preprocessing, a lightweight local temporal model, and a heterogeneity-aware aggregation strategy. First, in data preprocessing, the raw PDW parameters, including TOA, CF, and PW, are normalized to address substantial differences in scale. Specifically, TOA is transformed into first-order differential values to extract Pulse Repetition Interval (PRI) information, which prevents numerical saturation and captures periodic patterns effectively (Fig. 3). Second, a local time-series sorting model is designed for the resource constraints of edge devices. A bidirectional Long Short-Term Memory (LSTM) network is used as the backbone to capture long-range dependencies and dynamic patterns in pulse sequences from both forward and backward directions. To accelerate convergence and prevent gradient vanishing, residual connections are added to fuse static and dynamic features. The extracted features are then mapped to the radiation source category space through a cascaded linear classification layer. Third, to address model drift caused by Non-IID data, including feature distribution shift and label distribution shift, a new aggregation strategy is proposed based on parameter decomposition and proximal regularization. Model parameters are decoupled into a feature extractor and a classifier. During federated aggregation, only the parameters of the generic feature extractor are uploaded and globally averaged, whereas the personalized classifier parameters are retained locally to adapt to the class distribution of each station. Furthermore, a proximal regularization term is added to the local loss function (Eq. 20). This constraint limits the deviation of local updates from the global model and ensures that the optimization direction does not diverge substantially because of local data heterogeneity, thereby improving the stability and convergence speed of the global model.  Results and Discussions  Extensive simulation experiments are conducted on core datasets with 3 stations and 5 radars, and on extended datasets with 9 stations and 12 radars, including complex modulation patterns such as jitter, sliding, and staggering. Quantitative analysis shows that the proposed method achieves sorting performance comparable to that of Centralized Learning (CL). On the core dataset, the Precision, Recall, and F1-score of the proposed method reach 96.51%, 96.35%, and 96.42%, respectively, exceeding those of FedAvg by approximately 0.67% in F1-score. On the more challenging extended dataset, the performance advantage becomes more significant, with an F1-score improvement of 3.86% over FedAvg (Table 4). These results indicate that the parameter decomposition strategy effectively balances common feature learning with personalized decision-making. Analysis by class further shows that, for categories that are difficult to distinguish, such as Radar 7 and Radar 10, the proposed method improves recognition accuracy by up to 15% and 6%, respectively, compared with FedAvg (Fig. 7 and Fig. 8). Robustness tests further demonstrate the adaptability of the method. When the number of participating stations increases from 3 to 9 (Fig. 9), the F1-score rises steadily from 73.53% to 83.75%. This result confirms that enlarging node scale in the FL framework produces collaborative gains through more diverse samples and reduced geographic statistical heterogeneity, which substantially improve model generalization and robustness. Under severe class skew conditions, the method maintains an F1-score above 80% on the core dataset (Fig. 10 and Fig. 11). Furthermore, under extreme electromagnetic conditions characterized by high pulse loss rates of 70% and spurious pulse rates of 70%, the model maintains sorting performance above 75%, which demonstrates strong robustness against noise and interference (Fig. 12).  Conclusions  An FL-based framework is proposed for multi-station collaborative radar signal sorting to address data privacy and transmission constraints in distributed reconnaissance. By integrating a lightweight LSTM with a heterogeneity-aware aggregation mechanism, the method effectively captures temporal pulse features and mitigates model drift caused by Non-IID data. Experimental results verify that the approach achieves accuracy comparable to that of centralized methods and shows superior robustness under label skew and severe data degradation, including high pulse loss and spurious pulse rates. This study provides a privacy-preserving and efficient solution for intelligent signal processing in distributed electronic warfare systems.
Construction Methods of Two-Dimensional Golay-Zero Correlation Zone Array Sets with Flexible Parameters
WANG Meiyue, LIU Tao, CHEN Xiaoyu, LI Yubo
Available online  , doi: 10.11999/JEIT251360
Abstract:
  Objective  Sequences with good correlation properties are widely used in wireless communications, cryptography, and radar systems. However, a sequence set cannot simultaneously achieve ideal autocorrelation and ideal cross-correlation. This limitation has led to the study of two signal classes with ideal correlation properties: Zero Correlation Zone(ZCZ) sequences and Golay Complementary Sets(GCS). A Golay-ZCZ sequence set combines the advantages of both. Its constituent sequences exhibit ideal periodic autocorrelation and cross-correlation within the ZCZ, and the sums of their aperiodic autocorrelations are zero at all nonzero shifts. Therefore, a Golay-ZCZ set is both a ZCZ set and a GCS. It can thus be used in the applications of both sequence classes. An array set is a two-dimensional extension of a sequence set. Although Golay-ZCZ sequence sets have been widely studied and constructed, research on Two-Dimensional (2D) Golay-ZCZ array sets remains limited. This study proposes three constructions of 2D Golay-ZCZ array sets based on 2D multivariable functions and the concatenation operator. These array sets can be used as precoding matrices for massive Multiple Input Multiple Output(MIMO) omnidirectional transmission.  Methods  Three construction methods for 2D Golay-ZCZ array sets are proposed, including one direct construction and two indirect constructions. The resulting parameters have not been reported in existing studies. In the first construction, a 2D Golay-ZCZ array set is generated using 2D multivariable functions, with parameters expressed as prime powers. This direct function-based approach enables efficient synthesis of the target arrays. The second and third constructions generate 2D Golay-ZCZ array sets through horizontal and vertical concatenation of Two-Dimensional Complete Complementary Codes(2D CCC), respectively. In these indirect constructions, the parameters are not restricted to prime powers. This property broadens the applicability of the methods and increases parameter flexibility.  Results and Discussions  The first construction generates a 2D Golay-ZCZ array set with array size \begin{document}$ p_{1}^{{m}_{1}}\times p_{2}^{{m}_{2}} $\end{document} and ZCZ size \begin{document}$ ({p}_{1}-1)p_{1}^{{\pi }_{1}(2)-1}\times ({p}_{2}-1)p_{2}^{{\sigma }_{1}(2)-1} $\end{document} through a direct function-based method, where \begin{document}$ {p}_{1} $\end{document} and \begin{document}$ {p}_{2} $\end{document} are prime numbers. For clarity, the magnitudes of the 2D periodic cross-correlation function of the constructed array set are illustrated in Example 1 (Fig. 1). The second construction generates a ZCZ array set with array size \begin{document}$ {L}_{1}\times {N}^{2}{L}_{2} $\end{document} and ZCZ size \begin{document}$ ({L}_{1}-1)\times (N-1){L}_{2} $\end{document} based on the horizontal concatenation of \begin{document}$ (N,N,{L}_{1},{L}_{2}) $\end{document} 2D CCC. The third construction generates a ZCZ array set with array size \begin{document}$ {N}^{2}{L}_{1}\times {L}_{2} $\end{document} and ZCZ size \begin{document}$ (N-1){L}_{1}\times ({L}_{2}-1) $\end{document} based on the vertical concatenation of \begin{document}$ (N,N,{L}_{1},{L}_{2}) $\end{document} 2D CCC. An illustrative example of Construction 2 is provided, and the corresponding correlation magnitudes are shown in (Figs. 2 and 3). As summarized in (Table 1), the construction methods proposed in this paper generate parameter sets that have not been reported in the existing literature. The constructed array sets provide considerable flexibility in array dimensions and ZCZ sizes. This flexibility is valuable for the design of precoding matrices in MIMO omnidirectional transmission systems. In practical implementations, the dimension of a precoding matrix is typically determined by the number of transmit antennas, whereas the ZCZ size must match the maximum multipath delay spread of the channel. Owing to this parameter flexibility, the proposed 2D Golay-ZCZ array sets support adaptive selection under different antenna configurations and channel conditions.  Conclusions  Three construction methods for 2D Golay-ZCZ array sets are proposed. These methods generate array sets with flexible array sizes and large ZCZ widths. The first construction is based on a 2D multivariable function and can include previous results as special cases without using kernels. The second and third constructions rely on the concatenation operator and provide greater parameter flexibility. The proposed 2D Golay-ZCZ arrays have potential applications in MIMO omnidirectional transmission. The parameter-flexible array sets can be selected according to different antenna configurations and channel conditions. This property suppresses multi-antenna interference within the zero-correlation zone and maintains uniform transmitted energy.
A Lightweight and High-Reliability Challenge Generation Strategy for APUF
LAN Guohao, ZHANG Hui, DUO Bin, WANG Zibin, ZHOU Rang, LI Dongfen
Available online  , doi: 10.11999/JEIT251073
Abstract:
  Objective  The Arbiter Physical Unclonable Function (APUF) is a lightweight security primitive that has been widely adopted in identity authentication and key generation for resource-constrained devices. However, its response consistency is highly sensitive to environmental perturbations, leading to inconsistent responses for the same challenge under different conditions, severely undermining the reliability of APUF-based security systems. Existing reliability improvement schemes for APUF, which mainly rely on hardware modification or challenge screening, generally suffer from high resource overhead and low efficiency. To address the limitations of these existing solutions, a Delay-Constrained Challenge Generation Strategy (DCGS) is proposed to enhance APUF reliability without extra hardware overhead or screening-related inefficiencies.  Methods  The core of DCGS lies in modeling APUF path delay properties and constructing challenges with constrained delay differences to ensure response stability. First, a logistic regression (LR) model is established to characterize the relationship between APUF challenge bits and path delays. From the trained LR model, a delay weight vector is derived to quantify the contribution of each challenge bit to the overall path delay. Second, a two-stage challenge generation mechanism is designed to integrate delay constraint control: The first stage is prefix bit initialization, which generates distinct prefix sequences to establish a stable delay baseline for subsequent bit extension. The second stage is bit-wise extension, where each remaining challenge bit is dynamically determined based on the delay weight vector. During this extension process, the cumulative delay difference of the challenge is monitored in real time, ensuring it stays within a preset threshold range. Unlike traditional screening methods that post-process candidate challenges, DCGS directly generates stable challenges by design, eliminating the need for candidate pools and improving generation efficiency.  Results and Discussions  Performance evaluations of DCGS are conducted under varying noise intensities. At a noise intensity of 0.3 (maximum practical level), the reliability of DCGS-generated challenges remains at 100% (Fig.2). In terms of generation efficiency, DCGS consumes only 0.017 seconds to generate 10,000 challenges (Table 4). For response uniformity, DCGS achieves a value of 50.02% (Table 4). For uniqueness, it reaches 50.46% (Table 4). These two key metrics are both close to the ideal theoretical value of 50%. Security analysis shows that the average bit entropy of DCGS-generated challenges is 0.9807 (Fig.3), and the conditional entropy is 0.9878—only 0.0023 lower than that of random challenges (0.9901).  Conclusions  This paper proposes a delay-constrained challenge generation strategy for APUF, aiming to address the problems of inconsistent responses, low generation efficiency, and high hardware resource consumption of traditional schemes in high-noise environments. By modeling the path delay characteristics of APUF using LR and integrating a prefix initialization mechanism with a bit-wise extension mechanism, the strategy ensures that the generated challenges meet the preset delay difference threshold range. Through this method, the DCGS achieves high reliability, high efficiency, and good response uniformity without increasing hardware overhead. Experimental results show that DCGS can effectively enhance the reliability of APUF in complex environments, providing strong technical support for secure applications in resource-constrained devices.
Review of Non-invasive Brain–Computer Interfaces for Continuous Motor Control
XU Minpeng, JIA Leyi, ZHOU Xiaoyu, CHEN Enze, WANG Junyang, XIAO Xiaolin, MING Dong
Available online  , doi: 10.11999/JEIT260011
Abstract:
  Significance   Continuous motor control is a fundamental capability for brain–computer interface (BCI) systems aiming at natural and efficient interaction with external devices. Compared with discrete command-based control, continuous control enables real-time and smooth regulation of motion parameters such as position, velocity, and trajectory, which is essential for applications including assistive mobility, neuro rehabilitation, robotic manipulation, and immersive human–machine interaction. Although invasive BCI s have demonstrated high-performance continuous control benefiting from high-quality neural recordings, their reliance on surgical implantation restricts long-term use and large-scale deployment. Therefore, a systematic review of non-invasive continuous motor control BCI technologies is necessary to clarify research progress, methodological characteristics, and remaining challenges.  Progress   This review summarizes advances in non-invasive continuous motor control BCIs from four closely related aspects: control paradigms, decoding algorithms, application scenarios, and performance evaluation. At the paradigm level, motor imagery, steady-state visual evoked potentials, event-related potentials, and hybrid paradigms have been investigated to support continuous control through sustained intention modulation, dynamic stimulus encoding, and hierarchical or shared-control strategies. Regarding decoding algorithms, two major frameworks are identified: motion parameter mapping methods and motion parameter regression methods. Motion parameter mapping methods achieve continuous output by temporally integrating discrete classification results or mapping them to velocity or state variables, whereas motion parameter regression methods directly establish relationships between EEG features and continuous kinematic parameters. In recent studies, nonlinear models and deep learning approaches have been increasingly incorporated to improve robustness under non-stationary EEG conditions. At the application level, non-invasive continuous control has evolved from two-dimensional cursor tasks to more practical scenarios such as wheelchair navigation, robotic arm manipulation, unmanned systems, and virtual or augmented reality environments. In addition, existing studies evaluate continuous control performance using both objective metrics (e.g., trajectory error, task success rate, and information transfer rate) and subjective measures (e.g., workload and user experience), reflecting diverse experimental designs and control objectives.  Conclusions  Overall, existing studies demonstrate that non-invasive BCIs are capable of supporting continuous motor control; however, current research remains at a stage where diverse methods coexist without a unified framework. At the paradigm level, different approaches vary in their ability to reliably elicit and sustain continuous motor intentions. In terms of decoding algorithms, both motion parameter mapping and regression methods face limitations in robustness, generalization, and long-term stability due to the non-stationary nature of EEG signals. At the application level, many studies are still constrained to specific tasks and controlled environments, and the transferability of continuous control strategies to complex real-world scenarios requires further validation. Moreover, the lack of standardized evaluation protocols hinders direct comparison and systematic optimization across studies.  Prospects   Future research should focus on improving the stability and reliability of continuous control paradigms, enhancing decoding robustness under realistic EEG conditions, and strengthening the alignment between control strategies and application requirements. Establishing unified evaluation frameworks that integrate both objective and subjective indicators will be critical for methodological convergence and fair comparison. With continued advances, non-invasive continuous motor control BCIs are expected to play an increasingly important role in assistive technologies, rehabilitation systems, and advanced human–machine interaction.
A Neural Network-Based Robust Direction Finding Algorithm for Mixed Circular and Non-Circular Signals Under Array Imperfections
YU Qi, YIN Jiexin, LIU Zhengwu, WANG Ding
Available online  , doi: 10.11999/JEIT250884
Abstract:
  Objective   Direction Of Arrival (DOA) estimation is affected by low Signal-to-Noise Ratios (SNR), the coexistence of Circular Signals (CSs) and Non-Circular Signals (NCSs), and multiple forms of array imperfections. Conventional subspace-based estimators exhibit model mismatch in such environments and show reduced accuracy. Although neural-network methods provide data-driven alternatives, the effective use of the distinctive statistical properties of NCSs and the maintenance of robustness against diverse array errors remain insufficiently addressed. The objective is to design a DOA estimation algorithm that operates reliably for mixed CSs and NCSs in the presence of array imperfections and provides improved estimation accuracy in challenging operating conditions.  Methods   A robust DOA estimation algorithm is proposed based on an improved Vision Transformer (ViT) model. A six-channel image-like input is first constructed by fusing features derived from the covariance matrix and pseudo-covariance matrix of the received signal. These channels include the real component, imaginary component, magnitude, phase, magnitude ratio reflecting the NCS characteristic, and the phase of the pseudo-covariance matrix. A gradient-masking mechanism is introduced to adaptively fuse core and auxiliary features. The ViT architecture is then modified: the standard patch-embedding module is replaced with a convolutional layer to extract local information, and a dual-class-token attention mechanism, placed at the sequence head and tail, is designed to enhance feature representation. A standard Transformer encoder is used for deep feature learning, and DOA estimation is performed through a multi-label classification head.  Results and Discussions   Extensive simulations are carried out to assess the proposed algorithm (6C-ViT) against MUSIC, NC-MUSIC, a Convolutional Neural Network (6C-CNN), a Residual Network (6C-ResNet), and a MultiLayer Perceptron (6C-MLP). Performance is evaluated using Root Mean Square Error (RMSE) and angular estimation error under different operating conditions. Under single-source scenarios with low SNR and no array errors, 6C-ViT achieves near-zero RMSE across most angles and shows minor edge deviations (Fig. 2). It maintains the lowest RMSE across the SNR range from –20 dB to 15 dB (Fig. 3), indicating good generalization to unseen SNR levels. In dual-source scenarios containing mixed CS and NCSs under array errors, 6C-ViT shows clear advantages. Its estimation errors fluctuate slightly around zero, whereas competing techniques present larger errors and pronounced instabilities, especially near array edges (Fig. 4). Its RMSE decreases steadily as SNR increases and reaches below 0.1° at high SNR, while traditional approaches saturate around 0.4° (Fig. 5). Robust behavior is further observed across different numbers of signal sources (K = 1, 2, 3) and snapshot counts (100 to 2 000). 6C-ViT preserves high accuracy and stability under these variations, whereas other methods show marked degradation or instability, most evident at low snapshot counts or with multiple sources (Fig. 6). When evaluated using unknown modulation types, including UQPSK with a non-circularity rate of 0.6 and 64QAM, under array errors, 6C-ViT continues to produce the lowest RMSE across most angles (Fig. 7), demonstrating strong generalization capability. Ablation studies (Fig. 8) confirm the contributions of the six-channel input, the gradient masking module, the convolutional embedding, and the dual class token mechanism. The complete configuration yields the highest accuracy and the most stable performance.  Conclusions   Strong robustness is demonstrated in complex scenarios that contain mixed CS and NCSs, multiple array imperfections, low SNR, and closely spaced sources. By fusing multi-dimensional features of the received signal and using an enhanced Transformer architecture, the algorithm attains higher estimation accuracy and improved generalization across different signal types, error conditions, snapshot counts, and noise levels compared with subspace- and neural-network-based baselines. The method provides a reliable DOA estimation solution for demanding practical environments.
Adversarial Attacks on 3D Target Recognition Driven by Gradient Adaptive Adjustment
LIU Weiquan, SHEN Xiaoying, LIU Dunqiang, SUN Yanwen, CAI Guorong, ZANG Yu, SHEN Siqi, WANG Cheng
Available online  , doi: 10.11999/JEIT251264
Abstract:
  Objective   Robust environmental perception is essential for intelligent driving systems. Light Detection And Ranging (LiDAR) provides high-resolution 3D point cloud data and serves as a core information source for object detection and recognition. However, deep learning models for 3D point cloud recognition show notable vulnerability to adversarial attacks. Small, imperceptible perturbations can cause severe classification errors and threaten system safety. Existing attack methods have improved the Attack Success Rate (ASR), but the perturbations they generate often lack concealment, create outliers, and show poor imperceptibility because they do not adequately preserve the geometric structure of point clouds. This reduces their suitability for realistic security evaluation of optoelectronic perception systems. Developing an attack method that maintains a high success rate while preserving geometric consistency and imperceptibility is therefore critical. This study addresses this need by proposing a framework that incorporates point cloud geometry into perturbation generation.  Methods   A Gradient Adaptive Adjustment (GAA) adversarial attack method for 3D point cloud recognition is proposed. The framework (Fig. 2) includes three coordinated modules. The 3D Point Cloud Salient Region Extraction module evaluates decision-level vulnerability using Shapley value analysis to identify and rank point subsets with the strongest influence on classifier output. Perturbations are then concentrated in these sensitive regions. A curvature-weighted gradient mechanism integrates local geometric priors. For each point in the salient region, a local covariance matrix is computed from its k-nearest neighbors. Principal component analysis generates eigenvalues and eigenvectors, which are used to compute a curvature measure. A Gaussian kernel function produces curvature-dependent weights that are applied to backpropagated gradients. This suppresses perturbations in high-curvature areas and encourages them in low-curvature regions to preserve local shape morphology. A principal curvature direction constrained 0ptimization module further refines the perturbation direction. The weighted gradient is projected onto the principal curvature directions, and the projection components are fused using coefficients derived from the corresponding eigenvalues. This aligns the perturbation with natural geometric trends and avoids unnatural deformation. An adaptive optimization algorithm then minimizes a multi-objective loss balancing attack success, geometric similarity (via chamfer distance and hausdorff distance), and perturbation sparsity. The adversarial point cloud is iteratively updated based on the saliency map, curvature-weighted gradients, and principal direction constraints.  Results and Discussions   Experiments on ModelNet40, ShapeNetPart, and KITTI were conducted using PointNet, DGCNN, and PointConv. The GAA method showed strong performance. On ModelNet40 with PointNet, it achieved a 97.69% ASR with an average of 28 perturbed points, outperforming ten baselines such as AL-Adv (92.92% ASR, 40 points) and Kim et al. (89.38% ASR, 36 points) (Table 1). It also produced lower geometric distortion, as indicated by smaller Chamfer Distance and Hausdorff Distance values. Visual results (Fig. 4) show that GAA produces fewer outliers and more natural adversarial point clouds compared with methods such as AL-Adv. The method generalized well across architectures, reaching 99.78% ASR on DGCNN and 96.91% on PointConv (Table 2), with similar performance on ShapeNetPart (Table 3). Ablation experiments on the number of salient regions (K) showed consistent improvements in ASR and reduced geometric distortion as K increased from 1 to 6 (Table 4, Fig. 5), confirming the advantage of targeting multiple critical regions. Tests on the KITTI dataset demonstrated strong performance in real-world, noisy environments. The method maintained high ASRs, such as 99.33% on PointNet, with limited perturbations (Table 5). An ablation study on K indicated that K=4 offers an effective balance between success rate and perturbation cost for PointNet (Table 6).  Conclusions   This study presents a GAA method for adversarial attacks on 3D point cloud recognition. By combining a Shapley value-based saliency analyzer, a curvature-weighted gradient mechanism, and a principal curvature direction constraint, the method generates adversarial examples that achieve high attack success while preserving geometric consistency. Experiments show that GAA minimizes perceptual distortion and perturbs fewer points across datasets and models. The method provides a practical tool for vulnerability analysis and supports the development of more robust and secure optoelectronic perception systems for intelligent driving. Future work will examine robustness under adverse conditions and assess physical-world implications.
A Channel Phase Self-compensation Method for Active-Integrated Arrays
SUN Liying, LU Yunlong, XU Jun, HU Yang
Available online  , doi: 10.11999/JEIT251325
Abstract:
The seamless integration of active circuitry and antennas can effectively improve link performance and system integration. At present, active-integrated antennas are mainly designed by adjusting the antenna impedance while maintaining the desired radiation characteristics to achieve direct matching with active transistors. However, the effect of the antenna’s complex impedance on the phase response of the active channel, as well as its potential application in active-integrated phased arrays, has not been thoroughly studied. This paper proposes a channel phase self-compensation method for active-integrated arrays. For each active channel, the active transistor is directly integrated with the radiating element, where the load impedance at the transistor drain is matched to the input impedance of the antenna element. Under a constant active gain, the required complex load impedance is solved to establish an explicit mapping between the phase response of each active channel and its corresponding load impedance. According to the phase-shift requirements among array channels, appropriate load impedances are selected as the input impedances of the corresponding radiating elements. This approach applies a predefined phase distribution to each channel without using external phase-shifting structures. It can control the initial beam direction or compensate for the path difference between elements in conformal arrays. An active-integrated phased-array antenna with a preset beam direction is designed as a demonstration example to verify the effectiveness of the proposed method. The method provides an efficient design approach for next-generation active-integrated arrays.  Objective  In the traditional design approach, active circuit channels and antenna arrays are matched to 50 Ω before interconnection. This configuration occupies considerable physical space and limits system-level integration. In addition, insertion loss in passive matching networks and mismatch loss at the interconnections reduce overall link performance. Direct co-integration of active circuitry and antenna elements can address these limitations. However, multi-channel active-integrated antenna arrays often require one or multiple superimposed phase distributions across the channels to satisfy different application requirements, such as initial beam offset in fuze systems, wavefront compensation in conformal active phased arrays, and wide-angle beam scanning. These phase gradients are typically realized through backend phase-shifting networks. In this work, the complex impedance characteristics of the antenna are adjusted when it is directly integrated with the active circuitry. The phase response of the active-integrated channels can therefore be tuned within a certain range without using complex matching networks or additional phase shifters. This strategy reduces the complexity and performance requirements of the backend phase-shifting network. The advantages are more evident in millimeter-wave, high-frequency, and terahertz systems, where the available phase-shift range of phase shifters is limited.  Methods  Phase self-compensation of the active channels is achieved through the direct integration of the active transistor and the radiating element. In this configuration, the drain output of the transistor is directly connected to the input of the radiating element, and impedance transformation is realized within the antenna element. The proposed method includes three main steps. (1) The active transistor is first modeled as a two-port network. By evaluating the antenna element’s complex impedance as the load on different constant-gain circles, the mapping between the phase response of the active channel and the load impedance is established. The achievable phase-shift range of the active channel is then determined. (2) According to the required phase-shift distribution among the array channels, suitable combinations of active gain and corresponding complex load impedances (not unique) are selected. These combinations are not unique. (3) The realizability of the selected impedances is examined according to the characteristics of the radiating element. The impedance values with the highest feasibility are implemented by optimizing the radiating element, which includes fine adjustment of its geometry and feed position to meet the target impedance. When the radiating element is modified, particularly for circularly polarized elements, desirable radiation characteristics must also be preserved, including good axial ratio and beam-scanning performance.  Results and Discussions  The proposed phase self-compensation mechanism enables the array to achieve initial beam pointing and compensate for path-length differences caused by special array geometries, such as conformal or curved surfaces, without using additional phase-shifting structures. Therefore, the performance requirements of the backend phase-shifting network in active phased arrays can be reduced. To verify the effectiveness of the proposed method, a 1×4 circularly polarized active-integrated linear array (Fig. 9) is designed and demonstrated. Based on channel-level impedance calculations (Fig. 6) and an analysis of the antenna-element impedance characteristics (Fig. 8), a phase gradient of 38° between adjacent channels is synthesized and applied to the circularly polarized active-integrated array. Without degrading the circular polarization performance and without external phase-shifting circuitry, the initial beam direction of the active-integrated phased array is shifted to the desired angle of θ0 = 12° (Fig. 13). The phase self-compensation design does not degrade the beam-scanning capability of the array. After an additional phase gradient is applied for beam steering, the array achieves a scanning range of up to 50°. The gain reduction remains within 2 dB relative to the initial pointing direction, and the axial ratio remains below 4 dB throughout the scanning range.  Conclusions  Within the framework of active-integrated arrays, this work uses the phase-tuning effect produced by the complex impedance at the antenna port when the radiating element is directly matched to the active transistor. A desired phase-gradient distribution can therefore be synthesized among the channels of an active-integrated phased array within an achievable range. This capability enables compensation for required phase distributions, such as preset beam direction and path-length equalization in conformal-array applications, without relying on additional phase shifters. Therefore, the complexity and performance requirements of the backend phase-shifting circuitry are reduced. The effectiveness of the proposed method is validated through a multi-channel circularly polarized active-integrated phased-array prototype with a preset beam direction. Both full-wave simulations and experimental measurements confirm that the phase self-compensation mechanism provides the required initial beam pointing while preserving beam-scanning capability and polarization performance. This study provides a new approach for the design of high-efficiency next-generation active-integrated phased arrays.
TTSPD: A Multimodal Traffic Scene Perception Dataset Integrating Tire Data
YING Zongchen, GUI Lin, YANG Jiahan, ZHANG Fangwei, WANG Junfan, DONG Zhekang
Available online  , doi: 10.11999/JEIT260022
Abstract:
  Objective  With the rapid development of Intelligent Transportation Systems (ITS) and autonomous driving technologies, accurate traffic environment perception is a fundamental prerequisite for vehicle safety and decision making. Current perception frameworks primarily rely on high-resolution cameras and LiDAR sensors. Although these sensors provide rich information, they create severe challenges across the Perception-Storage-Calculation pipeline. High acquisition costs limit large-scale deployment. In addition, the massive data volume produced by high-dimensional sensors places heavy pressure on onboard storage and computational resources, often exceeding the power and thermal budgets of vehicle-grade edge platforms. These constraints motivate the exploration of alternative sensing paradigms that are cost-effective, compact, and computationally efficient while maintaining reliable perception accuracy. In response, the present study shifts the perception perspective from conventional external sensors to the tire-road contact interface, where abundant physical interaction information naturally exists. The objective is to construct a novel multimodal dataset, termed the Tire-integrated Traffic Scene Perception Dataset (TTSPD), which combines internal tire dynamics with external visual observations. This dataset is used to examine whether low-dimensional tire sensing data can complement or partially substitute high-dimensional visual data for accurate road surface classification. The study also aims to establish a new data morphology that balances perception performance and system efficiency for future intelligent vehicles.  Methods  To construct a high-quality and practically usable multimodal dataset, an integrated hardware-software acquisition framework is developed. From a hardware perspective, a specialized sensing system is designed by coupling tire-mounted multi-parameter sensors with a vehicle-mounted camera. To ensure reliable operation under the harsh mechanical conditions of a rotating tire, sensing nodes are encapsulated using a rubber-based composite material that provides mechanical protection and long-term stability. Wireless transmission is implemented using Bluetooth Low Energy (BLE) 5.0 with an adaptive frequency-hopping mechanism, enabling low-power and reliable communication during high-speed rotation. During data acquisition, the system synchronously collects six types of internal tire signals, including radial acceleration, tire temperature, and tire pressure, producing approximately 1.8 million sampling points. In parallel, a dashboard-mounted camera records high-resolution traffic scene images totaling 309 GB across four representative road surface conditions. To address the heterogeneity between high-frequency one-dimensional tire signals and two-dimensional visual data, a timestamp-based association strategy is adopted to achieve scene-level temporal alignment rather than strict frame-by-frame correspondence. Sensor sequences and image segments are grouped according to shared temporal windows and driving scenarios. This approach ensures semantic and temporal consistency at the scene level. The alignment strategy reflects practical deployment conditions and forms the basis of the final TTSPD dataset for multimodal fusion research.  Results and Discussions  The effectiveness of the proposed TTSPD is evaluated through comprehensive road surface classification experiments using mainstream deep learning models. Initial experiments based solely on visual data demonstrate strong baseline performance, with classification accuracies ranging from 87.25% to 93.75% (Table 7). These results confirm the quality and diversity of the visual modality in the dataset. The primary contribution of this study is the quantification of efficiency gains enabled by tire-based sensing. Comparative experiments progressively reduce the amount of visual data while integrating low-dimensional tire signals, particularly radial acceleration (Table 9). The results show that the multimodal model achieves approximately 95% of the full-data baseline accuracy while using only about 38.75% of the original data volume. This reduction in data dependency produces significant system-level benefits. Storage requirements decrease by approximately 61.25%, and overall model training time decreases by about 54.10% (Fig. 8). These findings indicate that tire dynamics encode high-value physical features related to road texture and surface conditions that complement visual cues. The proposed dataset therefore supports the development of lighter perception pipelines without reducing recognition performance.  Conclusions  This study addresses the long-standing Perception-Storage-Calculation bottleneck in vision-dominated autonomous driving systems by proposing the TTSPD. Multi-parameter sensors are embedded within tires using rubber-based encapsulation, and stable wireless communication is achieved through BLE 5.0. A robust tire-camera data acquisition system is therefore established. The resulting dataset covers four common and safety-critical road surface types: cement, asphalt, damaged, and water-covered roads. It provides a comprehensive foundation for multimodal perception research. Experimental results show that combining low-dimensional tire sensing data with visual information significantly improves perception efficiency. Approximately 95% of peak classification accuracy is achieved using only about 38.75% of the original data volume. This result effectively reduces storage pressure and computational cost, reflected in a 61.25% reduction in data storage and a 54.10% reduction in training time. The TTSPD dataset therefore proposes a practical data morphology that supports efficient and high-performance perception under vehicle-grade computational constraints. It also provides valuable resources for the future development of ITS.
Blind Parameter Estimation Method for PSK Modulated Frequency-Hopping Signals Based on Improved Maximum Likelihood
ZHANG Tianhao, ZHANG Yushu, XU Zhongqiu, TANG Xinyi, DANG Wenhua, LI Guangzuo
Available online  , doi: 10.11999/JEIT260005
Abstract:
  Objective  Blind parameter estimation of non-cooperative Frequency-Hopping (FH) signals is a critical task in electronic reconnaissance and countermeasures. Estimation methods based on time-frequency analysis typically suffer from limited resolution or high computational complexity. Furthermore, methods based on compressive sensing rely heavily on the consistency between the predefined dictionary and the actual signal characteristics, and the estimation precision will be significantly compromised by grid mismatch or modulation-induced energy dispersion. Maximum Likelihood (ML)-based methods offer the advantage of high theoretical estimation accuracy with relatively low computational complexity. However, existing studies typically assume an ideal unmodulated signal model with a single frequency transition. Consequently, these ML-based methods suffer from severe model mismatch when processing FH signals with digital modulation, such as Phase Shift Keying (PSK), or multi-hop signals. Moreover, the conventional iterative solution of ML-based methods is prone to divergence or trapping in local optima. To address these limitations, this paper proposes an improved ML-based method for the blind parameter estimation of PSK-modulated FH signals.  Methods  To handle received multi-hop signals, a signal slicing technique based on the Short-Time Fourier Transform (STFT) is proposed to extract slices containing individual frequency transitions. Subsequently, to mitigate the model mismatch caused by digital modulation in conventional ML-based methods, a model-matching signal extraction approach based on the ML objective function is developed for PSK-modulated FH signals. Furthermore, a weighted iterative solving algorithm for ML estimation is designed to enhance convergence, thereby achieving robust and accurate estimation of frequency-hopping parameters.  Results and Discussions  To validate the effectiveness of the model-matching signal extraction approach, ablation experiments were carried out under various modulation schemes, including binary PSK (BPSK), quadrature PSK (QPSK), and 8-ary PSK (8PSK). The results indicate that the proposed approach (Group D) significantly reduces the Mean Square Error (MSE) of hopping frequency estimation compared to that without the proposed extraction (Group ND). These results demonstrate that the proposed method effectively mitigates the model mismatch (Fig. 5). Simulation results also illustrate that the designed weighted iterative algorithm achieves superior convergence performance compared with linear weighting and non-weighting schemes (Fig. 6). Moreover, the experiments verify the algorithm's insensitivity to initial frequency offsets, showing that it tolerates offsets of up to 2 MHz at SNR of -10 dB with little performance degradation (Fig. 7). Finally, comparative analysis with representative existing methods indicates that the proposed method outperforms the others in terms of estimation accuracy (Fig. 8).  Conclusions  To achieve blind parameter estimation for PSK-modulated FH signals, this paper proposes an improved ML-based method. By utilizing a signal slicing technique based on the STFT, the proposed method successfully extends the applicability of the ML-based estimator to continuous multi-hop signals. To mitigate the model mismatch induced by PSK modulation, a model-matching signal extraction approach is developed to isolate valid signal segments that conform to the ML model. Furthermore, a weighted iterative algorithm incorporating a dynamic weighting function is introduced to address the instability of the conventional iterative ML solver. Simulation results confirm that the proposed method effectively eliminates model mismatch and ensures superior convergence performance with insensitivity to initial frequency offsets. Moreover, it is shown to achieve high estimation precision for both hopping frequencies and hopping times.
A Semantic-Enhanced Cybersecurity Named Entity Recognition Approach Oriented to Lightweight Adaptation of Large Language Models
HU Ze, XU Tongwu, YANG Hongyu
Available online  , doi: 10.11999/JEIT251260
Abstract:
  Objective  Named Entity Recognition (NER) in the field of cybersecurity is a fundamental technology supporting threat intelligence analysis, vulnerability management, and security incident response. However, this field generally faces challenges such as dense technical terms, scarce labeled data, dynamic changes in entity categories, and highly complex semantic features, which make traditional deep learning models and existing Large Language Models (LLMs) significantly inadequate in terms of domain adaptability and semantic fusion capability. To address the aforementioned key issues while also considering the need for lightweight model deployment, this paper aims to construct a cybersecurity NER approach that can enhance domain semantic representation, improve the ability to identify rare entities, and apply to low-resource environments, providing a reliable technical path for intelligent threat analysis in cybersecurity scenarios.  Methods  To address the complex semantic features of cybersecurity texts, this paper proposes a semantically enhanced, lightweight, and LLMs-adaptable cybersecurity NER approach. The proposed approach uses LLM2Vec to achieve bidirectional semantic reconstruction of large model decoders and combines Low-Rank Adaptation (LoRA) for low-rank fine-tuning, so as to maintain deep semantic encoding capability while significantly reducing the amount of parameter updates. To address the challenges of sparse keywords and severe noise interference in cybersecurity texts, a sparse gated attention mechanism is introduced to strengthen keyword-focused feature extraction by dynamically selecting high-contribution cybersecurity terms through global gating and sparse inference. A SecRoBERTa-based semantic enhancement component is introduced, which utilizes a domain-pre-trained model to generate similar word embeddings, optimizes feature robustness in small-sample scenarios, and alleviates the challenges of identifying out-of-vocabulary words and low-frequency terms. Finally, a masked conditional random field is employed to constrain label transitions and guarantee BIO-compliant output sequences, achieving robust and consistent entity boundary prediction.  Results and Discussions  Extensive experiments were conducted on two public cybersecurity datasets, DNRTI and APTNER. The proposed approach achieved an F1 score of 91.91% on DNRTI, surpassing the previous state-of-the-art model by 2.14%. On APTNER, it reached an F1 score of 80.37%, outperforming the best baseline by 2.97%. Ablation studies confirmed the contribution of each key component: the Sparse Gated Attention mechanism improved F1 by 3.57% over standard Multi-Head Attention on DNRTI; the semantic enhancement module contributed a 2.32% F1 gain; and the MCRF (Masked Conditional Random Field) layer provided a 10.63% F1 improvement over traditional CRF (Conditional Random Field). The model also demonstrated efficient training and inference characteristics, aligning with its lightweight design goals.  Conclusions  This paper proposes a lightweight adaptation approach based on LLMs for NER in the cybersecurity domain, which effectively addresses the limitations of existing LLMs-based NER methods in domain adaptation and rare entity recognition. By integrating LLM2Vec and LoRA for lightweight fine-tuning, a sparse gated attention mechanism for domain feature fusion, and a SecRoBERTa-based semantic enhancement component for similar word precomputation, the proposed approach achieves high performance on DNRTI and APTNER datasets. The research provides an efficient technical path for NER tasks in low-resource cybersecurity scenarios and offers strong support for downstream tasks such as automated threat intelligence analysis.
A High-Performance Eye Tracking Method Based on Event Camera and Dual-Channel Differential Illumination
SONG Sishun, FENG Junchi, PU Chengyu, GUO Yu, LIU Shijie, HE Xin, CHENG Yuwei
Available online  , doi: 10.11999/JEIT251162
Abstract:
  Objective  Eye tracking has become an essential technology in human–computer interaction, medical diagnostics, cognitive neuroscience, and augmented/virtual reality applications. However, traditional eye tracking systems often suffer from two major limitations: low spatial accuracy and restricted temporal resolution, particularly in high-speed eye movement scenarios. These limitations hinder precise gaze estimation and reduce the reliability of real-time interactive systems. To address these challenges, this research integrates an event camera with the dual-channel differential illumination strategy to enhance the signal-to-noise ratio of corneal reflection events. By introducing the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, accurate localization of corneal reflection points is achieved. On this basis, the corneal reflection point coordinates are utilized in combination with Singular Value Decomposition (SVD) and the least-squares method to determine the corneal curvature center, thereby significantly improving the accuracy of gaze direction estimation. This research provides an efficient technical pathway for next-generation eye tracking systems and offers theoretical support for their deployment in complex interactive environments.  Methods  The proposed event-camera-based gaze tracking method integrates asynchronous eye movement event data through a dual-channel differential illumination framework, thereby enhancing gaze direction estimation accuracy under high-speed and dynamic conditions. Firstly, the event camera asynchronously captures brightness-change events with microsecond-level temporal resolution, enabling precise tracking of rapid eye movements, while the dual-channel differential illumination mechanism suppresses redundant reflections and enhances the contrast of corneal reflection points. Secondly, the DBSCAN algorithm is employed to process event data, effectively removing noise and optimizing the spatial localization accuracy of corneal reflection features. Finally, a ray-tracing model is reconstructed using SVD and least-squares fitting to determine the corneal curvature center, thereby achieving robust and high-precision gaze direction estimation. Experimental results on a biomimetic eye movement dataset demonstrate that the proposed method achieves high temporal resolution, localization accuracy, and robustness in dynamic tracking scenarios.  Results and Discussions  Experiments demonstrate that the proposed method achieves a temporal resolution of 25 kHz (Fig. 6), far exceeding conventional cameras. Differential illumination significantly improves the signal-to-noise ratio of corneal reflection events. The DBSCAN algorithm localizes corneal reflection points more efficiently than K-Means, Agglomerative Clustering, Mean Shift, and OPTICS, achieving accurate results within 10 ms without requiring predefined clusters (Fig. 8, Table 3). For gaze estimation, the proposed method maintains stable accuracy across sampling frequencies from 2 kHz to 25 kHz. At a 15° cone angle, the mean error (ME) and root mean square error (RMSE) are approximately 0.66° and 0.67°, respectively, while at 25° they increase slightly to 0.87° and 0.90° (Table 4). Compared with existing state-of-the-art (SOTA) gaze tracking methods, the proposed approach demonstrates superior overall performance in terms of both temporal resolution and accuracy (Table 5) Trajectory results (Fig. 9) show close alignment between estimated and ground truth gaze paths, and distribution analyses (Fig. 10) confirm concentrated error ranges below 1°.  Conclusions  This paper presents a novel eye tracking method integrating event cameras, dual-channel differential illumination. The method achieves high temporal resolution (25 kHz), enhances event signal quality, and reduces localization errors, yielding gaze estimation errors of less than 1°. The proposed approach provides a reliable technical pathway for next-generation high-performance eye tracking systems. Future work should consider sensor noise modeling and computational optimization to further improve real-world applicability.
Multi-projection plane InISAR 3D reconstruction method for complex moving ship targets
LI Ning, NIU Jinfa, WANG Weibin, HU Xingwang, WU Lin
Available online  , doi: 10.11999/JEIT251268
Abstract:
  Objective  Interferometric Inverse Synthetic Aperture Radar (InISAR) is a Three Dimensions (3D) reconstruction technique for non-cooperative target. However, the complex 3D rotational motion of the ship target causes unstable Doppler frequency changes, and Inverse Synthetic Aperture Radar (ISAR) imaging inevitably suffers from target overlap and occlusion problems, making high-precision complete 3D reconstruction difficult under a single projection plane. Thus, a multi-projection planes InISAR 3D reconstruction method of complex moving ship targets based on point cloud fusion is proposed. Through efficient and high-precision point clouds registration and fusion supplement target 3D information, significantly improving the 3D reconstruction quality.  Methods  This method fully leverages the advantages of multi-plane observation from the severe movement of ship targets, extracts the ship’s centerline and estimates the vertical rotation vector via Principal Component Analysis (PCA), to select the optimal imaging time corresponding to different Imaging Projection Planes, completes ISAR imaging and InISAR 3D reconstruction. Secondly, a point cloud fusion algorithm combining Weighted Random Sampling Consensus (RANSAC) and Hierarchical Iterative Closest Point (ICP) is proposed. The random sampling process is optimized through a feature stability weighting strategy, efficiently extracting and matching corresponding feature points in InISAR images, achieving high-precision multi- Imaging Projection Plane (IPP) point cloud fusion.  Results and Discussions  Experimental results demonstrate that the proposed method significantly enhances reconstruction accuracy and target completeness. For simulated ship point target data, Fig 7 shows excellent results, with a significant reduction in reconstruction error. Signal-to-noise ratio (SNR) analysis reveals that 3D fusion imaging quality improves continuously as SNR increases from –10 dB to 10 dB, maintaining robust fusion performance even under low SNR conditions. For simulated destroyer radar cross section data, this method achieved significant registration results, and the detail recovery and structural integrity of the fused image were significantly improved, effectively solving the problem of incomplete 3D information reconstruction caused by overlapping and occlusion of scattering points.  Conclusions  To address the issues of low reconstruction accuracy and information loss caused by target rotation, overlapping, and occlusion in traditional InISAR methods for 3D reconstruction of complex moving ship targets, this paper proposes a multi-IPP InISAR 3D reconstruction method based on point cloud fusion. This method employs a PCA optimal imaging time selection strategy, By employing weighted RANSAC and hierarchical ICP algorithms to achieve efficient and high-precision registration and fusion of InISAR point clouds under multiple IPPs, obtaining high-quality 3D reconstruction results. This paper conducts multi-scenario experiments by constructing a ship model with ideal scattering points and an electromagnetic simulation RCS model with occlusion effects, verifying the accuracy of the proposed method under ideal conditions and its applicability in complex real-world scenarios.
A Clipped NMS List Decoding Algorithm of LDPC Codes for 5G URLLC
ZHANG Xiaojun, SONG Xin, GAO Jian, MI Yonghao, NIU kai
Available online  , doi: 10.11999/JEIT250853
Abstract:
  Objective  As one of the coding schemes in the fifth-generation (5G) wireless communication systems, Low-Density Parity-Check (LDPC) codes can achieve performance close to the Shannon limit through iterative decoding. However, in practical wireless transmission environments, the decoding performance of LDPC codes is susceptible to burst interference in wireless channels. The NMS decoding algorithm is highly sensitive to the distribution characteristics of input log-likelihood ratios (LLRs). Burst interference will cause LLRs to deviate from the Gaussian distribution, resulting in degradation in decoding performance. Meanwhile, 5G LDPC decoders are often equipped with a fixed number of processing units (PEs) according to the maximum lifting size to cover the full code length range. In URLLC (Ultra-Reliable Low-Latency Communications) short code transmission scenarios, the lifting size is much smaller than the maximum lifting size, leading to long-term idleness of a large number of processing units and insufficient utilization of hardware resources. To address the above issues, this paper proposes a Clipped Normalized Min-Sum List (CNMSL) decoding algorithm. By co-designing burst interference smoothing and idle resource reuse, it improves hardware resource utilization while enhancing decoding performance.  Methods  The statistical characteristics of LLRs over AWGN and interference channels are first analyzed, and the negative impact of burst interference on decoding performance is qualitatively illustrated to stem from the increased proportion of saturated LLRs induced by such interference. Next, the correlation between the optimal clipping threshold and channel noise variance, burst interference variance as well as burst probability is verified, which converges to a finite interval, the optimal threshold interval, when channel parameters undergo limited variations. On this basis, the CNMSL decoding algorithm is proposed. This algorithm constructs a list decoding architecture by reusing idle processing units in 5G LDPC decoders, where each decoding path performs independent and synchronous decoding to generate candidate codewords, and the optimal decoding result is screened out via CRC check. Meanwhile, an independent clipper is configured for each path with parameters set according to the optimal threshold interval, thereby effectively suppressing and mitigating the adverse effects of burst interference.  Results and Discussions  Experimental results show that the layered NMS algorithm almost fails to decode over interference channels without clipping mechanism. With a single clipping threshold, the algorithm works normally, and its BLER exhibits a convex-down trend of first decreasing and then increasing as the clipping threshold reduces. Under various channel conditions for both short and long codes, the single-clipping layered NMS algorithm with a clipping threshold of 3.5 achieves a gain of about 1 dB at \begin{document}$ BLER={10}^{-2} $\end{document} compared with that of 10, and the CNMSL algorithm further yields an additional gain of about 0.5 dB relative to the single-clipping NMS algorithm. In terms of hardware efficiency, when the lifting factor is less than 192, the PE utilization of the CNMSL algorithm is significantly higher than that of the layered NMS algorithm, with more remarkable improvement as the lifting factor decreases, and the average PE utilization of the CNMSL algorithm is increased by 69% compared with the layered NMS algorithm.  Conclusions  The CNMSL decoding algorithm is proposed in this paper, aiming to improve the error correction performance of the traditional layered NMS decoding algorithm over interference channels. By reusing idle PEs for list decoding to generate multiple candidate paths, the algorithm incurs no additional hardware overhead. In addition, an optimal threshold interval is defined to configure the clipper for each decoding path, which limits the proportion of saturated LLRs and makes the input LLRs follow a Gaussian or near-Gaussian distribution. Experimental results show that compared with the layered NMS decoding algorithm with a single clipper, the proposed CNMSL algorithm achieves a gain of approximately 0.5 dB for both short and long codes. Meanwhile, it increases the PE utilization by an average of 69%.
Drug Response Prediction Based on Graph Topology Attention Network
XU Peng, XU Hao, BAO Zhenshen, ZHOU Chi, LIU Wenbin
Available online  , doi: 10.11999/JEIT251099
Abstract:
  Objective  A core goal in modern cancer research is to figure out why patients respond differently to the same therapy. Achieving this requires developing computational tools that combine genetic information and drug properties to forecast treatment outcomes, which is essential for advancing personalized oncology. Although some existing methods have made progress in predicting cancer drug responses, effectively extracting features of drugs and integrating multi-omics data from cell lines have become challenges. To address these challenges, employing Graph Neural Networks (GNNs) to process drug molecular graphs has become a promising strategy. This research proposes a model that utilizes a graph topology attention network to capture features from drug molecular graphs, while an attention mechanism is applied to integrate multi-omics data.  Methods  In this study, a drug response prediction method based on Graph Topology Attention Network(GTAT) is proposed. The model integrates topological graph information to predict drug responses in cell lines. The model utilizes drug SMILES strings to generate two distinct drug representations and incorporates multi-omics data for cell line characterization (Fig. 1). For drug feature extraction, SMILES strings are first parsed to construct molecular graphs, which are then processed by the GTAT. This network captures both the topological information of the molecular graph-level and atom-level features, thereby producing structured molecular representations. Simultaneously, Extended Connectivity Fingerprints are computed from the same SMILES strings and transformed into continuous feature vectors via a Multi-Layer Perceptron (MLP). The graph-based drug representation and the fingerprint-based representation are subsequently concatenated to form a comprehensive drug feature vector. For cell line representation, multi-omics data are processed through omics-specific neural networks. The resulting features are fused using multi-head self-attention mechanisms, enabling the model to capture contextual interactions across omics modalities and generate an integrated cell line representation. Finally, the drug and cell line features are combined and fed into an MLP classifier to predict drug response outcomes. The proposed model effectively integrates heterogeneous biological data sources and significantly enhances prediction accuracy through multi-modal learning and attention-based feature fusion.  Results and Discussions  The proposed method achieves competitive performance on both GDSC and CCLE benchmark datasets (Table 2). Specifically, on the GDSC dataset, our approach outperforms all competing methods across all four metrics—AUC, AUPR, F1-score, and Accuracy. Notably, it improves the AUPR by approximately 1.92% over the second-best method, MOFGCN, demonstrating its advantage in handling class imbalance. On the CCLE dataset, our method still achieves the best performance in terms of AUC and Accuracy. Although it is marginally lower than GADRP in AUPR and F1-score, the gap is minimal, and our approach exhibits more robust overall discriminative ability (as reflected by AUC). These results collectively validate the effectiveness and strong generalizability of our method in drug sensitivity prediction tasks. The observed variation in AUPR and F1-score performance between datasets can be attributed to inherent differences in sample size and class distribution characteristics. The limited scale of the CCLE dataset, combined with its specific class imbalance (approximately 4:1 ratio of resistant to sensitive samples), may constrain the model's capacity to fully learn the underlying data distribution, particularly for minority classes. In contrast, the GDSC dataset exhibits greater heterogeneity and a more pronounced class imbalance (approximately 8:1), which collectively contribute to increased prediction difficulty and consequently lower performance on certain metrics.  Conclusions  Accurately predicting drug response in cell lines remains a central challenge in precision medicine, with significant implications for accelerating drug development and advancing personalized treatment. However, constructing a high-accuracy predictive model capable of effectively integrating multi-source biological information is difficult due to the complexity of drug molecular structures and inherent heterogeneity of cell lines. To address this, a cell line drug response prediction model based on Graph Topology Attention Network is proposed. This model employs the graph topology attention network to extract molecular graph features of drugs, which are then fused with molecular fingerprint features. Meanwhile, multi-omics features of cell lines are integrated using an attention mechanism. Experimental results demonstrate that the proposed model achieves superior performance over existing state-of-the-art benchmarks on the employed dataset. This study provides a new perspective for predicting cell line drug response. Certain limitations are acknowledged, such as the use of only three types of omics features for cell line representation and the influence of sample size on predictive outcomes. The integration of more diverse omics features, the application of pre-trained large-scale models, and the clinical translation for personalized medicine will be the primary focus of future work.
Multi-dimensional Spatio-temporal Features Enhancement for Lip reading
MA JinLin, ZHONG YaoWei, MA RuiShi
Available online  , doi: 10.11999/JEIT251111
Abstract:
  Objective  Lip reading is a challenging yet vital frontier in computer vision, dedicated to decoding spoken language solely from visual lip movements. The difficulty arises primarily from inherent ambiguities in the visual speech signal. On one hand, articulatory movements for different visemes can be extremely subtle. for instance, lip displacement differences as small as 0.3–0.7 mm for confusable pairs such as /p/–/b/ and /m/–/n/. These fine-grained spatial variations often lie below the effective resolution limits of conventional 3D convolutional neural networks. On the other hand, the natural co-articulation in speech introduces temporal ambiguity, where mouth shapes transiently blend multiple phonemes, making it difficult to isolate distinct visual units. These challenges are further compounded by real-world variables such as uneven lighting and significant inter-speaker articulation differences. As a result, current lip reading models frequently exhibit limitations in capturing discriminative spatiotemporal features, leading to suboptimal performance—especially for phonemes with minimal visual distinctions. Motivated by these issues, this work aims to develop a robust lip reading framework capable of effectively capturing and leveraging fine-grained spatiotemporal dependencies to improve recognition accuracy under diverse and realistic conditions.  Methods  To address the aforementioned limitations, this study proposes a novel lip reading framework named the Multi-dimensional Spatio-Temporal Enhancement Network (MSTEN), which is systematically designed to enhance spatial and temporal representations through integrated attention mechanisms and advanced residual learning. The framework incorporates three core components that collaboratively model the interdependencies between spatial and temporal features—an aspect often underutilized in conventional architectures. The first component, the Self-adjusting Spatio-temporal Attention (SaSTA) module, employs a self-adjusting mechanism operating concurrently across height, width, and temporal dimensions. It generates query, key, and value tensors via 1×1×1 3D convolutions, flattens them across spatial and temporal dimensions, and computes attention weights by multiplying the query with the transposed key, followed by softmax normalization. The resulting attention map is multiplied with the value vector and then combined with the original input via learnable parameters and a residual connection to preserve contextual information, yielding globally enhanced features. The second component, the Three-dimensional Enhanced Residual Block (TE-ResBlock), augments spatiotemporal feature extraction through temporal shift, multi-scale convolution, and channel shuffle. The temporal shift operation moves a quarter of the feature channels along the time axis to fuse adjacent frame information parameter-free, while multi-scale convolution uses parallel branches with kernel sizes of 3×3, 3×1, 1×3, and 1×1 to capture diverse receptive fields. Outputs are concatenated and processed via channel shuffle to improve cross-group information flow, with four TE-ResBlocks stacked for progressive feature refinement. The third component, the Multi-dimensional Adaptive Fusion (MDAF) module, deeply integrates spatial, temporal, and channel dimensions through three sub-modules: a Channel Enhancement Module (CEM) that recalibrates features using max pooling, temporal convolution, and sigmoid activation; a Spatial Enhancement Module (SEM) that expands the receptive field via identity mapping, standard and dilated convolution; and an Adaptive Temporal Capture Module (ATCM) that emphasizes dynamic movements using frame difference features and temporal weight maps. MDAF modules are inserted between TE-ResBlock stacks for iterative refinement. Finally, features from the MSTEN front-end are fed into a Densely Connected Temporal Convolutional Network (DC-TCN) back-end, which comprises four blocks, each containing three temporally convolutional layers with dense connections, to effectively model long-range phonological dependencies.  Results and Discussions  The proposed framework is comprehensively evaluated on the widely-used LRW dataset and GRID dataset, LRW comprising over 500,000 video clips from more than 1,000 speakers, GRID dataset consists of video clips from 34 speakers, with each speaker having 1,000 utterances and a total duration of 28 hours. Our model achieves an accuracy of 91.18%, representing an absolute improvement of 2.82 percentage points over a strong ResNet18 baseline, which underscores its substantial effectiveness. Ablation studies are conducted to dissect the contribution of each key component. The results clearly demonstrate that every proposed module brings a significant performance gain. Specifically, the introduction of the SaSTA module alone leads to an accuracy improvement of 2.09%, highlighting the crucial role of global spatiotemporal attention. The TE-ResBlock contributes a 1.73% increase, confirming its efficacy in multi-scale local feature extraction and inter-frame information fusion. Moreover, the MDAF module further enhances performance by 1.74%, emphasizing the benefit of adaptive multi-dimensional feature fusion, as detailed in Table 2.  Conclusions  This study presents a significant advancement in lipreading via the introduction of the MSTEN front-end network. The work is built upon three core contributions. First, the SaSTA module introduces an innovative mechanism for global context aggregation, effectively performing multi-dimensional feature weighting across height, width, and temporal sequences. Second, the TE-ResBlock tackles fundamental challenges in spatio-temporal modeling through a unique combination of temporal displacement, multi-scale convolution, and enhanced channel-wise interaction. Third, the MDAF module facilitates deep and synergistic integration of information from spatial, temporal, and channel dimensions. Together, these components work in concert to achieve state-of-the-art performance, reaching an accuracy of 91.18% on the challenging LRW dataset and 97.82% on the GRID dataset. Ablation studies further validate the individual and collective efficacy of each proposed innovation. Looking forward, future work will explore the extension of this framework to audio-visual speech recognition under noisy conditions, as well as the development of domain adaptation strategies to enhance robustness in low-resolution or resource-constrained scenarios.
FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization
WANG Yanlin, GAO Lijiang, YANG Haigang
Available online  , doi: 10.11999/JEIT260108
Abstract:
6-input look-up tables (LUTs) are frequently used in commercial Field-Programmable Gate Arrays (FPGAs) to build programmable logic blocks, while related experiments reveal that their average application in circuits is less than 30%, resulting in a significant waste of programmable resources. In this paper, the 6-input LUTs are fractured based on fracturable factors and recombined with different granularities to construct several new Hybrid Basic Logic Elements (HBLE). Based on HBLE, several novel Hybrid Programmable Logic Block (HPLB) architectures are proposed. Then the Programmable Logic Blocks (PLB) of Xilinx is replaced by several innovative HPLB architectures. Concurrently, a statistical evaluation algorithm for the mapped netlist is proposed. Finally, several HPLB architectures are experimentally verified and evaluated as appropriate. Experimental evaluations of the three enhanced architectures show that the HPLBs achieve an average area reduction of more than 30% when compared to Xilinx’s PLBs without adding more input ports. The hybrid HPLB architectures constructed with a fracturable factor N=3 produces the best optimization results when taking into account both HPLB utilization and area optimization. Based on the MCNC and VTR benchmarks, resource consumption increased by an average of 8.27% and 27.64%, respectively, thereby improving FPGA logic efficiency.  Objective  Currently, modern commercial FPGA architectures employ 6-LUTs as the fundamental building blocks for Basic Logic Elements (BLEs). Only about 30% of the Logic Elements (LEs) in the circuit are ultimately translated to 6-LUTs when mapping 6-LUT BLEs, according to experimental results. Nevertheless, more than half of the logic resources are wasted when 6-LUTs implement functions with inputs smaller than 6. Programmable resources will unavoidably be significantly wasted as a result. A circuit design mapped to 100 4-LUTs can be mapped to 78 6-LUTs during 6-LUT mapping studies, according to experimental data, with the {6,5,4,3,2}-LUT function distribution being {23,32,17,9,13}. The findings indicate that only around 25% of the 6-LUTs are ultimately mapped to 6-input functions, with the remaining 6-LUTs being underutilized. This illustrates even more how inefficient technical mapping is for LUTs with large input K.Methods The fracturable factor N, which is the number of sub-LUTs that may be obtained from a single LUT, characterizes the fracturable and reconfigurable nature of LUT architectures in FPGAs. Motivated by this, we decompose a 6-LUT into several granularities according to the fracturable factor in order to address the previously described problem of low resource utilization. Three novel hybrid-granularity divisible logic (HBLE) structures are created by connecting and reconfiguring the resultant sub-LUTs with additional input ports and multiplexer modules. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. One undivided 6-LUT and one divisible 6-LUT, divided into two 5-LUTs with a divisibility factor N=2, make up the HBLE2 structure. One undivided 6-LUT and one divisible 6-LUT, divided into one 5-LUT and two 4-LUTs, with a divisibility factor N=3, are included in the HBLE3 structure. One undivided 6-LUT and one divisible 6-LUT, which divides into four 4-LUTs with a divisibility factor N=4, make up the HBLE4 structure. Adder units are supported by all three HBLE structures, allowing for both latched and direct combinational logic output. Additionally, they allow direct latched output by avoiding combinational logic. A Hybrid Programmable Logic Block (HPLB) is a novel structure created by merging several HBLEs. The MCNC circuit set and the VTR circuit set, the two most well-known academic circuit benchmarks (BMs), are chosen for experimental assessment. A Xilinx Virtex-7 FPGA is used to map each circuit set. The mapped netlist is then used to tally the kinds and numbers of LUTs that were utilized. The minimum number of CLBs needed is found once the data has been arranged using the corresponding greedy algorithms. Since each Xilinx CLB has eight 6-LUTs, the greedy approach uses # Total LUT Number / 8 to determine the smallest number of CLBs needed following BM mapping. In order to guarantee similar conditions, each structure also needs to be sorted using the greedy algorithm after Xilinx’s CLB structure is replaced with the HPLB structure suggested in this research. This results in the bare minimum of HPLBs needed. It is not possible to use every LUT in the mapped CLBs during actual packing owing to routing constraints. As a result, the smallest value that may be achieved in a theoretical optimization scenario is represented by the optimized result that is acquired following greedy algorithm restructuring.  Results and Discussions  The average number of HPLBs needed for both HPLB2 and HPLB3 structures drops by about 8% when CLB structures are swapped out for HPLBs in order to map the MCNC circuit set. However, the number of HPLBs needed increases by more than 30% on average as a result of the HPLB4 structure. The needed count is smaller when HPLBs are used in place of CLBs for mapping the VTR circuit set. On average, the HPLB2 and HPLB4 counts drop by less than 10%, whereas the HPLB3 count drops by around 30%. This enables SRAM scheduling and complete input pin use. On the other hand, because of resource waste, the uniform CLB structure results in higher CLB requirements when implementing functions with a tiny LUT input K. The HPLB4 structure performs worse than the HPLB3 structure, according to post-mapping HPLB counts. Both the MCNC and VTR circuit sets achieve average area reduction ratios over 30%, according to analysis of post-mapping area optimization. All three HPLB structures attained area optimization ratios of about 31% on the MCNC test set. Different optimization effects were seen in the VTR test circuit set: HPLB2 produced an average area reduction of 30.63%, whereas HPLB4 produced an average decrease of 51.21%. The HPLB2 structure produced a 45.22% area reduction, even though its optimization effect was marginally less than that of HPLB4. A thorough examination of the area optimization results showed that a higher divisibility factor N produces more noticeable benefits for integrating small-scale LUTs in circuits, resulting in higher area reduction ratios from the enhanced architectures.  Conclusions  In order to solve the issue of low resource utilization in 6-LUTs, this research proposes three split granularity-based HPLB enhancement architectures. In addition to establishing an assessment procedure and matching algorithms for the enhanced structures, these HPLBs take the place of Xilinx’s CLB structure in order to examine the new structure’s benefits in resource utilization. Based on the proportion differences of different LUTs in the post-mapping netlist, evaluation experiments using the MCNC and VTR circuit test suites show that, although HPLB4 achieves significant area optimization, it requires additional HPLBs, resulting in increased interconnect area. While both HPLB2 and HPLB3 structures obtain average area optimizations over 30%, HPLB3 produces a significantly greater HPLB count and area optimization than HPLB2 as the test circuit scale grows. Thus, after replacing the CLB structure, the HPLB3 structure provides a more balanced optimization impact, greatly improving the utilization of programmable resources when taking into account the combined aspects of HPLB usage count and area optimization.
Efficient and Verifiable Ciphertext Retrieval Scheme Based on Trusted Execution Environment
WU Axin, FENG Dengguo, ZHANG Min, CHI Jialin, YI Yuling
Available online  , doi: 10.11999/JEIT251358
Abstract:
The ciphertext retrieval mechanism enables retrieval functionality over encrypted data. Symmetric Searchable Encryption (SSE) is a critical branch of ciphertext retrieval. However, due to considerations such as saving computing power, cloud servers may return incorrect or incomplete results. Moreover, attackers can also exploit these leaked information from search and access patterns to reconstruct the keyword details. Therefore, it is necessary and meaningful to protect the privacy of search and access patterns while achieving result verifiability. Nevertheless, existing verifiable SSE schemes that support search and access pattern privacy typically rely on keyword traversal mechanisms and their verification mechanisms are inefficient, which impose high computational and communication overheads on users. To address the above performance bottlenecks, this paper introduces an efficient and verifiable ciphertext retrieval scheme based on Trusted Execution Environment (TEE). To improve the efficiency of ciphertext retrieval, this scheme employs the collaborative implementation of hardware-level security isolation and oblivious data rearrangement to achieve keyword trapdoor size independent of the size of the keyword dictionary. Meanwhile, the correctness of the returned results is verified by embedding random numbers and blinding polynomial constant terms. Thanks to these designs, the scheme achieves significant efficiency improvements. Specifically, firstly, this scheme ensures that the size of keyword trapdoors depends solely on the number of query keywords, not the global dictionary size, effectively minimizing communication and computational costs. Secondly, this scheme requires storing only two random numbers to enable verifiability, substantially minimizing local storage overhead for users. Thirdly, the adoption of techniques, such as enabling data users to retrieve results via single-server and single-round interaction and leveraging symmetric homomorphic encryption, further enhances operational efficiency. Additionally, confidential computing within TEE weakens the security assumptions and trust level towards TEE. After formally proving the security of the proposed scheme using simulation-based methods, this paper has conducted a comprehensive performance evaluation. The evaluation results confirm that this scheme is significantly more efficient than other schemes with the same functionalities.
Physical Layer Security Game for Large Language Model-Based Inference in the Maritime Network
CHEN Haoyu, XIAO Liang, XU Xiaoyu, LI Jieling, WANG Zicheng, LIU Huanhuan, CHEN Hongyi
Available online  , doi: 10.11999/JEIT251269
Abstract:
  Objective  The physical-layer security game reveals the interaction between user equipment (UE) and attackers, and provides performance bounds of anti-jamming transmission and physical-layer authentication schemes based on the equilibriums. However, existing game models overlook smart attackers that send jamming or spoofing signals, fail to account for the maritime wireless channels affected by evaporation ducts and sea wave fluctuations, and are difficult to evaluate the performance of large language models (LLMs)-based inference, such as the vessel traffic monitoring.  Methods  The anti-jamming maritime communication game for LLM inference is formulated, where the jammer first selects the jamming power and channel to reduce the signal-to-interference-plus-noise ratio at the server with less jamming cost, and the UEs then choose transmit power, channel, LLM sparsity ratio and control center to send sensing data (e.g., images, temperature, and humidity) to enhance the inference accuracy with less latency. The physical-layer authentication game for maritime wireless networks with LLM inference is further formulated. The spoofing attacker first selects the number of spoofing packets to degrade authentication accuracy with less cost. The control center then selects the fast authentication mode based on channel state or the safe authentication mode based on the received signal strength and the arrival interval of the packet from multiple ambient transmitters, and the test threshold to increase accuracy with less cost.  Results and Discussions  Based on the Stackelberg equilibrium (SE) under the LLM with 7 billion parameters, the performance bounds of the reinforcement learning (RL)-based anti-jamming inference scheme are provided to reveal the impact of evaporation duct height, wave height, maximum sparsity ratio of LLM and the quantization level on inference accuracy and latency. In addition, the performance bounds of the RL-based maritime spoofing detection scheme are provided based on the SE of the physical-layer authentication game to show the impact of the maximum number of spoofing packets on the authentication accuracy. Simulations are carried out based on the five UEs with the antenna height of 3 meters offloading the image, temperature and humidity using the transmit power up to 200 mW at 5.8 GHz with a bandwidth of 20 MHz to five control centers with antenna heights of 6 m. The jammer applies Deep Q-Network to choose the jamming power with a maximum transmit power of 200 mW for each 5.8 GHz channel, and the spoofing attacker applies the Deep Q-Network to select the number of spoofing packets up to 100. The results show that the inference accuracy and latency of the RL-based anti-jamming maritime communication scheme for LLM inference converge to the performance bounds with gaps of less than 0.6% after 2500 time slots. In addition, the RL-based authentication scheme converges after 1000 time slots with the gap of less than 1.6%.  Conclusions  In this paper, we have formulated the maritime physical-layer security game for LLM inference, addressing scenarios such as anti-jamming sensing data transmission and spoofing detection, aiming at investigating how UEs determine transmit power and channel, and how the control center selects authentication modes and test thresholds to enhance the physical-layer security mechanisms. The attacker chooses attack modes and parameters to degrade the inference accuracy, increase latency, and even cause denial-of-service. Based on the SE and the conditions, the performance bounds of the inference accuracy increase with the maximum transmit power and linearly decrease with the sparsity ratio. Furthermore, the impact of the maximum number of spoofing packets on the inference accuracy is provided. Simulation results show that the RL-based maritime physical-layer security schemes converge to the performance bounds, thereby validating the accuracy and effectiveness of the game model.
A Method for Parallel Testing of Interlayer Vias in Monolithic 3D Integrated Circuits
CHEN Tian, CHEN Weikun, LIU Jun, LIANG Huaguo, LU Yingchun
Available online  , doi: 10.11999/JEIT251375
Abstract:
  Objective  As device dimensions in conventional two-dimensional integrated circuits approach fundamental physical limits, further improvements in performance and integration density face significant challenges. Monolithic three-dimensional integrated circuits (M3D ICs), which sequentially stack multiple active device layers on a single wafer, provide an effective solution to overcome these limitations. In M3D ICs, monolithic inter-tier vias (MIVs) are employed to realize vertical interconnections between device tiers. Compared with through-silicon vias (TSVs), MIVs feature much smaller dimensions, lower parasitic capacitance, and shorter interconnect delay. However, their small electrical variations and massive quantity cause defects to manifest mainly as subtle delay shifts, posing stringent requirements on test accuracy, efficiency, and robustness against Process, Voltage, and Temperature (PVT) variations. Existing MIV testing approaches suffer from limited scalability, strong PVT sensitivity, and difficulty in simultaneously achieving small-delay defect detection and fault localization in large-scale arrays. To address these challenges, a parallel MIV testing method based on a time-to-digital converter (TDC) is presented to enable efficient and reliable testing of large MIV arrays with low area and time overhead.  Methods  Large-scale MIVs are logically organized into a two-dimensional array structure. Each basic test cell consists of a device-under-test MIV, a tri-state buffer, and a D flip-flop, and multiple cells are cascaded to form row test chains and column test chains. By systematically exploiting the inherent input capacitance mismatch between the data and clock terminals of the D flip-flop, an embedded TDC structure incorporating the MIV under test is constructed. Test stimuli are generated by a digitally controlled delay line (DCDL), which produces START and STOP pulse signals with multiplicatively adjustable phase differences and injects them into different propagation paths of the test chains, enabling time quantization through a signal chasing mechanism. Structural symmetry between the test chains is employed to mitigate the influence of PVT variations. As the START and STOP phase difference is progressively amplified, multiple TDC readings are collected to characterize defect-induced small delay variations and to distinguish them from measurement noise and PVT-induced fluctuations. After fault information is obtained for individual test chains, cross-analysis of row and column test results enables fault localization within the two-dimensional MIV array.  Results and Discussions  Simulation results based on the Nangate 45 nm standard cell library demonstrate that, under fault-free conditions, TDC readings obtained at different phase difference settings exhibit a stable linear proportional relationship (Fig. 7). Extensive Monte Carlo simulations are performed to determine a robust deviation tolerance threshold of 2, which effectively separates normal variations caused by PVT fluctuations from abnormal shifts induced by defects. Fault injection experiments verify that small delay defects occurring on both the START chain and the STOP chain can be effectively detected and distinguished (Fig. 8). In terms of quantitative detection capability, the minimum detectable resistive open defect is approximately 8.4 kΩ, while the maximum detectable leakage defect and resistive short defect are about 67 kΩ and 32 kΩ, respectively, outperforming existing methods (Fig. 9). Moreover, the row–column decomposition architecture effectively alleviates the growth of test time as the MIV array size increases, resulting in a substantial reduction in overall test overhead. Area evaluation indicates that the average area overhead of the embedded built-in self-test structure is only 5.594 µm2 per MIV, making it suitable for high-density M3D integration.  Conclusions  A parallel TDC-based testing approach for large-scale MIV arrays is presented, which combines row–column decomposition, phase-difference multiplication, and proportional deviation-based decision mechanisms to achieve efficient detection and accurate localization of both hard faults and small delay defects. Structural symmetry within the test chains effectively enhances robustness against PVT variations. Simulation results confirm that the proposed method can reliably detect resistive open, leakage, and short defects while maintaining low area and time overhead. Compared with existing techniques, a favorable balance among test accuracy, PVT robustness, test efficiency, and hardware cost is achieved. Owing to its scalability and practical feasibility, the proposed approach provides an effective and reliable solution for MIV testing in advanced monolithic three-dimensional integrated circuits.
Modulation Recognition Method for High-Speed Mobile Communication Based on Attention Dynamic Fusion and Hybrid Pruning Transformer
ZHENG Qinghe, CHEN Bin, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT251211
Abstract:
  Objective  Automatic modulation recognition is a critical preprocessing step in dynamic spectrum access and anti-jamming communication systems, directly impacting the robustness and spectrum efficiency of non-cooperative communication. In high-speed mobile communication scenarios such as satellite, high-speed rail, and drone swarm communications, signal modulation features suffer severe distortion due to Doppler shifts, time-varying channels, and non-stationary interference. The above issues pose significant challenges to traditional modulation recognition methods based on static assumptions, leading to feature mismatch and increased misjudgment rates. To address the issues of insufficient robustness and real-time performance in existing deep learning-based modulation recognition models under high-speed mobile environments, this paper proposes a lightweight dynamic fusion Transformer-based approach.  Methods  The proposed method consists of three main components: signal representation fusion block, Transformer model design, and model pruning for lightweight inference. First, a RollingQ mechanism is introduced to dynamically adjust the direction of attention query matrix based on the quality of each signal representation, breaking the cycle of attention fixation and achieving the balanced utilization of all types of signal representations. Then, the multi-head attention frequency enhancement Transformer (MAFE-Transformer) is designed, which integrates local and global spatiotemporal features through modules including lightweight convolutional enhancement, multi-attention feature extraction, and frequency learning and selection. Finally, an attention-based dynamic hybrid pruning strategy is applied to reduce structural redundancy and accelerate inference, enabling real-time modulation recognition.  Results and Discussions  Extensive experiments are conducted on two public datasets, RadioML 2016.10a and RML22, to validate the effectiveness of the proposed method. The MAFE-Transformer achieves average classification accuracies of 65.14% and 78.40% on the two datasets, respectively. Under low SNR conditions of –20~0 dB, the model demonstrates strong robustness, particularly on the RML22 dataset with dynamic channel model ETU70 (Fig. 5). The confusion matrix shows that the error distribution of MAFE-Transformer is relatively uniform among different modulation schemes, reflecting its well-balanced classification performance (Fig. 6). Ablation studies confirm that the RollingQ-based dynamic fusion mechanism improves accuracy by 7.2% on RadioML 2016.10a and 9.5% on RML22 compared to single signal representation (Fig. 7). The hybrid pruning strategy reduces inference latency to 2.2 ms per signal while maintaining high accuracy (Fig. 8). Comparative experiments show that the proposed model outperforms several state-of-the-art deep learning models (e.g., Ms-RaT, MobileViT, MobileRaT, and KA-CNN) by 4%–10% in recognition accuracy, demonstrating superior performance in high-speed mobile communication scenarios (Fig. 9).  Conclusions  This paper proposes a lightweight dynamic fusion Transformer-based automatic modulation recognition method to address the challenges of robustness and real-time performance in high-speed mobile communication environments. By introducing RollingQ mechanism and the MAFE-Transformer structure combined with dynamic hybrid pruning, the proposed method achieves a better trade-off between recognition accuracy and inference efficiency. Experimental results on public datasets confirm its effectiveness and robustness under complex channel conditions with Doppler shifts and time-varying interference. However, the proposed method has not been systematically evaluated under more complex interference such as impulsive noise or frequency-selective fading. Future work will focus on improving adaptability to non-stationary noise, cross-device generalization, and optimization for edge deployment.
Design and Verification of Robust Modulation Recognition Framework Under Blind Adversarial Attacks
ZHENG Qinghe, ZHOU Fuhui, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT260019
Abstract:
  Objective  Deep learning-based automatic modulation recognition (AMR) models have demonstrated superior performance in non-cooperative communication systems such as cognitive radio and spectrum monitoring. However, the inherent vulnerability of deep learning models to adversarial attacks, where imperceptible perturbations can cause catastrophic misclassification, poses the severe security threat. Existing defense methods, including adversarial training, often rely on prior knowledge of specific attacks, incur significant computational overhead, and face the trade-off between robustness and accuracy on clean samples. To address these limitations, this paper aims to design and validate a robust modulation recognition framework that can operate effectively under blind adversarial attack scenarios without prior knowledge of the attack type and strategy, thereby ensuring the reliable deployment of intelligent communication systems in adversarial environments.  Methods  The proposed framework integrates a novel feature-purifying autoencoder module with standard modulation classifiers (CNN and Transformer). The core innovation lies in the autoencoder’s bottleneck layer, which incorporates a dynamic purification mechanism. This mechanism first calculates an adaptive threshold based on the statistical properties of the encoded latent features to identify anomalies. Subsequently, the Top-K sparsification operation selectively preserves only the most significant feature activations, effectively suppressing noise and adversarial perturbations while retaining essential signal characteristics. Then the autoencoder is trained via a three-stage curriculum learning strategy that sequentially optimizes reconstruction fidelity, feature sparsity, and semantic consistency between the purified and original clean signals, ensuring the output aligns with the true modulation manifold. This model-agnostic module can be seamlessly prepended to any trained classifier without retraining.  Results and Discussions  Comprehensive experiments are conducted on a simulated dataset encompassing 12 digital modulation types under multipath fading channels. The framework demonstrated substantial performance improvements. For the CNN and Transformer, the recognition accuracies under challenging targeted white-box attacks increased to 82.1% and 83.2%, and under non-targeted black-box attacks reached 87.7% and 89.4%, respectively (Table 1). The attack success rate (ASR) and attack effectiveness index (AEI) remained at low levels, confirming strong defensive capability. Figure 4 shows that defense efficacy improves with higher SNR. Crucially, the ablation study in Figure 5 highlights the indispensable role of the autoencoder, whose removal caused accuracy to plummet by 4.02% and 2.36% on CNN and Transformer under strong attacks. Further analysis (Figure 6) indicates that the framework maintains robustness across a wide range of perturbation bounds (\begin{document}$ \epsilon \leq 0.1 $\end{document}). Moreover, parameter sensitivity studies (Figures 7 and 8) show stable performance for threshold coefficient \begin{document}$ \xi $\end{document} in [1.5, 1.9] and sparsity rate k around 0.7, confirming its practical deployability.  Conclusions  This paper presents a robust, blind defense framework for robust AMR based on the feature-purifying autoencoder. The key advantages are threefold: 1) It provides effective defense against diverse white-box and black-box attacks without requiring any prior knowledge of various attack methods, achieving true blind defense; 2) As a preprocessing module, it eliminates the need for computationally expensive retraining of the primary classifier and is compatible with various backbone networks; 3) The multi-stage training strategy successfully balances robustness against attacks with the preservation of high accuracy on clean samples. Finally, experimental results on the comprehensive dataset validate the framework’s superiority. Future work will focus on lightweight architectural designs to reduce inference latency and further investigate performance boundaries under extreme low-SNR conditions combined with complex nonlinear channel impairments.
UWF-YOLO: A Lightweight Framework for Underwater Object Detection via Redundant Information Optimization
HOU Guojia, MA Jiaqi, WANG Yuechuan, HUANG Baoxiang, LI Kunqian
Available online  , doi: 10.11999/JEIT251129
Abstract:
  Objective  The rapid development of underwater imaging technology has significantly elevated the importance of underwater object detection for resource exploration and environmental monitoring applications. Generally, complex underwater environments yield various degradations of image quality such as color casts, haze-like effects, and non-uniform illumination. Unfortunately, existing vision-based object detection algorithms always suffer from unpleasing performance and notable limitations especially for detecting small objects, resulting in missed detections and false positives. Moreover, existing deep learning based underwater detection models also face substantial challenges in striking an optimal balance between accuracy and lightweight design under the condition of limited equipment resources. To address these issues, it is of great importance to design efficient underwater object detection methods in view of water-related vision tasks, which play a crucial role in marine resource exploration, ecological monitoring, underwater robotics, and intelligent perception systems for autonomous underwater vehicles.  Methods  In this paper, we propose a novel lightweight framework based on redundant information optimization for underwater object detection. Technically, we propose a lightweight underwater object detection network called UWF-YOLO based on redundancy information optimization. First, the C2f module is reconstructed by FasterNet Block to optimize both the backbone and neck networks, and a feature channel selection mechanism is incorporated to reduce the redundant features. On other hand, due to the redundant traditional convolutional features in the YOLO neck, it is difficult to adapt to the underwater environment. Ghost Convolution is also introduced to generate the Ghost feature map for enhancing the multi-scale feature fusion capability of the neck network. Next, our proposed method achieves parameter sharing by replacing the original detection head with a redundant optimization group detection head (RRG-Head) based on group convolution, thereby reducing computational costs. Finally, the structured channel pruning technique is applied to identify the inter-layer dependencies of the graph and bind the pruning units. Combined with the LAMP weight magnitude score normalization for evaluating the importance of channels, the low-contributing groups are pruned and fine-tuned to achieve network size compression. In addition, since the scene of underwater detection datasets are typically monotonous and the underwater objects contained in the available datasets are usually small and clustered. We also construct an underwater object detection dataset with complex scene, namely CSUOD, by collecting real-world underwater images from different websites and platforms to ensure both its diversity and authenticity, followed by manual annotation and resolution normalization preprocessing. CSUOD is specifically designed for various challenging underwater environments characterized by color casts, haze-like effects, and non-uniform illumination. In our CSUOD, we manually select 1135 images containing 6 different types, and perform the manual annotation and resolution standardization operations.  Results and Discussions  Extensive experiments are conducted on three public underwater object detection datasets (i.e., DUO, RUOD, and TrashCan) by comparing several popular and widely used object detection methods. The proposed model is evaluated against mainstream detectors, including YOLOv5s, YOLOv7-tiny, YOLOv8s, YOLOv9-tiny, and Deformable DETR. In computational complexity assessment, experimental results show that the proposed method has reduced the FLOPs, model size, and parameters by 60.4%, 77.3%, and 78.4%, respectively, compared to the baseline. In addition, our method has outperformed YOLOv9-tiny with comparable parameters by 0.3%, 2.3%, and 3.4% in mAP across the three datasets. Also, some comparative results on our established CSUOD dataset also indicate that our proposed model has a good improvement and stability even in complex underwater environments. Qualitative visualization results further illustrate the model’s robustness and detection stability under various underwater degradations, such as haze-like effects and non-uniform illumination.  Conclusions  Quantitative and qualitative experiments on different datasets have validated the effectiveness and robustness of the proposed method. In addition, our method achieves superior detection performance in complex underwater environments, effectively solving missed detections and false positives caused by background interference. A large number of experimental results show that our designed UWF-YOLO can not only achieve significant light weighting, but also maintain the comparable detection accuracy comparing with the benchmark model. This balance between the detection accuracy and low computational cost makes it particularly suitable for underwater devices with limited resources. Besides, the proposed method has great potential in practical scenarios such as marine ecological monitoring, underwater resource exploration, and autonomous underwater vehicle perception systems. It also provides a reliable and efficient technical foundation for real-time applications, with strong adaptability to different underwater conditions, efficient integration into embedded platforms, and support for real-time perception and decision-making. Our constructed dataset CSUOD in this study will help address the limitations of existing underwater object detection datasets and promote the development of underwater object detection. In the future, this work can be further extended to multi-modal perception systems and larger-scale datasets. These efforts will enable adaptive models for more dynamic underwater scenarios and support broader applications in intelligent ocean observation and autonomous navigation.
Performance Analysis and Rapid Prediction of Long-range Underwater Acoustic Communications in Uncertain Deep-sea Environments
CHEN Xiangmei, TAI Yupeng, WANG Haibin, HU Chenghao, WANG Jun, WANG Diya
Available online  , doi: 10.11999/JEIT251244
Abstract:
  Objective  In complex and dynamically changing deep-sea environments, the performance of underwater acoustic communications shows substantial variability. Feedback-based channel estimation and parameter adaptation are impractical in long-range scenarios because platform constraints prevent reliable feedback channels and the slow propagation of sound introduces significant delay. In typical long-range systems, environmental dynamics are often ignored and communication parameters are selected heuristically, which frequently leads to mismatches with actual channel conditions and causes communication failures or reduced efficiency. Predictive methods able to assess performance in advance and support feed-forward parameter adjustment are therefore required. This study proposes a deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications under uncertain environmental conditions to enable efficient and reliable parameter–channel matching without feedback.  Methods  A feed-forward method for underwater acoustic communication performance analysis and rapid prediction is developed using deep-learning-based sound-field uncertainty estimation. A neural network is first used to estimate probability distributions of Transmission Loss (TL PDFs) at the receiver under dynamic environments. TL PDFs are then mapped to probability distributions of the Signal-to-Noise Ratio (SNR PDFs), enabling communication performance evaluation without real-time feedback. Statistical channel capacity and outage capacity are analyzed to characterize the theoretical upper limits of achievable rates in dynamic conditions. Finally, by integrating the SNR distribution with the bit-error-rate characteristics of a representative deep-sea single-carrier communication system under the corresponding channel, a rate–reliability prediction model is constructed. This model estimates the probability of reliable communication at different data rates and serves as a practical tool for forecasting link performance in highly dynamic and feedback-limited underwater acoustic environments.  Results and Discussions  The method is validated using simulation data and sea trial data. The TL PDFs predicted by the deep learning model show strong consistency with the traditional Monte Carlo (MC) method across multiple receiver locations (Fig. 6). Under identical computational settings, deep-learning-based TL PDF prediction reduces computation time by 2\begin{document}$ \sim $\end{document}3 orders of magnitude compared with the MC method. The chained mapping from TL PDFs to SNR PDFs and then to channel capacity metrics accurately represents the probabilistic features of communication performance under uncertain conditions (Fig. 7 and Fig. 8). The rate–reliability curves derived from the deep-learning-based TL PDFs are highly consistent with MC-based results. In the high sound-intensity region, prediction errors for reliable communication probabilities across data rates range from 0.1% to 3%, and in the low sound-intensity region errors are approximately 0.3% to 5% (Fig. 12). Sea trial results further indicate that predicted rate–reliability performance agrees well with measured data. In the convergence zone, deviations between predicted and measured reliability probabilities at each rate range from 0.9% to 4%, and in the shadow zone from 1% to 9% (Fig. 18). Under a 90% reliability requirement, the maximum achievable rates predicted by the method match the measurements in both the convergence and shadow zones, demonstrating accuracy and practical applicability in complex channel environments.  Conclusions  A deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications in uncertain deep-sea environments is developed and validated. The framework builds a chained mapping from environmental parameters to TL PDFs, SNR PDFs, and communication performance metrics, enabling quantitative capacity assessment under dynamic ocean conditions. Predictive “rate–reliability’’ profiles are obtained by integrating probabilistic propagation characteristics with the performance of a representative deep-sea single-carrier system under the corresponding channel, providing guidance for parameter selection without feedback. Sea trial results confirm strong agreement between predicted and measured performance. The proposed approach offers a technical pathway for feed-forward performance analysis and dynamic adaptation in long-range deep-sea communication systems, and can be extended to other communication scenarios in dynamic ocean environments.
Indoor Visible Light Positioning Based on CNN–MLP Multi-Feature Fusion under Random Receiver Tilt Conditions
JIA Kejun, WANG Jian, MAO Lifei, YOU Wei, HUANG Ziyang, PENG Duo
Available online  , doi: 10.11999/JEIT251021
Abstract:
  Objective  Traditional visible light positioning (VLP) methods based on received signal strength (RSS) suffer from instability when the receiver experiences orientation perturbations, which disrupt the correspondence between optical power and spatial position, making reliable three-dimensional (3D) positioning difficult to achieve. Existing approaches typically rely on inertial measurement units (IMUs) to obtain orientation information; however, sensor fusion increases system complexity and hardware cost and introduces cumulative errors. To address these issues, this paper proposes a positioning method that fuses cosine-of-incidence-angle estimation based on a photodiode (PD) array with RSS information, enabling high-accuracy 3D indoor positioning under receiver orientation perturbations.  Methods  In the proposed fusion-based positioning method, a multi-PD array structure is first adopted, and a local coordinate system (LCS) is established at the array center. Constraint equations are then constructed based on the differences in received optical power among PDs in the array. A Gauss–Newton iterative algorithm is employed to estimate the incident light direction vector. By exploiting the orthogonal rotation invariance between the LCS and the global coordinate system (GCS), the cosine of the incident angle is estimated without the need for orientation sensors. Subsequently, a serial CNN–MLP fusion network is constructed, in which the estimated incident-angle cosine is introduced as an additional positioning feature on top of RSS-based localization. The network jointly models the RSS and incident-angle cosine information received by the PD array and maps them to 3D spatial coordinates. Finally, training samples are generated using Latin hypercube sampling (LHS) to uniformly sample spatial positions and orientation dimensions, thereby improving the representativeness of the training dataset.  Results and Discussions  Simulation experiments are conducted in a 4 m × 4 m × 2.5 m indoor environment. First, the effects of different numbers of PDs and tilt angles on the accuracy of incident-angle cosine estimation and spatial coverage are evaluated (Fig. 6), and the cumulative distribution functions (CDFs) of positioning errors under different array configurations are compared (Fig. 7). The results show that a 3-PD array with a tilt angle of 40° achieves the best balance among cost, coverage, and positioning accuracy. Next, positioning performance under different receiver tilt angles is analyzed. When the tilt angle is small, more than 70% of positioning errors are below 5 cm; even when the receiver is tilted up to 55°, the average error remains within 11.7 cm (Fig. 8). Error component comparisons indicate that the error along the Z-axis is significantly smaller than those along the X and Y axes (Fig. 9). Further tests are conducted at a height of 0.0 m covered by the training data and at an unseen height of 0.6 m not included in the training set (Fig. 10). The results demonstrate that the proposed model does not exhibit strong dependence on a specific height plane and maintains stable 3D positioning performance at unseen heights. Finally, the proposed method is compared with related positioning schemes. It outperforms existing methods in terms of CDF convergence speed, RMSE, and standard deviation (Fig. 11), achieving an average error reduction of approximately 2.5 cm and an RMSE reduction of 31.58% compared with Ref. [12].  Conclusions  This paper estimates the cosine of the incident angle at the receiver by exploiting differences in the optical power received by different PDs in an array and introduces this cosine value as a joint positioning feature into conventional RSS-based localization, thereby alleviating the instability of position mapping caused by relying solely on RSS under random receiver perturbations. By further combining the spatial feature extraction capability of CNNs with the nonlinear modeling strength of MLPs, the proposed method effectively maps positioning features to 3D spatial coordinates. The approach reduces reliance on orientation sensors such as IMUs while overcoming the susceptibility of traditional geometric positioning methods to noise and high-dimensional nonlinear features. Under varying heights and receiver orientations, the proposed algorithm demonstrates significant advantages in both positioning accuracy and stability.
Inverse Design of a Silicon-Based Compact Polarization Splitter-Rotator
HUI Zhanqiang, ZHANG Xinglong, HAN Dongdong, LI Tiantian, GONG Jiamin
Available online  , doi: 10.11999/JEIT250858
Abstract:
  Objective  The Polarization Splitter-Rotator (PSR) is a key device used to control the polarization state of light in Photonic Integrated Circuits (PICs). Device size has become a major constraint on integration density in PICs. Traditional design methods are time-consuming and tend to yield larger device footprints. Inverse design, by contrast, determines structural parameters through optimization algorithms according to target performance and enables compact devices to be obtained while maintaining functionality. This strategy is now applied to wavelength and mode division multiplexers, all-optical logic gates, power splitters, and other integrated photonic components. The objective of this work is to use inverse design to address size limitations in silicon-based PSRs by combining the Momentum Optimization algorithm with the Adjoint Method. This combined approach improves the integration level of PICs and provides a feasible pathway for the miniaturization of other photonic devices.  Methods  The design region is defined on a 220 nm Silicon-on-Insulator (SOI) wafer and is discretized into 25×50 cylindrical elements. Each element has a 50 nm radius, a 150 nm height, and an initial relative permittivity of 6.55. The adjoint method is used to obtain gradient information across the design region, and this gradient is processed with the Momentum Optimization algorithm. The relative permittivity of each element is then updated according to the processed gradient. During optimization, the momentum factor is dynamically adjusted with the iteration number to accelerate convergence, and a linear bias is applied to guide the permittivity toward the values of silicon and air as the iterations progress. After optimization, the elements are binarized based on their final permittivity: values below 6.55 are assigned to air, whereas values above 6.55 are assigned to silicon. This results in a structure containing irregularly distributed air holes. To compensate for performance loss introduced during binarization, the etching depth of air holes with pre-binarization permittivity between 3 and 6.55 is optimized. Adjacent air holes are merged to reduce fabrication errors. The final device consists of air holes with five radii, among which three larger-radius types are selected for further refinement. Their etching radii and depths are optimized to recover remaining performance loss. Device performance is evaluated through numerical analysis. Calculated parameters include Insertion Loss (IL), Crosstalk (CT), Polarization Extinction Ratio (PER), and bandwidth. Tolerance analysis is also conducted to assess robustness under fabrication variations.  Results and Discussions   A compact PSR is designed on a 220 nm SOI wafer with dimensions of 5 μm in length and 2.5 μm in width. During optimization, the momentum factor in the Momentum Optimization algorithm is dynamically adjusted. A larger momentum factor is applied in the early stage to accelerate escape from local maxima or plateau regions, whereas a smaller momentum factor is used in later iterations to increase the weight of the current gradient. Compared with other optimization strategies, this algorithm requires only 20%~33% of the iteration count needed by alternative methods to reach a Figure of Merit (FOM) of 1.7, which improves optimization efficiency. Numerical analysis shows that the device achieves stable performance across the 1 520~1 575 nm wavelength range. The IL remains low (TM0 < 1 dB, TE0 < 0.68 dB), and the CT is effectively suppressed (TM0 < –23 dB, TE0 < –25.2 dB). The PER is high (TM0 > 17 dB, TE0 > 28.5 dB). Tolerance analysis indicates strong robustness to fabrication variations. Within the 1 520~1 540 nm range, performance remains stable under etching depth offsets of ±9 nm and etching radius offsets of ±5 nm, demonstrating reliable manufacturability.  Conclusions   Numerical analysis demonstrates that combining the adjoint method with the Momentum Optimization algorithm is a feasible strategy for designing an integrated PSR. The design principle relies on controlling light propagation through adjustments to the relative permittivity, which determine the distribution and placement of air holes to achieve polarization splitting and rotation. Compared with traditional design approaches, inverse design uses the design region more efficiently and enables a more compact device structure. The proposed PSR is markedly smaller and shows enhanced fabrication tolerance. It is suitable for future large-scale PICs and provides useful guidance for the miniaturization of other photonic devices.
Research on UAV Swarm Radiation Source Localization Method Based on Dynamic Formation Optimization
WU Sujie, WU Binbin, YANG Ning, WANG Heng, GUO Daoxing, GU Chuan
Available online  , doi: 10.11999/JEIT251023
Abstract:
In dense and structurally complex urban environments, Unmanned Aerial Vehicle (UAV) swarm radiation source localization is affected by signal attenuation, multipath propagation, and building obstructions. To address these limitations, a dynamic formation-optimization method for UAV swarms is proposed. By improving the geometric configuration of the swarm, the method reduces path loss and interference, which strengthens localization accuracy. Received signal strength is used to evaluate signal quality in real time and supports adaptive formation adjustments that improve propagation conditions. Geometric dilution of precision and root mean square error metrics are integrated to refine swarm geometry and improve distance-estimation reliability. Simulation results show that the proposed method converges faster and improves localization accuracy in complex urban environments, reducing errors by more than 80 percent. The method adapts to environmental variation and demonstrates strong robustness and practical value.  Objective  UAV swarm localization and formation control in urban environments are affected by obstacles, signal attenuation, and rapid variation in the surroundings that reduce the reliability of conventional methods. This study proposes a radiation source localization approach that integrates the Received Signal Strength Indicator (RSSI) with dynamic formation adjustment to improve localization accuracy and strengthen system robustness in complex urban scenarios. RSSI is used once in full form, then referenced consistently.  Methods  The method uses RSSI measurements to estimate the distance to the radiation source and adjusts UAV swarm formation in real time to reduce localization errors. These adjustments are based on feedback that reflects relative positions, signal strength, and environmental variation. Localization accuracy is strengthened through a multi-sensor fusion strategy that integrates GPS, IMU, and depth-camera data. A data-quality assessment mechanism evaluates signal reliability and triggers formation adaptation when the signal drops below a predefined threshold. This optimization process reduces positioning errors and improves system robustness.  Results and Discussions  Simulation experiments in a ROS-based environment were conducted to evaluate the UAV swarm localization method under urban obstacles and multipath conditions. The swarm began in a hexagonal formation and adjusted its geometry according to environmental variation and localization confidence (Fig. 34). As shown in Fig. 5, localization errors fluctuated during initialization but converged to below 1 m after 150 s. Formation comparisons (Fig. 6) showed that symmetric structures such as hexagonal and triangular formations maintained errors below 0.5 m, whereas asymmetric formations (T and Y shape) produced deviations up to 4.9 m. Further comparisons (Fig. 7) showed that traditional RSSI saturated near 15 m, direction of arrival fluctuated between 5 and 14 m, and time difference of arrival failed due to synchronization problems. The proposed method achieved sub-meter accuracy within 60 s and remained robust throughout the mission. These findings indicate that combining RSSI-based distance estimation with dynamic formation adjustment improves localization accuracy, convergence speed, and adaptability under complex environmental conditions.  Conclusions  This study addresses UAV swarm localization in complex urban environments by integrating RSSI-based distance estimation, dynamic formation adjustment, and multi-sensor fusion. ROS-based simulations show that: (1) localization errors converge rapidly to sub-meter levels, reaching below 1 m within 150 s under non-line-of-sight conditions; (2) symmetric formations such as hexagonal and triangular configurations outperform asymmetric ones and reduce errors by up to 67 percent compared with fixed Y-shaped formations; and (3) relative to traditional RSSI, direction of arrival, and time difference of arrival approaches, the proposed method shows faster convergence, higher stability, and stronger robustness.
Conditional Generative Adversarial Networks-based Channel Estimation for ISAC-RIS System
LIU Yu, ZHENG Zelin, LIU Gang
Available online  , doi: 10.11999/JEIT251168
Abstract:
  Objective  In RIS-assisted ISAC systems, accurate channel estimation is crucial to ensure reliable operation. Although traditional deep learning methods can partially address the channel estimation problem, their generalization ability and estimation accuracy remain limited in complex multi-user channel environments. To tackle these challenges, this paper proposes a two-stage channel estimation method based on Conditional Generative Adversarial Network(CGAN) for RIS-assisted multi-user ISAC systems, aiming to enhance both the accuracy and stability of channel estimation.  Methods  This paper proposes a two-stage channel estimation method based on CGAN for estimating the SAC channels in RIS-assisted multi-user ISAC systems. By adjusting the switching states of the RIS, the overall estimation problem is decomposed into subproblems, enabling sequential estimation of the direct and reflected channels. Within the proposed CGAN framework, the adversarial training between the generator and discriminator allows the model not only to learn the mapping relationship between the observed signals and the true channels but also to optimize the output according to the discriminator’s feedback, thereby effectively improving both training efficiency and estimation accuracy.  Results and Discussions  Extensive simulation experiments were conducted to verify the effectiveness of the proposed method. First, the estimation performance of the SAC channel under different SNR conditions was compared. The results demonstrate that the proposed CGAN-based method achieves significantly better NMSE performance than the LS benchmark and traditional models such as FNN and ELM (Fig. 4). Then, the impact of increasing the number of antennas and RIS elements on SAC channel estimation performance was investigated. Compared with the LS benchmark, the proposed CGAN method consistently maintains superior performance under various SNR conditions (Figs. 5 and 6).  Conclusions  This paper investigates the channel estimation problem in RIS-assisted multi-user ISAC systems and proposes a two-stage channel estimation method based on CGAN. By adjusting the switching states of the RIS and employing adversarial training between the generator and discriminator networks, the proposed method achieves accurate estimation of the SAC channel. Simulation results demonstrate that, under various SNR conditions and channel dimensions, the CGAN-based estimation method exhibits strong generalization capability and significantly outperforms the benchmark schemes in estimation accuracy. Therefore, it shows great potential as an effective solution for enhancing system stability and efficiency.
Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning in Wireless Networks
LIU Jingyuan, MA Ke, XU Runchen, CHANG Zheng
Available online  , doi: 10.11999/JEIT251221
Abstract:
  Objective  Multimodal Federated Learning (MFL) uses complementary information from multiple modalities, yet in wireless edge networks it is restricted by limited energy and frequent missing modalities because many clients store only images or only reports. This study presents Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning (CREEMFL), which applies selective completion and joint communication–computation optimization to reduce training energy under latency and wireless constraints.  Methods  CREEMFL completes part of the incomplete samples by querying a public multimodal subset, and processes the remaining samples through zero padding. Each selected user downloads the global model, performs image-to-text or text-to-image retrieval, conducts local multimodal training, and uploads model updates for aggregation. An energy–delay model couples local computation and wireless communication and treats the required number of global rounds as a function of retrieval ratios. Based on this model, an energy minimization problem is formulated and solved using a two-layer algorithm with an outer search over retrieval ratios and an inner optimization of transmission time, Central Processing Unit (CPU) frequency, and transmit power.  Results and Discussions  Simulations on a single-cell wireless MFL system show that increasing the ratio of completing text from images improves test accuracy and reduces total energy. In contrast, a large ratio of completing images from text provides limited accuracy gain but increases energy consumption (Fig. 3, Fig. 4). Compared with four representative baselines, CREEMFL achieves shorter completion time and lower total energy across a wide range of maximum average transmit powers (Fig. 5, Fig. 6). For CREEMFL, increased system bandwidth further reduces completion time and energy consumption (Fig. 7, Fig. 8). Under different user modality compositions, CREEMFL also attains higher test accuracy than local training, zero padding, and cross-modal retrieval without energy optimization (Fig. 9).  Conclusions  CREEMFL integrates selective cross-modal retrieval and joint communication–computation optimization for energy-efficient MFL. By treating retrieval ratios as variables and modeling their effect on global convergence rounds, it captures the coupling between per-round costs and global training progress. Simulations verify that CREEMFL reduces training completion time and total energy while preserving classification accuracy in resource-constrained wireless edge networks.
Finite-time Adaptive Sliding Mode Control of Servo Motors Considering Frictional Nonlinearity and Unknown Loads
ZHANG Tianyu, GUO Qinxia, YANG Tingkai, GUO Xiangji, MING Ming
Available online  , doi: 10.11999/JEIT250521
Abstract:
  Objective  Ultra-fast laser processing with an infinite field of view requires servo motor systems with superior tracking accuracy and robustness. However, such systems are highly nonlinear and affected by coupled unknown load disturbances and complex friction, which constrain the performance of conventional controllers. Although Sliding Mode Control (SMC) exhibits inherent robustness, traditional SMC and observer designs cannot achieve accurate finite-time disturbance compensation under strong nonlinearities, thus limiting high-speed and high-precision trajectory tracking. To address this limitation, a novel finite-time adaptive SMC approach is proposed to ensure rapid and precise angular position tracking within a finite time, satisfying the stringent synchronization requirements of advanced laser processing systems.  Methods  A novel control strategy is developed by integrating an adaptive disturbance observer fused with a Radial Basis Function Neural Network (RBFNN) and finite-time Sliding Mode Control (SMC). First, the unknown load disturbance and complex frictional nonlinear dynamics are combined into a unified "lumped disturbance" term, improving model generality and the ability to represent real operating conditions. Second, a finite-time adaptive disturbance observer is constructed to estimate this lumped disturbance. The observer utilizes the universal approximation capability of the RBFNN to learn and approximate the dynamic characteristics of unknown disturbances online. Simultaneously, a finite-time adaptive law based on the error norm is introduced to update the neural network weights in real time, ensuring rapid and accurate finite-time estimation of the lumped disturbance while reducing dependence on precise model parameters. Based on this design, a finite-time SMC is developed. The controller uses the observer’s disturbance estimation as a feedforward compensation term, incorporates a carefully formulated finite-time sliding surface and equivalent control law, and introduces a saturation function to suppress control input chattering. A suitable Lyapunov function is then constructed, and the finite-time stability theory is rigorously applied to prove the practical finite-time convergence of both the adaptive observer and the closed-loop control system, guaranteeing that the system tracking error converges to a bounded neighborhood near the origin within finite time.  Results and Discussions  To verify the effectiveness and superiority of the proposed control strategy, a typical Permanent Magnet Synchronous Motor (PMSM) servo system model is constructed in the MATLAB environment, and a simulation scenario with desired trajectories of varying frequencies is established. The proposed method is comprehensively compared with the widely used Proportional–Integral (PI) control and the advanced method reported in reference [7]. Simulation results demonstrate the following: 1. Tracking performance: Under various reference trajectories, the proposed controller enables the system to accurately follow the target trajectory with a tracking error substantially smaller than that of the PI controller. Compared with the method in reference [7], it achieves smoother responses and smaller residual errors, effectively eliminating the chattering observed in some operating conditions of the latter. 2 Disturbance rejection and robustness: The adaptive disturbance observer based on the RBFNN rapidly and effectively learns and compensates for the lumped disturbance composed of unknown load variations and frictional nonlinearities. Even in the presence of these disturbances, the proposed controller maintains high-precision trajectory tracking, demonstrating strong disturbance rejection and robustness to system parameter variations. 3. Control input characteristics: Compared with the reference methods, the control signal of the proposed approach quickly stabilizes after the initial transient phase, effectively suppressing chattering caused by high-frequency switching. The amplitude range of the control input remains reasonable, facilitating practical actuator implementation. 4. Comprehensive evaluation: Based on multiple error performance indices, including Integral Squared Error (ISE), Integral Absolute Error (IAE), Time-weighted Integral Absolute Error (ITAE), and Time-weighted Integral Squared Error (ITSE), the proposed controller consistently outperforms both PI control and the method in reference [7]. It demonstrates comprehensive advantages in suppressing transient errors rapidly and reducing overall error accumulation. The method also improves steady-state accuracy and achieves a balanced response speed with effective noise attenuation. 5. Observer performance: The RBFNN weight norm estimation converges rapidly and stabilizes at a low level after initial adaptation, confirming the effectiveness of the proposed adaptive law and the learning efficiency of the observer.  Conclusions  A finite-time sliding mode control strategy with an adaptive disturbance observer is proposed for servo systems used in ultra-fast laser processing. The method models unknown load disturbances and frictional nonlinearities as a lumped disturbance term. An adaptive observer, integrating an RBF neural network with a finite-time mechanism, accurately estimates this disturbance for real-time compensation. Based on the observer, a finite-time SMC law is formulated, and the practical finite-time stability of the closed-loop system is theoretically proven. Simulations conducted on a permanent magnet synchronous motor platform confirm that the proposed approach achieves superior tracking accuracy, robustness, and control smoothness compared with conventional PI and existing advanced methods. This work offers an effective solution for achieving high-precision control in nonlinear systems subject to strong disturbances.
Breakthrough in Solving NP-Complete Problems Using Electronic Probe Computers
XU Jin, YU Le, YANG Huihui, JI Siyuan, ZHANG Yu, YANG Anqi, LI Quanyou, LI Haisheng, ZHU Enqiang, SHI Xiaolong, WU Pu, SHAO Zehui, LENG Huang, LIU Xiaoqing
Available online  , doi: 10.11999/JEIT250352
Abstract:
This study presents a breakthrough in addressing NP-complete problems using a newly developed Electronic Probe Computer (EPC60). The system employs a hybrid serial–parallel computational model and performs large-scale parallel operations through seven probe operators. In benchmark tests on 3-coloring problems in graphs with 2,000 vertices, EPC60 achieves 100% accuracy, outperforming the mainstream solver Gurobi, which succeeds in only 6% of cases. Computation time is reduced from 15 days to 54 seconds. The system demonstrates high scalability and offers a general-purpose solution for complex optimization problems in areas such as supply chain management, finance, and telecommunications.  Objective   NP-complete problems pose a fundamental challenge in computer science. As problem size increases, the required computational effort grows exponentially, making it infeasible for traditional electronic computers to provide timely solutions. Alternative computational models have been proposed, with biological approaches—particularly DNA computing—demonstrating notable theoretical advances. However, DNA computing systems continue to face major limitations in practical implementation.  Methods  Computational Model: EPC is based on a non-Turing computational model in which data are multidimensional and processed in parallel. Its database comprises four types of graphs, and the probe library includes seven operators, each designed for specific graph operations. By executing parallel probe operations, EPC efficiently addresses NP-complete problems.Structural Features:EPC consists of four subsystems: a conversion system, input system, computation system, and output system. The conversion system transforms the target problem into a graph coloring problem; the input system allocates tasks to the computation system; the computation system performs parallel operations via probe computation cards; and the output system maps the solution back to the original problem format.EPC60 features a three-tier hierarchical hardware architecture comprising a control layer, optical routing layer, and probe computation layer. The control layer manages data conversion, format transformation, and task scheduling. The optical routing layer supports high-throughput data transmission, while the probe computation layer conducts large-scale parallel operations using probe computation cards.  Results and Discussions  EPC60 successfully solved 100 instances of the 3-coloring problem for graphs with 2,000 vertices, achieving a 100% success rate. In comparison, the mainstream solver Gurobi succeeded in only 6% of cases. Additionally, EPC60 rapidly solved two 3-coloring problems for graphs with 1,500 and 2,000 vertices, which Gurobi failed to resolve after 15 days of continuous computation on a high-performance workstation.Using an open-source dataset, we identified 1,000 3-colorable graphs with 1,000 vertices and 100 3-colorable graphs with 2,000 vertices. These correspond to theoretical complexities of O(1.3289n) for both cases. The test results are summarized in Table 1.Currently, EPC60 can directly solve 3-coloring problems for graphs with up to n vertices, with theoretical complexity of at least O(1.3289n).On April 15, 2023, a scientific and technological achievement appraisal meeting organized by the Chinese Institute of Electronics was held at Beijing Technology and Business University. A panel of ten senior experts conducted a comprehensive technical evaluation and Q&A session. The committee reached the following unanimous conclusions:1. The probe computer represents an original breakthrough in computational models.2. The system architecture design demonstrates significant innovation.3. The technical complexity reaches internationally leading levels.4. It provides a novel approach to solving NP-complete problems.Experts at the appraisal meeting stated, “This is a major breakthrough in computational science achieved by our country, with not only theoretical value but also broad application prospects.” In cybersecurity, EPC60 has also demonstrated remarkable potential. Supported by the National Key R&D Program of China (2019YFA0706400), Professor Xu Jin’s team developed an automated binary vulnerability mining system based on a function call graph model. Evaluation of the system using the Modbus Slave software showed over 95% vulnerability coverage, far exceeding the 75 vulnerabilities detected by conventional depth-first search algorithms. The system also discovered a previously unknown flaw, the “Unauthorized Access Vulnerability in Changyuan Shenrui PRS-7910 Data Gateway” (CNVD-2020-31406), highlighting EPC60’s efficacy in cybersecurity applications.The high efficiency of EPC60 derives from its unique computational model and hardware architecture. Given that all NP-complete problems can be polynomially reduced to one another, EPC60 provides a general-purpose solution framework. It is therefore expected to be applicable in a wide range of domains, including supply chain management, financial services, telecommunications, energy, and manufacturing.  Conclusions   The successful development of EPC offers a novel approach to solving NP-complete problems. As technological capabilities continue to evolve, EPC is expected to demonstrate strong computational performance across a broader range of application domains. Its distinctive computational model and hardware architecture also provide important insights for the design of next-generation computing systems.
Personalized Federated Learning Method Based on Collation Game and Knowledge Distillation
SUN Yanhua, SHI Yahui, LI Meng, YANG Ruizhe, SI Pengbo
Available online  , doi: 10.11999/JEIT221203
Abstract:
To overcome the limitation of the Federated Learning (FL) when the data and model of each client are all heterogenous and improve the accuracy, a personalized Federated learning algorithm with Collation game and Knowledge distillation (pFedCK) is proposed. Firstly, each client uploads its soft-predict on public dataset and download the most correlative of the k soft-predict. Then, this method apply the shapley value from collation game to measure the multi-wise influences among clients and quantify their marginal contribution to others on personalized learning performance. Lastly, each client identify it’s optimal coalition and then distill the knowledge to local model and train on private dataset. The results show that compared with the state-of-the-art algorithm, this approach can achieve superior personalized accuracy and can improve by about 10%.
The Range-angle Estimation of Target Based on Time-invariant and Spot Beam Optimization
Wei CHU, Yunqing LIU, Wenyug LIU, Xiaolong LI
Available online  , doi: 10.11999/JEIT210265
Abstract:
The application of Frequency Diverse Array and Multiple Input Multiple Output (FDA-MIMO) radar to achieve range-angle estimation of target has attracted more and more attention. The FDA can simultaneously obtain the degree of freedom of transmitting beam pattern in angle and range. However, its performance is degraded due to the periodicity and time-varying of the beam pattern. Therefore, an improved Estimating Signal Parameter via Rotational Invariance Techniques (ESPRIT) algorithm to estimate the target’s parameters based on a new waveform synthesis model of the Time Modulation and Range Compensation FDA-MIMO (TMRC-FDA-MIMO) radar is proposed. Finally, the proposed method is compared with identical frequency increment FDA-MIMO radar system, logarithmically increased frequency offset FDA-MIMO radar system and MUltiple SIgnal Classification (MUSIC) algorithm through the Cramer Rao lower bound and root mean square error of range and angle estimation, and the excellent performance of the proposed method is verified.
Satellite Navigation
Research on GRI Combination Design of eLORAN System
LIU Shiyao, ZHANG Shougang, HUA Yu
Available online  , doi: 10.11999/JEIT201066
Abstract:
To solve the problem of Group Repetition Interval (GRI) selection in the construction of the enhanced LORAN (eLORAN) system supplementary transmission station, a screening algorithm based on cross interference rate is proposed mainly from the mathematical point of view. Firstly, this method considers the requirement of second information, and on this basis, conducts a first screening by comparing the mutual Cross Rate Interference (CRI) with the adjacent Loran-C stations in the neighboring countries. Secondly, a second screening is conducted through permutation and pairwise comparison. Finally, the optimal GRI combination scheme is given by considering the requirements of data rate and system specification. Then, in view of the high-precision timing requirements for the new eLORAN system, an optimized selection is made in multiple optimal combinations. The analysis results show that the average interference rate of the optimal combination scheme obtained by this algorithm is comparable to that between the current navigation chains and can take into account the timing requirements, which can provide referential suggestions and theoretical basis for the construction of high-precision ground-based timing system.