Latest Articles

Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes/issues, but are citable by Digital Object Identifier (DOI).
Display Method:
Design of a Timing-Controlled Non-Volatile Flip-Flop with Low-Switching-Ratio FeFET
DU Shimin, YANG Chang, WANG Lunyao, ZHANG Zhe
Available online  , doi: 10.11999/JEIT251059
Abstract:
  Objective  Nonvolatile processors(NVPs) have become a key technology for Internet-of-Things (IoT) and energy-harvesting systems, where maintaining computational states during unexpected power interruptions is essential. Conventional volatile processors rely on external nonvolatile memory(NVM) for state retention; however, this approach incurs significant latency and energy overheads. Integrated nonvolatile flip-flops using ferroelectric field-effect transistors(FeFETs) offer a promising alternative by enabling on-chip state backup and recovery. Nevertheless, existing single-ended FeFET-based flip-flops are prone to contention-induced failures during power-up recovery, especially when the FeFET on/off ratio degrades. This issue originates from competing discharge paths that lead to uncertainty in internal node voltage settling, thereby resulting in unreliable state restoration. To address this challenge, this work proposes a novel flip-flop architecture that replaces contention-based recovery with a timing-controlled two-phase mechanism. The primary objectives of this design are to achieve high-reliability recovery even under degraded FeFET on/off ratios as low as 102, optimize timing parameters such as Hold-Time and Clock-to-Q delay, and maintain low energy consumption suitable for IoT applications.  Methods  The proposed design is an extension of the Static Contention-Free Single-Phase-Clocked Flip-Flop(SSCFF), which inherently eliminates internal node contention through its fully static structure. Based on this foundation, one FeFET device and five additional MOSFETs are integrated to construct a single-ended nonvolatile flip-flop(NVFF). Two control signals, RES and MOD, are introduced to manage the recovery process.In the normal operation mode where MOD=0, the circuit functions as a conventional SSCFF and supports state backup during runtime. In the recovery mode where MOD=1, the recovery operation is divided into two distinct phases.In the pre-charge phase, when RES=0, the internal nodes are pre-charged to VDD. In the selective discharge phase, as RES transitions from low to high, the resistance state of the FeFET determines whether discharge occurs. If the FeFET is in the low-resistance state(LRS), a discharge path is formed, pulling the node voltage down to ground. If the FeFET remains in the high-resistance state(HRS), the node retains its charge until the next clock edge.This sequence of pre-charging followed by selective discharge eliminates contention during recovery and ensures that the internal node voltages settle deterministically and reliably.The design is implemented in a 130nm CMOS process with integrated FeFET models. Simulations, including Monte Carlo analysis, were performed in Cadence Virtuoso across a supply voltage range of 0.6–0.9 V and FeFET on/off ratios ranging from 102 to 104. Key performance metrics, such as Setup-Time, Hold-Time, Clock-to-Q delay, restore energy, and recovery success rate, were evaluated and compared against traditional Transmission Gate Flip-Flop.  Results and Discussions  Simulation results show that the timing-controlled recovery improves reliability even under severe FeFET degradation. The proposed flip-flop achieves 100 % restore yield when the FeFET on/off ratio drops to 102. This is because the proposed structure eliminates the competing discharge paths. Timing metrics are also improved: the 3σ worst-case Hold-Time is reduced by 64.6 % , and the Clock-to-Q delay is shortened by 33.9 %. Although Setup-Time increases slightly, it can be compensated by device sizing. Restore energy remains in the low-fJ (10–15 J) range across all supply voltages, rising only modestly compared with the TGFF because of the added pre-charge phase.  Conclusions  A Ferroelectric FET Nonvolatile Flip-Flop with timing-controlled two-phase recovery has been presented, addressing the contention-induced failure modes that limit low-voltage NVFF reliability. By integrating a single FeFET with an enhanced SSCFF structure and using RES signal to manage the pre-charge and discharge steps, high restore yield is maintained even under severely degraded FeFET on/off ratios, while Hold-Time and Clock-to-Q delay are significantly improved relative to traditional transmission-gate NVFFs. The proposed architecture offers a compelling solution for energy-constrained IoT processors requiring fast, reliable state preservation under unpredictable power conditions.
A Noise Reduction Strategy via Coprime-Spacing Subarrays for Biodiversity Acoustic Indices
CHEN Lei, XU Zhiyong, ZHAO Zhao
Available online  , doi: 10.11999/JEIT260237
Abstract:
  Objective  As a popular tool for rapid biodiversity assessment, acoustic indices have attracted increasing attention in the field of soundscape ecology in recent years. Nevertheless, most commonly used acoustic indices are susceptible to background noise. Traditional single-channel noise reduction strategies, including spectral subtraction, high-pass filtering, and threshold detection, have been widely adopted as preprocessing approaches to optimize the calculation of acoustic indices. However, when dealing with anthropogenic interference that overlaps with biotic signals in both time and frequency domains, the denoising capability of single-channel methods degrades severely. Although spatio-temporal adaptive whitening filtering based on microphone arrays provides a feasible approach for suppressing directional interference, it suffers from a non-uniform two-dimensional spatio-temporal amplitude response and the self-cancellation of target signal in the unconstrained interference cancellation. These disadvantages lead to distortion in the time-frequency distribution of target signals, causing acoustic index calculations to deviate from the ground truth. Therefore, this study aims to propose a noise reduction strategy via coprime-spacing subarrays for biodiversity acoustic indices. This method effectively suppresses directional interference while maximally preserving the time-frequency distribution structure of biotic signals.  Methods  The noise reduction strategy based on microphone array spatio-temporal adaptive whitening filtering is proposed, incorporating the Frequency-dependent Acoustic Diversity Index (FADI), which is insensitive to fluctuations in the array's two-dimensional spatio-temporal amplitude response. A noise-robust acoustic index method, termed Adaptive Interference Cancellation–Frequency-dependent Acoustic Diversity Index (AIC-FADI), is subsequently developed. Specifically, a non-uniform linear array is first constructed using three microphones to form two dual-element subarrays with coprime spacing. This design fully exploits the high spatial resolution of wide-spacing arrays to narrow the null width in the direction of interference. Meanwhile, it avoids the physical implementation difficulties and mutual coupling effects associated with small-spacing array designs caused by the ultra-wideband characteristics of target signals. The spatio-temporal adaptive whitening filtering is then performed on each coprime-spacing subarray separately, adaptively forming two-dimensional nulls within the interference support region, thereby suppressing directional anthropogenic interference in analytical data before index calculation. Next, a frequency-dependent threshold scheme is utilized to obtain the binary spectrogram for each coprime-spacing subarray output, abating the influence from gain differences along the frequency axis for a certain direction. Afterwards, by leveraging the high spatial resolution of wide-spacing arrays and the interleaved characteristics of spatial aliasing null positions between the spatio-temporal frequency responses of the two subarrays with coprime spacing, a pointwise maximum fusion is applied to the above two binary spectrograms. This process reconstructs the binary time-frequency distribution structure of target signals outside the interference support region, leading to a single binary spectrogram where biological sound components are preserved to a great extent and anthropogenic interference is considerably suppressed. Ultimately, from this single binary spectrogram, the proportions of non-zero time-frequency bins within each frequency band are calculated and forwarded to the entropy function, resulting in the final AIC-FADI result.  Results and Discussions  The simulation result indicates that the proposed AIC-FADI maintains numerical robustness across an SINR range down to –15 dB (the yellow line in Fig. 5), substantially outperforming the classical ADI version based on single-channel noise reduction algorithm (FADI) and other ADI versions based on single-array interference suppression processing mentioned in this paper (AIC-FADI-s, AIC-FADI1, and AIC-FADI2). The real-world experiment confirms that the proposed spatio-temporal adaptive whitening filtering effectively suppresses wideband interference signals in complex scenarios, thereby improving the SINR of the analyzed recording. This enables some weaker biotic signals to exceed their corresponding frequency-dependent adaptive thresholds, greatly reducing missed detection of the target signal. In addition, by performing pointwise maximum fusion of the binary spectrograms from the two coprime-spacing subarray outputs, AIC-FADI further alleviates the extent of target signal missed detection (Fig. 8). Nevertheless, the real-world experiments also reveal that the interference suppression performance of AIC-FADI degrades for highly time-varying interference components.  Conclusions  This paper addresses the challenge of calculating acoustic indices reliably in complex soundscapes where directional anthropogenic interference overlaps with biotic signals in both time and frequency domains. A noise reduction strategy using coprime-spacing subarrays is proposed, and a new noise-robust acoustic index (AIC-FADI) is then developed. The method is evaluated through simulations and real-world recordings, and the results show that: (1) By applying spatio-temporal adaptive whitening filtering on each coprime-spacing subarray followed by pointwise maximum fusion, the proposed method achieves both wideband interference suppression capability and target information fidelity in complex soundscapes containing strong interference. (2) As a result, the proposed AIC-FADI maintains numerical robustness down to –15 dB SINR, substantially outperforming the classical FADI algorithm and other ADI versions based on single-array interference suppression methods. (3) The proposed method provides a feasible technical solution for extending the practical application scenarios and spatio-temporal coverage of biodiversity acoustic indices in human-dominated areas. However, this study only considers directional interference that is relatively stable or slowly time-varying. Hence, the interference suppression performance degrades for highly time-varying or uncorrelated noise components. These challenges should be addressed in future work through more advanced signal processing techniques to further improve the robustness of acoustic indices in highly complex acoustic environments.
A Survey of Quantum Covert Communication Integration Schemes and Application Scenarios
SUN Yiheng, XU Yongjun, ZHANG Haibo, HUANG Zishan
Available online  , doi: 10.11999/JEIT260282
Abstract:
  Significance   With the growing demand for network communication security, research and development in covert communication and quantum communication have continued to evolve. However, current covert communication suffers from inherent security vulnerabilities; the transmission reliability of quantum communication has been limited by information eavesdropping and harmful interference. Therefore, quantum covert communication has become a research hotspot, integrating the advantages of both covert and quantum communication while addressing their respective security limitations. To this end, this paper provides a comprehensive survey of quantum covert communication integration schemes and application scenarios, including the principles of covert communication and typical enabling techniques; protocols for quantum communication and important quantum techniques; and three types of quantum covert communication integration schemes summarized by different application scenarios. This paper contributes to the design of advanced secure communication networks while offering guidance for the development of future quantum covert communication systems.  Progress   This paper presents a comprehensive survey of recent advances in quantum covert communication integration schemes and application scenarios, with an in-depth discussion of the principles of covert communication and key enabling techniques, such as Fluid Antenna (FA), Reconfigurable Intelligent Surface (RIS), and Unmanned Aerial Vehicle (UAV). FA actively reshapes wireless channel characteristics, particularly the spatial correlation of multipath components, by dynamically adjusting the transmitter physical configuration, thereby reducing information leakage. In Non-Line-of-Sight (NLoS) scenarios, RIS can dynamically alter the direction of reflected transmission of the incident signal, not only enhancing the Channel State Information (CSI) quality of the covert signal but also reducing signal leakage. In flexible or temporary communication networks, UAVs can increase CSI uncertainty, preventing unauthorized users from establishing a stable monitoring model and thereby complicating eavesdropping. Then, key protocols and significant techniques of quantum communication are introduced, including BB84, B92, and E91 for Quantum Key Distribution (QKD), and BF02, Two-Step for Quantum Secure Direct Communication (QSDC). Additionally, the quantum repeaters and Quantum Random Number Generator (QRNG) are reviewed. Based on different application scenarios, quantum covert communication integration schemes can be categorized into enabling, covert, and symbiotic integration schemes, depending on the integration mechanisms. To be specific, the enabling integration scheme leverages the unconditional security of quantum communication to address the security vulnerabilities in covert communication, the covert integration scheme utilizes enabling techniques in covert communication to reduce the detection probability of quantum communication, and the symbiotic integration scheme combines both advantages of covert communication and quantum communication to achieve mutual empowerment and deep symbiosis. Finally, critical challenges are highlighted, including stringent hardware precision requirements, low resource allocation efficiency, and obstacles in large-scale applications. Promising directions for future research are also identified, including R&D on precision communication equipment, dynamic resource management, cost control during deployment, and the promotion of standardized development.  Prospects   Despite remarkable progress in preliminary applications and specific scenarios, research on quantum covert communication remains in its infancy. As quantum covert communication scenarios become increasingly diverse and complex, future studies should prioritize challenges that restrict further development and large-scale application of quantum covert communication. The stringent hardware precision requirements are the primary challenge, limiting reliable transmission distance and stability. Low resource allocation efficiency is another challenge, as the quantum covert communication system that generates quantum entanglement over lossy channels remains subject to the Square Root Law (SRL) constraints, while signal transmission exhibits burstiness and dynamics. Additionally, high deployment costs and the lack of standardization present significant hurdles. To address the challenges mentioned, future directions should include R&D on precision communication equipment, dynamic resource management, cost control during deployment, and the promotion of standardized development to facilitate the development of high-performance, large-scale, and multi-scenario quantum covert communication.  Conclusions  This paper provides a comprehensive survey of quantum covert communication with particular emphasis on integration schemes and application scenarios. The fundamentals and typical enabling techniques of covert communication are first reviewed, highlighting its Low Probability of Detection (LPD) secure paradigm and unique channel characteristics. The typical protocols and important techniques of quantum communication are then examined, including QKD, QSDC, quantum repeaters, and QRNG. Three types of quantum covert communication integration schemes have been further classified by different integration mechanisms and corresponding application scenarios. Finally, several existing challenges are identified, including stringent hardware precision requirements, low resource allocation efficiency, and obstacles to large-scale applications. Relevant research directions are also outlined, including R&D on precision communication equipment, dynamic resource management, cost control during deployment, and the promotion of standardized development. These directions are expected to serve as a valuable reference for advancing and standardizing quantum covert communication in future secure networks.
A Frequency Domain Self-Attention Guided Multi-Scale Inverse Lithography Technology
LUO Binling, WANG Ying, CAI Shuting
Available online  , doi: 10.11999/JEIT251382
Abstract:
  Objective  Optical Proximity Effects (OPE) in lithographic processes cause printed patterns on wafers to deviate from target layouts, necessitating Optical Proximity Correction (OPC) through mask optimization prior to exposure. Traditional rule-based OPC methods suffer from significant accuracy degradation when handling complex layouts, while model-based OPC approaches incur high computational cost. In recent years, deep learning--based methods have been introduced to accelerate mask generation; however, their limited receptive fields hinder effective modeling of long-range optical interference effects, thereby constraining optimization accuracy. To address these challenges, this work proposes a Frequency Domain Self-Attention Guided Multi-Scale Inverse Lithography Technology (FMS-ILT), which jointly models local geometric details and global optical interactions, leading to improved printed image fidelity, edge placement accuracy, and process robustness.  Methods  FMS-ILT adopts a residual convolution--based multi-scale encoder--decoder architecture, where shallow layers extract fine-grained geometric features such as edges and corners, while deeper layers capture large-scale layout context. Residual blocks and multi-level skip connections are employed to preserve high-frequency information and stabilize training. To overcome the limited receptive field of spatial convolutions, a Frequency Domain Self-Attention Mechanism (FSAM) is introduced at the encoder output. Global feature interactions are enabled via the Fourier transform, and the resulting attention responses are mapped back to the spatial domain through the inverse Fourier transform to adaptively reweight feature representations. A two-stage training strategy is adopted. During pretraining, a dual-branch structure is used to jointly learn mask geometry and imaging consistency, providing physically meaningful initialization. During main training, lithography simulation is applied under nominal, maximum, and minimum process conditions to further refine mask optimization under physical constraints.  Results and Discussions  The comparison results with baseline models are summarized in Tables 2 and 3. Our method is set as the reference (Ratio = 1), and all experiments are conducted on the LithoBench dataset. In terms of overall imaging \begin{document}$ \mathcal{L}2 $\end{document} error, our method achieves the lowest value of 19,998, outperforming baseline models by 2%–107%. For the process robustness metric Process Variation Band (PVB), GAN-OPC obtains the best result of 19,156, which is 31% lower than ours; however, its \begin{document}$ \mathcal{L}2 $\end{document} error and EPE are 107% and 1115% higher, respectively, indicating an imbalance between imaging fidelity and edge accuracy. The remaining baseline models exhibit PVB performance comparable to ours. Regarding Edge Placement Error (EPE), our method also demonstrates a significant advantage, achieving an average EPE of 1.95, which is 47%–1115% lower than the baselines. These improvements can be attributed to three key factors: (1) a multi-scale encoder–decoder fusion mechanism that effectively integrates local and global features, (2) the combination of attention mechanisms and frequency-domain operations to guide the model toward critical regions, and (3) a dual-branch pretraining strategy that injects physical priors into the network. With these modules jointly contributing, FMS-ILT achieves more balanced and superior performance in imaging fidelity, process stability, and edge accuracy.  Conclusions  This work proposes a Frequency Domain Self-Attention Guided Multi-Scale Inverse Lithography Technology (FMS-ILT). The model adopts a residual convolution--based multi-scale encoder--decoder architecture to extract rich spatial features and incorporates a frequency-domain self-attention mechanism to jointly model local geometric details and global optical interference characteristics. A two-stage training strategy is employed. In the pretraining stage, a dual-branch task of mask generation and target image reconstruction is used to enhance the physical consistency between the mask and the printed image. In the main training stage, lithography simulation is introduced to further improve imaging accuracy and process robustness. Experimental results on the public LithoBench dataset demonstrate that FMS-ILT achieves superior performance in terms of \begin{document}$ \mathcal{L}2 $\end{document}, PVB, and EPE metrics, effectively improving printed image quality and providing a feasible and efficient solution for computational lithography.
Full-Space Covert Integrated Sensing and Communications Assisted by Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface
XIE Wenwu, ZHANG Qinke, YANG Liang, WANG Ji, YU Chao, LIU Xinzhong, CUI Yaru
Available online  , doi: 10.11999/JEIT260145
Abstract:
  Objective  The evolution of Sixth Generation (6G) mobile communications toward higher frequencies and larger antenna arrays has made Integrated Sensing And Communication (ISAC) a key enabling technology. However, ISAC systems still face limited communication covertness and resource competition between sensing and communication. Covert communication and Reconfigurable Intelligent Surface (RIS) techniques provide promising solutions. However, most existing studies use reflective RISs with half-space coverage and assume far-field propagation. These assumptions limit deployment flexibility and fail to capture near-field spherical-wave characteristics. To address these issues, this paper proposes a near-field full-space ISAC framework assisted by an Extremely Large-Scale Simultaneously Transmitting And Reflecting Reconfigurable Intelligent Surface (XL-STAR-RIS). The objective is to jointly optimize active transmit beamforming and passive XL-STAR-RIS coefficient design to improve the covert communication rate while satisfying sensing performance and covertness requirements.  Methods  The detection capability of warden Willie is first analyzed, and a closed-form lower-bound expression for the minimum Detection Error Probability (DEP) is derived. A non-convex optimization problem is then formulated to maximize the covert communication rate under sensing Signal-to-Noise Ratio (SNR), covertness, and total transmit power constraints. Direct solution is difficult because the active transmit beamforming vectors and passive XL-STAR-RIS coefficients are strongly coupled. An Alternating Optimization (AO) framework is therefore adopted to decompose the original problem into two tractable subproblems. The active transmit beamforming subproblem is solved using SemiDefinite Relaxation (SDR) combined with a penalty-based successive convex approximation method. The passive XL-STAR-RIS coefficient design subproblem is solved using the Dinkelbach algorithm and a rank-one penalty method. The two subproblems are solved alternately until convergence.  Results and Discussions  Simulation results verify the effectiveness of the proposed framework. The algorithm converges within approximately 10 iterations and achieves a covert communication rate of about 11.5 bit/(s·Hz). This rate is higher than those of the passive-RIS scheme (9.8 bit/(s·Hz)) and the non-RIS scheme (8.0 bit/(s·Hz)). The performance gain becomes more evident as the transmit power increases, which indicates strong power adaptability. The proposed framework also maintains robust performance under strict operational constraints. When the sensing SNR threshold increases, it achieves a higher covert communication rate than the benchmark schemes. Under a stricter covertness requirement, it also preserves a higher communication rate. These results show that joint active transmit beamforming and passive XL-STAR-RIS coefficient design can effectively balance communication, sensing, and covertness in near-field ISAC systems.  Conclusions  This paper presents an XL-STAR-RIS-assisted covert communication framework for near-field ISAC systems. By jointly designing active transmit beamforming and passive XL-STAR-RIS coefficients through an efficient AO algorithm, the proposed framework balances communication rate, sensing performance, and communication covertness. Simulation results confirm its advantages over conventional passive-RIS and non-RIS schemes, especially under strict sensing and covertness constraints. The results also indicate the potential of XL-STAR-RIS for secure full-space 6G applications. Future work will consider imperfect Channel State Information (CSI), dynamic propagation environments, and multi-RIS collaboration to improve practical robustness.
Phase Shift-Based Covert Backdoor Attack Strategy in Deep Neural Networks
ZHANG Heng, XIA Yu, REN Yan, DU Linkang, ZHANG Zhikun
Available online  , doi: 10.11999/JEIT251145
Abstract:
  Objective  The proliferation of Deep Neural Networks (DNNs) in safety-critical domains such as autonomous driving and biomedical diagnostics has raised serious concerns about their vulnerability to adversarial threats, particularly backdoor attacks. In these attacks, hidden triggers are embedded during training, causing models to behave normally on clean inputs while producing malicious outputs when specific triggers are present. Existing backdoor methods mainly operate in either the spatial domain or the frequency domain, but they face a fundamental tradeoff between Attack Success Rate (ASR) and stealth. Spatial triggers often introduce visible artifacts, whereas frequency-domain amplitude perturbations disrupt spectral energy distributions and can therefore be detected by advanced defenses such as spectral anomaly detection. This study addresses the need for a backdoor paradigm that simultaneously achieves high attack performance, minimal perceptual distortion, and robustness against state-of-the-art defense methods. The objective is to develop a frequency-domain backdoor attack based on phase manipulation, which is better aligned with human visual perception and structural consistency, thereby overcoming the limitations of existing methods.  Methods  FDPS integrates frequency-domain phase manipulation, perceptual similarity screening, and standard data poisoning. The method first converts input images from RGB to Y'CbCr color space. This conversion isolates the chrominance channels while preserving the luminance component. Discrete Fourier Transform (DFT) is then applied to the chrominance components to obtain complex frequency spectra. Phase information is computed with the atan2 function, and selected high-frequency components are shifted to embed the trigger. Image reconstruction is performed through Inverse Discrete Fourier Transform (IDFT). The framework further incorporates Learned Perceptual Image Patch Similarity (LPIPS) filtering. This filter removes generated samples that do not satisfy the similarity threshold. The screening process ensures that all retained triggers remain visually imperceptible. The accepted poisoned samples are assigned the target class labels and then combined with the clean training data according to standard protocols.  Results and Discussions  FDPS achieves near-perfect ASR, reaching 99%, while maintaining Benign Accuracy (BA) across three datasets and two network architectures (Table 1). The method embeds triggers by manipulating phase information in the Cb and Cr chrominance channels through Fourier transforms, and LPIPS filtering helps preserve visual stealth. Experimental results show that poisoned images retain semantic focus, as confirmed by Grad-CAM visualizations that remain aligned with the clean-image patterns (Fig. 4). The method also shows strong resistance to defense mechanisms. Under Neural Cleanse, FDPS yields an anomaly index of 1.73, which is below the detection threshold of 2 (Figs. 3-5). Under STRIP, the entropy distribution of poisoned samples substantially overlaps with that of clean samples. Additional analysis shows that high-frequency phase perturbation achieves strong attack performance with limited poisoning. In particular, on the GTSRB dataset, FDPS achieves 99% ASR with only 2% poisoned training samples, while minimizing the effect on model utility (Fig. 6; Table 3).  Conclusions  An end-to-end frequency-domain strategy is proposed to embed covert triggers into image classification models while preserving fidelity on clean samples. By shifting selected high-frequency phase components in the chrominance channels and applying LPIPS-based filtering, FDPS achieves 99% ASR with negligible BA loss and minimal visible artifacts. It also evades representative detection methods, including Grad-CAM, Neural Cleanse, Adversarial Neuron Pruning (ANP), and STRIP. These findings indicate that high-frequency phase perturbation constitutes an effective and stealthy backdoor mechanism. Future work should extend this strategy to broader modalities and develop dedicated frequency-domain anomaly detectors as principled countermeasures.
Millimeter-Wave Air-to-Ground Channel Prediction Assisted by Visual Information of the Propagation Environment
CHENG Yuanxun, HU Qingsong, ZHANG Xiaomin, WANG Xuesong
Available online  , doi: 10.11999/JEIT260274
Abstract:
  Objective  Accurate prediction of air-to-ground (A2G) channel states is essential for adaptive transmission and resource optimization in unmanned aerial vehicle (UAV) communications. In urban millimeter-wave scenarios, however, A2G links are highly sensitive to blockage, reflection, scattering, and the rapidly changing geometric relationship among the transmitter, the receiver, and surrounding buildings. As a result, the channel exhibits strong spatial and temporal nonstationarity, and conventional pilot- or feedback-based acquisition methods may become ineffective because the obtained channel state information is easily outdated. Recent data-driven approaches have shown potential, but many of them rely heavily on historical channel observations or directly use raw images as network inputs, which may introduce redundant visual information and weaken physical interpretability. To address these limitations, this paper proposes a vision-assisted millimeter-wave A2G channel prediction method that extracts low-dimensional geometric features from the propagation environment instead of using raw visual data directly. The objective is to preserve the key structural information governing channel evolution while reducing irrelevant redundancy, thereby improving the prediction of channel.  Methods  A communication-and-sensing integrated dataset with strict spatial and temporal alignment is established for millimeter-wave UAV A2G channel prediction. On the sensing side, a high-fidelity three-dimensional urban scenario containing 23 buildings, roads, and intersections is constructed in Unreal Engine 4.27, where synchronized RGB and depth images are collected through AirSim using a multirotor UAV equipped with RGB and depth cameras. The UAV flies along 10 preset trajectories at a height of 55 m with a spatial sampling interval of 1 m, yielding 2160 valid visual samples (Fig. 1, Fig. 2). On the communication side, the same scene is reconstructed in Wireless InSite, and the transmitter-receiver positions are synchronously updated along the same trajectories to ensure frame-level alignment between visual and channel data (Fig. 3). To obtain compact and physically meaningful environmental representations, a cross-modal spatial feature extraction scheme is developed. Buildings are first detected from RGB images using YOLO-V8 (Fig. 4), and the detected regions are then registered with depth images to reconstruct three-dimensional point clouds. After Euclidean clustering and axis-aligned bounding-box fitting, key geometric attributes, including planar position, height, and volume, are extracted. These features are combined with the transmitter-receiver distance to form the spatial feature vector of each frame, and their relevance to path loss, received power, and RMS delay spread is evaluated through cosine-similarity-based correlation analysis (Fig. 6). Based on the extracted features, a hybrid Transformer-MLP network is designed for channel prediction (Fig. 5). Building features are first projected into a latent space, and a stacked Transformer encoder is employed to capture global interactions among buildings through masked multi-head self-attention. Masked average pooling is then used to aggregate building-level representations into a scene-level environmental descriptor, which is concatenated with the link distance feature and fed into a multilayer perceptron regressor to predict the three target channel parameters.  Results and Discussions  The results confirm the effectiveness of the proposed spatial feature representation. Correlation analysis shows that the extracted geometric features are consistently related to path loss, received power, and RMS delay spread under different aggregation strategies (Fig. 6), indicating that compact building descriptors can effectively characterize the propagation environment. Among them, building height exhibits the strongest correlation with all three channel parameters, highlighting its important role in blockage, attenuation, and multipath propagation in urban millimeter-wave A2G channels. In prediction experiments, the proposed method accurately tracks the variation trends of all three targets. It remains effective in deep-fading and sharp-fluctuation regions for path loss prediction (Fig. 7), achieves high consistency with the ground truth for RMS delay spread (Fig. 8), and follows rapid local fluctuations of received power with good fidelity (Fig. 9). In contrast, the benchmark model only captures the general trend and shows larger deviations in peaks, valleys, and abrupt-changing intervals. Residual analysis further demonstrates the superiority of the proposed method. Its errors are more concentrated around zero and fluctuate within narrower ranges than those of the benchmark model across all three tasks (Fig. 10). Quantitatively, both the mean absolute error and the root mean squared error are reduced (Fig. 11). In addition, the model maintains acceptable complexity, with about 5.5 M parameters and a single-frame inference delay of about 3.4 ms, indicating good potential for real-time deployment.  Conclusions  A vision-assisted millimeter-wave A2G channel prediction method for UAV communications is proposed. By constructing a strictly aligned communication-and-sensing dataset and extracting low-dimensional spatial features with clear physical meaning, the method establishes an effective mapping from environmental geometry to channel parameters. The proposed Transformer-MLP framework achieves accurate prediction of path loss, received power, and RMS delay spread, while offering better interpretability, robustness, and efficiency than the benchmark model.
Transfer Learning Aided CNN for Efficient Data Detection in ReRAM with Sneak-Path Interference
DAI Bin, WU Anni
Available online  , doi: 10.11999/JEIT260354
Abstract:
  Objective  Sneak path interference (SPI) in resistive random-access memory (ReRAM) introduces unpredictable inter-cell correlations, significantly increasing the complexity of signal detection. Traditional detection methods typically rely on assumptions about known channel noise states, resulting in limited generalization capability in practical applications. To address this issue, three data detection methods based on convolutional neural networks (CNNs) are proposed, which can effectively model and mitigate interference without relying on prior channel information: first, a method combining constrained coding with a multi-layer CNN, which uses constrained coding to determine the sneak path interference state and recover data; second, a dual-CNN framework that first employs a lightweight CNN for sneak path interference identification, followed by a multi-layer CNN for refined detection; third, an approach incorporating transfer learning, which maintains detection accuracy while reducing the required training sample size to one-thousandth of that of traditional methods. Simulation results demonstrate that the proposed method achieves superior bit error rate (BER) performance under unknown channel conditions, with a BER reduction of at least half relative to existing algorithms, approaching the theoretical performance limit. Moreover, the integration of transfer learning reduces the required training samples from \begin{document}$ {10}^{6} $\end{document} to \begin{document}$ 1000 $\end{document}, corresponding to a reduction of three orders of magnitude.  Methods  To address distinct challenges in sneak path interference detection, this paper proposes three methods sequentially:1. The integrated constrained coding aided convolutional neural network (CC-CNN) detection framework effectively addresses the complex inter-cell correlations introduced by sneak path interference. This approach first employs constrained coding to detect the presence of interference and subsequently utilizes a CNN to learn and capture the random correlations under the influence of interference, thereby achieving accurate signal recovery.2. The dual-CNN-based detection method resolves the code rate loss associated with traditional constrained coding. By directly leveraging a CNN to learn and identify sneak path interference patterns from raw data, this method eliminates the need for redundant coding or additional overhead. It ensures high-precision interference detection while preserving the overall code rate performance of the system.3. The transfer learning-based CNN (TL-CNN) detection method overcomes the dependence of high-performance CNNs on large-scale training datasets. By reusing knowledge from pre-trained models, this method enables rapid adaptation to ReRAM signal detection tasks. It significantly reduces the required number of training samples while maintaining high detection accuracy and resource efficiency, thereby enhancing the feasibility of the solution in practical scenarios.  Results and Discussions  Simulation results demonstrate that the performance of the three proposed methods consistently approaches the theoretical lower bound (Fig.6), outperforming baseline methods such as the Belief Propagation (BP) detector, Deep Neural Network (DNN) detector, and Elementary Signal Estimator (ESE) detector. The two-step network achieves performance comparable to that of the single-step network while successfully avoiding code rate loss. Notably, the transfer learning-aided CNN attains near-optimal BER with only 1000 target domain samples, and its performance stabilizes when the sample size exceeds 1000 (Fig.7), fully validating its data efficiency. The integration of SK modules enables the models to effectively capture SPI-induced spatial correlations, while the transfer learning strategy ensures the models’ robust performance under different noise conditions.  Conclusions  The crossbar array architecture of ReRAM is susceptible to sneak-path interference during storage operations, leading to reduced data reliability. To address this issue, this paper proposes three deep learning-based detection methods. Type-I integrates constrained coding with a CNN to achieve efficient and fast interference detection. Type-II adopts a two-stage processing approach: it first classifies interference patterns in the memory array and then performs detection specifically on affected units, thereby ensuring high detection accuracy while minimizing coding rate loss. Type-III introduces a transfer learning framework that leverages a pre-trained model from the source domain, significantly reducing the number of training samples required in the target domain and effectively lowering training overhead. Experimental results show that under different noise conditions, all three proposed methods achieve performance close to the theoretical lower bound, providing an effective solution for enhancing the reliability of ReRAM storage systems.
An Overview of Key Technologies on 6G-Enabled Communication and Computing Integration for Energy-Efficiency Optimization
LIU Guangyi, CAI Qing, WANG Xinyao, CHEN Tianjiao, JIN Jing, XUE Yahui, WANG Ailing, WANG Hanning
Available online  , doi: 10.11999/JEIT260399
Abstract:
  Significance   Constrained by physical conditions such as size, power consumption, and cost, high energy consumptions have become key bottleneck for the large-scale application of new intelligent terminals. In contrast to Fifth-Generation (5G) networks, Sixth-Generation (6G) will achieve profound architectural enhancement of the RAN, sink computing capabilities toward the RAN side, and enable the RAN to perform part of tasks originally executed by end devices. With the end-edge collaboration, new intelligent terminals are expected to realize lightweight, low-cost and long-endurance evolution, which is of great significance for supporting the large-scale deployment of ubiquitous intelligence in 6G networks.  Progress   Current advancements in terminal energy consumption optimization with 6G end-edge collaboration are discussed, focusing on three primary offloading modes: local execution, full offloading, and partial offloading. Local execution requires the terminal to process all tasks, leading to high computational energy consumption, while full offloading shifts all tasks to the RAN, reducing terminal energy use but increasing transmission energy costs, particularly in poor channel conditions. Partial offloading combines the advantages of both modes, optimizing energy consumption based on real-time network conditions. For partial offloading, existing research has introduced several optimization techniques to enhance energy efficiency. (1) Feature extraction and filtering: Through semantic encoding and information extraction approaches, feature extraction is performed at the UE to transmit only task-relevant data to the RAN. This reduces the amount of redundant or unnecessary data sent, minimizing transmission energy consumption (2) Model partitioning for offloading: This technique divides a large deep learning model into different layers based on its network structure, with simpler layers processed at the UE and more complex ones offloaded to RAN. By leveraging end-edge collaborative reasoning, this method optimizes energy consumption by balancing the computational load between the terminal and RAN. (3) Model lightweighting: By reducing model complexity through techniques like pruning, quantization, and knowledge distillation, this method lowers computational overhead while maintaining performance. (4) Incremental reasoning: This method focuses on the changes in data or features, performing localized reasoning only on updated portions and reusing historical computations, significantly reducing redundant calculations. The above optimization techniques collectively enhance the performance and energy efficiency of terminal devices within the 6G end-edge collaboration framework.  Conclusions  This paper provides a comprehensive discussion of terminal energy consumption optimization with 6G end-edge collaboration. It summarizes the functional evolution of enhanced RAN, constructs an end-edge collaborative service framework for communication-computation integration, and establishes a theoretical model including terminal computing energy consumption and transmission energy consumption. The composition and influencing factors of energy consumption under different offloading modes are clarified. Key technologies for energy optimization based on end-edge collaboration are further discussed, including feature extraction and filtering, model partitioning for offloading, model lightweighting, and incremental reasoning. Given the energy consumption fluctuations caused by the dynamic nature of wireless channels, this paper introduces energy optimization mechanisms such as semantic compression, dynamic partitioned offloading, adaptive model pruning, and incremental reasoning to strike a dynamic balance between optimizing energy consumption and maintaining task performance. Taking intelligent robot video understanding as a typical application scenario, a test platform is developed to validate the effectiveness of the proposed optimization mechanisms. This paper also analyzes the challenges currently faced in the research and discusses future research directions.  Prospects   Although the end-edge collaborative energy-saving technologies have achieved initial progress, they still face many challenges in practical deployment, especially under real network environments, dynamic wireless channels, and large-scale user access. Future research should focus on the trade-off between optimization overhead and system robustness, and further investigate dynamic communication–computation resource substitution modeling in stochastic resource environments, as well as multi-user collaborative strategies and global energy efficiency optimization. Meanwhile, as the technology matures, the standardization and engineering implementation of end-edge collaborative energy-saving frameworks will become crucial for the large-scale adoption of 6G applications. Future studies should therefore promote deeper integration between algorithm design and network architecture, enabling the practical deployment of low-power, high-efficiency intelligent communication systems.
Lightweight Semantic Communication System Driven by User Personalization in UAV Networks
WEI Yuxuan, CHEN Xiao, CHEN Qiuyu, JIANG Hao, YANG Zhaohui
Available online  , doi: 10.11999/JEIT260370
Abstract:
  Objective  With the rapid development of the low-altitude economy and 6G intelligent networks, Unmanned Aerial Vehicle (UAV) image communication shows great promise in scenarios such as target reconnaissance, emergency communications, and intelligent inspection. However, constrained by the limited bandwidth, payload capacity, and onboard computational resources of UAVs, conventional pixel-level transmission fails to meet the demands of efficient, low-latency, and intelligent communications. Semantic Communication (SC), which transmits only task-relevant information, offers an effective solution to enhance communication efficiency in such resource-constrained scenarios. However, existing research on UAV image SC faces several challenges. First, fixed network architectures apply unified semantic encoding and transmission strategies for all users, failing to accommodate diverse personalized requirements. Second, new user onboarding typically requires interest pre-training or model fine-tuning, leading to high deployment overhead. Third, the computational complexity of models is generally high. To address these issues, this paper proposes a lightweight personalized UAV SC system, LPUSC, aimed at achieving a balanced trade-off among computation, bandwidth, and personalized demands. The system enables personalized transmission via low-cost semantic index interaction and a lightweight semantic extraction module, without pre-training for new users. Additionally, a dual-branch end-to-end network is designed, where the semantic index transmission network collaborates with the semantic image transmission network trained with a weighted hybrid loss strategy to ensure high-precision and high-quality transmission of personalized semantic images.  Methods  The proposed LPUSC system adopts a dual-branch architecture to enable accurate task-driven semantic content transmission. First, in the semantic index interaction branch, the lightweight object detection model YOLO11s is employed to perform semantic perception on UAV-captured visual scenes, compressing complex image information into low-dimensional semantic index vectors to reduce transmission redundancy and communication overhead. On this basis, an end-to-end semantic index transmission network is designed to enhance the robustness of semantic index transmission under complex wireless channel conditions. Through the semantic index interaction mechanism, the system is capable of accurately identifying targets of user interest, providing prior guidance for subsequent semantic content extraction. Second, in the semantic image transmission branch, the lightweight yet high-precision MobileSAM model is adopted for semantic region extraction. This branch receives the interest target bounding boxes returned by the semantic index interaction branch as heuristic prompt inputs, enabling pixel-level accurate segmentation and extraction of specific semantic targets. Third, to further enhance the reconstruction quality of semantic images, a weighted hybrid loss function is designed. This loss function integrates Mean Squared Error (MSE), L1 norm, Structural Similarity Index Measure (SSIM), gradient, perceptual, and background suppression losses to jointly optimize the network across pixel-level accuracy, structural preservation, and fine detail restoration. Through the joint constraint of multiple loss terms, the proposed system effectively enhances the reconstruction capability of semantic regions, thereby achieving high-quality semantic image transmission.  Results and Discussions  Simulation results validate the proposed LPUSC system in terms of semantic extraction and end-to-end transmission. In terms of semantic extraction, three schemes are compared, including YOLO11s-seg, “YOLO11s + SAM”, and “YOLO11s + MobileSAM” (Fig. 4). The results show that the detection-segmentation decoupled architecture achieves superior semantic boundary localization accuracy. Combined with the quantitative analysis in Table 1, the “YOLO11s + MobileSAM” scheme significantly reduces resource consumption while maintaining high extraction accuracy, confirming its suitability for resource-constrained UAV platforms. In terms of end-to-end transmission, the semantic index vector transmission results (Fig. 5) show that the Bit Error Rate (BER) decreases monotonically with increasing Signal-to-Noise Ratio (SNR) across all three channel environments, with rural environments achieving the best performance, followed by suburban and urban environments. The performance differences are primarily attributed to variations in scatterer density and link blockages across environments. The proposed transmission network maintains stable BER under different Doppler frequencies, demonstrating its robustness in dynamic channel conditions. In terms of semantic image transmission, the proposed weighted hybrid loss function demonstrates good training stability (Fig. 6), and LPUSC consistently outperforms the DeepJSCC and “JPEG + LDPC” baselines across the full SNR range (Fig. 7). Specifically, LPUSC achieves SSIM and Peak Signal-to-Noise Ratio (PSNR) gains of 1.3% and 4.8% over DeepJSCC, and 43% and 79.5% over JPEG, respectively. The results indicate that the proposed personalized semantic image transmission network achieves high-quality reconstruction with robustness to channel variations.  Conclusions  To improve the efficiency and flexibility of UAV image communication, this paper proposes a lightweight personalized SC system called LPUSC. The system employs a dual-branch transmission architecture that integrates a lightweight, high-precision object detection model and a semantic segmentation model, enabling personalized content transmission without interest pre-training. This design meets personalized user requirements while maintaining low computational and communication overhead. Simulation results demonstrate that the LPUSC system achieves stable and reliable semantic index interaction, and significantly outperforms DeepJSCC and JPEG baselines in semantic region reconstruction. The proposed system offers a valuable reference for efficient UAV image SC in 6G low-altitude intelligent networking.
Load Optimization of Inverter Air Conditioning Cluster Driven by Constraint Surface Projection and Spatial-Fitness Synergy
ZHENG Bowen, PAN Mingming, WANG Lei, LIU Chang, ZHENG Qingrong, TANG Zhuofan, ZHAO Jianli
Available online  , doi: 10.11999/JEIT260149
Abstract:
  Objective  Supply-demand imbalances in modern power distribution networks are exacerbated by the increasing penetration of distributed renewable energy and frequent extreme weather events. Consequently, large-scale inverter air conditioning (IAC) clusters are utilized for Demand Response (DR) as a viable strategy to enhance grid flexibility. However, existing dispatch strategies are often limited by the curse of dimensionality, and aggregate power equality constraints are not strictly met without compromising user comfort. In this study, an optimization framework is developed to achieve precise grid power control while thermal discomfort is minimized and fairness among heterogeneous users is maintained.  Methods  A multi-objective optimization framework based on an Equivalent Thermal Parameter (ETP) model is established to evaluate the thermodynamic states of heterogeneous buildings. To balance collective comfort and individual fairness, a composite fitness function is designed, in which a weighted mean square error term, a temperature variance penalty, and a violation suppression term are integrated. To address the steady-state errors inherent in traditional penalty-based methods, a Spatial-Fitness Adaptive Particle Swarm Optimization (SFA-PSO) algorithm is proposed. Particles are mapped strictly onto the power conservation hyperplane by a geometric constraint surface projection mechanism to ensure power balance. Furthermore, learning factors are dynamically adjusted by a spatial-fitness synergistic strategy based on the cognitive dissonance between a particle's fitness rank and spatial distance rank, whereby premature convergence in high-dimensional spaces is prevented.  Results and Discussions  Extensive continuous scheduling simulations were conducted under a complex dynamic environment, which comprehensively incorporated multi-source thermal disturbances, a 1% bidirectional communication packet loss rate, and varying part load ratios of 20%, 50%, and 80%.First, regarding the effectiveness of the proposed mechanisms, ablation experiments confirmed that the constraint surface projection guarantees power tracking accuracy. While traditional penalty-based methods (e.g., Penalty-PSO) exhibited steady-state power deviations of approximately 10-1 kW, SFA-PSO successfully restricted the aggregate power tracking errors within 10-9 kW (Fig. 3). Furthermore, the introduction of the Spatial-Fitness Adaptive (SFA) strategy effectively prevented the premature convergence observed in Phy-PSO, enabling continuous fitness descent particularly in low-load scenarios with narrow feasible regions (Fig. 4). This is directly attributed to the dynamic evolution of the learning factors, where the cognitive factor remains high initially to encourage global exploration, and subsequently decreases while the social factor rises to enhance precise local exploitation (Fig. 5).Second, in terms of continuous dynamic scheduling performance, a 6-hour simulation during the peak load period (12:00 to 18:00) with 5-minute dispatch intervals, totaling 72 decision steps, was executed. Under extreme power limitations, standard algorithms like GA and WOA suffered from severe power limit violations due to poor synergy with the projection mechanism, whereas SFA-PSO maintained perfect constraint satisfaction (Fig. 7). SFA-PSO consistently positioned itself at the lowest fitness level throughout the real-time evolution curves, demonstrating superior robustness against environmental thermal noise and network transmission delays (Fig. 8). Quantitatively, compared to eight baseline algorithms including SLPSO, CSO, and DSCPSO, the proposed SFA-PSO achieved the most outstanding comprehensive performance with an average fitness of 904, a minimum fitness of 243, and the lowest standard deviation of 551 (Table 2).Finally, comprehensive scalability analyses across diverse cluster sizes ranging from 100 to 1,000 nodes further validated the algorithm's high-dimensional solving capability. Across all scale scenarios, SFA-PSO exhibited the strongest optimization capacity, characterized by a rapid initial descent within the first 20 iterations and sustained exploration in later stages (Fig. 9). Although the integration of the projection and SFA mechanisms increased the computational time by 30% to 50% compared to the basic PSO algorithm (Fig. 6) , the absolute optimization solving time remained highly stable at approximately 1.5 seconds even for a massive 1,000-node cluster (Fig. 9). This minor computational overhead is entirely negligible for minute-level control cycles, fully satisfying the stringent real-time dispatch requirements of modern smart grids.  Conclusions  The steady-state error limitations of traditional soft-constraint methods in aggregate power control are effectively addressed by the proposed SFA-PSO algorithm. By ensuring precise tracking of dispatch commands and mitigating high-dimensional traps, a robust and scalable solution is provided for the flexible scheduling of large-scale IAC loads in smart grids, and a practical balance between grid-side regulation and user-side comfort is maintained. Objectively, cross-algorithm generalization is restricted by the inherent algorithm dependency of the constraint projection mechanism, and additional computational overhead is introduced to guarantee high-precision tracking. Consequently, adaptive constraint processing and algorithm lightweighting technologies are primary focuses for future research.
Resource Allocation in Dual-RIS Cooperative Rate-Splitting Multiple Access Networks
CHEN Yuang, WU Chang, PENG Mingyu, LU Hancheng
Available online  , doi: 10.11999/JEIT260171
Abstract:
  Objective  In RSMA systems, the achievable common-stream rate is fundamentally constrained by the user with the weakest channel quality, which limits scalability, robustness, and user fairness in dense 6G networks. Existing cooperative RSMA architectures only partially alleviate this bottleneck and still suffer from rigid channel dependencies and limited interference management capability. To address these issues, this paper proposes a dual-RIS cooperative RSMA architecture, where two collaboratively deployed RISs jointly create additional controllable propagation paths through cooperative double reflection. The objective is to maximize the system sum rate through the joint optimization of BS beamforming, RS strategies, and dual-RIS phase configurations, thereby improving spectral efficiency, robustness, and user fairness under users’ QoS constraints.  Methods  A tractable system model is developed for the dual-RIS cooperative RSMA architecture, accurately capturing cascaded multi-link channels and interference coupling. Based on this model, a joint optimization problem is formulated to maximize the system sum rate by optimizing BS beamforming, RS strategies, and discrete phase shifts of both RISs. Due to strong variable coupling and non-convexity, a low-complexity and efficient AO algorithm is designed, which decomposes the original problem into manageable subproblems and solves them iteratively with fast convergence.  Results and Discussions  Extensive simulation results demonstrate the effectiveness of the proposed dual-RIS cooperative RSMA system. The proposed AO algorithm converges rapidly within 6–7 iterations and achieves over 97% of the steady-state sum rate within three iterations for large-scale RIS deployments (Fig. 3). Compared to classic phase configuration scheme, the proposed phase configuration yields up to at least 10.6% sum-rate gains (Fig. 4). Moreover, the proposed RSMA system outperforms NOMA and SDMA by 10.0% and 14.6%, respectively (Fig. 5). Dual-RIS cooperation provides 11.9% gain over single-RIS, with performance approaching the continuous-phase upper bound (Fig. 6). Balanced RIS element allocation maximizes performance (Fig. 7). In contrast, the proposed beamforming significantly surpasses traditional methods, delivering up to at least 33.2% gains at 30 dBm transmit power (Fig. 8). These results highlight the superiority of the proposed dual-RIS cooperative RSMA system in enhancing common-stream decoding and interference suppression, leading to improved robustness and fairness.  Conclusions  This paper investigates a dual-RIS cooperative RSMA system that effectively improves public-stream decoding performance while mitigating complex interference. To maximize the system’s sum rate, this paper jointly optimizes BS beamforming, RS decisions, and discrete phase shifts of both RISs. A low-complexity AO algorithm is developed to address the strongly coupled non-convex problem. Extensive results demonstrate that the proposed dual-RIS cooperative RSMA scheme achieves significant sum-rate gains over state-of-the-art schemes while exhibiting superior robustness and user fairness.
A Social-Aware Ant Colony Optimization with Reproductive Division of Labor for MCS Task Allocation
SHEN Xiaoning, SHE Juan, WANG Zhilong, LI Jiayuan
Available online  , doi: 10.11999/JEIT260018
Abstract:
  Objective  With the rise of handheld/wearable smart devices, Mobile Crowd Sensing (MCS) has become an efficient data collection paradigm. Effective task allocation improves system efficiency, requester/participant satisfaction, and platform sustainability. Existing models overlook task skill requirements, fail to leverage participants' social networks for emergencies, and ignore collaboration efficiency in team tasks. To address this, we propose a Social-Aware MCS Task Allocation Model (SAMCSTA) with dual objectives: maximizing platform total revenue and overall task perceived quality. The model incorporates social networks to build a two-tier collaboration framework, expanding resources and enhancing flexibility. For complex tasks, it quantifies individual capabilities and introduces a collaboration efficiency mechanism to optimize team composition.  Methods  This paper proposes a Multi-objective Ant Colony Optimization Based on Reproductive Division of Labor (MACORDL). The core innovations of the algorithm include: (1) Constructing four subpopulations—queen ants, male ants, scout ants, and worker ants—each equipped with distinct strategies such as local enhancement, memetic crossover, and knowledge transfer, forming a hierarchical collaborative search framework; (2) A mating selection strategy based on statistical learning is designed to enable the intelligent transfer of elite genes; (3) The short-term contribution of each subpopulation is predicted based on its historical performance, allowing for dynamic and adaptive allocation of computational resources; (4) Designing a cooperative update mechanism for node pheromones and participant pheromones, establishing a dual-layer search guidance system.  Results and Discussions  The evaluation uses 8 synthetic and 4 real-world instances, with performance measured by Hypervolume Ratio (HVR) and Inverted Generational Distance (IGD). The Wilcoxon rank-sum test (significance 0.05) is employed for statistical comparison. Results show that MACORDL achieves the best HVR and IGD on most instances (Table 2, Table 3). On average, it outperforms the second-best algorithm by 16.41% in HVR and 18.04% in IGD. Visual comparisons confirm that the Pareto front by MACORDL is superior in convergence, distribution uniformity, and breadth (Fig 4). Though slight improvement remains in fine-grained search on a few large-scale cases, MACORDL demonstrates stable performance and scalability across different scales, enabling the platform to obtain task allocation solutions with higher revenue and better perceived quality.  Conclusions  This paper addresses the task allocation problem in MCS systems, taking into account both interactions among platform participants and those between participants and their social connections. A social-aware MCS task allocation model is established. The MACORDL algorithm is proposed to solve it. Comparative experiments on 12 real and synthetic instances of varying scales show that MACORDL significantly outperforms six representative algorithms on most instances, obtaining allocation schemes and paths that yield higher revenue and better perceived quality, demonstrating good scalability. MACORDL incorporates multiple strategies to balance local exploitation and global exploration. However, limitations include assumption that all tasks are released at the start with full information, and the lack of participant privacy protection. Future work will focus on dynamic/uncertain MCS task allocation models and privacy-preserving distributed optimization.
KE-HNS: Knowledge-Enhanced Personalized Recommender Model with Hierarchical Noise Suppression
XIE Jun, WANG Dantong, ZHANG Bo, CHEN Guijun, LV Jiaqi, LUO Xiongyan
Available online  , doi: 10.11999/JEIT260051
Abstract:
  Objective  In the big data and AI era, explosive information growth underpins the digital economy, yet filtering value from redundancy remains a key bottleneck. Personalized recommender systems are vital for precise matching and resource optimization. Integrating knowledge graphs enriches user–item representations, but current KG-based models suffer from weak noise suppression, coarse interest capture, and imbalanced information use, impairing performance. This paper proposes KE-HNS, a knowledge-enhanced recommender with a hierarchical multi-layer denoising strategy that fuses graph neural networks and contrastive learning. It systematically tackles noise, fine-grained preferences, and multi-source balance, markedly improving recommendation effectiveness.  Methods  KE-HNS introduces a multi-layer denoising paradigm. At input, an input denoising layer reduces noise via two sub-modules: user-item interaction denoising, which uses a learnable binary mask to drop noisy edges; and KG denoising enhancement, which scores triples by importance, identifies low-score ones, and masks them. The internal denoising layer preserves spatial independence by partitioning the entity-attribute space per relation, limiting high-order noise propagation. A compression denoising layer applies contrastive learning to further suppress noise and reinforce robust signals. To capture fine-grained interests, GCNs enhance user representations from interacted items and linked entities, while weight layers refine item representations using entity attributes and relations. For balanced information use, contrastive learning aligns user–item and item–entity views via positive/negative sampling, adaptively adjusting source weights. Matching is performed via inner product, producing a TOP-K recommendation list.  Results and Discussions  KE-HNS was assessed on three public datasets—Book-Crossing, MovieLens-1M, and Last.FM—via performance comparison, ablation studies, denoising evaluation, case analysis, and complexity assessment. For CTR prediction, it outperforms top baselines by 0.94%–1.01% in AUC and 0.43%–0.90% in F1 (Table 3). In Top-K recommendation, its Recall@K exceeds most state-of-the-art methods across nearly all K values, trailing CG-KGR only slightly on Last.FM (Fig. 7). Ablation results confirm that every sub-module contributes significantly to performance gains (Table 4). Denoising tests show the model filters noise effectively while preserving high prediction accuracy under noisy settings (Fig. 8). Complexity analysis indicates practical deployability in real-world scenarios (Table 5).  Conclusions  This paper presents KE-HNS, a personalized recommendation model that combines knowledge enhancement with a multi-layer suppression mechanism. While it delivers strong performance across multiple domains, it has notable limitations: the view contrast operation hampers computational efficiency; it relies heavily on the completeness of knowledge graph coverage; and current evaluations lack testing in emerging multimedia contexts. Experiments on benchmark datasets show that KE-HNS effectively aligns collaborative filtering signals with knowledge-aware semantics while mitigating noise, pointing to promising avenues for future work on computational optimization and dynamic knowledge integration.
GNN-driven Beamforming and Resource Allocation for RIS-assisted MISO-OFDMA Multi-group Multicast System
MA Yu, DING Chunxia, JIN Weijie, LI Xiao, JIN Shi
Available online  , doi: 10.11999/JEIT251381
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RISs) have strong potential to improve coverage and Spectral Efficiency (SE) in future wireless networks. However, when RISs are applied to wideband Multiple-Input Single-Output Orthogonal Frequency Division Multiple Access (MISO-OFDMA) systems, their practical benefits are limited by two key challenges. First, RIS reflection coefficients may not match the frequency-selective channel conditions across all subcarriers. Second, subcarrier allocation, Base Station (BS) active beamforming, and RIS passive beamforming are strongly coupled. These challenges become more serious in multi-group multicast scenarios, where shared data streams increase inter-group interference. Therefore, this article proposes a Graph Neural Network (GNN)-driven optimization framework to maximize the system SE through joint active beamforming, passive beamforming, and subcarrier allocation.  Methods  To address the optimization difficulty caused by the strong coupling among subcarrier allocation, BS active beamforming, and RIS passive beamforming, this work develops a model-driven GNN optimization framework. The objective is to maximize the system SE. First, a complete system model containing the BS, RIS, and multi-group multicast users is established (Fig 1). The formulation includes practical constraints, such as the BS transmit power limit, the unit-modulus constraint of RIS elements, and the binary constraint on subcarrier allocation. To satisfy the multicast requirement, the SE of each group is defined as the minimum SE among all users in that group. This definition further increases the non-convexity of the optimization problem.The first component of the proposed network, GNN1 (Fig 3), contains an initialization layer and a message-update layer. For each subcarrier \begin{document}$ n\in \mathcal{N} $\end{document}, every user is modeled as a node, and the input to GNN1 is the set of channel matrices \begin{document}$ \left\{{\mathbf{H}}_{k,n},k\in \mathcal{K}\right\} $\end{document}. Because standard GNNs process real-valued features, each complex channel vector is decomposed into its real and imaginary parts and used as the node feature representation. Group-level aggregation (Fig. 4) and RIS-level aggregation (Fig. 5) are then performed. GNN2 (Fig 6) takes the subcarrier-wise embeddings generated by GNN1 as input and constructs an expanded graph with group nodes (Fig. 7) and an RIS node (Fig. 8). By aggregating messages among subcarrier nodes, group nodes, and the RIS node, GNN2 fuses cross-subcarrier information and captures the global coupling among system components. Based on the integrated representation, GNN2 outputs the BS active beamforming matrix and RIS passive beamforming vector. Output-layer normalization is used to satisfy the physical constraints. Finally, given the beamforming parameters, subcarrier allocation is performed using the maximum-SE criterion. The learning objective is defined as maximizing the total SE.  Results and Discussions  The proposed GNN algorithm consistently outperforms all random benchmark schemes, including APG-randAllocate, APG-randActive, and APG-randPassive, across the full transmit power range from 0 to 20 dBm. This advantage indicates that the proposed method can dynamically handle subcarrier allocation and joint active and passive beamforming optimization. It also maintains stable and superior performance under large transmit-power variations. Overall, the system SE of all schemes increases monotonically with BS transmit power because higher transmit power improves the received signal-to-noise ratio and increases the achievable rate. Compared with the benchmark methods, the GNN adaptively coordinates BS active beamforming and RIS passive beamforming at different power levels and better uses the reflection gain provided by the RIS. Therefore, the GNN maintains a consistent performance advantage across the full power range. Even in the high-power region, it outperforms APG and LAO, which further verifies its robustness (Fig. 10).When the number of RIS elements varies, the GNN maintains a clear performance advantage over both APG and LAO. In general, the system SE increases with the number of RIS elements because a larger RIS provides higher array gain and improves the equivalent channel conditions. According to the numerical results, the proposed GNN achieves a spectral efficiency of 2.066 5 bit/(s·Hz), which is approximately 6.94% and 3.65% higher than those of LAO and APG, respectively. Meanwhile, the average computational time of the GNN is only about 0.007 5 s, which is approximately 4% of that required by the benchmark methods. These results demonstrate that the proposed GNN effectively uses the performance gain provided by RIS scaling and achieves a good balance between system performance and computational complexity (Fig. 11 and Table 2).The relationship between system SE and the number of user groups is then examined under fixed settings for the number of transmit antennas and users. The overall SE decreases as the number of user groups increases. This decrease occurs because more multicast groups lead to stronger inter-group interference and because limited subcarrier resources must be shared among more groups. In all considered scenarios, the proposed GNN consistently outperforms LAO. Although its SE is slightly lower than that of APG, the GNN still achieves about 98% of APG performance while requiring only about 4% of the computational time. This result indicates that the proposed method can reduce computational overhead while maintaining near-optimal system performance, which is useful for real-time or large-scale deployment (Fig. 12).The generalization ability of the proposed GNN is further evaluated by training the model at a fixed transmit power and testing it over a wide transmit power range from 0 to 20 dBm. The training and testing curves almost overlap, indicating that the proposed GNN generalizes well to unseen transmit power levels. Across the full power range, the GNN consistently outperforms the LAO and APG benchmarks, further confirming its robustness and adaptability under different transmission conditions (Fig 13).  Conclusions  For the RIS-assisted MISO-OFDMA system, this paper formulates a joint optimization problem for subcarrier allocation, BS active beamforming, and RIS passive beamforming to maximize the system SE. A model-driven GNN method is proposed to solve this problem. Comparative experiments with benchmark algorithms are conducted to validate the proposed method. The results demonstrate that the proposed GNN algorithm consistently outperforms LAO and APG in overall performance. It also exhibits strong robustness under different numbers of user groups and transmit power settings, which supports its potential for practical deployment in complex engineering scenarios.
Evaluation of DeepION model based on SPP Navigation Positioning During Active Solar Condition
WANG Zitong, FU Haiyang, JIANG Zhuojun, CAI Dijia
Available online  , doi: 10.11999/JEIT250662
Abstract:
  Objective  Accurate characterization of ionospheric variability is a critical prerequisite for reliable Global Navigation Satellite System (GNSS) positioning, especially during geomagnetic storms when rapid and highly structured disturbances occur. Existing empirical and physics-based ionospheric models often struggle to represent storm-time ionospheric dynamics and small-scale irregularities in real time. This study aims to develop a unified data-driven ionospheric modeling framework that takes GNSS-derived Slant Total Electron Content (STEC) time series (estimated from GNSS observations) as input and learns the spatiotemporal mappings to key ionospheric parameters, including STEC, Vertical Total Electron Content (VTEC), and the Rate of TEC Index (ROTI). By leveraging deep operator learning, the proposed framework seeks to enhance short-term ionospheric modeling and forecasting capability under disturbed conditions and to provide more reliable ionospheric corrections for single-frequency GNSS positioning.  Methods  This study proposes a unified data-driven ionospheric modeling framework, named DeepION, based on the Deep Operator Network (DeepONet) architecture. The framework takes STEC time series as the primary input, and learns nonlinear spatiotemporal mappings to key ionospheric parameters. Specifically, DeepION enables modeling and prediction of STEC and VTEC, while ROTI is subsequently derived from the predicted STEC series. In the network design, a convolutional neural network (CNN) is employed as the branch network to extract spatiotemporal features from historical STEC time series. The trunk network consists of a multi-layer fully connected architecture with periodic time encoding, whose inputs include GNSS observation geometry and temporal information, enabling the model to capture the continuous temporal dynamics of ionospheric behavior. During data preprocessing, a VTEC-based modeling strategy is first applied to estimate and remove receiver Differential Code Biases (DCB), thereby obtaining high-quality STEC observations. The model is then trained and validated using the STEC observations during the May 2024 geomagnetic storm. The model outputs include ray-path STEC values, gridded VTEC fields, and derived ROTI time series. Furthermore, the proposed framework is evaluated by incorporating the model-derived VTEC corrections into GNSS Single Point Positioning (SPP) experiments. The modeled and observed ionospheric parameters are compared under both geomagnetically quiet and disturbed conditions to comprehensively assess the modeling accuracy and practical performance of DeepION.  Results and Discussions  The experimental results demonstrate that the proposed DeepION model can robustly characterize ionospheric spatiotemporal variability under different space weather conditions, capturing both large-scale structures and small-scale disturbances during geomagnetic storms. On STEC forecasting, the model achieves a Root Mean Square Error (RMSE) of 12.8 TECU over a 3-day prediction horizon, maintaining high consistency with observed GNSS measurements (Fig.4). Moreover, the model effectively predicts ionospheric irregularities, as shown by the close match between predicted and observed ROTI time series at mid-latitude stations NVSK (Fig.5). For VTEC modeling, DeepION-generated global VTEC maps accurately reproduce equatorial anomalies and storm-enhanced density regions, closely matching the CODE-SH benchmark while outperforming empirical models such as Klobuchar and NeQuick in both spatial resolution and structural fidelity (Fig.6). Further analysis of ray-path level performance shows that STEC derived from DeepION-based VTEC mapping yields the lowest residual errors at the mid-to-high latitude station NLIB, achieving an RMSE of 6.80 TECU, outperforming Klobuchar, NeQuick, and slightly improving upon CODE-SH (Fig. 7). In GNSS positioning applications, SPP results indicate that DeepION-derived ionospheric corrections consistently reduce positioning errors at both CUSV and NLIB stations, particularly in the vertical and geometric components during storm-time conditions, demonstrating enhanced robustness under intensified geomagnetic disturbances (Fig. 8, Fig. 9).  Conclusions  This study presents DeepION, a data-driven ionospheric modeling framework based on the Deep Operator Network architecture, which learns spatiotemporal relationships between GNSS-derived STEC observations and key ionospheric parameters. With a CNN-based branch network and a periodically encoded trunk network, DeepION models and predicts STEC and VTEC, and then derives ROTI from the predicted STEC series. Experiments using global GNSS data during the May 2024 geomagnetic storm show that DeepION can capture storm-time ionospheric variability and achieves stable performance in STEC forecasting and global VTEC reconstruction. Compared with conventional empirical and physics-based models, DeepION provides improved modeling accuracy and spatial representation. Furthermore, GNSS Single Point Positioning experiments indicate that ionospheric corrections derived from DeepION lead to reduced positioning errors at both mid- and high-latitude stations, particularly in the vertical and geometric components under disturbed geomagnetic conditions. These results highlight the practical value of DeepION for GNSS ionospheric correction during space weather events. Overall, DeepION offers a scalable framework for data-driven ionospheric modeling, and future work will extend it to multi-GNSS constellations, longer prediction lead time, and additional ionospheric observations.
A Point Cloud Slice-based UAV SLAM for 3D Reconstruction of Large Container Port Areas
HU Zhaozheng, ZUO Zhihang, XU Cong, TAO Qianwen, LIU Chao, MENG Jie
Available online  , doi: 10.11999/JEIT251112
Abstract:
  Objective  With the continuous advancement of port intelligence, the demand for digital management in container port areas is increasingly growing. In large container yard scenarios, 3D reconstruction of the yard environment can be achieved by utilizing drone Simultaneous Localization and Mapping (SLAM) technology. However, container port areas contain an abundance of repetitive semantic structural information, where traditional semantic matching methods suffer from low efficiency and poor accuracy. Furthermore, during the 3D reconstruction process conducted by drones over container port areas, the lanes between yards present large feature-sparse regions, which can easily lead to odometry degradation. Additionally, the extensive presence of repetitive scene features also interferes with loop closure detection. To address these issues, this paper proposes a slicing method for rapid feature extraction, which is further optimized based on the characteristics of the container yard scenario. A UAV point cloud slicing SLAM method tailored for large-scale container port 3D reconstruction is introduced, enabling high-precision 3D reconstruction.  Methods  To address point cloud semantic extraction, this paper proposes a point cloud slicing method for rapid feature extraction, which quickly extracts the principal direction and divides the point cloud into multiple layers to efficiently obtain multi-layer semantic point clouds. The slicing method is further optimized based on the characteristics of the container yard scenario: the principal plane extraction is simplified using the direction of gravity, and the elevation range of each container layer is adaptively obtained through point cloud gradient changes to construct multi-layer sliced point clouds. Subsequently, a progressive adaptive LiDAR odometry based on sliced point clouds is constructed, which adaptively identifies degraded scenarios using elevation slices and employs an incremental iterative strategy for inter-layer slice fusion matching, thereby improving the accuracy, efficiency, and stability of the LiDAR odometry. In addition, a factor graph optimization method that fuses information from sliced point clouds is designed. By performing fusion voting on the matching results of multi-layer sliced point clouds, erroneous results are filtered out and the impact of repetitive structures on loop closure detection is reduced; slice factors are then used to construct factor graph edges, enhancing global optimization and achieving efficient and stable 3D reconstruction.  Results and Discussions  The feasibility and effectiveness of the proposed method are verified through testing in Carla simulations and real-world scenarios at a large container port in Wuhan. Results are as follows: First, through comparative analysis with three algorithms—RANSAC, Region Growth, and 3DG_SEG—the efficiency and accuracy of the proposed semantic extraction algorithm are demonstrated. Furthermore, by comparing mapping trajectories with two renowned open-source LiDAR algorithms, FAST-LIO2 and Faster-LIO, the superiority of the proposed odometry method is proven. Finally, comparisons of speed and confidence level are conducted with six algorithms: ICP, NDT, GICP, Fast_GICP, Scan Context+ICP, and Quatro. Simultaneously, the loop closure detection module from LIO-SAM is integrated into FAST-LIO2, and the Scan Context module into Faster-LIO. The mapping trajectories are then compared with that of the proposed algorithm, validating the effectiveness of the proposed loop closure detection algorithm. The proposed method achieves high 3D reconstruction accuracy; therefore, it is suitable for practical application in operational processes.  Conclusions  The proposed method uses an efficient point cloud slicing technique and a multi-layer slice matching mechanism. Points within the same elevation range form a slice point cloud (Slice), and the segmentation process is called slice generation. This enables efficient and robust 3D reconstruction in large-scale scenes with repetitive features.First, the LiDAR point cloud is aligned to the Z-axis using IMU-derived gravity direction. A sliding window records density gradient changes to adaptively determine each layer’s elevation range. This simplifies slicing and reduces the impact of non-standard containers or ground height variations on semantic extraction.Multi-layer slice data are then integrated into the odometry module to detect degenerate scenarios. Under normal conditions, progressive slice matching initializes pose estimation; otherwise, IMU-based iterative Kalman filtering is used.Finally, fusion voting removes outliers from multi-layer slice matching results. The best match initializes loop closure for global container point cloud registration, enabling dual-stage loop closure detection and slice factor construction. Integrating slice point cloud information into factor graph optimization unifies coordinates and achieves efficient, robust 3D reconstruction.
Secure and Covert MIMO Short Packet Communications with Location-Uncertain Malicious Nodes
TIAN Bo, YANG Weiwei, YANG Xiaoqin, BAI Mengmeng
Available online  , doi: 10.11999/JEIT260059
Abstract:
  Objective  This paper investigates secure and covert short packet communication in multi-input multi-output (MIMO) wireless systems with location-uncertain malicious nodes over quasi-static Rician fading channels. In the considered scenario, a legitimate transmitter sends confidential short packets to a legitimate receiver, while multiple monitoring nodes attempt to detect whether the transmission exists and multiple eavesdropping nodes attempt to intercept the confidential information. Since malicious nodes may remain silent and their exact positions are unavailable to the legitimate system, their spatial uncertainty brings significant challenges to joint covertness and secrecy analysis. To address this problem, this paper establishes a unified analytical and optimization framework for secure covert short packet transmission, aiming to characterize the coupling relationship among covertness, secrecy, and reliability, and to improve the average effective secrecy and covert rate (AESCR).  Methods  The transmitter adopts singular value decomposition (SVD)-based precoding, and the legitimate receiver applies maximum ratio combining (MRC) to enhance the legitimate link. The monitoring nodes and eavesdropping nodes are modeled as two independent Poisson point processes (PPPs) outside a circular protection zone centered at the transmitter, which captures the spatial randomness of malicious nodes. For covertness analysis, each monitoring node is assumed to perform optimal likelihood ratio detection with full knowledge of the system model, noise power, channel state, and codebook information. By using the Chernoff bound and the Bhattacharyya coefficient, a theoretical lower bound on the minimum detection error probability of a single monitoring node is first derived. Then, by combining stochastic geometry with the distribution of the strongest monitoring node, a tractable lower bound on the average minimum detection error probability is obtained. For secrecy analysis, the finite blocklength normal approximation is used to account for both decoding error and information leakage penalties. The legitimate channel is statistically characterized according to the Rician fading condition, while the strongest eavesdropping node is analyzed through stochastic geometry. Based on these results, an approximate analytical expression for the average secrecy rate is derived. Furthermore, AESCR is introduced as a comprehensive performance metric that jointly reflects reliability, secrecy, and covertness. Under the average covert constraint and the short packet length constraint, a joint optimization problem for transmit power and packet length is formulated. By exploiting the monotonic properties of the objective function and the covert constraint, the original coupled optimization problem is transformed into a one-dimensional search problem.  Results and Discussions  Simulation results verify the accuracy of the theoretical derivations and reveal the influence of key system parameters. Both the simulated average minimum detection error probability and its theoretical lower bound decrease as the packet length increases, and higher transmit power further reduces the detection error probability, indicating that excessive power makes the transmission more exposed to monitoring nodes (Fig. 2). Increasing the number of monitoring-node antennas strengthens spatial reception capability and further degrades covertness (Fig. 2). Enlarging the protection zone improves covertness because malicious nodes are forced to remain farther away from the transmitter, whereas increasing the monitoring-node density weakens this benefit by raising the probability that a strong monitoring node appears near the protection-zone boundary (Fig. 3). The average secrecy rate increases with packet length and gradually approaches the asymptotic secrecy-capacity upper bound, because the finite blocklength rate penalty becomes smaller when the packet length grows (Fig. 4). The proposed AESCR first increases and then decreases with packet length, confirming the existence of an optimal packet length; this phenomenon results from the tradeoff between reduced finite-blocklength penalty and increased detection exposure (Fig. 5). Larger malicious-node density and more malicious-node antennas reduce system performance, since they enhance both monitoring and eavesdropping capabilities (Fig. 5). Relaxing the covert constraint improves the achievable AESCR, because the system can select a higher transmit power or a more favorable packet length (Fig. 6). The results under different Rician factors also show that the proposed analytical framework is applicable to both Rician and Rayleigh fading conditions (Fig. 6). Increasing the number of legitimate receive antennas improves AESCR, and a larger transmit antenna array brings additional SVD precoding gain (Fig. 7). Compared with benchmark schemes, the proposed joint optimization of transmit power and packet length consistently outperforms the scheme with fixed packet length and power-only optimization, demonstrating the necessity of jointly balancing reliability, secrecy, and covertness in MIMO short packet transmission (Fig. 8).  Conclusions  This paper develops a stochastic-geometry-based analytical framework for MIMO secure covert short packet communication with location-uncertain multi-antenna malicious nodes. By deriving a lower bound on the average minimum detection error probability, obtaining an approximate analytical expression for the average secrecy rate, and introducing AESCR, the proposed framework reveals the fundamental tradeoff among covertness, secrecy, and reliability under finite blocklength transmission. The results show that increasing the number of legitimate transmit and receive antennas improves secure covert performance, whereas higher malicious-node density and more malicious-node antennas degrade system performance. The existence of an optimal packet length further demonstrates that packet length and transmit power must be jointly designed. Therefore, the proposed joint optimization method provides an effective solution for secure covert short packet transmission in mission-critical and low-latency wireless systems.
One-step Reconstruction Diffusion Model for Poisoning Attack on QoS-aware cloud API Recommender System
TAN Zeyu, WANG Haoyuan, QI Mingyang, SUN Mengmeng, SHEN Limin, CHEN Zhen
Available online  , doi: 10.11999/JEIT260115
Abstract:
  Objective  In the cloud era, cloud Application Programming Interface (cloud API), as the best carrier for data output, capability replication and service delivery, has become an indispensable core element for service-oriented software development and operation. With the rapid increase in the number of cloud APIs, it is difficult for users to choose from a large number of cloud APIs with the same functions. For this purpose, researchers introduced Quality of Service (QoS) to effectively differentiate cloud APIs based on their non-functional attributes. Therefore, QoS-aware cloud API recommender systems (QARS) are gradually playing an increasingly important role in guiding users to choose the most suitable cloud API. However, existing research mainly focuses on improving the accuracy of QARS, ignoring the security risks brought about by the economic benefits of cloud APIs and the openness of the network environment. These risks are especially evident in the threats posed by poisoning attacks. Attackers manipulate the recommendations by injecting fake users, causing serious damage to the fairness and credibility of the QoS-aware cloud API recommender system. To counter the threat of poisoning attacks, this paper reveals the attack mechanisms of diffusion model-based attack methods from the perspective of learning defense through attacking, inspiring the design of corresponding defense methods.  Methods  This paper systematically defines the attack process of poisoning attacks and fake user profiles, and proposes attack scales to flexibly simulate poisoning attacks. Then, to reveal the attack principle of the diffusion model-based attack method, this paper further proposes a Preference guided one-step reconstruction Diffusion model-based Poisoning Attack framework (PDPA) to simulate poisoning attacks. Following the collaborative principle that similar users may have similar preferences toward cloud APIs, the fake users generated by the attack method need to ensure that both their QoS values and the distribution of cloud API invocations remain similar to those of real users, thereby exploiting the collaborative influence of fake users to interfere with the QARS's modeling of user preferences. Therefore, to effectively carry out poisoning attacks, PDPA aims to generate fake users that are similar to real users. Firstly, PDPA uses the One-step reconstruction Diffusion Model (ODM) to model the QoS data and the invocation distribution of real users, respectively. ODM avoids the error accumulation that occurs during the iterative denoising process caused by the noise dependence of standard diffusion models, enabling ODM to generate fake user cloud API invocation behaviors similar to those of real users, thereby ensuring that fake users can effectively have a collaborative influence. Subsequently, in order to improve the attack performance, PDPA systematically selects fake users with a preference for invoking the target cloud API to fill the maximum QoS value. This not only enhances the aggressiveness of fake users, but also alleviates the interference of the target cloud API's addition on the invocation behavior of fake users, ensuring the concealment of fake users.  Results and Discussions  The experiment was conducted in the real-world QoS dataset WS-DREAM. Firstly, this paper uses six recommendation methods as target recommender systems, and six baseline attack methods to simulate poisoning attacks. The experimental results (Table 3) reveal the vulnerability of the recommender system to poisoning attacks. Each attack method can cause damage to the accuracy of the recommender system. PDPA achieves the best attack performance in most experimental settings, which is attributed to its sufficient modeling of user invocation preferences, thereby enabling fake users to effectively exert collaborative influence on the QARS. Secondly, the comparison of the F1 and distribution in latent space of fake users generated by ODM and the standard diffusion model was conducted. The experimental results (Figure 2) verify that ODM is superior to the standard diffusion model not only in terms of stealth but also as reflected in low-dimensional visualization. Subsequently, the ablation study on each module of PDPA was conducted. The experimental results (Tables 4 and 5) verify that each module of PDPA is a necessary guarantee for the attack performance and concealment of fake users. Finally, the comparison of MAE and F1 on various attack scales was conducted to verify the impact of attack scale on the attack effect and concealment of fake users. The experimental results (Figure 3 and Table 6) indicate that increasing the attack scale could effectively enhance the attack performance, but it would also lead to an increase in the number of detected fake users.  Conclusions  To counter the threat of poisoning attacks, this paper explores the attack process and key attack parameters of poisoning attacks, and reveals the vulnerability of the QoS-aware cloud API recommender system by simulating poisoning attacks. This paper simulates poisoning attacks on QARS by constructing the PDPA, which demonstrates the significant potential of diffusion models in poisoning attacks and validates the necessity of separately modeling QoS data and cloud API invocations through ablation studies. Furthermore, PDPA reveals the underlying mechanism of generating fake users via diffusion models, providing insights for designing targeted countermeasures.
Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating Two-row Update Strategy
ZHU Jianyong, CHEN Kun, YANG Hui, NIE Feiping
Available online  , doi: 10.11999/JEIT260111
Abstract:
  Objective  As data structures grow increasingly complex, conventional unsupervised clustering techniques often fail to achieve satisfactory performance. Semi-supervised clustering, which leverages limited prior information, has thus become increasingly popular due to its ability to improve clustering quality. While existing methods have made progress, they suffer from two critical drawbacks. First, traditional constrained projection clustering algorithms typically adopt a two-step independent strategy: learning the projection matrix first and then performing kmeans clustering. This separation causes the projection deviation to propagate directly to the clustering process without correction, leading to the accumulation of learning errors. Moreover, applying pairwise constraints only at the projection stage deviates from the core principle of using prior information to guide the clustering process. Second, many current methods, such as those based on spectral clustering, handle pairwise constraints implicitly (e.g., through eigen-decomposition of a modified similarity matrix). This implicit handling often fails to strictly satisfy constraints, particularly Cannot-Link constraints which are non-transitive, resulting in high constraint violation rates. To this end, this paper proposes a Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating Two-row Update Strategy (PCITUS). The primary objective is to unify dimensionality reduction and clustering into a single framework to avoid information loss and to design an explicit optimization strategy that minimizes constraint violations while enhancing computational efficiency.  Methods  The proposed PCITUS model integrates constraint projection and clustering into a unified objective function to achieve collaborative optimization, while directly optimizing pairwise constraints. First, the algorithm utilizes the transitive property of Must-Link (ML) constraints, where all samples belonging to the same ML connected component are merged into a single "hyper-point" in the feature space. This preprocessing step naturally ensures that all ML constraints are satisfied. Subsequently, a trade-off parameter is introduced to incorporate projection learning as a regularization term within the clustering framework, enabling the two components to be jointly optimized within a unified objective. Moreover, prior information is embedded into the clustering process by transforming pairwise constraints into row-wise constraints on the indicator matrix. Subsequently, this paper employs an improved coordinate descent method to solve the discrete indicator matrix directly, effectively enhances computational efficiency and find better results. Furthermore, a core innovation is the two-row simultaneous optimization strategy designed for handling Cannot-Link (CL) constraints. PCITUS explicitly checks for CL conflicts by simultaneously evaluating the objective function values for swapping conflicting rows to sub-optimal classes and selects the scenario yielding the higher value.  Results and Discussions  Extensive experiments were conducted on 8 benchmark datasets and compared against 9 state-of-the-art semi-supervised clustering algorithms. The quantitative results in terms of Accuracy (ACC) and Normalized Mutual Information (NMI) demonstrate the superiority of PCITUS (Table 4 and Table 5). PCITUS achieves the highest performance on most datasets. Notably, on the Mushroom dataset, the NMI metric improved by 7.2% compared to the second-best algorithms. The comparison with CNP (a two-step projection method) confirms that the unified framework effectively mitigates error propagation and information loss, this also stems from the fact that a better projection space can lead to a clearer clustering structure, while a more reasonable clustering structure, in turn, guides the formation of a more discriminative projection space. The effectiveness of the explicit constraint handling is further illustrated (Fig. 1), PCITUS exhibits no ML constraint violations due to the hyper-point merging strategy. For CL constraints, because of the two-row simultaneous optimization strategy, PCITUS maintains an extremely low violation rate (e.g., 0.57% on Mushroom and 0.41% on Satimage), significantly outperforming methods that handle constraints implicitly. Additionally, parameter sensitivity analysis (Fig. 2) indicates that the performance of PCITUS is stable across a wide range of the trade-off parameter, and noise sensitivity experiments (Fig. 3a and Fig. 3b) highlights its robustness. The convergence curves (Fig. 3c and Fig. 3d) and runtime comparisons (Table 7) validate its computational efficiency, showing rapid convergence and typically reaching a stable objective function value within approximately 10 iterations.  Conclusions  To tackle the difficulties in optimizing cannot-link constraints, as well as the inherent limitations of traditional constraint projection clustering frameworks based on a two-step separation scheme, this paper presents PCITUS, a novel semi-supervised clustering framework that jointly optimizes pairwise constraint projection and clustering structures. By integrating the projection objective into the clustering framework as a regularizer, the proposed method ensures that the subspace learning and data partitioning processes mutually enhance each other, jointly approaching the global optimum. Furthermore, pairwise constraints are integrated throughout the entire learning process, ensuring that prior knowledge is fully utilized during optimization. The introduction of the coordinate descent method with a specific "two-row simultaneous update strategy" allows for the direct and precise allocation of Cannot-Link constraints, significantly reducing constraint violations. Experimental results validate that PCITUS not only outperforms existing algorithms in clustering performance but also exhibits strong robustness to parameter variations.
Energy-Efficient Trajectory Planning and Resource Optimization for UAV Relay Communications over Hybrid RF/FSO Links
LI Baolong, PAN Wenwei, JIANG Hao, FENG Simeng, WU Qihui
Available online  , doi: 10.11999/JEIT260139
Abstract:
  Objective  In low-altitude communication networks, hybrid RF/FSO UAV relaying can effectively alleviate RF spectrum congestion and enhance uplink data aggregation efficiency. However, in obstacle-rich urban environments, FSO backhaul links are highly susceptible to blockage and may experience intermittent outages, resulting in a severe mismatch between the RF uplink arrival rate and the FSO backhaul service rate. Meanwhile, UAV trajectory planning is constrained by obstacle-avoidance and flight dynamics. To address these coupled challenges, this paper investigates an energy-efficiency maximization problem by jointly optimizing multiuser NOMA-based RF access and the UAV’s three-dimensional obstacle-avoiding trajectory, while incorporating buffer-assisted RF/FSO rate decoupling.  Methods  A time-slotted UAV relaying model where multiple ground users upload data to the UAV via an RF link using NOMA is considered in the paper. The UAV decodes the superposed signals using successive interference cancellation (SIC) and determines the decoding order in each slot according to the received power ranking. The successfully received data are then forwarded to the base station (BS) through an FSO backhaul link. Urban blockage is modeled using 3D geometric obstacles, and a visibility test is employed to determine whether each relevant link is in LOS or non-line-of-sight (NLOS), thereby capturing the spatially correlated and time-varying characteristics of the RF access rate and the intermittent FSO backhaul capacity. To suppress the blockage-induced mismatch between uplink and backhaul rates, a finite-capacity buffer is deployed at the UAV. In each slot, the forwardable amount is jointly limited by the instantaneous FSO backhaul capability and the amount of data available in the buffer, while buffer-capacity constraints prevent overflow. System energy efficiency is defined as the ratio of the cumulative data successfully delivered to the BS over the mission horizon to the UAV propulsion energy consumption, where the propulsion power is modeled as a function of the UAV’s velocity and acceleration to reflect the impact of flight dynamics. Under 3D flight-region boundaries, prescribed start, end locations, discrete-time kinematic equations, maximum velocity and acceleration limits, and obstacle collision-avoidance constraints, a non-convex optimization problem is formulated with cross-slot multiuser transmit powers and the UAV 3D trajectory as decision variables. Furthermore, an alternating optimization framework is developed. With a fixed trajectory, the propulsion energy is fixed and maximizing energy efficiency becomes equivalent to increasing the end-to-end successfully forwarded data, yielding a power-optimization subproblem. Due to NOMA coupling and logarithmic rate expressions, this subproblem remains non-convex and is handled via successive convex approximation (SCA). With fixed transmit powers, particle swarm optimization (PSO) is used to search candidate 3D trajectories in a continuous space. To ensure feasibility under strict dynamics and safety constraints, a quadratic-programming (QP) projection is employed to enforce velocity and acceleration constraints, and collision checks are performed on trajectory waypoints and inter-slot line segments to guarantee obstacle-free flight. These two optimization procedures are alternately performed, resulting in a joint design that satisfies flight-dynamics feasibility and collision avoidance while significantly improving energy efficiency.  Results and Discussion   Simulations are conducted in an urban airspace containing multiple users, a BS, and dense 3D obstacles. Blockage causes frequent LOS/NLOS switching as the UAV moves. Figures 2 and 3 present comparisons of the 3D trajectory and its planar projection, respectively. Compared to the initial trajectory, the optimized trajectory exhibits clear detours and necessary altitude adjustments, and achieves collision-free flight while satisfying velocity and acceleration constraints, thus validating the feasibility and safety of the proposed trajectory planning approach. Figure 4 presents the energy-efficiency convergence behavior under different user transmit-power budgets. The proposed alternating optimization typically stabilizes within a small number of outer iterations. Meanwhile, the converged energy efficiency increases with higher power budgets, demonstrating the synergy between power control and trajectory adaptation. Furthermore, Figure 5 depicts the buffer evolution over time. It is observed that the buffer gradually accumulates when the backhaul is blocked or experiences strong fading, and is quickly drained once the UAV enters regions where LOS backhaul becomes available and FSO capacity improves. In order to further quantify the buffering gain, Figure 6 compares the system energy efficiency achieved by the proposed buffering mechanism and the no-buffer scheme. Compared to the no-buffer scheme, the proposed mechanism enables store-and-forward-based temporal smoothing during backhaul interruptions, thereby significantly improving system energy efficiency. Figure 7 illustrates the energy-efficiency convergence behavior under different buffer capacities. It is observed that as the buffer capacity increases, the converged energy-efficiency level is significantly improved. This is because a larger buffer enhances the UAV’s ability to temporarily store incoming data, thereby effectively alleviating data accumulation and transmission blockage when the access-link rate and backhaul-link rate are mismatched or when the backhaul link is constrained. Figure 8 compares the performance of four benchmark schemes, namly a non-optimized baseline, a power optimization scheme, a trajectory optimization scheme, and the proposed joint power-and-trajectory optimization scheme. It is found that the coordinated design of power allocation and obstacle-avoiding trajectory substantially improves end-to-end energy efficiency, and that trajectory optimization often plays a more dominant role under blockage-limited conditions.  Conclusion  The paper investigates a hybrid RF/FSO UAV relaying scheme with NOMA and an onboard buffering mechanism for low-altitude urban communication environments. Given the dense obstacles, frequent blockage, the fragility of FSO links, and stringent flight-dynamics constraints, an energy-efficiency maximization problem is formulated for the joint optimization of multiuser NOMA power allocation and the UAV trajectory. Accordingly, an SCA-based power-allocation method and an obstacle-avoiding trajectory design combining PSO with QP projection are developed. The obtained trajectory satisfies flight-dynamics feasibility and collision-avoidance requirements while significantly improving throughput per unit propulsion energy. Simulation results demonstrate that the planned trajectory can effectively avoid obstacles, and the onboard buffer provides an effective cushion between RF access and FSO backhaul to mitigate rate mismatch. In addition, the proposed method consistently outperforms benchmark schemes in terms of energy efficiency. Meanwhile, the trajectory optimization is shown to be generally more effective than power allocation in improving the overall system performance.
Non-Terrestrial Network Architecture and Key Technologies for Civil Aviation
LIU Xiangnan, QIU Yu, HUANG Zhipeng, ZHANG Haijun
Available online  , doi: 10.11999/JEIT260348
Abstract:
  Significance   Currently, civil aviation communications rely heavily on terrestrial base stations and narrowband satellite communications. This setup not only leaves significant coverage blind spots in scenarios like remote airspace, transoceanic routes, and polar flights—failing to meet the high-reliability requirements of core operations such as real-time flight monitoring and engine health data transmission—but also suffers from pain points including bandwidth constraints, poor passenger connectivity experience, and insufficient communication resilience in emergency scenarios.  Progress   In this context, we review the evolution of NTN technologies in the civil aviation sector and track the latest research progress worldwide and domestically on civil aviation NTN networks, including network frameworks, mobility management, and resource management. To enable NTN networks to better serve the civil aviation, we approach the topic from three perspectives—network frameworks, mobility management, and resource management—introducing key technologies in network architecture, access and mobility management, and novel resource control mechanisms within NTN systems.  Conclusions  For civil aviation, NTN can not only completely fill the coverage gaps of terrestrial communications, but also balance high-speed passenger connectivity with efficient transmission of airline operational data, enhancing the industry’s operational efficiency and service quality. It lays a technical foundation for cutting-edge scenarios like future air-space integrated transportation and civil aviation unmanned aerial vehicle networking, serving as a key enabler to address civil aviation’s communication challenges and drive the industry’s upgrade toward greater safety, efficiency, and intelligence.  Prospects   With the continuous advancement of key technologies such as networking architecture design, mobility management, and resource management, the proposed solutions are expected to offer more efficient, stable, and intelligent communication support for the civil aviation industry. In the long term, such NTN-enabled communication frameworks will play an essential role in supporting the digital transformation and intelligent upgrading of civil aviation operations.
A Tensor Framework for ISAC: Information Fusion Enhanced Channel Estimation and Target Localization
YU Weijia, DU Jianhe, CHEN Yuanzhi, HE Jing, ZHANG Peng, GUAN Yalin
Available online  , doi: 10.11999/JEIT251371
Abstract:
  Objective  Communication and sensing systems are evolving toward higher frequency bands, larger antenna arrays, and greater miniaturization, driving their increasing convergence in terms of hardware architecture, channel characteristics, and signal processing. This synergy gives rise to integrated sensing and communication (ISAC), in which the joint estimation of channel and sensing target parameters has become a primary research hotspot. Although existing studies have realized the co-estimation of these two categories of parameters based on a unified tensor framework, several limitations remain. On the one hand, current research focuses primarily on parameter estimation itself, without further transforming the multidimensional estimation results into precise localization of scatterer points (SPs), mobile terminals, and sensing targets, which makes it difficult to achieve a complete spatial characterization of the wireless propagation environment. On the other hand, limited attention has been paid to the fusion mechanism between channel and sensing target parameter information, thereby hampering the further improvement of parameter estimation and localization accuracy.  Methods  To address the problems of parameter estimation and localization for channels/sensing targets in millimeter-wave multiple-input multiple-output ISAC systems, a tensor decomposition algorithm based on information fusion is proposed. First, a unified fourth-order parallel factor model is constructed at the base station for the estimation of uplink channel and sensing target parameters. To reduce computational complexity, the fourth-order tensor model is transformed into a third-order form, and the trilinear alternating least squares method is adopted to estimate the three factor matrices. Furthermore, by exploiting the special structure of a factor matrix, the proposed algorithm incorporates a closed-form decomposition to decouple the coupled factor matrix, from which the angle of departure, angle of arrival, time delay, Doppler shift, and coefficients are extracted from the four estimated factor matrices. On this basis, the localization of mobile transmitter (MT), SPs, and sensing targets is realized separately using geometric relationships, while the estimation accuracy of SPs is effectively improved by fusing the Doppler shift and position information of SPs and sensing targets. Besides, the Cramér-Rao bound is derived to establish a theoretical performance benchmark for the five parameters.  Results and Discussions  The first simulation experiment shows that the proposed algorithm and the Op-QALS algorithm outperform the Co-SVD-BALS algorithm in both channel/sensing target parameter estimation and localization (Fig. 2, Fig. 3, Fig. 4). With information fusion, the proposed algorithm achieves the best performance in Doppler shift and position estimation for SPs (Fig. 2(d), Fig. 4(a)). This is attributed to the fact that both the proposed algorithm and Op-QALS algorithm fully exploit the multi-dimensional structure of the received signal, and the fusion operation further enhances the estimation capability of the proposed algorithm, whereas the Co-SVD-BALS algorithm suffers from severe error accumulation during its stepwise factor matrix estimation. Moreover, the average processing time (APT) required by the proposed algorithm for localization is slightly higher than that of Co-SVD-BALS algorithm, but significantly lower than that of Op-QALS algorithm (Table 1 and Table 2). Therefore, the proposed algorithm achieves excellent parameter estimation and localization performance at a reasonable computational cost. The second simulation experiment shows that under two signal-to-noise ratio levels, the localization accuracy of all algorithms improves gradually with the increase of \begin{document}$ K $\end{document}, while the proposed algorithm maintains comparable SP and MT localization accuracy to Op-QALS algorithm, but with notably lower APT (Fig. 5). Furthermore, the incorporation of the fusion operation does not significantly increase the APT of the proposed algorithm (Fig. 5(d)). The third simulation experiment indicates that increasing \begin{document}$ {M}_{\mathrm{RE}}\left(M_{\mathrm{RE}}^{\mathrm{s}}\right) $\end{document}and \begin{document}$ N $\end{document} helps enhance the ability of the proposed algorithm to resolve multipath signals, thereby obtaining more precise localization performance (Fig. 6).  Conclusions  This paper proposes a unified tensor framework-based information fusion algorithm for channel/sensing target parameter estimation and localization. By exploiting the Vandermonde structure of a factor matrix, the proposed algorithm maintains estimation accuracy while reducing complexity. Besides, fusion operation further improves SP estimation and localization without significantly increasing computational overhead. Future work will extend the algorithm to more general array configurations and explore higher-order tensor processing in multi-base-station cooperation or multi-user access scenarios.
Physical-layer Security in Visible Light Communications: Fundamental Theories, Key Techniques, and Future Challenges
WANG Jinyuan, YAN Xinrun, LIN Zihan, LI Yuanyuan, LI Zheng, ZHANG Xin
Available online  , doi: 10.11999/JEIT260338
Abstract:
  Significance   Due to the broadcast nature of optical signals, information security represents a critical research direction in visible light communication (VLC). Conventional encryption techniques address network security issues at the upper layers of the protocol stack through access control, cryptographic protection, and end-to-end encryption. However, their security relies on the assumption that eavesdroppers possess limited computational capabilities, an assumption that currently faces significant challenges. In recent years, physical layer security (PLS) has emerged as a novel information security paradigm and has attracted considerable attention from researchers worldwide. PLS exploits the randomness, heterogeneity, and distinctiveness between the main channel and the eavesdropping channel to achieve secure information transmission at the physical layer. To date, extensive research achievements have been made regarding PLS techniques in conventional radio frequency wireless communications (RFWC). Nevertheless, due to substantial differences in frequency bands, transmitted signals, power representations, and channel characteristics, PLS research results from RFWC systems cannot be directly applied to VLC. Although scholars worldwide have conducted research on VLC PLS technology, the foundational theories, key techniques, and future challenges involved in VLC PLS still lack a systematic review. To bridge this gap, this paper presents a comprehensive survey of VLC PLS technology.  Progress   To evaluate and enhance system performance, a classic VLC PLS system model—comprising the received signal model, the input constraint model, and the channel gain model—is initially established. A comprehensive theoretical framework for performance evaluation is then developed, encompassing instantaneous performance metrics, statistical performance metrics, and asymptotic performance metrics. Specifically, to characterize instantaneous performance, existing works on instantaneous secrecy capacity and instantaneous secrecy rate across different scenarios are summarized. As statistical performance metrics, average secrecy capacity, average secrecy rate, secrecy outage probability, probability of strictly positive secrecy capacity, and interception probability are analyzed. To demonstrate asymptotic performance, secrecy diversity order and secrecy degrees of freedom are derived. Furthermore, to enhance the PLS performance, advanced technologies, including secure beamforming, artificial noise, physical region protection, secure coding, and secure diversity, are summarized.  Prospects   Despite existing research achievements, numerous challenges remain in VLC PLS. This paper identifies four critical challenges: (i) Accurate PLS performance limit: Deriving exact expression of secrecy capacity under VLC's unique physical constraints remains challenging. (ii) Incomplete evaluation framework: Some key metrics widely used in RFWC have not been investigated in VLC, and the construction of a comprehensive VLC PLS performance evaluation framework remains unresolved. (iii) Limitations of existing methods: Conventional PLS performance enhancement methods typically adopt a “modeling-optimization-verification” separated research paradigm, often falling into a vicious cycle of “inaccurate modeling-suboptimal solutions-limited performance gains”. Therefore, it is imperative to integrate novel technologies (such as deep learning, reinforcement learning, and digital twins) to construct a data-model dual-driven framework for VLC PLS performance enhancement. (iv) Hardware platform gap: The absence of dedicated hardware platforms featuring adversarial topologies and real-time processing capabilities significantly impedes the practical deployment of VLC PLS technologies. Therefore, addressing these challenges is essential for transitioning VLC PLS from theoretical advances to commercial applications.  Conclusions  The broadcast nature of optical signals renders VLC systems vulnerable to eavesdropping attacks. This paper presents a comprehensive survey of PLS in VLC, covering system models, performance metrics (instantaneous, statistical, and asymptotic), and key performance enhancement technologies including secure beamforming, artificial noise, physical region protection, secure coding, and secure diversity. Despite significant progress, challenges remain in establishing accurate performance bounds, complete evaluation frameworks, novel enhancement techniques, and practical hardware implementations. By exploiting channel disparities at the physical layer without relying on complex encryption, PLS represents a paradigm shift in security assurance, paving the way for next-generation secure and reliable VLC networks.
A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception
PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai
Available online  , doi: 10.11999/JEIT260063
Abstract:
  Objective  Multimodal sentiment analysis is often affected by visual noise from complex environments, image-text sentiment inconsistency, and imbalanced modality contributions. When all modalities are treated without distinction, visual noise can degrade model performance. A robust mechanism is therefore needed to evaluate visual confidence and filter redundant visual information.  Methods  A Multimodal Sentiment Analysis Model with Multi-Source Knowledge-guided Visual confidence Perception (MKVP) is proposed (Fig. 1). A multi-source knowledge guidance matrix is constructed using syntactic-dependency, sentiment-intensity, and aspect-focused operators (Fig. 2). Guided by this matrix, the Visual Confidence Perception (VCP) module measures semantic affinity and dynamically suppresses irrelevant visual noise (Fig. 3). A dual-stream parallel interaction module is then used to support deep cross-modal alignment, and a global gated fusion mechanism further adjusts the fusion weights of different modalities.  Results and Discussions  Extensive experiments are conducted on the MVSA-Single, MVSA-Multiple, and HFM datasets. The proposed MKVP model achieves accuracy and F1 scores of 77.56% and 76.70%, 72.72% and 70.66%, and 87.26% and 86.78%, respectively. Compared with the baseline models, the accuracy and F1 score are improved by 2.45% and 3.68%, 2.19% and 2.21%, and 1.83% and 1.91%, respectively (Table 3). Ablation studies show that each component contributes to performance, especially the VCP module, which filters visual noise and improves feature quality (Table 5). Feature-space visualization further confirms that the VCP module refines semantic representations by promoting clearer clustering of samples with the same sentiment polarity (Fig. 4). Case studies on mismatched image-text samples also verify the ability of the model to resolve cross-modal semantic conflicts (Table 6). Model-complexity analysis shows that MKVP maintains high computational efficiency and low inference latency (Table 8).  Conclusions  The proposed MKVP framework reduces the effects of visual noise and image-text sentiment inconsistency in multimodal sentiment analysis. By using multi-source knowledge to guide visual confidence perception and combining dual-stream interaction with dynamic gated fusion, the model learns robust sentiment representations from noisy multimodal data. This method provides an efficient and reliable solution for complex social media scenarios.
Aerial Spatio-Temporal Image Generation via Latent Diffusion Models
SHANG Yuying, HOU Yingyan, LIU Zinan, LU Wanxuan, HUANG Yuhong, WANG Yixiao, YU Hongfeng, FU Kun
Available online  , doi: 10.11999/JEIT260165
Abstract:
  Objective  Aerial Earth observation plays a pivotal role in environmental monitoring, disaster warning, and urban planning. However, constraints such as flight-platform endurance and mission-window timeliness often prevent acquired aerial imagery from fully characterizing the long-term evolution of the Earth's surface. Although pre-trained latent diffusion models have shown strong potential for image generation, their application in aerial scenarios remains challenging because of the scarcity of high-quality temporal annotation data and semantic-visual misalignment caused by variable observation scales. To address these challenges, this paper proposes ASTIG, a training-free framework for Aerial Spatio-Temporal Image Generation. By leveraging the generative priors of pre-trained latent diffusion models and Large Language Models (LLMs), ASTIG provides a new paradigm for semantically controllable aerial spatio-temporal image generation.  Methods  ASTIG consists of three coordinated components. First, a dynamic semantic decomposition process is proposed to parse complex descriptions of aerial scene evolution into frame-level visual prompts, thereby compensating for the lack of temporal semantic annotations in existing aerial image-text datasets. Second, a Linguistic Binding (LB) strategy is proposed to establish explicit associations between key ground objects and their corresponding visual attributes within the cross-attention mechanism of the diffusion model, thereby improving the semantic response precision of the generated images. Third, a Temporal Anchor Attention (TAA) mechanism is incorporated. It uses dual reference frames to maintain subject stability and background consistency across the generated spatio-temporal image sequence, thus suppressing inter-frame temporal drift under training-free conditions.  Results and Discussions  ASTIG and the baseline methods are evaluated on 7,236 high-quality aerial spatio-temporal descriptions using six automated metrics, including subject consistency, background consistency, temporal flickering, motion smoothness, aesthetic quality, and imaging quality. Quantitative results (Tables 1 and 2) show that ASTIG outperforms the baseline methods in spatio-temporal image generation, with improvements of 3.91% in subject consistency and 4.57% in motion smoothness over the frame-prompt baseline. Qualitative comparisons (Fig. 4) further show its strong ability to model long-term surface evolution in aerial imagery. Ablation studies validate the individual effectiveness of the LB strategy and the TAA mechanism (Table 3 and Fig. 5). Sensitivity analyses of the intervention steps (Table 4 and Fig. 6) and binding strength (Table 5 and Fig. 7) further identify suitable parameter settings. Extension experiments from satellite perspectives (Figs. 8 and 9) also show that ASTIG has the potential to generalize beyond aerial platforms to broader Earth observation scenarios.  Conclusions  This paper proposes ASTIG, a training-free framework for aerial spatio-temporal image generation that addresses the scarcity of high-quality long-term temporal data and semantic-visual misalignment. By leveraging the generative priors of pre-trained latent diffusion models and LLMs, ASTIG integrates a dynamic semantic decomposition process, an LB strategy, and a TAA mechanism to improve temporal semantic construction, semantic response precision, and inter-frame consistency. Experimental results show that ASTIG outperforms existing baseline methods across multiple automated evaluation metrics, providing a new paradigm for aerial spatio-temporal image generation. As a training-free method, ASTIG is still limited by the prior knowledge of the backbone model. Future work will examine geometric correction and nadir-view prior constraints to better align the generated results with the physical properties of satellite imagery.
Joint Channel Estimation and Diagnosis for Blocked RIS-Assisted Multi-User Multipath Millimeter-Wave Systems
LI Shuangzhi, LIU Cong, WANG Ning, HAN Gangtao, GUO Xin
Available online  , doi: 10.11999/JEIT260093
Abstract:
  Objective  Reconfigurable Intelligent Surface (RIS) can effectively modulate Millimeter-Wave (mmWave) signals and reshape the wireless propagation environment. In practical deployments, however, RIS elements are vulnerable to adverse weather and physical obstructions, which cause unpredictable distortion and motivate joint channel estimation and blockage diagnosis. Most existing studies focus on single-user systems, whereas multi-user scenarios remain insufficiently studied. This gap creates an opportunity to exploit the common RIS blockage vector and the shared RIS-Base Station (BS) channel across users. This paper therefore proposes a low-complexity framework for joint channel estimation and blockage diagnosis by exploiting the sparsity and correlation of multi-user cascaded channels.  Methods  Under the assumption that all User Equipment (UE) shares the same RIS-BS channel and is affected by a common RIS blockage vector, the problem is divided into two stages. First, a target UE is selected. The sparsity of the mmWave channel and blockage vector, together with the linear dependence among RIS-BS paths, is used to formulate a sparse recovery problem. A hierarchical Bayesian model is then adopted, and an efficient Sparse Bayesian Learning (SBL) algorithm is used for joint recovery. Second, partial Channel State Information (CSI) obtained from the target UE is used to construct a common channel matrix that combines the RIS-BS channel and blockage information. Channel estimation for the remaining UEs is then reformulated as another sparse recovery problem.  Results and Discussions  A low-complexity strategy for cascaded channel estimation and blockage diagnosis is developed by exploiting the sparsity and correlation of multi-user cascaded channels and the commonality of the RIS blockage vector. Ideal estimation results are used as a theoretical lower bound, and the proposed algorithm is compared with two benchmark schemes. Simulation results show that the proposed algorithm consistently outperforms the benchmark schemes (Fig. 1). Specifically, a higher target-user Signal-to-Noise Ratio (SNR) improves the Normalized Mean Square Error (NMSE), which confirms the importance of target-user selection (Fig. 2). The algorithm also shows good convergence as the number of iterations increases (Fig. 3), and its performance approaches the ideal case more closely as the number of time frames increases (Fig. 4). In addition, the method remains robust as the number of blocked elements increases (Fig. 5). More BS antennas further improve performance by enhancing array orthogonality (Fig. 6). By exploiting path correlation, the proposed method achieves better estimation accuracy with slightly lower runtime (Table 1). However, estimation accuracy decreases as the number of paths increases because the model becomes more complex (Figs. 7 and 8).  Conclusions  This paper proposes a joint channel estimation and blockage diagnosis framework for blocked RIS-assisted multi-user multipath mmWave systems. Simulation results show that the method approaches the theoretical performance bound in complex multipath environments. It also maintains clear performance advantages under high blockage rates while reducing computational complexity through the use of common channel structures. This study provides a practical solution to performance degradation in RIS deployment, clarifies the effects of key parameters, and offers guidance for system design. Because practical blockages often exhibit block-sparse or structured-sparse characteristics, future work may incorporate structured priors, such as group sparsity and Markov random fields, into the SBL framework to capture spatial correlation and improve diagnostic accuracy and robustness.
Construction of MDS Entanglement-Assisted Quantum Error-Correcting Codes
QU Yuanyue, GAO Jian
Available online  , doi: 10.11999/JEIT251251
Abstract:
  Objective  Entanglement-Assisted Quantum Error-Correcting Codes (EAQECCs) provide an effective way to protect quantum information by using pre-shared entanglement between the sender and receiver. Existing constructions of EAQECCs mainly rely on classical cyclic or constacyclic codes and often require strong algebraic constraints, which limit the range of achievable parameters. This paper develops a general and systematic framework for constructing new families of EAQECCs from Twisted Reed-Solomon (TRS) codes over finite fields. The study has two aims. The first is to extend classical Reed-Solomon-based code design to the twisted setting so that richer algebraic structures can be used. The second is to determine the exact number of maximally entangled pairs required to attain the quantum Singleton bound. The final objective is to construct Maximum-Distance Separable (MDS) EAQECCs with greater flexibility and broader parameter ranges than existing methods.  Methods  The proposed method starts from the definition of TRS codes over finite fields. A twist parameter is introduced into the generator matrix, which changes the structure of the corresponding parity-check matrices. By systematically analyzing the associated coset-sum matrices in the twisted and untwisted cases, the rank of the relevant matrix product is determined. This rank equals the number of required entangled pairs and therefore provides the theoretical basis for the construction of EAQECCs. A detailed algebraic analysis shows that the matrix contains a submatrix with entries \begin{document}$ {M}_{l,j}=\displaystyle\sum\nolimits_{y\in W}{\left({\xi }^{j}y\right)}^{tl} $\end{document}, which simplifies to \begin{document}$ t\zeta^{jl} $\end{document}under suitable group-theoretic conditions. The resulting matrix is a Vandermonde matrix, and its full rank gives an explicit characterization of the entanglement structure. This property is then used to construct MDS EAQECCs. Based on these results, two families of EAQECCs are derived according to the number of entangled pairs. The corresponding parameters are tabulated and are shown to satisfy the quantum Singleton bound with equality, which confirms that the constructed codes are MDS.  Results and Discussions  Comprehensive parameter analysis and explicit examples verify the theoretical results. Comparative analysis further shows the flexibility of the proposed framework. Unlike previous constructions that require divisibility conditions such as \begin{document}$ a\mid (q+1) $\end{document}and \begin{document}$ a\mid (q-1) $\end{document}, the present approach remains applicable under broader algebraic settings and thus extends the feasible range of code parameters. This difference is summarized in the remark section and verified numerically. A systematic comparison with existing MDS EAQECCs (Table 4) reveals several new parameter regimes that are not accessible with classical or cyclic-code-based constructions. In particular, the proposed method yields larger code lengths and more flexible entanglement consumption rates \begin{document}$ \dfrac{c}{n} $\end{document}, which improves both the efficiency and the generality of EAQECCs. The algebraic consistency observed across all tested cases supports the correctness and general applicability of the TRS-based framework.  Conclusions  This study establishes an algebraic framework for constructing MDS EAQECCs from TRS codes. By rigorously analyzing the rank properties of coset-sum matrices, the required entanglement is determined precisely, and the conditions under which the constructed codes attain the quantum Singleton bound are identified. Two broad classes of MDS EAQECCs are obtained, corresponding to \begin{document}$ a\mid \left(q+1\right) $\end{document} and \begin{document}$ a\mid \left(q-1\right) $\end{document}, respectively, and both are verified by explicit examples and tabulated results. Compared with existing studies, the proposed approach not only generalizes earlier constructions but also extends the achievable parameter space to cases not covered by Reed-Solomon-code- or cyclic-code-based frameworks. The derived codes show improved structural flexibility, clearer algebraic characterization, and potential value for high-performance quantum information systems. This work therefore provides a unified perspective for the development of algebraically optimized EAQECCs and offers a basis for future studies of TRS-based quantum code families and their efficient encoding implementations.
Research Status and Prospects of Mid-Wavelength Infrared Superlattice Detector Technology
LIU Ming, ZHAO Yaqi, GUAN Xiaoning, ZHANG Fan, LU Pengfei
Available online  , doi: 10.11999/JEIT260083
Abstract:
  Significance   Mid-Wavelength Infrared (MWIR) detectors are widely used in civilian and military applications because of their high sensitivity and excellent temperature discrimination. Type-II SuperLattice (T2SL) materials, especially the InAs/GaSb and InAs/InAsSb systems, have become promising candidates for third-generation infrared photodetectors. This review systematically analyzes the research status and future trends of MWIR T2SL detector technology. It focuses on key photoelectric parameters, including Quantum Efficiency (QE), dark current density, and Specific Detectivity (D*). This work provides a reference for material selection and performance optimization in this rapidly developing field.  Progress   Considerable progress has been made in dark current suppression and photoresponse enhancement for MWIR T2SL detectors. For dark current suppression, advanced barrier structures, such as nBn, XBn, and M-structures, are designed through band-structure engineering. These structures effectively block majority-carrier transport while allowing efficient collection of photogenerated carriers. For instance, an nBn device with an AlAsSb/InAsSb superlattice barrier shows a dark current density of 2.01×10–5 A/cm2 at 150 K (Fig. 1(a,b)). Strain compensation and optimized epitaxial growth further reduce bulk dark current. One device achieves a dark current density of 4.5×10–7 A/cm2 at 140 K (Fig. 2(c,d)). Device process optimization, including two-step etching and Zn-diffusion-based planar junction formation, also reduces surface leakage current (Fig. 3). For photoresponse enhancement, the main strategies include micro/nano-optical structure integration, epitaxial growth optimization, and device process improvement. Monolithically integrated metalenses increase the peak responsivity to 9.01 A/W at 300 K (Fig. 4(a)). Guided-mode resonance architectures enable a room-temperature External Quantum Efficiency (EQE) of approximately 60% (Fig. 4(b,c)). Epitaxial optimization, including stepped absorption layers and interfacial graded doping, increases the QE to 59.4% at 150 K (Fig. 5(c,d)). Device process optimization, such as substrate removal and Anti-Reflection (AR) coating deposition, also improves QE. An average QE of 63.7% is reported in the 3.7–4.8 μm range (Fig. 6(c,d)). Comparative analysis shows that InAs/GaSb detectors are mainly reported at 77–150 K, whereas InAs/InAsSb detectors show stronger potential for higher-temperature operation, especially near 150 K (Fig. 7, Fig. 8). Overall, dark current densities are generally suppressed below 10–4 A/cm2, and peak QEs approach 70%.  Conclusions  T2SL materials, with tunable band structures and low Auger recombination rates, have become a core material platform for high-performance MWIR detection. Current studies have addressed key challenges in dark current suppression and photoresponse enhancement. Through advanced barrier design and device process optimization, dark current densities have been suppressed to the 10–6 A/cm2 level at approximately 150 K. Through optical and epitaxial engineering, QEs have been increased to approximately 60% or higher. The InAs/InAsSb material system is particularly promising for High-Operating-Temperature (HOT) applications.  Prospects  Future development will focus on four main directions. First, the HOT limit should be further increased, with the goal of maintaining diffusion-limited performance at 180 K or higher. Second, large-format Focal Plane Arrays (FPAs) should be developed based on highly uniform material growth through mature Molecular Beam Epitaxy (MBE), aiming for pixel operability higher than 99%. Third, multicolor and multispectral detection should be expanded by precisely tuning superlattice periods, enabling integrated dual-band or multiband MWIR detection with reduced crosstalk. Fourth, new device architectures and coupled physical mechanisms should be explored to extend detector performance and application boundaries.
PLS-YOLO: A Lightweight Model for Signal Modulation Recognition
ZHOU Xiaobo, ZHANG Fan, SHE Chao, ZHOU Guofei, MENG Jianping
Available online  , doi: 10.11999/JEIT251377
Abstract:
  Objective  As wireless communication evolves toward high efficiency, low latency, and ubiquitous connectivity, higher requirements are placed on Automatic Modulation Recognition (AMR) to ensure link reliability in complex electromagnetic environments. Deep learning has improved recognition performance compared with traditional methods, which often rely on subjective feature design and have limited robustness. However, existing YOLO-based AMR models are not fully optimized for specific signal characteristics or practical deployment. These models often have excessive parameters and high computational complexity, which makes them unsuitable for resource-constrained hardware, such as edge nodes and Field-Programmable Gate Arrays (FPGAs), and limits their ability to meet real-time communication requirements. To address these bottlenecks, this paper proposes Precision and Lightweight Structure-YOLO (PLS-YOLO), a lightweight AMR model based on YOLOv10n. By optimizing network channels, replacing core modules, and improving the downsampling mechanism, the proposed model enables efficient integration of modulation signal classification and localization. It also reduces the parameter count and computational complexity, thereby supporting AMR deployment in resource-constrained scenarios.  Methods  The method includes two main stages: dataset preprocessing and PLS-YOLO model construction. In the preprocessing stage, the public RadioML2016.10a and RadioML2016.10b benchmark datasets for signal modulation recognition are used. For In-phase and Quadrature (IQ) signals in these datasets, the Short-Time Fourier Transform (STFT) is used to map one-dimensional temporal signals into two-dimensional time-frequency spectrograms containing phase and amplitude information. This process provides richer feature representations for the model. A random sampling strategy without replacement is then used to stitch individual time-frequency samples into 3×3 composite images (Fig. 4). Target labels matching the input format of YOLO-series models are generated at the same time. The dataset is divided into training, validation, and test sets at a ratio of 7:1.5:1.5 by stratified sampling to ensure consistent signal-type distributions across all subsets. The model is built on YOLOv10n, with targeted improvements designed to balance the parameter count and recognition performance. The C2f module in the original backbone network is replaced with the CSPPC module, which is based on the CSP architecture and consists of feature splitting, Partial Convolution (PConv) processing, and feature fusion. This design reduces parameters while improving recognition performance. The feature dimensionality reduction process in the backbone network is also reconstructed to reduce the increase in computational complexity caused by parameter redundancy. The traditional downsampling module is replaced with CGBlock, which improves the capture of complex modulation signal features by fusing context-aware information. Finally, standard convolutions in the PSA and v10Detect modules are replaced with PConv to further reduce computational complexity and jointly optimize lightweight design and recognition performance.  Results and Discussions  Experimental results on RadioML2016.10a show that PLS-YOLO achieves a mean Average Precision (mAP) of 68.4% within the Signal-to-Noise Ratio (SNR) range of –20 to 18 dB. The mAP increases to 94.3% when SNR ≥ 0 dB. Compared with the baseline YOLOv10n model, PLS-YOLO improves mAP by 0.6%, reduces the parameter count by 47.33%, and decreases computational complexity by 34.15%. Its inference speed also increases by 5 frames per second (fps) (Table 2). These results show that the model effectively balances recognition performance and lightweight deployment by reducing computational cost while improving precision. To verify robustness, additional experiments are conducted on RadioML2016.10b. As shown in Table 4, PLS-YOLO achieves an mAP of 73.30% over the –20 to 18 dB range and 95.4% at SNR ≥ 0 dB. It outperforms mainstream models such as MCNet and LSTM2, confirming its strong recognition performance. Furthermore, Fig. 5 shows that converting IQ data into spectrograms is more suitable for PLS-YOLO recognition of digital modulation signals. By contrast, the recognition performance for analog modulation signals remains limited. Future work should therefore improve feature modeling and recognition capability for analog signals.  Conclusions  This study proposes PLS-YOLO, a lightweight AMR model based on YOLOv10n. To jointly improve modulation recognition performance and model compactness, the network structure is optimized through channel dimensionality reduction, core module replacement, downsampling mechanism improvement, and PConv substitution. These strategies reduce key limitations of existing YOLO-based AMR models, including parameter redundancy, high computational complexity, and limited adaptability to resource-constrained scenarios such as edge nodes and FPGAs. Experiments on the RadioML2016.10a and RadioML2016.10b benchmark datasets show that PLS-YOLO achieves strong overall performance. While integrated signal classification and localization are maintained, both parameter count and computational complexity are substantially reduced compared with the baseline YOLOv10n model, with a clear improvement in recognition performance. The results verify the effectiveness and feasibility of the proposed optimization strategies and provide a practical technical path for AMR implementation. The remaining limitations in analog modulation signal recognition also indicate a clear direction for future research.
Dynamic Focus and Semantic Prompt Network for Fine-Grained Pest Classification
LIU Changyuan, ZHAO Haijian, WU Haibin
Available online  , doi: 10.11999/JEIT260044
Abstract:
  Objective  Agricultural pest images are often affected by complex background interference, large appearance differences across morphological stages, diverse shooting angles, and substantial scale variation. These factors limit feature extraction and morphological adaptability in existing fine-grained classification models. To address these challenges, an Agricultural Pest Multi-Dimensional dataset (APMD) is constructed to cover multiple morphological stages, viewing angles, and object scales. In addition, a Dynamic Focus and Semantic Prompt Network for fine-grained pest classification (DFS-PestNet) is proposed. The network adopts a decoupled parallel architecture that combines a main feature stream and a prompt enhancement stream. A Spatial Dependency Perception (SDP) module is designed to dynamically focus on key discriminative regions, such as pest spots and wing veins, thereby improving local subtle feature extraction under complex backgrounds. An Advanced Haptic-Visual Prompting (AHVP) module is introduced to integrate category semantics and spatial position information into shallow and middle-level features, which improves adaptability to morphological variations across developmental stages. Dual-branch Saliency Sampling (DSS) is further adopted to adaptively aggregate key features from essential pest body parts through learnable prototype components and dual-branch saliency fusion. This strategy improves the recognition of small targets, including tiny pests and early-stage larvae. Experimental results show that the proposed model achieves better classification performance than baseline and mainstream methods on both public and self-constructed datasets. These results verify the effectiveness and application potential of the model in complex agricultural scenarios and provide a technical reference for intelligent pest monitoring and precise control in smart agriculture.  Methods  To improve classification accuracy under complex background interference and multi-morphological conditions, APMD is first constructed. This dataset contains image data covering different pest morphological stages, viewing angles, and scales. Specifically, it includes 15,680 images from 58 species, which are divided into training, validation, and testing sets at a standard ratio of 7:2:1 (Fig. 1) (Table 1). The dataset provides high-quality data support for research on fine-grained pest classification. DFS-PestNet is then proposed. In this network, the SDP module is designed to adaptively locate and enhance key discriminative pest regions. By reducing the effects of pose variation and complex background interference, this module enables more accurate fine-grained feature extraction. The AHVP module is also incorporated into the network to embed category semantics and spatial position information. This module guides the network to focus on key discriminative features across different morphological periods, thereby improving recognition robustness under large morphological changes during the pest life cycle. Furthermore, DSS is proposed to adaptively aggregate features from essential pest body parts. This strategy strengthens the recognition of challenging small targets and reduces the difficulty of small-target recognition in fine-grained pest classification.  Results and Discussions  The performance of DFS-PestNet in fine-grained pest classification is evaluated through multidimensional experiments. First, qualitative visualization is conducted. Grad-CAM heatmaps show that, compared with the baseline model, which is easily affected by complex farmland backgrounds and plant stems, DFS-PestNet effectively suppresses background noise and focuses on fine-grained discriminative parts, such as pest heads and antennae (Fig. 6). The model also shows clear advantages in capturing features of tiny targets, such as leafhopper nymphs, and pests at different life stages, such as Chilo suppressalis hidden within stems. The t-SNE feature reduction results further confirm that the proposed model reduces feature confusion in multi-morphological scenarios. High-dimensional features show clearer inter-class separation and tighter intra-class clustering in a two-dimensional visual space (Fig. 7). Second, quantitative ablation and parameter optimization experiments are performed. The ablation studies validate the synergistic effect of the three improved modules, namely SDP, AHVP, and DSS (Table 2). Their combination increases the classification accuracy of the baseline model by 2.21%, reaching 77.24%, with all core evaluation metrics achieving the best values. Hyperparameter optimization further identifies 6 as the optimal number of prompt position tokens and 0.2 as the optimal feature dropout rate (Fig. 8). This configuration ensures sufficient semantic representation while achieving a good balance between simulating natural occlusion and improving model robustness. Finally, comparative experiments with mainstream state-of-the-art models are conducted. Compared with existing advanced Convolutional Neural Network (CNN) and Transformer architectures, such as Gate-ViT and EST, DFS-PestNet achieves the highest accuracies of 77.24% and 98.01% on the large-scale public dataset IP102 and the challenging self-constructed APMD dataset, respectively (Table 3) (Table 4). These results show consistent improvements across fine-grained classification metrics. Moreover, while maintaining high classification accuracy, the proposed model achieves inference speeds of 158 frame/s and 164 frame/s on the two datasets, respectively. In summary, DFS-PestNet achieves strong classification accuracy and high inference efficiency for complex pest feature extraction across large scale variation and multiple morphological stages. This provides a practical basis for efficient deployment in smart agriculture.  Conclusions  To address multi-morphological variation and small-target recognition in fine-grained pest classification, the APMD dataset is constructed, and DFS-PestNet is proposed based on the MPSA baseline. Specifically, the SDP module is introduced to adaptively focus on pose- and morphology-invariant discriminative features. The AHVP module embeds category semantics and spatial position information into shallow and middle-level networks. The DSS module adaptively aggregates key body-part features to improve small-target recognition. Experimental results show that DFS-PestNet outperforms mainstream models on both the IP102 and APMD datasets across different developmental stages, angles, and scales. Future work will focus on lightweight model design for efficient edge deployment and open-set recognition for early warning of unknown pest categories in complex real-world environments.
Robust Optimization of Low-altitude Communication and Computation Resources in Uncertain Environments
GONG Yucheng, LI Bin, WANG Xinyi, FEI Zesong
Available online  , doi: 10.11999/JEIT260090
Abstract:
  Objective  Low-altitude edge computing networks provide flexible computing services and extended coverage for user equipment. However, quality of service is often degraded by uncertainty in task data size and by Unmanned Aerial Vehicle (UAV) position jitter caused by environmental disturbances. Existing robust methods commonly rely on deterministic uncertainty sets, which tend to be conservative and cannot accurately describe the stochastic distribution of task demands. To address these challenges, a robust energy minimization framework is proposed for multi-UAV-assisted Mobile Edge Computing (MEC) networks. The objective is to minimize the weighted sum of system energy consumption. This is achieved by developing a joint optimization model that coordinates UAV flight trajectories, task splitting decisions, and computation and communication resource allocation. The model explicitly accounts for the dual uncertainties of task data size and UAV trajectory.  Methods  To handle the nonconvexity and strong coupling among optimization variables, the problem is first modeled as a Markov Decision Process (MDP). A comprehensive state space is defined to characterize real-time system dynamics, and a continuous action space is designed for trajectory control and resource management. A Distributionally Robust Optimization Soft Actor-Critic (DRO-SAC) algorithm is then developed to solve the MDP. In this framework, an ambiguity set based on the L1-norm distance is constructed to characterize the distributional uncertainty of the task demand distribution. A maximum-entropy reinforcement learning mechanism is used to learn an optimal policy under the worst-case distribution within the ambiguity set. In this way, UAV trajectories, task splitting, and computation and communication resource allocation are jointly optimized to improve system robustness under dynamic environmental fluctuations.  Results and Discussions  The performance of the proposed DRO-SAC algorithm is evaluated through simulations. DRO-SAC achieves faster convergence and higher rewards than Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) algorithms (Fig. 3). For energy consumption, the proposed method consistently achieves higher efficiency under different user densities (Fig. 4). The robustness of the system against position errors is also verified, with energy fluctuations kept at a low level (Fig. 5). Dynamic trajectory adjustment further confirms that the proposed method can provide effective user coverage while reducing system energy consumption (Fig. 6).  Conclusions  A DRO-SAC-based joint optimization framework is proposed to address uncertainty in task data size and UAV position jitter in multi-UAV-assisted MEC networks. By constructing an ambiguity set for the task demand distribution and optimizing the worst-case expected objective, the proposed method mitigates the limitations of traditional deterministic models in dynamic environments. Weighted system energy consumption is minimized while latency and safety constraints are satisfied. Simulation results demonstrate that the proposed scheme achieves stable convergence and high energy efficiency, even when communication and computation resources are limited and environmental parameters fluctuate strongly.
Power Side-Channel Leakage Assessment and Chosen-Ciphertext Attack on the Decoding Function of Kyber
QIU Yubo, LI Ziqi, YUAN Chaoxuan, ZHOU Zijian, HU Wandi, HU Wei
Available online  , doi: 10.11999/JEIT251243
Abstract:
  Objective   The standardization of post-quantum cryptography makes the implementation security of Kyber a practical and urgent problem rather than a purely theoretical concern. As a lattice-based key encapsulation mechanism selected by NIST, Kyber achieves favorable efficiency and security based on the hardness of the Module Learning With Errors problem; however, its real-world deployment on embedded devices still exposes measurable physical leakage. Existing studies have shown that side-channel attacks can target several modules of Kyber, but two issues remain insufficiently addressed. First, the leakage strengths of different auxiliary functions along the decapsulation and reencryption path have not been compared within a unified assessment framework, which makes it difficult to identify the most dangerous implementation-level weak point. Second, although chosen-ciphertext attacks and power analysis have both been studied, the decoding function poly_frommsg() has not been fully exploited from the perspective of periodic leakage modeling and low-query key recovery. To address these problems, this work performs function-level leakage assessment for the key operations involved in Kyber decapsulation and then develops a chosen-ciphertext simple power analysis attack against the most vulnerable decoding function. The study is intended to provide both a practical attack method and implementation-oriented security insights for the protection of post-quantum cryptographic software on embedded platforms.  Methods   A function-oriented evaluation-and-attack framework is established around the execution path of Kyber.CCAKEM.Dec(). Four representative target functions are selected: the Barrett reduction function poly_reduce(), the encoding function poly_tomsg(), the decoding function poly_frommsg(), and the hash function G(). For each function, the intermediate variable that exhibits the largest data-dependent bit transition under crafted ciphertext inputs is first analyzed from the viewpoint of Hamming-distance leakage. Two ciphertext sets are then constructed so that the selected intermediate variable takes two maximally distinguishable values, and 50 power traces are collected for each set. The experiments are implemented on an STM32F407IG embedded platform, and the power signals are captured by a PicoScope 6406E oscilloscope at a sampling rate of 5 GS/s. Welch’s t-test based TVLA is adopted to quantify leakage significance, with ±4.5 used as the decision threshold for leakage existence. After the decoding function is identified as the most vulnerable point, a chosen-ciphertext SPA attack is designed. The attack first constructs ciphertexts according to the coefficient range of the secret polynomial, then extracts 256 points of interest from reference traces by local-maximum search, and finally builds a grouped threshold model according to the periodic energy structure of the points of interest. The recovered message bits are mapped back to the coefficients of the secret polynomial, enabling full private-key reconstruction for Kyber512 and Kyber768.  Results and Discussions   The leakage assessment demonstrates a clear difference among the four target functions. For poly_reduce(), the intermediate variable t depends directly on the coefficients of the intermediate polynomial mp, and the maximum Hamming distance reaches 13; accordingly, the measured TVLA peaks are concentrated around 50 for both Kyber512 and Kyber768 (Fig.5). For poly_tomsg(), the relevant binary transition corresponds to a Hamming distance of only 1, and the observed TVLA values are much smaller, approximately 6 (Fig.6). For poly_frommsg(), the message-dependent mask flips between 0 and 0xffff, yielding a Hamming distance of 16 and the strongest leakage among all tested functions; the TVLA peaks reach about 60, which identifies this module as the primary attack target (Fig.7). For the hash function G(), the leakage is weaker and less regular, but several sampling points still exceed the TVLA threshold, indicating that theoretical IND-CCA reinforcement through the FO transform does not automatically eliminate physical leakage (Fig.8). These results show that implementation-level vulnerability is highly correlated with data-dependent bit transitions and that linearly expanded message-processing functions may expose more stable power signatures than some arithmetic modules.Based on this observation, the proposed attack focuses on poly_frommsg(). The local-extrema analysis shows that the 256 message-bit operations generate 256 stable points of interest, and their energy values exhibit a periodic pattern with an approximate period length of 8 (Fig.10, Fig.11). Instead of applying a single global threshold to all points of interest, the proposed grouped threshold model divides the points according to their positions within the period and computes location-aware thresholds. This design suppresses position-dependent drift and improves the consistency of bit decisions. The resulting message-recovery procedure can reliably reconstruct the bit sequence from one attack trace under each chosen ciphertext. Combined with the precomputed ciphertext table, only 6 chosen ciphertexts are required to recover the private key of Kyber512 and only 9 chosen ciphertexts are required for Kyber768. Compared with the prior poly_frommsg()-based method, which needs 8 and 12 ciphertexts respectively, the proposed method reduces the ciphertext requirement by 25.0% while maintaining a 100% success rate (Table 4). Compared with the attack on poly_tomsg(), the proposed method exploits a function with stronger leakage observability and therefore achieves both higher decision stability and equal or better overall efficiency. The periodic points-of-interest model is thus not merely an empirical phenomenon; it directly supports the attack design and explains the practical gain in low-query key recovery.  Conclusions  This work shows that Kyber contains heterogeneous implementation-level vulnerabilities along its decapsulation path and that the decoding function poly_frommsg() is the most critical leakage point under the tested software implementation. By combining function-level TVLA assessment with a chosen-ciphertext SPA attack, the study not only pinpoints leakage sources in poly_reduce(), poly_tomsg(), poly_frommsg(), and G(), but also converts the observed periodic leakage structure of poly_frommsg() into an effective grouped threshold model for key recovery. The resulting attack reduces the number of required ciphertexts for Kyber512 and Kyber768 to 6 and 9, respectively, while preserving a 100% success rate. These findings indicate that practical protection of post-quantum software should go beyond algorithm-level security claims and explicitly consider masking, execution randomization, balanced implementations, and function-level leakage testing during deployment and validation.
A Multi-View Feature Extraction and Dual-Edge Contrastive Learning Approach for Image Forgery Detection
XU Zhuang, YE Ziyi, PAN Enkang, LIU Chunxiao
Available online  , doi: 10.11999/JEIT251271
Abstract:
  Objective  With the rapid development and widespread use of advanced image editing tools such as Adobe Photoshop and Meitu, the creation and dissemination of highly realistic forged images have become increasingly prevalent, posing significant challenges to the authenticity verification of visual content across various fields including journalism, forensic analysis, and social security. Conventional image forgery detection methods predominantly formulate the task as a pixel-wise binary classification problem, which often leads to label ambiguity and conflicts, especially around the edges of tampered regions. Additionally, most existing approaches primarily focus on spatial domain features, neglecting the rich complementary information available from other perspectives such as noise and frequency domains, which can be crucial for forgery detection.  Methods  To overcome these limitations, this paper proposes a novel image forgery detection algorithm based on multi-view feature extraction combined with a dual-edge contrastive learning framework. The core idea involves redefining the detection task as an intra-image inconsistency detection problem, thereby effectively avoiding the label conflict issues inherent in traditional pixel classification schemes. To address the semantic ambiguity and blurred boundaries at tampered edges, a dual-edge contrastive learning strategy is designed, which separately extracts and contrasts features from inner and outer edge regions as well as from non-edge tampered and non-tampered areas. This approach encourages the model to pay attention to challenging edge samples, thereby improving edge detection accuracy. Furthermore, the proposed method develops a dual-branch multi-view feature encoder to comprehensively capture diverse clues. The spatial domain branch employs a High-Resolution Network (HRNet) backbone to extract multi-scale spatial features, enhanced by a mixture-of-experts gating mechanism that dynamically weights features across scales and fuses residuals between adjacent scales, thus emphasizing subtle forgery traces. The noise domain branch extracts multiple noise-related features, including camera noise fingerprints, SRM filter responses, constrained Bayar convolution outputs, max pooling features, residuals from average pooling, and learnable Fourier domain features with adaptive masking. A mixture-of-experts strategy is also utilized to assign relevance weights to these heterogeneous features dynamically, according to each input image’s specific characteristics. During training, the fused multi-view features are subjected to the dual-edge contrastive learning framework, which employs a contrastive loss to enhance the discrimination between tampered and authentic regions, especially at their edges. At the inference stage, clustering algorithms such as K-means are applied to the learned feature representations to delineate tampered regions without relying on explicit pixel labels, thus providing a more flexible detection process.  Results and Discussions  Extensive experiments are conducted on multiple widely used benchmark datasets, including NIST16, Columbia, COVERAGE, DSO, and CASIA-v1, covering various forgery types such as splicing, copy-move, object removal, and post-processing. The proposed method consistently outperforms state-of-the-art approaches, achieving average permuted F1 and IoU score improvements of 26.0% and 10.1%, respectively, over the best existing methods (Table 3). Visualization results demonstrate superior tampered region localization, especially along tampered edge areas, with reduced false positives and clearer edge delineation (Fig. 5). Ablation studies further confirm the effectiveness of each key component, including multi-view feature extraction, the mixture of noise experts fusion mechanism, and the dual-edge contrastive learning strategy (Table 4, 5, 6).  Conclusions  This paper presents a novel image forgery detection framework that addresses the limitations of conventional classification-based methods by modeling the task as intra-image inconsistency detection. The introduction of dual-edge contrastive learning effectively mitigates semantic ambiguity at tampered edges, while the multi-view feature extraction encoder comprehensively captures spatial and noise domain clues. Experimental results across diverse datasets demonstrate significant improvements in detection accuracy and edge precision. Future work will explore extending the inconsistency detection paradigm to incorporate additional modalities such as text, enabling multimodal forgery detection.
Remote Sensing Land-cover Classification Combining Multi-modal and Multi-scale Fusion with Mamba
XIE Wen, ZHU Chaotao, WANG Jin, MA Xiaomeng
Available online  , doi: 10.11999/JEIT251303
Abstract:
  Objective   The rapid development of remote sensing imaging has generated large-scale and diverse data for remote sensing land-cover classification. In recent years, Mamba-based models have been successfully applied to image processing because of their distinctive architectures and strong global modeling capability. Among them, multi-scale vision Mamba models are well suited to complex spatial distributions. This property matches remote sensing scenes, in which ground objects often have large scale variations and complex orientations. To fully use the advantages of Mamba in feature extraction and fusion for remote sensing data, a Mamba-based Multi-modal and Multi-scale fusion model for Remote Sensing land-cover classification (M3RS) is proposed.  Methods   M3RS mainly contains three stages for feature extraction and fusion. First, a Multi-Scale Spatial Encoder based on Spatial Mamba is used to extract features from Light Detection And Ranging (LiDAR) images and Synthetic Aperture Radar (SAR) images. Considering the unique data structure of HyperSpectral Image (HSI), a Multi-Scale Spatio-Spectral Encoder is proposed to extract complex spatio-spectral features by using Spatial Mamba and Spectral Mamba. Next, a Multi-Modal Feature Fusion Module, consisting of the proposed Cross-Mamba and Channel-Concatenated Mamba, is designed to fuse multi-modal features. Cross-Mamba efficiently fuses multi-modal spatial features through the interaction of State Space Model (SSM) parameters from different modalities. Channel-Concatenated Mamba further fuses multi-modal features by constructing four channel scanning strategies. Finally, an improved Multi-Scale Feature Fusion Module is adopted to fuse multi-scale features layer by layer. This design provides highly discriminative features for classification and improves the accuracy of remote sensing land-cover classification.  Results and Discussions   Comparative experiments are conducted on three publicly available multi-modal remote sensing land-cover classification datasets. The proposed model is compared with seven mainstream models. The results show that M3RS achieves the best Overall Accuracy (OA), Average Accuracy (AA), and Kappa coefficient among all compared methods. On the Muufl dataset, the OA of M3RS is 3.49%, 3.80%, and 4.02% higher than those of representative Convolutional Neural Network (CNN)-, Transformer-, and Mamba-based models, respectively (Table 1, Fig. 8). On the Houston2013 and Augsburg datasets, the OA of M3RS exceeds those of all compared algorithms by an average of 3.37% and 3.11%, respectively (Tables 2 and 3). These results indicate that integrating a multi-modal and multi-scale architecture with Mamba improves the accuracy of remote sensing land-cover classification. In addition, the ablation experiment verifies the contribution of each proposed module to classification performance (Table 4). Spectral Mamba provides a clear accuracy gain, and the fusion modules further improve the overall performance to different degrees. The hyperparameter experiment also provides a useful configuration for multi-scale remote sensing image fusion (Table 5). Compared with a Transformer model using the same multi-scale architecture, M3RS achieves higher classification accuracy, reduces the parameter count by 37.4%, and shortens the training time by 10.7%. These results show that Mamba improves both accuracy and efficiency in this framework (Fig. 9).  Conclusions   M3RS uses Mamba to fuse multi-modal and multi-scale features, thereby improving remote sensing land-cover classification. The heterogeneous encoders in M3RS address differences among multi-modal data and provide richer complementary information for fusion and classification. Cross-Mamba and Channel-Concatenated Mamba account for both the similarities and differences between Mamba and Transformer. They achieve efficient multi-modal spatial feature interaction and comprehensive multi-modal feature fusion, respectively, forming a hierarchical fusion strategy. The multi-scale architecture also alleviates the difficulty caused by complex spatial distributions of remote sensing land covers. The proposed Multi-Scale Feature Fusion Module, composed of Spatial Mamba and channel attention, integrates multi-scale features and provides a reliable basis for subsequent classification. Future work will further optimize the model by exploring the principles of Mamba and refining feature alignment in cross-attention-based multi-modal interaction, thereby improving the reliability of feature fusion.
Optimal Weighted Subspace Fitting-based Direct Position Determination with HF/VHF Collaboration
YANG Gao-yuan, YIN Jie-xin, WANG Ding, YANG Bin
Available online  , doi: 10.11999/JEIT260001
Abstract:
  Objective   Passive localization is essential for target detection, navigation, and track tracking, particularly in military applications involving maritime and aerial targets. These targets often transmit across multiple frequency bands, including shortwave High Frequency(HF) and Very High Frequency (VHF). Existing localization methods largely rely on single-band approaches or two-step positioning techniques. Single-band methods underutilize the positional information available across different bands, while two-step methods lose information during intermediate parameter estimation (e.g., Direction-Of-Arrival (DOA); Time-Difference-Of-Arrival (TDOA)), reducing localization accuracy. Collaborative fusion of HF signals (via ionospheric reflection) and VHF signals (via Doppler effects from moving arrays) has been rarely addressed. To overcome low positioning accuracy and limited spatial resolution in over-the-horizon multi-target scenarios, this study proposes a novel collaborative Direct Position Determination (DPD) method designed to integrate the complementary strengths of HF and VHF signals, enhancing localization precision and robustness in complex electromagnetic environments.  Methods  An Optimal Weighted Subspace Fitting (OWSF) DPD algorithm is proposed. Comprehensive signal propagation models are established for heterogeneous observation platforms (Fig. 1). HF signal propagation is modeled using a two-dimensional DOA framework based on ionospheric reflection, incorporating azimuth and elevation angles to handle nonlinear over-the-horizon propagation. VHF signals are modeled using a space-time extended signal framework for a moving Unmanned Aerial Vehicle (UAV), exploiting Doppler effects to create a virtual large-aperture array that captures both one-dimensional angle and Frequency-Of-Arrival (FOA) information. Unlike traditional methods that process each band separately, the OWSF algorithm constructs a unified cost function that fuses the signal and noise subspaces of both HF and VHF data using optimal weighting matrices, balancing the contributions of different signal qualities. Target positions are then estimated by minimizing this cost function via grid search or Newton iteration. The Cramér-Rao Bound (CRB) under Earth-ellipsoid constraints is derived to provide the theoretical performance limit.  Results and Discussions   Simulations are conducted in a centralized processing scenario, where HF stations and UAV VHF signals are transmitted to a central station for joint processing (Fig. 2). The simulation involves three stationary targets and a collaborative system comprising HF stations and a UAV (Fig. 3, Table 2, Table 3). Performance comparisons demonstrate that the OWSF method consistently outperforms traditional two-step positioning methods and single-system DPD methods (DOA-only or FOA-only) in Root Mean Square Error (RMSE) (Fig. 4). When HF SNR is 5 dB lower than VHF SNR, OWSF exhibits superior robustness compared to Subspace Data Fusion (SDF) and Minimum Variance Distortionless Response (MVDR) methods, approaching the CRB at high SNR (Fig. 5). The impact of system parameters is further analyzed, showing that increasing the number of sampling points (Fig. 6) and array elements (Fig. 7) improves accuracy, particularly in low SNR regimes. Regarding spatial resolution, the OWSF algorithm generates sharper spectral peaks for distant targets and successfully resolves closely spaced targets that the SDF-DPD algorithm fails to distinguish (Fig. 8, Fig. 9).  Conclusions   The HF/VHF collaborative DPD method effectively integrates multidimensional observational information from ionospheric reflection and Doppler-based propagation. Simulation results demonstrate substantial improvements in localization accuracy, spatial resolution, and robustness, especially under low-SNR conditions or heterogeneous signal quality between bands. The derived CRB provides a solid theoretical benchmark, confirming that the method overcomes the limitations of single-band and two-step approaches. This approach offers a highly effective solution for over-the-horizon passive localization of multiple stationary targets.
Household Appliance Plastics Identification by Fusing Multi-Level Feature Enhancement and Hierarchical Classification
CHONG Penghao, ZHENG Yunlong, YANG Aosong, GUO Mengci, LI Shifeng
Available online  , doi: 10.11999/JEIT260084
Abstract:
  Objective  Accurate plastic identification remains challenging in waste household appliance recycling under low-resolution spectral conditions. In practical recycling environments, plastics often have complex compositions, surface contamination, and aging effects, which increase classification difficulty. Black plastics are especially difficult to identify because their strong light absorption and spectral overlap in the Visible-Near Infrared (Vis-NIR) range reduce feature separability and degrade classification performance. Under these conditions, conventional single-stage classification models often fail to maintain stable accuracy. To address this problem, an automated identification method is proposed for low-dimensional multispectral feature spaces. The method aims to improve the discriminative capability of limited spectral information and enhance classification accuracy for complex plastic categories.  Methods  A compact Vis-NIR multispectral acquisition system based on the AS7265x sensor is used to collect 18-channel reflectance data in the 410~940 nm range. A handheld acquisition device with a controlled optical structure is designed to reduce environmental interference and ensure measurement consistency (Fig. 3). A total of 576 samples are collected from five typical household appliance plastics, including Acrylonitrile Butadiene Styrene (ABS), High-Impact PolyStyrene (HIPS), PolyPropylene (PP), Acrylonitrile Styrene copolymer (AS), and Polycarbonate/Acrylonitrile Butadiene Styrene (PC+ABS) blends. These samples are obtained from waste household appliances and are subjected to preliminary surface cleaning before spectral acquisition. To improve feature representation, a multi-level feature engineering strategy is adopted. This strategy integrates original spectral intensity features, nonlinear polynomial expansion features, and adjacent-channel ratio features to characterize both global and local spectral information. The nonlinear expansion enhances the representation of reflectance variations, whereas the ratio features capture local spectral-shape changes and reduce external disturbances. These features are combined into a 53-dimensional feature vector. Linear Discriminant Analysis (LDA) is then applied to enhance interclass separability. To address spectral overlap and class imbalance, a Hierarchical Joint Classifier (HJC) is constructed. HJC uses a two-stage classification framework. In the first stage, an XGBoost-based primary classifier performs coarse classification to separate easily distinguishable samples and group spectrally similar black plastics. In the second stage, a TabTransformer-based secondary classifier performs fine-grained classification of difficult samples (Fig. 6). This hierarchical design reduces classification complexity and improves discrimination for challenging categories. Model performance is evaluated using five-fold cross-validation and an independent test set. Accuracy, precision, recall, and F1-score are calculated from confusion matrices (Fig. 7). Comparative experiments are conducted with traditional machine learning methods, ensemble learning models, and deep learning approaches under different feature-processing strategies (Fig. 8, Fig. 9).  Results and Discussions  The proposed HJC achieves a classification accuracy of 97.4% in five-fold cross-validation and 93.1% on the independent test set (Table 4). Compared with single-stage classifiers and methods without feature enhancement, the proposed method provides higher performance and greater stability under low-resolution spectral conditions. Comparative results show that the proposed method outperforms baseline approaches, such as PCA combined with CNN, which achieves an accuracy of approximately 71.3% on the same dataset (Fig. 8). This improvement indicates that the proposed feature engineering strategy effectively strengthens the discriminative capability of low-dimensional spectral data. Combining LDA with feature engineering further improves class separability compared with conventional PCA-based methods. Confusion matrix analysis shows that misclassifications mainly occur between spectrally similar black ABS and black HIPS samples, whereas most other categories achieve high classification accuracy (Fig. 9). These results indicate that spectral overlap remains the main challenge under low-resolution conditions. The hierarchical classification strategy reduces this problem by focusing classification resources on difficult samples, thereby improving the overall generalization ability of the model. Overall, the proposed method shows robustness under practical conditions, including spectral noise, limited channel resolution, and material heterogeneity. These results indicate its suitability for real-world recycling applications.  Conclusions  A hierarchical classification method with multi-level spectral feature engineering is developed for plastic identification under low-resolution Vis-NIR conditions. Nonlinear and spectral-shape features are incorporated into a two-stage framework to improve the identification of spectrally similar materials. The results show stable accuracy across different plastic types. The method is suitable for automated sorting in waste household appliance recycling and can be extended to other material identification tasks with limited spectral information.
Spatial Information-guided Diffusion for Domain Adaptation Semantic Segmentation of Remote Sensing Images
LIANG Yan, LI Jun-Fan, SHAO Kai, HU Lin
Available online  , doi: 10.11999/JEIT260031
Abstract:
  Objective  Domain Adaptation Semantic Segmentation (DASS) is critical for remote sensing applications, including land-cover mapping, urban planning, and environmental monitoring. However, deep learning models often show severe performance degradation under domain shifts caused by imaging variation, geographic differences, and label-semantic heterogeneity. Conventional feature-alignment and generative adversarial network-based methods often fail to preserve semantic consistency. They are also sensitive to noisy supervision, especially when cross-domain gaps are large. This work aims to construct a robust DASS framework for semantically consistent image translation and reliable knowledge transfer.  Methods  A two-stage framework, termed Co-training Spatial-Guided DASS (CoSG-DASS), is proposed by integrating image translation and co-training. In the image-translation stage, a spatial information-guided latent diffusion model enhanced by ControlNet is designed. Semantic pseudo-labels and depth estimates are used as horizontal semantic and vertical spatial conditions to guide target-style image generation. To reduce the effect of noisy pseudo-labels, an Entropy-based Adaptive Guidance Intensity Module (EAGIM) is introduced. EAGIM estimates pixel-level confidence using information entropy and suppresses unreliable features. In the co-training stage, translated target-style images and unlabeled real target-domain images are used to train a segmentation model with a depth-guided segmentation head. Cross-entropy loss and adversarial loss are jointly used for optimization.  Results and Discussions  Extensive experiments are conducted on three cross-domain tasks. CoSG-DASS generates images that better match target-domain distributions. Quantitative results based on Fréchet Inception Distance (FID) show that the proposed method outperforms CycleGAN, UNI-Diff, and CRS-Diff in most settings (Table 1). Visual comparisons (Fig. 6) show that the method reduces edge blurring and category confusion. It also improves the separation of roads and vegetation and preserves small objects, such as vehicles. In the semantic segmentation stage, CoSG-DASS outperforms state-of-the-art domain adaptation methods. It improves mean Intersection over Union (mIoU) by 1.14%, 3.78%, and 2.49% on the cross-geographic task (Vaihingen IRRG→Potsdam IRRG), cross-imaging-mode task (Vaihingen IRRG→Potsdam RGB), and bidirectional label-semantic-heterogeneity tasks between DFC25 and LoveDA, respectively (Tables 24). Visual segmentation results (Fig. 7) confirm its strong boundary preservation and high accuracy in complex scenes. Ablation studies (Table 5) verify the contribution of the core components, including depth control, pseudo-label guidance, EAGIM, and the co-training strategy. Feature-distribution visualization based on Uniform Manifold Approximation and Projection (UMAP) further shows that CoSG-DASS reduces intra-class variation and increases inter-class separation after adaptation (Fig. 8).  Conclusions  CoSG-DASS alleviates domain shifts in remote sensing images through semantic-preserving diffusion-based translation and depth-guided co-training. It improves both image-translation quality and segmentation accuracy over existing methods. The proposed framework provides an effective solution for multi-source remote sensing interpretation. Future work will focus on extreme label-semantic heterogeneity and lightweight diffusion architectures.
A Multi-layer Resilient Control Framework for Networked Microgrids against False Data Injection Attacks
HUANG yu, CAO zhengyang, HU songlin, YUE dong, CHEN yonghua, YAN yunsong
Available online  , doi: 10.11999/JEIT250850
Abstract:
  Objective  With the increasing penetration of distributed renewable energy and the growing dependence on cyber-physical infrastructure, Networked MicroGrids (NMGs) are increasingly vulnerable to False Data Injection Attacks (FDIAs). These attacks threaten frequency stability and system security. Traditional secondary control methods are limited by constrained communication resources and fixed sampling mechanisms. They often fail to maintain resilient operation under stealthy FDIAs and dynamic disturbances. To address these challenges, this study develops a multi-layer resilient control strategy that integrates event-triggered communication/control, data-driven attack observation, and double-replay Q-learning. The objective is to improve communication efficiency, attack detection, and stability recovery in NMGs under complex cyber threats.  Methods  The proposed Event-Triggered Control–Radial Basis Function–Double-Replay Q-Learning (ETC–RBF–DRQL) framework integrates an Event-Triggered Control (ETC) mechanism, a Radial Basis Function Unknown Input Observer (RBF-UIO), and a Double-Replay Q-Learning (DRQL) compensator to achieve resilient frequency control in NMGs under FDIAs. The ETC mechanism reduces redundant data transmission while maintaining system stability. The RBF-UIO estimates system states and detects anomalous deviations. After an attack is detected, the DRQL module adaptively generates compensation signals to suppress the attack effect and restore system stability. The framework is formulated using a modular dynamic model of NMGs, which supports stability analysis under communication and attack constraints. Simulation experiments are conducted on a 4-node distributed microgrid testbed in MATLAB/Simulink. The testbed includes different renewable energy sources and realistic communication links to verify the effectiveness and scalability of the proposed approach.  Results and Discussions  The proposed ETC–RBF–DRQL framework is validated on a 4-node NMG under FDIA scenarios. Simulation results show that the method achieves better overall performance in frequency regulation, communication efficiency, and attack resilience. Specifically, the frequency deviation peak is reduced from 0.021 8 Hz under periodic Proportional-Integral (PI) control to 0.012 1 Hz. The steady-state average deviation and fluctuation standard deviation are reduced to 0.009 7 Hz and 0.007 4 Hz, respectively (Fig. 4, Table 2). The average communication event rate decreases to 11.9 pkt·s^-1, corresponding to a 76.2% reduction compared with periodic sampling (Table 2). The proposed framework also maintains reliable attack detection performance, with a detection rate of 91.5%, a false alarm rate of 4.8%, and an area under the curve (AUC) of 0.968 (Table 2). These results indicate that the proposed method can coordinate frequency recovery, communication overhead reduction, and FDIA mitigation in NMGs.  Conclusions  This paper investigates a multi-layer resilient control framework for NMGs under FDIAs and communication constraints. The proposed ETC–RBF–DRQL method integrates event-triggered communication/control, RBF-UIO-based attack detection, and DRQL-based adaptive compensation. It therefore enables closed-loop coordination among anomaly detection, attack suppression, and frequency stability recovery. Simulation results on a 4-node NMG show that, compared with conventional PI-based schemes, the proposed approach reduces frequency deviation peaks and shortens recovery time while lowering communication overhead. Theoretical analysis further confirms its feasibility and stability under bounded estimation errors. This study focuses on sensor-side FDIAs and simplified communication conditions. Future work will consider more complex multi-type attacks and hardware-in-the-loop validation to support engineering applications.
Graph Representation Learning Driven Adaptive Streaming for Point Cloud Video
LIU Wei, CHEN Ruiyang, WANG Xi, ZHANG Jiawei, XU Jing
Available online  , doi: 10.11999/JEIT251084
Abstract:
  Objective   The increasing demand for immersive media propels point cloud video into the spotlight for applications such as virtual and augmented reality. However, the massive data volume of point cloud streams poses a significant challenge to current network infrastructures, jeopardizing the user's Quality of Experience (QoE) under limited bandwidth. Existing Adaptive Bitrate (ABR) streaming solutions are hindered by two primary limitations. Viewport prediction models often focus solely on temporal features, leading to insufficient accuracy for long-term predictions in complex Six-Degrees-of-Freedom (6DoF) movement. Concurrently, dynamic quality allocation strategies struggle to make optimal online decisions under the uncertainties of prediction errors and network fluctuations, failing to effectively balance conflicting QoE metrics. This research addresses these challenges by proposing an integrated framework that combines high-precision viewport prediction with intelligent, context-aware quality allocation to enhance QoE for point cloud video streaming.  Methods   The proposed method integrates a graph-based viewport prediction scheme with a context-aware quality allocation mechanism. For viewport prediction, an “anchor point graph” is constructed to explicitly model the user's spatial movement patterns. This graph is processed using representation learning to generate low-dimensional embeddings for each anchor point, which encapsulate rich spatial context. These learned spatial features are concatenated with real-time 6DoF viewport data to form a fused feature sequence. A stacked Long Short-Term Memory (LSTM) network processes this sequence to accurately predict the user's future viewport trajectory. For quality allocation, the sequential decision-making process is modeled as a contextual bandit problem, adopting the LinUCB algorithm as the decision engine. At each decision epoch, a context vector is constructed for each spatial tile, incorporating critical information such as its predicted utility, historical quality level, and location relative to the predicted viewport. The LinUCB algorithm utilizes this context to select an optimal action for each tile, thereby maximizing cumulative QoE under the bandwidth budget, as detailed in Algorithm 1.  Results and Discussions   Extensive simulations validate the framework's performance using the public 8i Voxelized Full Bodies dataset, real-world user viewport traces, and 5G network bandwidth profiles. In the viewport prediction task, the proposed model significantly outperforms baselines, achieving a stable average F1-score of 0.9838 (Fig. 4) and maintaining a consistently low root-mean-square error (RMSE) over long prediction horizons (Fig. 3). In the end-to-end streaming evaluation, the integrated framework demonstrates remarkable improvements in overall QoE. Cumulative distribution function (CDF) plots reveal that the proposed scheme consistently delivers higher QoE, user-perceived utility, and video quality, while incurring the lowest quality fluctuation (Fig. 5). Notably, under fluctuating network conditions, the solution improves the mean QoE by 54.82% compared to the next-best baseline at an average bandwidth of 100 Mbps (Fig. 6), highlighting its efficiency in resource-constrained environments.  Conclusions   This paper presents a complete adaptive streaming framework to address the QoE optimization challenge for point cloud video. By developing a novel 6DoF viewport prediction model that leverages graph representation learning, long-term prediction accuracy is significantly enhanced. Furthermore, by framing dynamic quality allocation as a contextual bandit problem, the system makes intelligent, online decisions that adapt to both prediction outcomes and dynamic network conditions. Comprehensive experimental results validate the effectiveness of this integrated approach, which consistently outperforms existing solutions in both prediction accuracy and overall user QoE.
From Touch to Semantics: A Cross-Modal Framework for Zero-Shot Spiking Tactile Object Recognition
CHI Wei, XU Jin
Available online  , doi: 10.11999/JEIT260158
Abstract:
  Objective  Tactile perception enables robots to understand object properties and perform dexterous interactions. However, tactile data are costly to collect and difficult to scale, which limits conventional supervised learning in open-world scenarios. Zero-Shot Learning (ZSL) provides a promising solution by transferring knowledge from seen to unseen categories through semantic representations. Existing tactile ZSL methods either rely on auxiliary visual information or use manually designed attributes, which are often subjective and limited in generalization. Event-based spiking tactile signals are sparse and asynchronous, with rich spatiotemporal dynamics. These properties make semantic modeling more challenging. Systematic studies on zero-shot recognition for such data remain limited. To address these issues, this paper proposes a zero-shot object recognition framework for spiking tactile perception. The framework aims to bridge low-level tactile dynamics and high-level semantics in a scalable manner.  Methods  The proposed framework consists of three components (Fig. 1): spiking tactile feature extraction, semantic prototype construction, and cross-modal tactile–semantic alignment. First, a biomimetic Spiking Graph Neural Network (SGNN) is used to model raw event-based spiking tactile signals. By integrating Leaky Integrate-and-Fire (LIF) neurons with graph-based message passing, the SGNN captures temporal firing dynamics and spatial relationships among tactile sensing units. It then generates discriminative and biologically interpretable high-level tactile embeddings. Second, instead of using manually annotated attributes, a Large Language Model (LLM) is used to generate structured, fine-grained, and extensible tactile attribute descriptions for each object category. These textual descriptions are encoded as continuous semantic vectors to form class-level semantic prototypes with consistent dimensionality across categories. This strategy supports flexible semantic expansion and avoids labor-intensive attribute engineering. Third, a bidirectional tactile–semantic alignment mechanism is designed to improve generalization to unseen categories. A forward mapping projects tactile embeddings into the semantic space for classification, whereas a reverse mapping reconstructs tactile features from semantic representations. A cycle-consistency constraint is imposed between the two mappings to preserve structural coherence and semantic stability across modalities. The overall framework is trained only on seen categories. During zero-shot inference, tactile embeddings of unseen samples are matched with their corresponding semantic prototypes in the shared embedding space.  Results and Discussions  The proposed method is evaluated on the Ev-Object event-based tactile dataset under a strict zero-shot setting, with disjoint seen and unseen category sets. Performance is assessed using Mean Class Accuracy (MCA), Top-k accuracy, and the Semantic Alignment Score (SAS). The proposed framework consistently outperforms representative tactile ZSL baselines across all metrics. It achieves an MCA of 73.48%, a Top-1 accuracy of 62.68%, and a Top-2 accuracy of 88.75%. Ablation studies show that removing the LLM semantic module, bidirectional mapping, or cycle-consistency constraint reduces recognition performance and semantic alignment quality. Removing the LLM semantic module causes a substantial decrease in MCA, which confirms the role of structured LLM-generated tactile semantics in knowledge transfer. Removing the bidirectional mapping or the cycle-consistency constraint also reduces performance, indicating that both components help maintain stable cross-modal alignment. The t-SNE visualization further shows that cycle-consistent alignment yields more compact intra-class clusters and clearer inter-class separation for unseen categories. Semantic prototypes are also better located near the centers of tactile feature clusters. These results indicate that combining biologically inspired spiking models with LLM-generated tactile semantics provides an effective solution for open-world tactile perception.  Conclusions  This paper presents a zero-shot object recognition framework for spiking tactile perception by integrating SGNN-based tactile representation with semantic prototypes. The proposed method addresses key limitations of existing tactile ZSL approaches by avoiding visual data and manual attribute design while effectively modeling the spatiotemporal dynamics of event-based spiking tactile signals. Experimental results under strict zero-shot settings confirm the effectiveness and robustness of the proposed framework. This work provides a strong baseline for zero-shot spiking tactile recognition and offers a principled path toward open-world tactile cognition in robotic systems. Future work will explore generalized zero-shot tactile perception, multimodal extensions, and real-world robotic deployment under noisy and dynamic sensing conditions.
Research on Secure and Covert Transmission for UAV-Assisted Visible Light Communication Systems
WU Mengru, LIN Jiale, LU Weidang, LI Bo, GUO Lei
Available online  , doi: 10.11999/JEIT260239
Abstract:
  Objective  Owing to mobility and on-demand coverage capabilities, unmanned aerial vehicles (UAVs) can serve as aerial base stations to enable visible light communication (VLC). However, air-ground communication links are exposed to open environments, making VLC vulnerable to data eavesdropping and malicious detection. To address this issue, this paper proposes a secure and covert transmission strategy for a UAV-assisted VLC system from the perspectives of physical layer security and covert communication. The proposed strategy jointly optimizes the transmit power and hovering altitude of a UAV to maximize the system’s secrecy capacity, subject to the requirements of covert communication and illumination targets, as well as the operational constraints on UAV transmit power and hovering altitude.  Methods  This paper investigates secure and covert communication for UAV-assisted VLC. First, a UAV-assisted VLC system model is proposed, in which a mobile UAV equipped with light-emitting diodes is employed to establish a VLC link with a ground user in the presence of an eavesdropper and a warden. Subsequently, an optimization problem is formulated to maximize the secrecy capacity of the system by jointly optimizing the transmit power and hovering altitude of the UAV. To solve the formulated problem, we propose a two-layer optimization (TLOP) algorithm to decompose the transformed problem into two subproblems, including an inner-layer subproblem for transmit power optimization and an outer-layer subproblem for the design of UAV hovering altitude. On this basis, a closed-form expression for the optimal transmit power is derived for the inner-layer problem, while a particle swarm optimization (PSO) algorithm is developed to solve the outer-layer problem.  Results and Discussions  In the simulations, the proposed optimization scheme is compared with two baselines. First, the convergence of the proposed TLOP algorithm is verified (Fig. 3). The results demonstrate that the algorithm achieves rapid convergence within a limited number of iterations. Second, the optimal hovering altitude of the UAV with respect to the UAV’s horizontal coordinates under spatial distribution is illustrated (Fig. 4). The results indicate that as the UAV gradually approaches the ground legitimate user, its optimal hovering altitude exhibits a downward trend. Then, the secrecy capacity with respect to the UAV’s horizontal coordinates is presented (Fig. 5). It can be clearly observed that the secrecy capacity shows an upward trend as the UAV approaches the ground legitimate user. This is because as the UAV gets closer to the user, the constraints imposed by secure and covert communication are gradually relaxed. Furthermore, as \begin{document}$ \epsilon $\end{document} increases, the secrecy capacity of all schemes exhibits an increasing trend (Fig. 6). This is because the relaxation of covertness requirements enables the UAV to flexibly adjust its hovering altitude and transmit power to increase the secrecy capacity of the system. In addition, the secrecy capacity shows a declining trend as the number of symbols increases (Fig. 7). The reason can be attributed to the fact that the increase in the number of symbols provides the warden with more signal samples available for detection. Finally, the secrecy capacity of all schemes decreases as the uncertainty region radius of malicious users increases (Fig. 8). This trend is explained by the increased uncertainty requires the UAV to address potential threats over a wider area, which forces the UAV to adopt a more conservative strategy. In conclusion, simulation results confirm that the proposed scheme can improve the secrecy capacity of the UAV-assisted VLC system.  Conclusions  This paper investigates secure and covert communication in a UAV-assisted VLC system. Our goal is to maximize the secrecy capacity of the system by jointly optimizing the transmit power and hovering altitude of a UAV. Given that the formulated problem is a highly non-convex problem, a TLOP algorithm based on PSO is designed to solve it. Simulation results demonstrate that the proposed algorithm achieves fast convergence and improves the system’s secrecy performance compared to baselines.
MG-MoE: Routed Multi-Granularity Expert Ensemble
XIAN Fengyu, JIAN Haifang, XIE Zihui, DU Jun, ZHANG Yuanyuan, NING Xin, DONG Miaomiao, WANG Hongchang
Available online  , doi: 10.11999/JEIT260219
Abstract:
  Objective  Fine-grained image recognition (FGIR) aims to distinguish visually similar subcategories that differ only in subtle local patterns while remaining robust to large intra-class variations such as pose changes, occlusions, illumination shifts, and complex backgrounds. In real-world settings, these challenges are further compounded by long-tailed category distributions, where rare or hard classes are prone to overfitting spurious context and suffering unstable decision boundaries. This motivates a conditional computation paradigm in which complementary inductive biases are explicitly separated into specialized expert branches and combined adaptively per sample. The goal of this work is to develop a routed multi-granularity mixture-of-experts framework that improves discriminative performance under controllable inference cost, while enhancing robustness on difficult samples and long-tailed categories through adaptive sparse expert activation.  Methods  We propose MG-MoE (Multi-Granularity Mixture-of-Experts), a routed ensemble architecture composed of a shared backbone, four heterogeneous experts, and a learnable router that predicts expert weights conditioned on the input (Fig. 2). The experts are deliberately instantiated with complementary inductive biases to cover the key factors in FGIR: (1) MPSA emphasizes global structure and contour-level semantics; (2) PMG captures fine local details through multi-granularity part modeling; (3) TransFG focuses on pose- and deformation-aware modeling; and (4) PIM improves robustness under cluttered backgrounds via background suppression mechanisms. To limit interference and reduce unnecessary computation, MG-MoE adopts sparse fusion, where only the Top-K experts (K=2 by default) contribute to the final prediction at inference.To improve routing stability and generalization, we introduce a two-stage optimization strategy. The first stage performs dynamic cluster-level training, where a cluster-level soft teacher distribution is constructed from validation-set statistics and imposed through KL-divergence regularization to stabilize routing behavior and promote effective specialization among experts. The second stage performs residual fine-tuning: while keeping the feature-driven routing mechanism unchanged, the classification heads of the Top-2 experts associated with each cluster are selectively unfrozen, and the router and expert heads are jointly optimized with grouped learning rates. This design reduces fusion bias and strengthens discrimination on difficult samples and long-tailed categories.  Results and Discussions  MG-MoE achieves strong performance on standard FGIR benchmarks. On CUB-200-2011, MG-MoE attains 92.89% Top-1 accuracy, exceeding representative expert backbones when used individually, such as MPSA (91.23%), PIM (91.17%), and TransFG (90.49%), and surpassing multi-granularity baselines such as PMG (88.32%) (Table 1). On the larger Bird-1445 dataset, MG-MoE continues to show consistent improvements over strong baselines, indicating that routed multi-expert specialization remains effective under a higher number of categories and stronger long-tail effects (Table 2).The efficiency–accuracy trade-off is summarized in Table 3. MG-MoE (Top-2) reaches the best accuracy (92.89%) with a compute budget of 143.9 GFLOPs.Importantly, MG-MoE avoids dense expert activation at inference by selecting only the Top-2 experts for each sample, yielding a favorable accuracy–efficiency trade-off, and ablations show that increasing K beyond 2 does not yield consistent gains, suggesting that indiscriminate fusion can dilute discriminative evidence. Specifically, Top-2 fusion delivers the best performance, whereas Top-1 is more sensitive to routing errors and larger K can introduce noise and reduce accuracy (Table 4).We further analyze the role of expert diversity and composition. Experiments with fewer experts (two- or three-expert variants) generally underperform the full four-expert configuration, indicating that each inductive bias contributes nontrivially to handling different fine-grained difficulty factors. Conversely, simply adding more experts without introducing genuinely new inductive biases yields diminishing or negative returns, consistent with increased routing ambiguity and limited functional diversity (Table 5). These results support the design choice of a compact set of heterogeneous experts combined with sparse routing.To interpret the learned specialization, we visualize category-wise routing statistics. The expert–category heatmap shows that MPSA dominates routing weight across many categories, reflecting the central role of global structure in fine-grained discrimination; meanwhile, PIM and TransFG exhibit noticeable activation increases on specific difficult categories, aligning with their intended functionality for background suppression and pose/deformation modeling (Fig. 3). Finally, t-SNE visualizations illustrate the qualitative effect of expert fusion on class separability: shared backbone features exhibit stronger inter-class entanglement for visually similar subcategories, whereas fused outputs form clearer clusters with improved between-class separation and within-class compactness, consistent with a more reliable decision space shaped by routed expert aggregation (Fig. 4).  Conclusions  This work presents MG-MoE, a multi-granularity routed mixture-of-experts framework for fine-grained recognition. By combining four complementary experts with Top-2 sparse fusion and a two-stage optimization strategy for stable routing and calibrated fusion, MG-MoE improves recognition accuracy on CUB-200-2011 and Bird-1445 while providing interpretable evidence of expert specialization (Table 1, Table 2, Fig. 3, Fig. 4). Ablations confirm that controlled Top-2 fusion and heterogeneous expert design are key to the observed gains, while overly dense fusion or homogeneous expert expansion offers limited benefit (Table 4, Table 5).
An Inverse-Hybrid-Modeling Digital Twin System for Natural Gas Energy Metrology
LIU Bin, ZHONG Lu, FENG Quanyuan, CHEN Yihong
Available online  , doi: 10.11999/JEIT260289
Abstract:
  Objective  Global natural gas consumption continues to rise at an average annual growth rate of 3.2%. A 0.1% reduction in energy measurement error can reduce trade disputes by approximately $750 million per year. Traditional research primarily adopts indirect methods for energy measurement, with the mainstream chromatographic analysis and acoustic velocity correlation methods facing multiple bottlenecks and certain application limitations. Chromatographic analysis exhibits low anti-interference error but suffers from high flow dynamic response delay and insufficient dynamic calibration capability. Additionally, it has poor adaptability to multi-gas source switching, requires manual calibration, and has high usage and maintenance costs. The lack of interoperability standards for energy networks further exacerbates system integration difficulties. Although acoustic velocity correlation demonstrates low-latency flow dynamic response, it has high anti-interference error, which may increase significantly with single-component content changes (e.g., hydrogen content from 5% to 10%) and even fail under complex operating conditions (e.g., multi-gas source mixing, dynamic pressure fluctuations). To address these challenges, new mechanism modeling-oriented methods have emerged, with the two most representative research directions being “mechanism modeling-driven” and “hybrid modeling”. Both approaches integrate multi-source data fusion with virtual-real interaction to establish mechanism models between flow rate, other parameters, and energy, providing a new paradigm for accurate energy measurement, but new challenges have also arisen. The “mechanism modeling-driven” approach is based on static flow modeling using Computational Fluid Dynamics (CFD), but it has low dynamic parameter update efficiency (delay > 30 seconds), struggles to adapt to real-time operating condition changes, and relies on massive labeled data with insufficient interpretability. The “hybrid modeling” approach has unresolved issues in multi-module collaborative optimization. Furthermore, the core challenge of existing research lies in the lack of industrial-grade verification platform support, addressing the problems of dynamic response delay, parameter identification difficulties, excessive physical simplification, and weak anti-interference capability in traditional natural gas energy measurement methods under complex conditions. Building on the latest research findings of the “mechanism modeling-driven” and “hybrid modeling” approaches, this study innovatively introduces a variational autoencoder (VAE)-based operating condition feature extraction algorithm and a dynamic Bayesian network parameter calibration mechanism, combined with variational expectation-maximization (VEM) algorithm for offline calibration. It proposes a reverse hybrid modeling-driven digital twin system, which effectively solves the aforementioned problems in traditional natural gas energy measurement processes.  Methods   This study proposes a natural gas energy measurement digital twin system based on reverse hybrid modeling, which centers on a three-tier architecture of “algorithm-system-scenario”. It integrates calorific value, flow rate, and energy mechanism models with multi-source real-time data streams. A variational autoencoder (VAE) is introduced to achieve unsupervised operating condition feature mining, and a parameter self-correction loop is constructed by combining dynamic Bayesian network with variational expectation-maximization (VEM) system calibration. Industrial-grade devices such as ultrasonic flowmeters and gas chromatographs are integrated to ensure real-time data transmission and closed-loop control. The system covers core operating conditions such as dynamic pressure fluctuations, hydrogen-containing gas mixtures, and multi-gas source switching, ensuring a high degree of adaptability between the model and practical applications. Through 25 weeks of continuous verification on a full-scale industrial-grade experimental platform, the results show that the system has an operational delay ≤ 3.8 s, data transmission jitter ≤ 0.5 s, average daily energy consumption per device ≤ 1.2 kW·h, mean time between failures (MTBF) ≥ 4 100 h, energy measurement error ≤ 0.15%, calorific value error ≤ 0.12%, and flow rate indication error ≤ 0.2%. Meanwhile, the system meets security requirements through industrial Ethernet encryption and hierarchical access control, providing engineering support for intelligent pipeline network optimization and standardized integration.  Results and Discussions  First, a multi-level hybrid modeling framework is established: modular hybrid modeling is achieved through an algorithm-system-scenario three-tier architecture. Numerical methods combined with data are generally more flexible than purely analytical models and can be used to represent complex multi-physical systems because they employ fewer physical lumped parameterizations; moreover, under mechanical, energy, and hydrodynamic effects, these parameters may change during the energy measurement process. Deep integration of mechanism models and real-time data is realized through variational autoencoder (VAE) and dynamic Bayesian network, reducing parameter synchronization delay to 3.8 seconds, which effectively supports fluid-acoustic co-simulation and rapid response for complex working conditions such as hydrogen-containing natural gas. Second, an integrated algorithm for reverse hybrid modeling and system calibration is proposed: by introducing variational autoencoder (VAE), dynamic Bayesian network, and variational expectation-maximization (VEM) algorithm, a reverse hybrid modeling algorithm is constructed to form a self-supervised and adaptive intelligent system with inner closed-loop operation. The VAE encoder compresses high-dimensional operating condition data into low-dimensional feature vectors, enabling unsupervised feature extraction without massive labeled data. It can also automatically generate perturbed data similar to the input data based on learned internal distribution laws of the data, simulating abnormal operating conditions to verify anti-interference capability. Combined with dynamic Bayesian network, a continuous iterative cycle of "prior → evidence → posterior" is constructed to realize system self-correction and adaptive response to operating condition changes. The VEM algorithm specifically compensates for systematic errors that are difficult to cover by dynamic Bayesian networks, overcoming the limitations of traditional static models.  Conclusions  This study describes and gradually validates a hybrid digital twin system that combines experimental data-driven approaches with physical models, successfully simulating the physical characteristics of natural gas energy measurement. A full-scale test platform was constructed, and all major parameters of the system were rigorously validated through experimental measurement data and compared with industry benchmark data. Each independent module within the "algorithm-system-scenario" three-tier hybrid modeling architecture (including calorific value measurement, flow calculation, and energy conversion) underwent 25 weeks of continuous experimental validation, confirming a high degree of consistency between model predictions and actual measurements.On the established digital twin experimental platform for natural gas energy measurement, systematic validation was conducted on the three core functions: flow measurement under dynamic conditions, multi-component calorific value determination, and energy accumulation. The results demonstrated that the output of the digital twin model matched the physical device measurement data with over 99.5% accuracy. Notably, under complex operating conditions such as pressure pulsations and hydrogen-containing gas mixtures, the system maintained measurement accuracy within 0.5%, significantly outperforming traditional methods and meeting the Class A accuracy requirements for natural gas measurement.By introducing a multi-tier hybrid modeling framework, this study successfully addressed the challenges of parameter identification difficulties and excessive physical simplifications inherent in traditional natural gas energy measurement methods. The integration of Variational Autoencoders (VAEs), dynamic Bayesian networks, and Variational Expectation- Maximization (VEM) algorithms enabled unsupervised feature extraction for complex operating conditions and adaptive model parameter calibration, reducing reliance on prior physical knowledge and massive labeled datasets. Experimental evidence demonstrates that the proposed method maintains high precision and robustness even in complex scenarios, such as pressure pulsations and hydrogen-containing gas mixtures, where traditional models struggle to provide accurate descriptions.
S4-UNET: A Long-Sequence Modeling Blind Source Separation Method for Single-Channel Co-Frequency Overlapped Communication Signals
GAO Shaoyuan, GUO Wenpu, SHI Hao, PENG Ruiyan
Available online  , doi: 10.11999/JEIT251144
Abstract:
  Objective  Blind source separation of single-channel co-frequency overlapped communication signals remains a formidable challenge in non-cooperative reception scenarios. Conventional multi-channel methods are inapplicable due to antenna limitations, while existing deep learning approaches suffer from inadequate long-sequence modeling capability, prohibitive computational complexity, and unsatisfactory performance when signals exhibit small carrier frequency offsets. These limitations severely hinder the practical deployment of blind separation techniques in dense electromagnetic environments. There is therefore a critical need for an efficient and robust framework that can effectively capture long-range temporal dependencies while maintaining computational tractability.  Methods  The proposed S4-UNET deeply integrates the U-NET encoder-decoder framework with the Structured State Space Sequence model (S4). A Temporal State Enhancement Module (TSEM) is designed as the backbone building block for both the encoder and decoder to extract local time-frequency features through residual learning. To address the long-range dependency modeling problem, the S4 is strategically embedded in the odd-numbered stages of the encoder, leveraging its inherent capacity to capture global temporal correlations with near-linear computational complexity. The S4 transforms sequence modeling into a state-space evolution process and employs Fast Fourier Transform (FFT) for efficient convolution, complemented by skip connections and Gated Linear Units (GLU) to preserve fine-grained local details. Multi-scale feature fusion is achieved through skip connections between corresponding encoder and decoder stages, and signal resolution is progressively restored via interpolation-based upsampling. The model adaptively tokenizes feature maps either temporally or channel-wise depending on the feature scale, ensuring optimal sequence representation.  Results and Discussions  Experimental evaluations were conducted on extensive simulation datasets covering identical modulation mixtures, different modulation mixtures, and different bandwidth mixtures with micro frequency offsets, as well as on publicly available benchmarks and hardware-collected measured datasets. Quantitative metrics and visualizations (Fig. 3, Fig. 5, Table 5) demonstrate that S4-UNET consistently outperforms representative deep learning baselines such as ConvTasNet and CTDCRN, as well as the classical TDE-ICA algorithm, across various signal lengths and modulation schemes. The model exhibits robust separation fidelity even under randomly distributed frequency offsets and phase mismatches (Table 3), confirming its strong generalization capacity. Ablation studies and sensitivity analyses (Table 6, Table 7, Table 8) reveal that the selective placement of S4 in odd encoder stages, appropriate convolutional stride configurations, and the adoption of GLU activation collectively contribute to an optimal trade-off between separation accuracy and computational efficiency. Importantly, the model maintains competitive inference latency while effectively handling both long and short sequences, underscoring its practical viability.  Conclusions  The proposed S4-UNET successfully addresses the core challenges of single-channel co-frequency blind source separation by synergistically combining multi-scale convolutional feature extraction with efficient state-space long-sequence modeling. It demonstrates superior separation performance, robustness against frequency offsets, and favorable generalization across diverse data domains. While the current work focuses on dual-source mixtures, the modular architecture provides a solid foundation for future extensions toward handling an unknown number of sources through integration with source enumeration and iterative cancellation strategies.
Mamba-YOWO: An Efficient Spatio-Temporal Representation Framework for Action Detection
MA Li, XIN Jiangbo, WANG Lu, DAI Xinguan, SONG Shuang
Available online  , doi: 10.11999/JEIT251124
Abstract:
  Objective  Spatio-temporal action detection aims to localize and recognize action instances in untrimmed videos. This task is essential for applications such as intelligent surveillance and human–computer interaction. Existing methods, particularly those based on 3D convolutional neural networks (3D CNNs) or Transformers, often face difficulty balancing computational cost and the ability to model long-range temporal dependencies. The YOWO series provides efficient detection but relies on 3D convolutions with limited receptive fields. The Mamba architecture, based on a Selective State Space Model (SSM) with linear computational complexity, has shown strong capability for long-sequence modeling. This study integrates Mamba into the YOWO framework to improve temporal modeling efficiency and representation ability while reducing computational cost, addressing the limited application of Mamba in spatio-temporal action detection.  Methods  The proposed Mamba-YOWO framework is built on the lightweight YOWOv3 architecture. It adopts a dual-branch heterogeneous design for feature extraction. The 2D branch, derived from YOLOv8 with CSPDarknet and PANet structures, processes keyframes to extract multi-scale spatial features. The temporal branch replaces conventional 3D convolutions with a hierarchical architecture composed of a Stem layer and three stages (Stage1–Stage3). Stage1 and Stage2 apply Patch Merging for spatial downsampling and stack Decomposed Bidirectionally Fractal Mamba (DBFM) blocks. The DBFM block employs a bidirectional Mamba structure to capture temporal dependencies in both past-to-future and future-to-past directions. A Spatio-Temporal Interleaved Scan (STIS) strategy is introduced within the DBFM block. This strategy combines bidirectional temporal scanning with spatial Hilbert quad-directional scanning, enabling serialized video representation while maintaining spatial locality and temporal consistency. Stage3 applies 3D average pooling to compress temporal features. An Efficient Multi-scale Spatio-Temporal Fusion (EMSTF) module is designed to integrate features from the 2D and temporal branches. This module applies group convolution–guided hierarchical interaction for preliminary fusion and a parallel dual-branch structure for refined fusion, generating an adaptive spatio-temporal attention map. A lightweight detection head with decoupled classification and regression subnetworks produces the final action tubes.  Results and Discussions  Extensive experiments were conducted on the UCF101-24 and JHMDB datasets. Compared with the YOWOv3/L baseline on UCF101-24, Mamba-YOWO achieved a Frame-mAP of 90.24% and a Video-mAP@0.5 of 60.32%, which correspond to improvements of 2.1% and 6.0%, respectively (Table 1). These improvements were obtained while reducing model parameters by 7.3% and computational cost (GFLOPs) by 5.4%. On JHMDB, Mamba-YOWO achieved a Frame-mAP of 83.2% and a Video-mAP@0.5 of 86.7% (Table 2). Ablation experiments verified the contribution of key components. The optimal number of DBFM blocks in Stage2 was four, whereas additional blocks reduced performance, likely due to overfitting (Table 3). The proposed STIS scanning strategy achieved higher accuracy than 1D-Scan, Selective 2D-Scan, and Continuous 2D-Scan (Table 4), which indicates that joint modeling of temporal consistency and spatial structure improves representation quality. The EMSTF module also outperformed other fusion methods, including CFAM, EAG, and EMA (Table 5), which shows its stronger ability to integrate heterogeneous features. These results indicate that the Mamba-based temporal branch effectively models long-range dependencies with linear complexity, whereas the EMSTF module improves multi-scale spatio-temporal feature integration.  Conclusions  This study proposes Mamba-YOWO, an efficient spatio-temporal action detection framework that integrates the Mamba architecture into YOWOv3. The model replaces conventional 3D convolutions with a DBFM-based temporal branch that incorporates the STIS scanning strategy, which improves long-range temporal modeling with linear computational complexity. The EMSTF module further improves feature representation through group convolution and dynamic gating mechanisms. Experimental results on UCF101-24 and JHMDB show that Mamba-YOWO achieves higher detection accuracy, such as 90.24% Frame-mAP on UCF101-24, whereas model parameters and computational cost are reduced. Future work will examine the theoretical mechanism of Mamba for temporal modeling, extend its capability to longer video sequences, and support lightweight deployment on edge devices.
A Dimension-reduction Attack on Shortest Vector Problem Using Hints
YIN Risheng, CAO Jinzheng, MA Yongliu, WANG Hong, CHENG Qingfeng
Available online  , doi: 10.11999/JEIT251277
Abstract:
  Objective  Cryptographic algorithms based on the Learning With Errors (LWE) problem and its variants are widely used, including the key encapsulation mechanism Kyber and the digital signature scheme Dilithium. In many applications, the LWE secret is a short vector. Therefore, reducing LWE to the Shortest Vector Problem (SVP) is a common approach to cryptanalysis. Traditional SVP algorithms, including enumeration, lattice sieving, and lattice basis reduction, become difficult to apply directly in high-dimensional lattices because of their high computational cost. With the use of side-channel attacks, hints about the secret vector provide a new way to solve SVP. This paper proposes a dimension-reduction attack based on such hints. The method uses hints to reduce the problem dimension, thereby extending the practical range of enumeration and sieving.  Methods  Two types of hints are analyzed: integer hints and modular hints. For integer hints, which provide exact inner-product information about the shortest vector, the problem is formulated as a system of integer equations. The solution space of this system is then used to represent the shortest vector in a shorter linear form. Hermite normal form and Gaussian elimination are applied to obtain a particular solution and a fundamental solution system. This representation reduces the number of unknown coefficients that must be searched in enumeration or sampled in sieving. Thus, the search space is reduced, and the original SVP instance is transformed into a lower-dimensional problem. For modular hints, which provide inner-product information about the shortest vector modulo an integer, a conversion mechanism based on Coppersmith’s lemma is developed. For common-modulus modular equations, Lenstra-Lenstra-Lovász (LLL) lattice basis reduction is first used to reduce the norms of row vectors. Gaussian elimination is then applied to decrease the number of nonzero terms. Each resulting modular equation is screened according to Coppersmith’s lemma. Equations that satisfy the conversion condition are transformed into integer equations. For non-common-modulus modular equations, the moduli are first factorized into prime-power moduli. Equations with the same modulus are grouped and processed in the same manner. The resulting integer equations are then solved using the dimension-reduction enumeration or sieving method.  Results and Discussions  To evaluate the proposed dimension-reduction attack, the enumeration-based and sieving-based algorithms are compared with the lattice basis reduction algorithm in Algorithm 5 in terms of runtime and solution exactness. The effect of key parameters on dimension reduction is first analyzed. These parameters include the number of screening rounds (Fig. 2), the small-root bound (Fig. 3), and the modulus size (Fig. 4). The conversion efficiency of Algorithm 3 under different parameter settings is summarized in Table 1. The results show that more screening rounds generally improve the reduction effect, but this improvement has a saturation point. Beyond this point, additional rounds provide limited benefit. Finally, the computational efficiency of the proposed methods is compared with that of lattice basis reduction (Fig. 5). The results show that the computational cost of enumeration and sieving increases rapidly with dimension. However, up to dimension 90, the dimension-reduction attack can still use hints to reduce the dimension and obtain exact solutions more efficiently. Lattice basis reduction shows a slower increase in runtime as the dimension grows and is therefore more suitable for higher-dimensional SVP instances.  Conclusions  The proposed dimension-reduction attack provides a simple and effective method for solving SVP using hints. For integer hints, the solution space of the corresponding equation system is used to reduce the number of variables in enumeration and sieving. For modular hints, Coppersmith’s lemma is used to convert selected modular equations into integer equations, reducing the problem to the integer-hint case. The experiments show that, when sufficient hints are available, the method can effectively reduce the lattice dimension and extend the practical range of enumeration and sieving. Compared with lattice basis reduction, enumeration and sieving after dimension reduction can provide exact solutions within their applicable dimension range. Although the reduction effect tends to saturate as the number of hints increases, a moderate number of hints is sufficient to achieve effective dimension reduction. These results indicate that hint-based dimension-reduction attacks offer a practical route for exact SVP solving and provide useful evidence for the security evaluation of lattice-based cryptographic schemes.
A Lightweight Semi-supervised Brain Tumor Segmentation Network with Counterfactual Reasoning
FAN Yawen, WANG Chaoyuan, WANG Xin, ZHANG Xinchen, ZHOU Quan
Available online  , doi: 10.11999/JEIT251130
Abstract:
  Objective  Brain tumor segmentation plays a key role in clinical diagnosis and treatment planning. However, reliable annotation of medical images is costly and time-consuming, which limits the availability of large annotated datasets. To address this problem, this paper proposes a semi-supervised brain tumor segmentation method that combines a lightweight multimodal fusion segmentation network with counterfactual reasoning. The aim is to improve segmentation accuracy while maintaining sufficient efficiency for deployment in resource-limited clinical scenarios.  Methods  A parameter-sharing multimodal encoder-decoder network is designed to reduce model size and computational cost. An anatomical-structure consistency prior is incorporated to improve alignment with brain anatomy. During training, a teacher-student framework is used to generate counterfactual samples from model predictions. These samples guide learning from unlabeled MRI scans through a counterfactual consistency loss that enforces pixel-level consistency and feature-level semantic stability. This strategy helps the model extract structural information from unlabeled data while reducing the risk of boundary distortion caused by conventional data augmentation.  Results and Discussions  Experiments on the BraTS 2019 and BraTS 2021 datasets show that the proposed method consistently outperforms comparison models under limited-label conditions. On BraTS 2019, the proposed method achieves the best average Dice Similarity Coefficient (DSC) of 66.06%, and its average Intersection over Union (IoU) of 53.16% is comparable to those of other models. More importantly, it obtains the lowest average 95% Hausdorff Distance (HD95) of 7.60 mm, representing reductions of approximately 11% and 6% compared with UNet3D and LightMUnet, respectively (Tables 3 and 4). On BraTS 2021, the semi-supervised model improves the average DSC and IoU by 4.51% and 5.29%, respectively, and reduces the average HD95 by 0.68 mm compared with the baseline model (Tables 5 and 6). With only 10% labeled data, the proposed method achieves approximately 94% of the fully supervised performance in the main segmentation metrics. The model is also efficient, with only 1.657M parameters, a computational cost of 0.440 2 T, and an inference time of 0.093 7 s (Table 7). These results indicate that the proposed design achieves a favorable balance among segmentation accuracy, computational efficiency, and clinical deployment. The improvement is attributed to both the lightweight multimodal fusion segmentation network and the counterfactual mechanism, which guides the model to learn anatomically meaningful representations.  Conclusions  The proposed framework provides an effective solution for semi-supervised brain tumor segmentation. It balances accuracy, efficiency, and interpretability, and shows that causal reasoning can be integrated into medical image analysis in a practical manner.
SG-DDPG-based Low-intercept Point Beam Design for FDA-MIMO Short-range Detectors
JIA Jinwei, GAO Min, HAN Zhuangzhi, LIU Limin, YIN Yuanwei
Available online  , doi: 10.11999/JEIT260010
Abstract:
  Objective  Radio short-range detectors are widely used in many detection systems. However, in modern battlefields, the electromagnetic environment is increasingly complex, and radio short-range detectors must withstand various forms of electromagnetic interference. In particular, fourth-generation jammers based on Digital Radio Frequency Memory (DRFM) can implement repeater deception jamming. Such jamming may cause failures such as premature detonation in radio short-range detectors and reduce their damage effectiveness. Anti-repeater deception jamming has therefore become a key issue for short-range detectors. Improving the Low Probability of Intercept (LPI) performance of radio short-range detectors is an effective means of resisting repeater deception jamming. According to the Chinese manuscript, this study focuses on the effect of FDA-MIMO array-element frequency-offset settings on beam synthesis and proposes an SG-DDPG-based method for LPI point beam design.  Methods  Frequency Diverse Array-Multiple-Input Multiple-Output (FDA-MIMO) technology is used in this study, and the key factors affecting beam convergence are analyzed. For the spatial LPI beam design of radio short-range detectors, a performance evaluation model for spatial LPI beams is constructed. An FDA-MIMO LPI point beam design method based on the Stage Guidance-Deep Deterministic Policy Gradient (SG-DDPG) algorithm is then proposed. In the SG-DDPG algorithm, a multidimensional staged guidance reward function is designed. An Actor-Critic model is used to maximize the reward value through gradient ascent. The array-element frequency offsets that provide better beam convergence in the current environment are then obtained. The SG-DDPG algorithm is suitable for LPI point beam design under different fall angles of radio short-range detectors. It overcomes the technical limitation of formula-based frequency-offset calculation, which is only applicable when the detector fall angle is close to vertical.  Results and Discussions  The simulations show that, after the array-element frequency offsets are optimized by the SG-DDPG algorithm, the FDA-MIMO beam achieves a half-power beam width of 1 m in the range dimension and 9.9° in the angular dimension. The proposed method provides better beam convergence and LPI performance than classical frequency-offset design methods. These results indicate that the proposed algorithm offers an effective approach for array-element frequency-offset optimization and LPI point beam design, thereby improving the LPI performance of radio short-range detectors.  Conclusions  This paper presents an FDA-MIMO LPI point beam design method based on the SG-DDPG algorithm, with the array-element frequency offset used as the optimization objective. The simulation results support two main conclusions. First, the proposed method removes the restriction that the fall angle of the radio short-range detector must be close to vertical when the array-element frequency offset is calculated by a formula-based method. The algorithm can be applied to LPI beam design under different fall angles and improves the LPI performance of radio short-range detectors. Second, the proposed method achieves a half-power beam width of only 1 m in the range dimension and 9.9° in the angular dimension, which is better than that of traditional methods. Under different fall angles, the beam formed by the proposed method has the smallest intercept area, indicating the best LPI performance.
Performance Optimization and Gate Oxide Electric Field Analysis of 1200V Trench SiC MOSFET Based on PCL-CSL Collaborative Design
FANG Shaoming, LI Hongda, GAO Yuan
Available online  , doi: 10.11999/JEIT260164
Abstract:
  Objective  1 200 V Silicon Carbide (SiC) trench Metal-Oxide-Semiconductor Field-Effect Transistors (MOSFETs) are key devices in medium- and high-voltage power conversion systems. They feature high switching performance, low conduction loss, and high-temperature stability. However, conventional trench structures suffer from electric-field concentration at the trench corner and bottom gate oxide. This effect can cause the peak gate oxide electric field to exceed the industrial reliability criterion of 3 MV/cm, reducing long-term reliability. In addition, strong trade-offs exist among breakdown voltage, specific on-resistance, threshold voltage, and peak gate oxide electric field. These trade-offs make it difficult to achieve high efficiency and high reliability at the same time. To address these issues, this work studies a synergistic structure that combines deep P-type Column (PCL), Carrier Storage Layer (CSL), and locally thickened gate oxide. The aim is to regulate the electric-field distribution, suppress electric-field concentration, improve carrier transport, and achieve balanced device performance. This study provides a systematic design method for high-reliability and high-performance 1 200 V Trench SiC MOSFETs for industrial applications.  Methods  Numerical device simulations were performed using a Technology Computer-Aided Design (TCAD) platform to analyze and optimize the electrical performance of 1 200 V Trench SiC MOSFETs. To ensure reliable simulations, physical models were used for bandgap narrowing, Shockley-Read-Hall (SRH) recombination, Auger recombination, avalanche breakdown, incomplete dopant ionization, doping- and temperature-dependent mobility, and high-field mobility saturation. A device structure with deep PCL, CSL, and locally thickened bottom gate oxide is constructed to reduce the peak gate oxide electric field and improve device reliability. Key structural and process parameters were swept and quantitatively analyzed. These parameters included epitaxial layer thickness (TEpi), epitaxial layer doping concentration (NEpi), trench width, trench depth, P-Well (PW) implantation dose, PCL spacing, and CSL implantation dose. Static electrical characteristics, including threshold voltage (Vth), specific on-resistance (Ron,sp), Breakdown Voltage (BV), and peak gate oxide electric field (Eox,max) are extracted and evaluated. The final parameter combination is finally determined through a trade-off analysis between conduction performance and long-term device reliability.  Results and Discussions  The simulation results show that the deep PCL structure redirects electric-field lines away from the trench bottom gate oxide and reduces electric-field concentration. When this structure is combined with the locally thickened bottom gate oxide, Eox-max is reduced below 3 MV/cm, meeting the industrial reliability criterion. The CSL broadens the vertical conduction path, reduces current crowding, and decreases Ron,sp. Parameter optimization shows that TEpi, NEpi, trench dimensions, PW implantation dose, and CSL implantation dose determine the trade-off between BV and conduction performance (Fig. 5, Fig. 6, Fig. 9, Fig. 10, and Fig. 19). PCL spacing has a strong effect on electric-field shielding and gate oxide protection (Fig. 16 and Fig. 17). After multi-parameter optimization, the device achieves VTH=4.7 V, BV=1 708 V, Ron,sp=1.57 mΩ·cm2, and Eox-max=2.5 MV/cm (Table 2). These results indicate balanced performance for high-voltage power applications.  Conclusions  A synergistic PCL-CSL structural design for 1 200 V Trench SiC MOSFETs is studied and validated through TCAD simulation. The design addresses key limitations of conventional Trench SiC MOSFETs, including high peak gate oxide electric field, limited breakdown capability, and the trade-off between conduction performance and reliability. The effects of TEpi, NEpi, trench dimensions, PW implantation dose, PCL spacing, and CSL implantation dose on device performance and gate oxide reliability are clarified through parameter sweeping and comparative analysis. With coordinated structural optimization, the optimized device achieves low Ron,sp, high BV, suitable VTH, and suppressed electric-field concentration near the trench bottom oxide. Eox-max is controlled below the 3 MV/cm industrial reliability criterion, which reduces the risk of oxide degradation under high-bias operation. The proposed structural strategy and optimization method provide guidance for the design, simulation, and process development of high-voltage, high-reliability SiC power devices.
Full-round Integral Cryptanalysis of the Lightweight Block Cipher INLEC
YU Bin, LIU Wenfen, CHEN Wen, GUO Ying, LU Yongcan, HUANG Yuehua
Available online  , doi: 10.11999/JEIT251131
Abstract:
  Objective  With the rapid development of telecommunication technology, Internet of Things (IoT) devices have been widely deployed in modern applications. However, their limited computing resources and energy supply create challenges for data privacy and security. To address these issues, Feng et al. proposed INLEC, a low-energy lightweight block cipher designed for resource-constrained IoT environments. The designers claimed that INLEC can resist differential, linear, impossible differential, and side-channel attacks. However, its security against integral cryptanalysis has not yet been evaluated. This paper presents a comprehensive full-round integral cryptanalysis of INLEC to assess its actual resistance to integral cryptanalysis.  Methods  The monomial prediction technique proposed by Hu et al. is used to construct a Mixed Integer Linear Programming (MILP) model for the monomial trails of INLEC. Based on this model, a 9-round integral distinguisher for INLEC is obtained. By further using the structural properties of the diffusion layer, the 9-round integral distinguisher is extended to a 10-round integral distinguisher by adding an initial round. This is the first 10-round integral distinguisher constructed for INLEC. To reduce the complexity of key recovery, a multi-key guessing method is proposed. Combined with the partial-sum technique, this method enables the first 14-round key recovery attack on INLEC. An integral cryptanalysis framework for the full-round INLEC cipher is therefore established.  Results and Discussions  The analysis shows that the 10-round integral distinguisher provides exploitable balanced bits for key recovery. Based on this distinguisher, the proposed 14-round key recovery attack achieves a data complexity of 263 chosen plaintexts and a time complexity of 289.843 14-round encryptions. These results indicate that the diffusion layer of INLEC does not fully eliminate integral properties within 10 rounds. The remaining structural properties can be used to support key recovery. This finding challenges the original security claims for INLEC and shows that integral properties should be considered when evaluating lightweight block ciphers for IoT applications.  Conclusions  This paper evaluates the resistance of the lightweight block cipher INLEC to integral cryptanalysis based on monomial prediction. A 9-round integral distinguisher is first constructed using an MILP model of monomial trails. The 9-round integral distinguisher is then extended to a 10-round integral distinguisher by exploiting the structural properties of the diffusion layer. A 14-round key recovery attack is further achieved by combining the partial-sum technique with the multi-key guessing method. The results show that INLEC has insufficient resistance to integral cryptanalysis and that its practical security may be lower than expected. Therefore, more rounds should be considered in the design of such ciphers to resist known integral attacks.
Multipath Scheduling Algorithm for UAV Video Streaming
CAO Changlong, LI Lingzhi, SHI Lianmin, ZHAO Qingyue
Available online  , doi: 10.11999/JEIT260002
Abstract:
  Objective   With the rapid growth of the low-altitude economy, Unmanned Aerial Vehicle (UAV) technology has been widely used in emergency rescue, disaster monitoring, urban security, and other applications. In these scenarios, stable, low-latency, and high-fidelity video backhaul is critical for task execution. Multipath transport protocols can improve Quality of Experience (QoE) through bandwidth aggregation, providing an effective basis for UAV video streaming. However, under dynamic and heterogeneous network conditions, the performance of multipath transport protocols depends strongly on the design of multipath scheduling algorithms. Existing heuristic schedulers use predefined rules to reduce head-of-line blocking and inter-path load imbalance, but their adaptability remains limited in highly dynamic environments. Learning-based schedulers can learn the mapping between network states and scheduling rewards from real-time feedback, enabling adaptive performance optimization. However, most existing learning-based schedulers are designed for general network scenarios. They are not optimized for UAV networks, and their ability to guarantee QoE has not been fully validated. A multipath scheduling algorithm tailored to UAV video streaming is therefore needed to better exploit the performance potential of multipath transport protocols.  Methods   To address the dynamic and heterogeneous challenges of UAV video streaming, this paper proposes NeuroFly, a multipath scheduling framework based on the NeuralUCB algorithm. In NeuroFly, multipath traffic scheduling is formulated as a Contextual Multi-Armed Bandit (CMAB) problem. The context space is constructed by integrating path state information, video encoding features, and UAV mobility parameters, which jointly characterize the current transmission environment. In the action space, a frame-priority-driven redundant transmission mechanism is proposed. Video frames are assigned different frame priorities according to decoding dependencies, and differentiated redundancy strategies are used to improve the probability of successful video-frame delivery. A multi-objective reward function is further designed to guide policy learning and support adaptive optimization under dynamic and heterogeneous network conditions. In addition, a context monitoring mechanism is integrated into NeuroFly to handle abrupt environmental changes caused by high UAV mobility. This mechanism detects context distribution shifts and triggers a two-stage restart strategy. A soft restart is activated when gradual context drift is detected, removing outdated historical experience. A hard restart is performed under abrupt context changes by clearing the experience replay buffer and reinitializing model parameters, allowing learning to restart under a new distribution.  Results and Discussions   The proposed NeuroFly framework is evaluated in both simulation and field environments. First, Mininet-WiFi is used to simulate realistic UAV network environments and evaluate overall QoE performance. The results (Fig. 4) show that, compared with state-of-the-art heuristic and learning-based schedulers, NeuroFly achieves broad performance gains by fully using aggregated multipath bandwidth. Specifically, the 99th-percentile latency is reduced by 19.9%~51.0%, the average video frame rate is increased by up to 24.6%, image structural similarity is improved by up to 49.2%, and the buffering time ratio is reduced by 13.4%~77.6%. These results demonstrate the strong ability of NeuroFly to guarantee QoE. Field experiments (Fig. 6) further confirm that NeuroFly provides favorable optimization in real UAV operation scenarios. Compared with mainstream transport solutions widely deployed in production environments, NeuroFly achieves better real-time transmission performance and shows strong practical applicability for future large-scale UAV deployment.  Conclusions   This paper addresses network dynamics, path heterogeneity, and time-varying transmission conditions in UAV video streaming over multipath transport protocols. An intelligent multipath scheduling framework, NeuroFly, is proposed based on the NeuralUCB algorithm. In this framework, multipath traffic scheduling is modeled as a CMAB problem. Through the design of the context space, action space, and multi-objective reward function, online learning and adaptive optimization of traffic allocation policies are achieved. To further improve robustness under severe environmental changes, a lightweight context monitoring mechanism is introduced to detect context distribution drift and restart the learning process when needed. Systematic evaluations are conducted on both simulation platforms and real UAV operation environments. The simulation results show that NeuroFly achieves consistent improvements across QoE metrics compared with state-of-the-art heuristic and learning-based schedulers. The field results further indicate that NeuroFly provides reliable guarantees in actual UAV operation scenarios when compared with mature solutions that have been widely deployed in production environments. These results validate the practicality, robustness, and engineering feasibility of NeuroFly, and suggest its potential for large-scale deployment in UAV applications that are sensitive to real-time video quality, including emergency response, power inspection, agricultural monitoring, and logistics delivery.
Multi-agent Reinforcement Learning Method for Trajectory Optimization in Dual-UAV Cooperative Railway Inspection
HUANG Gaoyong, SONG Jun, FANG Xuming, YAN Li, HE Rong
Available online  , doi: 10.11999/JEIT251321
Abstract:
  Objective  Conventional railway inspection methods, including manual inspection and dedicated inspection vehicles, suffer from low efficiency, limited coverage, and safety risks, especially in hazardous or inaccessible areas. Unmanned Aerial Vehicles (UAVs) offer a promising alternative. However, deployment in strictly regulated railway protection zones remains challenging. In particular, single-UAV inspection is limited by restricted viewpoints, coverage blind spots, and poor data synchronization. To address these issues, this paper proposes a dual-UAV cooperative railway inspection framework. The objective is to jointly optimize the flight trajectories and inspection task sequence of two UAVs to maximize inspection task quality under coupled constraints, including energy consumption, obstacle avoidance, communication-rate constraints, and cooperative synchronization.  Methods  To solve this high-dimensional, non-convex, NP-hard problem, a two-stage hierarchical framework is proposed. In the first stage, the optimal cooperative observation positions for each inspection task are determined. Particle Swarm Optimization (PSO) is used to obtain the optimal three-dimensional coordinates of the two UAVs, thereby improving coverage and inspection quality. In the second stage, continuous trajectory optimization is formulated as a Multi-Agent Deep Reinforcement Learning (MADRL) problem. To improve convergence stability under strong safety constraints, a Risk-Adaptive Exploration Noise Mechanism (RAENM) is incorporated into the training process. The problem is then solved by an improved Multi-Agent Twin Delayed Deep Deterministic policy gradient (MATD3) algorithm under the Centralized Training with Decentralized Execution (CTDE) paradigm. Each UAV is modeled as an independent agent. Its state includes kinematic information, target position, remaining energy, and obstacle distance. Its action space defines the flight control variables. A composite reward function is designed to balance multiple objectives, including target approaching, energy saving, obstacle avoidance, railway-protection-zone compliance, and synchronized cooperative arrival.  Results and Discussions  The proposed framework is evaluated through simulations against several baseline algorithms. The results show that the improved MATD3 method achieves faster and more stable convergence, especially as the number of inspection tasks increases. In path planning, it generates more compact trajectories and the shortest total path length. For example, in the two-task scenario, the total path length is reduced to 13,025 m, about 4.5% shorter than that of the next best method. In addition, the proposed method achieves the lowest cumulative energy consumption in all tested scenarios. It also yields the smallest navigation error and the shortest arrival-time difference between the two UAVs at shared inspection points, indicating higher control accuracy and better spatiotemporal coordination. By reducing position deviation and improving synchronization, the proposed method achieves the highest inspection task quality in all evaluation settings.  Conclusions  This paper proposes a two-stage hierarchical framework for dual-UAV cooperative trajectory optimization in railway inspection. The framework combines PSO-based cooperative observation position optimization with improved MATD3-based trajectory learning. Simulation results show that the proposed method outperforms baseline methods in path efficiency, energy saving, cooperative synchronization, and inspection task quality. This study provides support for the deployment of intelligent multi-UAV systems in railway infrastructure inspection. Future work will consider more realistic factors, including communication uncertainty and dynamic environments.
Jointly Improving Information Timeliness and Fidelity under Finite-Blocklength Source Coding in a Wireless IoT System
DUAN Jianxin, ZHANG Tianci, CHEN Zhengchuan, ZHANG Di, ZHU Xu, TIAN Zhong, WANG Min, ZHANG Lütianyang
Available online  , doi: 10.11999/JEIT251057
Abstract:
  Objective  Wireless Internet of Things (IoT) information update systems are essential for time-sensitive applications. In these systems, timely information delivery with high fidelity is critical for accurate sensing, estimation, and decision-making. However, short-packet transmission and strict latency requirements make classical asymptotic rate-distortion theory insufficient for characterizing practical system performance. Under finite-blocklength source coding, shorter source-coding blocklengths reduce latency but increase distortion, whereas longer source-coding blocklengths improve information fidelity at the cost of higher delay. This leads to a fundamental trade-off between information timeliness and information fidelity, which remains insufficiently characterized in the non-asymptotic regime.  Methods  Age of Information (AoI) and Mean Squared Error (MSE) are used to quantify information timeliness and information fidelity, respectively. Closed-form expressions for time-average AoI and time-average MSE are derived under finite-blocklength source coding. Based on distortion tolerance, excess distortion probability, and transmission rate, a joint optimization problem is formulated to minimize the weighted-sum objective of time-average AoI and time-average MSE. The monotonicity and convexity of the objective function are analyzed with respect to these design variables. An alternating iterative algorithm is then developed to jointly optimize distortion tolerance, excess distortion probability, and transmission rate.  Results and Discussions  Numerical simulations are conducted under different weight settings to examine the trade-off between information timeliness and information fidelity in representative operating scenarios. The proposed framework reveals the effect of finite-blocklength parameters on system performance. The results show that the proposed method balances AoI and MSE under different design priorities. At a transmit power of 20 dB, the weighted-sum metric of the scheme with the highest distortion tolerance is improved by approximately 33.7% compared with that of the scheme with the lowest distortion tolerance. The maximum relative error between the theoretical analysis and Monte Carlo simulations remains below 0.3%, verifying the accuracy of the derived analytical expressions.  Conclusions  This paper presents a non-asymptotic analysis of the timeliness-fidelity trade-off in a wireless IoT information update system by explicitly considering finite-blocklength source coding. By treating distortion tolerance, excess distortion probability, and transmission rate as design variables, the proposed framework verifies the necessity of finite-blocklength modeling and the advantage of joint parameter optimization. The results provide theoretical guidance for the design and optimization of timely and high-fidelity wireless IoT systems.
Research on Energy Efficiency Optimization of Rotatable Hybrid Intelligent Reflecting Surface Communication
ZHANG Guangchi, GUO Xuan, WANG Luyao, CUI Miao, FU Hao
Available online  , doi: 10.11999/JEIT260119
Abstract:
  Objective  With the evolution of 6G communication networks, reconfigurable intelligent surfaces (RIS) have emerged as a pivotal technology for reshaping wireless environments and enhancing spectral efficiency. However, conventional fixed RIS architectures face two critical challenges in practical deployment: the “angle mismatch” loss, where the effective aperture significantly diminishes when users are located at large angles from the RIS normal, and the “energy consumption bottleneck,” caused by the high cumulative power consumption of radio frequency (RF) circuits and static control elements in large-scale arrays. Existing research often treats mechanical rotation and element switching in isolation, lacking a unified framework to balance the trade-off between mechanical/circuit energy consumption and communication gain. To address these limitations, this paper investigates a rotatable and switchable hybrid RIS (H-RIS) assisted downlink communication system. The primary objective is to maximize the system’s energy efficiency (EE) by jointly optimizing the base station transmit power, subarray activation states, physical rotation angles, and electronic phase shifts. This approach aims to introduce mechanical rotation degrees of freedom to compensate for path loss and employ dynamic switching mechanisms to reduce redundant power consumption, thereby achieving sustainable green communication.  Methods  A joint optimization framework is established for the H-RIS aided single-user multiple-input single-output (MISO) system. The system model explicitly accounts for the dynamic power consumption induced by mechanical rotation and the static power consumption of active subarrays. The resulting optimization problem is formulated as a non-convex mixed-Integer non-linear programming (MINLP) problem, involving coupled binary variables (activation status) and continuous variables (power, angles, phases). To solve this challenging problem, a block coordinate descent (BCD)-based alternating optimization (AO) algorithm is proposed to decouple the variables into three sub-problems.Firstly, to tackle the exponential complexity caused by binary switching variables, a channel contribution-based ranking strategy is developed. By performing eigenvalue decomposition on the cascaded channel correlation matrix, the priority of each subarray is quantified, reducing the search space from exponential to linear.Secondly, for the power allocation sub-problem, the non-convex fractional objective function is transformed into a parametric subtractive form using the Dinkelbach algorithm, which is then solved via the interior-point method.Thirdly, for the physical rotation and electronic phase optimization, the problem is decomposed into single-variable sub-problems. A Golden Section Search algorithm is employed to iteratively find the optimal rotation angle and phase shift for each subarray within bounded constraints, ensuring the monotonic convergence of the objective function.  Results and Discussions  Extensive simulations are conducted to evaluate the performance of the proposed H-RIS scheme compared with benchmark schemes, including “Only-Rotation” (always on), “Only-Switching” (fixed angle), and “Conventional” (fixed and always on).The simulation results regarding the maximum transmit power Pmax(Fig. 2 and Fig. 3) demonstrate that the proposed method achieves the highest energy efficiency across the entire power range. Specifically, in the low power regime, the proposed algorithm intelligently turns off redundant subarrays where the rate gain cannot offset the circuit power cost, thereby significantly outperforming the “Only-Rotation” scheme which suffers from high static power consumption.The impact of user distance is also analyzed (Fig. 4 and Fig. 5). Results indicate that the proposed scheme maintains high spectral efficiency comparable to the “Only-Rotation” scheme by dynamically adjusting the rotation angles to align with the Line-of-Sight (LoS) path, effectively compensating for the angle mismatch loss observed in the “Only-Switching” and “Conventional” schemes.Furthermore, the activation pattern of the subarray varies in a “U” shape with distance (Table 1), which allows for flexible adjustment of array size and orientation according to user-RIS geometry.  Conclusions  This paper proposes an energy-efficient transmission scheme for H-RIS aided communication systems by integrating mechanical rotation and dynamic switching capabilities. A low-complexity BCD-based algorithm is developed to jointly optimize the transceiver design. The results confirm that introducing mechanical rotation significantly mitigates the angle mismatch loss, while the proposed channel contribution-based switching strategy effectively eliminates redundant energy consumption. The proposed H-RIS architecture offers a superior trade-off between spectral efficiency and energy efficiency compared to traditional fixed RIS architectures, providing a viable solution for future green 6G networks.
CRLB Optimization for O-RIS-Assisted VLP Systems
ZHANG Zengjie, WU Qi, ZHANG Jian, DUAN Ruijie, FENG Yunhan
Available online  , doi: 10.11999/JEIT260120
Abstract:
  Objective  With the rapid development of indoor location-based services, Visible Light Positioning (VLP) has emerged as a promising high-accuracy positioning technology. The integration of Optical Reconfigurable Intelligent Surfaces (O-RIS) into VLP systems can effectively enhance signal coverage and improve positioning performance. However, optimizing the positioning accuracy and fairness across different user areas in RIS-assisted VLP systems remains a challenging issue. This study focuses on optimizing the Cramer-Rao Lower Bound (CRLB) of the system under both near-field and far-field channel models, aiming to enhance overall positioning precision and fairness through RIS configuration.  Methods  Under the far-field channel model assumption, the RIS orientation optimization problem is formulated as a received power maximization problem. A positioning algorithm combining Particle Swarm Optimization (PSO) and N-step iteration is proposed to dynamically adjust the RIS orientation optimally without prior knowledge of the receiver’s position. Under the near-field channel model assumption, the allocation problem between RIS elements and LEDs is constructed as a Markov Decision Process (MDP). A reinforcement learning method based on experience replay and knowledge utilization is designed to solve this problem, aiming to minimize the CRLB while ensuring positioning fairness for users in different regions.  Results and Discussions  Simulation results demonstrate that the proposed algorithms effectively enhance system positioning performance under both models. In the far-field model, the PSO-based iterative algorithm achieves dynamic optimization of RIS orientation, significantly improving positioning accuracy (Fig. 3). Under the near-field model, the reinforcement learning approach not only minimizes the CRLB but also considerably improves positioning fairness across the entire area, with a noticeable reduction in performance disparity among users in different zones (Fig. 5, Fig. 6). Comparative experiments show that the proposed methods outperform conventional RIS configuration strategies in terms of both average positioning error and fairness index (Table 1).  Conclusions  This paper investigates CRLB optimization methods for O-RIS-assisted VLP systems under near-field and far-field channel models. In the far-field scenario, a PSO-based iterative algorithm is proposed to optimize RIS orientation, enhancing positioning accuracy without requiring prior receiver location information. In the near-field scenario, a reinforcement learning-based approach is designed to optimize RIS element–LED allocation, which effectively minimizes the CRLB and improves positioning fairness across the whole area. Simulation results validate the effectiveness of the proposed algorithms in both models. Future work may consider more practical channel impairments and multi-user scenarios to further improve the robustness and scalability of the system.
Intelligent Resource Allocation Algorithm Based on Outdated CSI for Multi-Node URLLC
ZHAO Yizhen, GAO Wei, HU Yulin, ZHU Yao
Available online  , doi: 10.11999/JEIT260216
Abstract:
  Objective  Ultra-Reliable and Low-Latency Communications (URLLC) have found widespread applications in Industrial Internet-of-Things (IIoT) systems. However, in mobile operation scenarios such as transportation and inspection, the acquisition of instantaneous Channel State Information (CSI) is often impractical due to feedback overhead, forcing resource allocation decisions to be made based on outdated CSI. This mismatch significantly limits the achievable energy efficiency of the system. Traditional convex optimization methods have difficulty addressing such challenges, while classical Deep Reinforcement Learning (DRL) algorithms also exhibit inherent limitations in terms of convergence stability and policy performance when confronted with the stringent Quality-of-Service (QoS) constraints in URLLC. Motivated by these challenges, considering a multi-user URLLC system operating under outdated CSI in dynamic scenarios, this paper formulates an energy efficiency maximization problem while guaranteeing the communication latency and reliability requirements, and aims to design an efficient and stable algorithm for joint power and blocklength allocation.  Methods  To achieve the above objective, this paper proposes a Successive Convex Approximation (SCA)–assisted DRL framework for energy efficiency maximization under outdated CSI. Specifically, a SCA-based algorithm is first developed to derive a pre-allocation of transmit power and blocklength, yielding a feasible and physically interpretable yet relatively conservative baseline solution. Building upon this baseline, a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is employed to perform incremental refinement through interaction with the dynamic environment, thereby alleviating the conservative nature of SCA. Meanwhile, the SCA solution is incorporated as prior knowledge together with user location information into the state representation, which effectively narrows the policy search space and enables the DRL agent to better capture large-scale channel characteristics and system dynamics under outdated CSI, thereby enhancing the learning efficiency and stability.  Results and Discussions  The effectiveness of the proposed method is validated through the following simulation results. In the simulation, the proposed algorithm is evaluated against SCA, TD3 without SCA guidance, and TD3 without user location information. Simulation results demonstrate that the proposed method significantly outperforms all benchmark schemes in terms of convergence stability and system energy efficiency. During the training phase (Fig. 3), the average reward of the proposed algorithm increases steadily and converges stably, whereas removing location information leads to low and highly fluctuating rewards, and removing SCA guidance results in convergence to a much lower reward level, highlighting the importance of both prior guidance and location-aware state representation. Besides, during the actual operation stage of the system, the proposed algorithm achieves high and stable energy efficiency (Fig. 4), significantly outperforming comparative algorithms. Under outdated CSI, DRL-based methods outperform conservative optimization when transmission is successful, while the absence of location information or SCA guidance significantly degrades energy efficiency or increases transmission failures, verifying the two factors' effectiveness in improving energy efficiency and ensuring strategy validity. The simulation also examined the impact of key system parameters on energy efficiency. For basic resource parameters such as blocklength (Fig. 5) or power (Fig. 6), appropriately increasing their budget can help improve system energy efficiency. For parameters about reliability (Fig. 7), in order to avoid waste of resources, they should be reasonably set according to business requirements. Finally, the simulation of the average energy efficiency varying with the number of nodes and the number of network neurons provides certain reference basis for the configuration of the algorithm structure and the design of the network scale (Fig. 8).  Conclusions  In conclusion, this paper addresses the challenge of energy-efficient resource allocation for multi-user URLLC systems operating under outdated CSI by integrating SCA with DRL. That is, a TD3-based DRL approach is enhanced by introducing a SCA reference solution as prior guidance and incorporating user location information into the state representation. Such an optimization–learning dual-driven solution framework combines the interpretability and feasibility of model-based optimization with the adaptivity and expressive power of data-driven learning. The effectiveness of the proposed method is evaluated through simulations: (1) The proposed method achieves higher energy efficiency than pure optimization and conventional TD3 while satisfying URLLC latency and reliability constraints; (2) The SCA reference improves the stability and effectiveness of the strategy under outdated CSI; (3) Incorporating user location information enables more efficient decision-making. However, this work focuses on a single-cell multi-user scenario, and practical issues such as multi-cell interference, cooperative multi-base-station scheduling, and more complex mobility patterns are not considered. Future work will extend the proposed framework to more realistic multi-cell and multi-agent scenarios and investigate its applicability under more severe CSI imperfections.
Facial Expression Recognition Model Based on an Improved YOLO12n
HAN Chuang, HUANG Jingyao, LAN Chaofeng
Available online  , doi: 10.11999/JEIT250936
Abstract:
  Objective  Facial Expression Recognition (FER) is a key technology in affective computing and intelligent human–computer interaction. In practical scenarios, recognition performance is often degraded by low resolution, complex illumination, partial occlusion, and class imbalance. Although deep learning-based methods have made substantial progress, lightweight models such as You Only Look Once version 12 nano (YOLO12n) still have limited feature extraction ability and reduced robustness under degraded imaging conditions. To address these limitations, this paper proposes an improved FER model, termed YOLO-FER. The model is designed to enhance feature representation, improve the discrimination of similar expressions, and maintain real-time detection performance in low-quality environments.  Methods  Based on the YOLO12n model, YOLO-FER introduces several targeted improvements. First, a C3k2_star module is constructed by embedding NewStarBlock into the original bottleneck structure. This design enhances high-dimensional nonlinear feature representation and alleviates feature loss during fusion, as shown in Fig. 2 and Fig. 3. Second, Multidimensional Collaborative Attention (MCA) is integrated with the A2C2f module to form A2C2f_MCA. This module performs joint modeling across the channel, height, and width dimensions to capture fine-grained facial features (Fig. 4). Third, a Low Resolution Feature Extractor (LRFE) module is placed at the end of the backbone. It enhances pixel-level feature representation under low-resolution and low-light conditions through dilated convolution and pixel attention (Fig. 5). Finally, Adaptive Threshold Focal Loss (ATFL) is used to dynamically adjust the contributions of easy and hard samples. This function mitigates class imbalance and improves the discrimination of similar expressions. The overall model structure is shown in Fig. 1. Experiments are conducted on the RAF-DB and Low Light Dataset (LLD) datasets. Precision (P), recall (R), F1 score, and mAP@0.5 are used as evaluation metrics.  Results and Discussions  Extensive experiments show that YOLO-FER outperforms the baseline YOLO12n and other YOLO-series models. As shown in Table 2, on the RAF-DB dataset, YOLO-FER achieves P=81.8%, R=81.9%, and mAP@0.5=87.6%, with a 3.8% improvement in mAP@0.5 over the baseline. On the LLD dataset (Table 3), YOLO-FER achieves an mAP@0.5 of 95.9%, representing a 5.0% improvement. These results indicate strong robustness under low-light conditions. The ablation studies in Table 2 and Table 3 confirm that each proposed module contributes to performance improvement. C3k2_star, A2C2f_MCA, LRFE, and ATFL all lead to consistent gains in detection accuracy. Their combination achieves the best performance with only a slight increase in parameters. The comparison with other YOLO variants in Table 5 further shows that YOLO-FER achieves a favorable balance between accuracy and model complexity. The mAP@0.5 curves in Fig. 8 show that the proposed model maintains consistent performance gains during training. The confusion matrix analysis in Fig. 9 and Table 4 demonstrates that the MCA module improves the discrimination of similar expressions, such as Angry and Disgust, and reduces misclassification. Grad-CAM visualization results (Fig. 13) indicate that YOLO-FER focuses more accurately on key facial regions, including the eyes, eyebrows, and mouth, than the baseline model. Experiments under degraded conditions (Fig. 14 and Table 13) further show that YOLO-FER maintains higher detection performance than YOLO12n and has a smaller overall performance drop. These findings confirm its robustness in low-quality scenarios. Although the number of parameters increases slightly from 2.5 M to 3.0 M, the inference speed remains competitive (Table 7), indicating that the proposed method retains real-time capability.  Conclusions  This paper proposes YOLO-FER, an improved FER model based on YOLO12n. The model improves feature extraction and robustness in low-quality image scenarios. By integrating C3k2_star, MCA, LRFE, and ATFL, YOLO-FER improves recognition performance and generalization ability. Experimental results on the RAF-DB and LLD datasets confirm that the model achieves high detection performance while maintaining efficient inference speed. The proposed method provides a practical solution for real-time FER applications in complex environments. Future work will focus on improving performance under extremely low-resolution conditions and exploring cross-domain generalization and micro-expression recognition.
Optimizing SATisfiability-Based Automatic Test Pattern Generation Systems: Unified Fault Set Construction,Modeling, and Solving
YAN Dapeng, HE Qirun, GUO Jing, WANG Boning, CAI Zhikuang
Available online  , doi: 10.11999/JEIT260025
Abstract:
  Objective  Boolean SATisfiability-Based Automatic Test Pattern Generation (SAT-Based ATPG) is widely used to generate tests for hard-to-detect single stuck-at faults and to prove fault untestability in combinational logic. When SAT-Based ATPG is applied to large netlists with dense fanout and reconvergence, its runtime and memory consumption are often dominated by three interacting issues. Representative fault lists produced by conventional dominance- or equivalence-based fault collapsing can remain large, increasing the number of SAT calls and enlarging the incremental context that must be maintained across faults. Meanwhile, SAT modeling may introduce redundant Conjunctive Normal Form (CNF) overhead, especially when an explicit faulty-circuit copy is constructed or when propagation constraints are encoded globally without locality control. In addition, fanout-reconvergence structures amplify assignment correlations along sensitized paths, and such correlations are often exposed only after repeated decisions and backtracking when only standard unit propagation is used. The unified optimization objective is therefore to reduce overall CNF size and solving cost while preserving completeness, so that a practical SAT-Based ATPG system remains efficient and stable across circuits of different scales.  Methods  A three-part framework is developed and implemented in an incremental SAT-Based ATPG flow, and the overall workflow is illustrated (Fig. 1). First, a checkpoint-driven dynamic fault-set construction method is proposed. Checkpoints are collected during netlist-to-directed-acyclic-graph conversion, including all primary inputs and all fanout branches, and XOR/XNOR outputs are additionally recorded as supplementary checkpoints to avoid over-collapsing XOR-related fault behavior. Representative faults are initialized on checkpoints by compact rules that combine dominance-oriented fault collapsing with equivalence-aware refinement, and solver-guided repair is performed when an untestable representative fault indicates potential masking under structural constraints. The procedure is summarized in Algorithm 1. Second, a SAT modeling method based on fault sensitization constraints is adopted to avoid explicit faulty-circuit duplication. Fault activation, propagation, and observability are represented by additional fault sensitization constraints over the original circuit variables, and auxiliary variables are introduced only when local bookkeeping is required. Constraint localization is restricted to the fault fanout cone, and cone-boundary and internal vertices are identified through a graph-traversal procedure (Fig. 2). Third, a dynamic implication learning mechanism oriented to fanout-reconvergence pairs is integrated into the incremental solving loop. Reconvergence pairs within the fault fanout cone are monitored under partial assignments, and structure-induced implications are injected either as implied assignments when a reconvergent output becomes functionally determined or as short conflict clauses when a branch-value combination becomes inconsistent with the fault sensitization constraints. The dynamic implication learning procedure is summarized in Algorithm 2.  Results and Discussions  The unified system is evaluated on ISCAS’85 and ISCAS’89 benchmark circuits, with TG-PRO used as the baseline implementation under the same SAT solver and termination settings. The checkpoint-driven dynamic fault-set construction method substantially reduces the representative fault space entering ATPG. Relative to the uncollapsed fault space, the average representative-fault ratio decreases from 51.38% to 42.01%, corresponding to an average fault-space reduction of 57.99%. The best-case ratio reaches 33.19% on large circuits with heavy reconvergence, which indicates that checkpoint-centered representative-fault allocation effectively suppresses redundancy without enlarging the untestable fault set (Table 1). The reduced fault-set size is reflected in preprocessing efficiency, and the total runtime for fault-set construction is consistently reduced, with an average reduction of 8.37% across the evaluated circuits (Fig. 3). For SAT model construction, the fault-sensitization-constraint encoding reduces CNF overhead relative to the baseline model construction. Across the benchmark set, the numbers of CNF clauses and CNF variables are reduced by 11.44% and 3.50%, respectively, which shows that avoiding explicit faulty-circuit duplication and localizing auxiliary constraints to the fault fanout cone effectively lowers memory demand (Table 2). The reduced CNF size and strengthened locality of constraints are further reflected in end-to-end runtime, and the total runtime of SAT modeling and solving is reduced across the evaluated benchmarks (Fig. 4). Dynamic implication learning further improves solving efficiency in reconvergence-heavy structures. Compared with static implication learning, CNF construction time increases by 3.0% on average because of the additional monitoring and injection operations, yet the overall runtime decreases by 4.42% on average, which indicates a favorable cost-benefit trade-off. The overhead attributed to dynamic implication learning accounts for 2.51% of the total runtime aggregated across circuits, which confirms that the injected implications and pruning clauses provide measurable solving benefits at limited extra cost (Table 3).  Conclusions  A unified optimization framework for SAT-Based ATPG is developed by combining checkpoint-driven dynamic fault-set construction, localized fault sensitization constraints for CNF modeling, and fanout-reconvergence-oriented dynamic implication learning. Representative faults are compressed through solver-guided repair of dominance and equivalence relations to avoid masking, CNF growth is controlled through duplication-free modeling localized to the fault fanout cone, and reconvergence correlations are exploited through incremental implication injection to strengthen propagation and enable early conflict pruning. Experimental results on standard benchmark circuits show consistent reductions in representative fault scale, CNF size, and total runtime, providing a practical approach for scaling SAT-Based ATPG to larger designs with complex fanout and reconvergence.
Research on UAV-assisted Dynamic-weight Edge Computing Offloading Strategy
WANG Yijun, WANG Yachu, SHAHD Batool, MIAO Ruixin
Available online  , doi: 10.11999/JEIT260054
Abstract:
  Objective  The increasing demands of the Internet of Things (IoT) for computational resources and real-time processing have highlighted the significance of Mobile Edge Computing (MEC). Traditional MEC relies on terrestrial base stations, resulting in coverage blind spots in remote or specialized environments. Unmanned Aerial Vehicle (UAV)-assisted MEC architectures exploit UAVs’ flexible deployment to expand service coverage. However, existing approaches for multi-terminal, multi-UAV scenarios often fail to optimize task offloading latency, system energy consumption, and adaptability to dynamic environments simultaneously. They also overlook optimal UAV selection when terminal devices are covered by multiple UAVs and lack adaptive mechanisms to adjust optimization objectives during task execution. This study addresses these challenges by integrating cooperative caching, offloading decision-making, and resource allocation strategies.  Methods  A three-tier microcloud-edge-terminal architecture is constructed, comprising a central cloud, multiple UAV edge servers with caching capabilities, and numerous mobile terminal devices. A cooperative caching mechanism reduces transmission delay during task execution. Task offloading adopts a fine-grained partial offloading mode, dividing complex tasks into dependent subtasks modeled through a Directed Acyclic Graph (DAG). The Cooperative Caching-Adaptive Hierarchical MultiVerse Optimizer (CCAH-MVO) algorithm is proposed. A hybrid coding scheme encodes offloading decisions, caching decisions, and resource allocation uniformly. A dynamic weight mechanism adaptively balances delay and energy consumption according to the system’s real-time energy state. Additionally, a UAV selection strategy is implemented for scenarios where terminals are covered by multiple UAVs. By simulating inter-universe material exchange and local refined search, the algorithm efficiently determines the optimal offloading strategy. MATLAB simulations validate the method under various experimental settings.  Results and Discussions  The simulation scenario involves 50 randomly distributed terminal devices and 5 UAVs in a 400 m × 400 m area. UAVs are deployed above terminal cluster centers, while terminals at cluster edges are simultaneously within the coverage of multiple UAVs (Fig. 5). The optimal UAV for each terminal is selected using the UAV selection function (Fig. 6), preventing resource bottlenecks and achieving balanced load distribution. In terms of delay performance, the CCAH-MVO algorithm maintains the lowest task delay across all task volumes, with a gradual increase as the number of tasks grows (Fig. 7). Delay under CCAH-MVO is consistently lower than that under fixed-weight strategies across the full task range, demonstrating the effectiveness of the dynamic adaptive mechanism in preserving low latency (Fig. 10). For energy consumption, differences among the algorithms are minor when task quantities are low. Under high task loads, the activation of the dynamic weight mechanism flattens the energy consumption curve (Fig. 8). When the number of tasks reaches 100, total energy consumption under CCAH-MVO is the lowest among all strategies and remains lower than the fixed-weight approach, reflecting effective control under critical energy conditions (Fig. 9). Regarding total system overhead, the CCAH-MVO algorithm consistently achieves the best performance. The gap with fixed-weight strategies widens when task numbers exceed 80, illustrating the dynamic weight mechanism’s collaborative optimization of delay and energy consumption (Fig. 11). Overall, by integrating the dynamic weight mechanism and balancing load through UAV selection, the CCAH-MVO algorithm effectively mitigates resource constraints and high task processing overhead in complex, dynamic UAV-assisted MEC environments. It ensures precise coordination between task delay and energy consumption across different load stages.  Conclusions  The proposed CCAH-MVO framework, incorporating a microcloud-edge-terminal architecture, cooperative caching mechanism, fine-grained partial offloading, dynamic weight adjustment, and UAV selection strategy, effectively addresses resource scheduling in complex multi-UAV MEC environments. Simulations show adaptive optimization of objectives, intelligent energy management, low latency, and reduced total system overhead, improving service stability and user experience. This research provides a practical solution for efficient UAV edge computing in dynamic environments. Future work will explore dynamic energy efficiency optimization and multi-node collaboration while maintaining low-latency performance.
Intelligent Protection Method for Personalized Location Privacy in 3D MCS Scenario
MIN Minghui, YE Jun, WEI Xipeng, MIN Bo, LI Shiyin
Available online  , doi: 10.11999/JEIT251237
Abstract:
  Objective  With the widespread adoption of intelligent mobile devices and growing reliance on location-based services, Mobile CrowdSensing (MCS) systems have become a critical infrastructure for urban sensing and smart city applications. In complex 3D environments such as hospitals and shopping malls, real-time user location data uploaded during task execution can be exploited by untrusted servers or external attackers, resulting in severe privacy risks. Existing location privacy protection methods are largely designed for 2D spaces and rely on fixed privacy budgets, lacking adaptability to dynamic user energy states, personalized privacy requirements, and inference attacks. These limitations hinder the simultaneous optimization of location privacy and service quality in 3D MCS systems. This paper proposes a personalized privacy-protection task assignment mechanism that integrates 3D Geo-Indistinguishability (3DGI) and distortion privacy, enabling dynamic optimization of location perturbation strategies and task allocation in complex 3D environments.  Methods  A dynamic 3D MCS system model is established, incorporating user energy states, task execution costs, individual privacy preferences, and attacker Bayesian inference behaviors. A reinforcement learning approach is adopted to learn personalized location perturbation strategies through continuous interaction with the environment. Specifically, a Proximal Policy Optimization (PPO)-based mechanism, PPOM, is proposed. It employs an Actor-Critic architecture to operate in a continuous action space for effective policy learning. A utility-driven reward function integrating user privacy feedback and server profit allows the system to optimize privacy protection and economic benefit simultaneously.  Results and Discussions  Extensive simulations on synthetic and GeoLife datasets demonstrate that PPOM outperforms 3DGI, 3DGI-PPOM, and LEAPER under Single-user Single-task (S-S) and Single-user Multi-task (S-M) scenarios. PPOM achieves superior 3D location privacy protection through personalized perturbation and two-dimensional action space design. Server net profit is maintained at a level comparable to 3DGI-PPOM while system utility is significantly improved, even under high user privacy preferences. LEAPER underperforms due to its 2D-oriented design. Overall, PPOM dynamically balances personalized privacy protection and server economic benefits in complex 3D MCS environments.  Conclusions  This study presents a reinforcement learning-based mechanism for personalized 3D location privacy protection and task assignment in dynamic MCS systems. Key contributions include: (1) a personalized privacy protection framework integrating 3DGI and distortion privacy, accounting for user energy status, task costs, privacy preferences, and attacker Bayesian inference in real time; (2) a perturbation policy optimization mechanism, PPOM, based on the PPO with an Actor-Critic structure, Gaussian sampling, and advantage-based learning to enhance robustness and stability in continuous high-dimensional action spaces; (3) a privacy-aware task assignment model using inferred locations from perturbed data, with a utility function jointly quantifying privacy protection and server profit, achieving dynamic trade-offs between user privacy and service quality under resource constraints.
A Radio Frequency Fingerprint Open-set Identification MethodCombining Multi-scale Wavelet Front-end and Hyperspherical Metric Learning
TIAN Xinyu, LI Zirui, ZHENG Qinghe, ZHOU Fuhui, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT260214
Abstract:
  Objective  Open-set Radio Frequency Fingerprint (RFF) identification under low Signal-to-Noise Ratio (SNR) conditions is challenging because fingerprint features are easily masked by noise, multipath effects induce nonlinear distortions, and existing methods struggle with feature extraction and unknown device detection. This study proposes a deep learning framework that integrates a multi-scale wavelet front-end with hyperspherical metric learning to achieve robust open-set RFF identification.  Methods  The proposed method, MS-RANet, comprises three key components. First, a multi-scale wavelet front-end based on one-dimensional stationary wavelet transform performs full-resolution, multi-scale decomposition of I/Q signals, preserving discriminative fingerprint information while suppressing noise. Second, a multi-scale residual attention network incorporates deep residual learning, global self-attention, and Bidirectional LSTM (BiLSTM) to enhance sensitivity to subtle fingerprint features and capture long-range temporal dependencies. Third, hyperspherical metric learning constrains the feature space onto a unit hypersphere, optimizing angular margins to produce compact intra-class and separable inter-class feature distributions. Unknown devices are subsequently detected using cosine similarity.  Results and Discussions  Experiments on a high-fidelity IEEE 802.11 simulation dataset demonstrate the effectiveness of MS-RANet. The method achieves an average classification accuracy of 65.34% across SNR levels from –5 dB to 20 dB, and an Area Under the Curve (AUC) of 0.81 at –5 dB SNR, outperforming DNN, GRU, CNN-LSTM, ResNet50, and DRSN-CA. Confusion matrices and Receiver Operating Characteristic (ROC) curves confirm robustness under extreme channel conditions. t-SNE visualization shows well-separated, compact clusters for known devices, while unknown samples are effectively isolated from known class regions. Ablation studies verify the contributions of the multi-scale wavelet front-end, global attention, BiLSTM, and hyperspherical metric learning modules.  Conclusions  This study presents a robust open-set RFF identification method combining a multi-scale wavelet front-end with hyperspherical metric learning. The framework exhibits strong noise resilience, enhanced feature discrimination, and reliable detection of unknown devices under low-SNR and multipath fading conditions. Future work will focus on reducing computational complexity, improving inference speed, evaluating generalization across diverse scenarios and protocols, and integrating the method with complementary physical-layer security mechanisms for collaborative authentication.
Prior-guided Temporal Fusion Method for Multi-UAV Cooperative Obstacle-avoidance Route Planning
WANG Ao, LI Dapeng, XU Yifan, FAN Bingyang, HAN Guang, ZHAO Haitao
Available online  , doi: 10.11999/JEIT251231
Abstract:
  Objective  Traditional multi-agent reinforcement learning methods for multi-Unmanned Aerial Vehicle(UAV) cooperative obstacle-avoidance route planning in cluttered 3D environments often suffer from slow convergence, weak coordination, and limited global awareness under partial observability. To address these limitations, this paper proposes a prior-guided temporal fusion value-decomposition framework, termed Prior-Guided-LSTM-QMIX (PGL-QMIX). The method uses local heuristic scores derived from offline A* reference paths to guide decision-making under partial observability. The aim is to reduce route length, avoid collisions, and preserve real-time planning capability.  Methods   The multi-UAV cooperative obstacle-avoidance route-planning task is formulated as a Partially Observable Markov Decision Process (POMDP). In the offline stage, A* is used to generate a reference path for each UAV. During online execution, only the locally visible path segment is extracted, and heuristic scores are constructed from this local prior information and fused with each UAV’s local observation. An individual-level Long Short-Term Memory (LSTM) network is used to capture temporal dependencies in local perception and prior guidance, whereas a system-level LSTM-based mixing network dynamically generates the mixing weights and bias for value decomposition, thereby enabling coordinated joint action-value estimation. Potential-based reward shaping is further adopted to improve training stability.  Results and Discussions   Simulation results in 3D grid environments show that PGL-QMIX converges faster and more stably than QMIX, VDN, and MAPPO. Compared with QMIX, the proposed method reduces the average route length by 8.8%, 12.3%, and 16.1% in three scenarios, respectively. It also improves convergence speed by 20.5%, 26.6%, and 38.1%, and increases the steady-state task success rate by 5.22, 14.99, and 37.25 percentage points, respectively. In addition, the generated trajectories are shorter and more efficient across different map sizes.  Conclusions   PGL-QMIX improves coordination, safety, and route efficiency for multi-UAV cooperative obstacle avoidance in cluttered 3D environments. By integrating heuristic prior guidance, recurrent temporal fusion, and value decomposition, the proposed method achieves faster convergence, higher success rates, and better generalization than existing baselines. Future work will incorporate real UAV dynamic constraints and communication-aware cooperative obstacle avoidance.
A Quantum-resistant Threshold Signature Scheme for Database Audit Logs
CHEN Dajiang, ZHANG Yiwen, JIAO Lihua, WANG Baizheng, CHEN Ruidong
Available online  , doi: 10.11999/JEIT251320
Abstract:
  Objective  Database audit logs are a core basis for ensuring data integrity, accountability, and traceability in distributed systems. However, current audit-log protection mechanisms still rely on classical public-key signature algorithms such as RSA and ECDSA, which are vulnerable to quantum attacks. Shor’s algorithm can break integer-factorization- and discrete-logarithm-based cryptography in polynomial time, while Grover’s algorithm reduces the brute-force security of hash-based and symmetric primitives. These threats weaken the long-term reliability of existing database audit-log protection mechanisms in cloud and data-intensive environments. To address this issue, a quantum-resistant framework for database audit logs is proposed to satisfy practical requirements for efficiency, real-time verification, scalable deployment, and distributed trust management. The goal is to provide a robust cryptographic foundation for next-generation database audit-log systems with unforgeability and tamper resistance under quantum threats. Methods A hybrid hash-based signature layer is constructed by combining Few-Time Signature (FORS) and eXtended Merkle Signature Scheme-Tree (XMSS-T). FORS supports efficient signing for high-frequency log events, whereas XMSS-T organizes authentication paths in a Merkle-tree hierarchy for scalable state management. This combination yields a multi-level quantum-resistant signing structure. A Shamir (r,n) threshold secret-sharing mechanism is then adopted to split the signing key into multiple shares managed by independent audit agents. This design avoids a single point of failure, supports collaborative attestation, and ensures that no single party holds complete signing authority. In addition, a chained-hash structure is used to bind consecutive log entries through one-way linkage, thereby ensuring tamper evidence and chronological integrity. The framework further defines a complete set of system algorithms, including setup, key distribution, partial-signature generation, signature aggregation, log-chain update, and verification, all of which operate efficiently in a distributed setting. For formal security analysis, the scheme is modeled in the Quantum Random Oracle Model (QROM), and adversarial capabilities are characterized through UF-CMA, IND-CCA2, and IND-CKA2 games to capture forgery, decryption misuse, and index-indistinguishability attacks. A prototype implementation is developed and evaluated under realistic multi-node settings across different log scales, message sizes, interval configurations, and threshold ratios.  Results and Discussions  Experimental results show that the proposed scheme achieves a good balance between quantum-resistant security and system performance. For large-scale logs, the average signing latency increases linearly with log volume, which supports the efficiency of the chained-hash structure (Table 2). Compared with representative quantum-resistant signatures such as Dilithium and SPHINCS+, the threshold-signing design reduces the peak computational burden on individual nodes while preserving strong security guarantees. The system also maintains a stable throughput of about 2 000 operations per second. The message-size analysis shows that latency increases with message size but remains manageable even when the message exceeds 4 kB (Fig. 2(b)). Additionally, variation in the threshold ratio (r/n) has a measurable but moderate effect on system latency. A higher threshold improves resistance to collusion, but slightly increases delay (Fig. 2(e)). The interval-based chained-signing strategy further reduces the signing frequency and improves throughput without weakening log-integrity guarantees. These results indicate that the proposed scheme is well suited to cloud-based and distributed database environments that require real-time auditing and high-volume log processing.  Conclusions  A quantum-resistant mechanism for database audit logs is presented by integrating hash-based signatures, threshold secret sharing, and chained log-integrity protection. The scheme provides strong quantum-resistant security guarantees, including provable unforgeability, confidentiality, and tamper resistance, supported by formal proofs in the QROM. Experimental results show that the mechanism maintains high signing and verification efficiency under large-scale deployment, with good scalability across different log volumes, message sizes, and threshold settings. Owing to its distributed trust model and quantum-resistant cryptographic basis, the proposed scheme offers a practical and secure solution for next-generation database audit systems in cloud computing, big-data processing, and compliance-critical environments.
A Study of the Effects of Amplitude and Phase Errors on the Angle-Measurement Accuracy of Phased Array Radar under Interference Cancellation Conditions
ZHAN Siheng, ZHOU Liang, SHEN Ruobin, ZHANG Jiahao, WANG Bin, MENG Jin
Available online  , doi: 10.11999/JEIT251195
Abstract:
  Objective  The electromagnetic environment is becoming increasingly complex, and mainlobe interference constrains the detection performance of phased array radars. Adaptive interference cancellation (AIC) can effectively suppress such interference but leads to mainlobe pattern distortion and introduces azimuth angle measurement errors. Most existing studies focus on interference cancellation mechanisms, with little attention paid to the angle measurement errors introduced by this technique. Amplitude-phase channel errors in the radar receive channel also degrade angle measurement accuracy. This paper investigates the influence of amplitude-phase channel errors in the receive channel on the angle measurement errors of monopulse phased array radars equipped with no difference-difference channel.  Methods  A monopulse phased array radar with no difference-difference channel is studied, and the amplitude-phase errors in the receiving channels are modeled as a normal distribution. The mean shows the systematic offset and the standard deviation shows random fluctuations. The operation principles of phased array radar receivers, monopulse radar systems, angle measurement theory, and mainlobe interference suppression and cancellation theory are introduced. Two angle measurement models are established through theoretical derivation: an ideal reference model and an amplitude-phase error model. Simulation results show that the radar’s effective angle measurement range is ±2.5° under ideal interference-free and error-free conditions. The jamming source is set at –1.2°, and the angle measurement results are taken as a reference for subsequent experiments. Monte Carlo simulations (100 independent tests for each parameter set) are used to analyze the statistical characteristics of angle measurement errors. Heatmaps are used to clearly show the absolute errors and obtain their variation laws.  Results and Discussions  (1) When there is no channel amplitude-phase error, the jamming angle is fixed at –1.2°; prior to interference cancellation, the target bearing matches the true value. After cancellation, the absolute error between the target signal and the true value near the beam normal is less than or equal to 0.1°, but null dips near the jamming angle cause abrupt changes in azimuth angle, and the error increases as the deviation from the beam normal increases. (2) Before cancellation, the azimuth angle measurement error increases with the absolute value of the amplitude mean and the incident angle, reaching a peak of approximately >0.06° at an amplitude mean of ±0.9 dB and an incident angle of ±2.5°. Within an incident angle range of ±2°, the error is typically <0.02°; when the amplitude mean is fixed, the error increases with the amplitude standard deviation; when the phase standard deviation is fixed, the error increases with the absolute value of the phase mean; it exceeds 0.15° at a phase mean of ±0.9°, and reaches approximately 0.6° at a phase standard deviation of 6° and an incident angle of ±2.5°. (3) After cancellation, phase error is most sensitive at an incident angle of 0.5°, where the azimuth angle measurement error reaches 0.4°. Outside this region, the error can be controlled within 0.2° and decreases rapidly as the deviation from the beam normal increases.  Conclusions  This paper quantifies the impact of amplitude-phase errors in the receiving channel on azimuth angle measurement errors before and after interference cancellation. The main conclusions are as follows: (1) Both amplitude and phase errors cause random fluctuations in azimuth angle measurements, with phase errors having a more significant impact; (2) In the absence of jamming, azimuth angle measurement errors are smallest near the beam axis and increase as the measurement approaches the boundaries of the effective angle measurement range; (3) In the presence of jamming and during cancellation, the azimuth angle measurement error peaks near the beam normal and decays rapidly. This study provides engineering guidance for azimuth angle measurement error assessment, error budgeting, and mainlobe interference suppression. Future research will focus on non-normal amplitude-phase errors, calibration dynamics, scenarios with multiple jamming sources, and experimental validation.
A Survey of Processor Security
CHEN Congcong, GU Zhiyang, ZHANG Jiliang
Available online  , doi: 10.11999/JEIT260026
Abstract:
  Significance   Processor security is a cornerstone of modern information security. Cryptographic algorithms, operating systems, and applications have long relied on processors as trusted computing bases. However, as Moore’s Law slows, modern processors increasingly adopt aggressive microarchitectural optimization techniques to improve performance and energy efficiency, often without sufficient security consideration. This trend has led to frequent security vulnerabilities in recent years. In particular, microarchitectural timing channels, exemplified by Meltdown and Spectre, exploit timing differences caused by microarchitectural state changes to break fundamental hardware and software isolation, affecting billions of devices worldwide. At the same time, the boundary between architectural and microarchitectural behavior has become less clear, giving rise to new attack paradigms and turning timing channels from isolated hardware flaws into cross-layer system security problems.  Progress   Although substantial progress has been made in the study of timing channels, existing surveys still have several limitations. First, the mechanisms of timing channels are highly diverse, and the set of exploitable components continues to grow. Hardware-centric classification schemes are therefore insufficient to capture emerging and previously unknown attacks, and they often obscure the common features shared across different techniques. Second, as traditional microarchitectural channels become better understood and partially mitigated, leakage increasingly shifts to higher-level shared resources, including operating system policies and software-managed shared resources. However, previous studies have often treated software mainly as an execution context rather than a direct source of timing leakage. In addition, current discussions of defenses tend to emphasize individual techniques, with limited analysis of their scope and failure modes.  Contributions   This survey systematically reviews timing channels from a cross-layer perspective and unifies hardware- and software-based timing channels under a common abstraction. Four necessary conditions for timing channel exploitation are identified, and a unified classification framework is established based on the nature of shared mutable state and the mechanisms that make timing differences observable. Within this framework, representative attacks from the past decade are comprehensively reviewed, their attack procedures are systematically analyzed, and their common features are clarified. In addition, existing defense mechanisms are classified according to the leakage conditions they are intended to disrupt, and their scope and possible failure modes are examined. This survey also reviews current automated vulnerability detection methods.  Prospects   Future research on timing channels faces several emerging challenges. New microarchitectural optimization techniques continue to create new attack surfaces, while resource sharing at the software level may produce additional forms of timing leakage. Moreover, emerging platforms, including chiplet-based architectures, cloud computing environments, hardware accelerators, and heterogeneous systems, are likely to expose new types of timing channels that require systematic study.
Shallow-Water Geoacoustic Parameter Inversion Using Stokes Parameters and an Attention-Enhanced Multi-Task U-Net
HUANG Qianzhuo, LI Xiaoman, BI Xuejie, ZHANG Zishi, TONG Han, LI Fei
Available online  , doi: 10.11999/JEIT251085
Abstract:
  Objective  Geoacoustic parameters in shallow water are critical for characterizing underwater acoustic propagation. Traditional inversion methods, however, are limited by high computational complexity, high cost, and strong dependence on the accuracy of environmental models. To address these issues, an efficient and robust inversion method is proposed to improve the reliability and stability of shallow-water geoacoustic parameter estimation while preserving computational efficiency.  Methods  This method is developed from the Stokes parameters of the vector acoustic field. Signals received by a single vector hydrophone are processed with a warping transform to separate and extract the normal modes propagating in a shallow-water waveguide. The extracted signals are then used to calculate the Stokes parameters, which are normalized and used as input features for the inversion model. An attention-enhanced multi-task U-Net is constructed with a shared encoder and multiple prediction branches to estimate key geoacoustic parameters, including compressional wave velocity, shear wave velocity, density, compressional wave attenuation, and shear wave attenuation. In addition, channel attention and spatial attention, together with a multi-task loss function with uncertainty weighting, are used to improve feature extraction and adaptively balance the different parameter inversion tasks.  Results and Discussions  The attention mechanism is shown to suppress fluctuations in model predictions and to improve the accuracy and stability of geoacoustic parameter inversion. When 200 test samples are evaluated, the mean absolute percentage errors of both compressional wave velocity and seabed density remain below 5% (Table 3). After the attention mechanism is introduced, the errors in compressional wave velocity and seabed density are further reduced to below 3% (Table 5), which indicates improved prediction accuracy for these key parameters. The proposed method is also shown to be insensitive to parameter mismatch and to have strong robustness to environmental variation. Furthermore, the method is validated with measured data from a shallow-water region in the northern South China Sea, and its effectiveness and reliability in practical applications are confirmed (Table 6 and Fig. 9). These results show that the attention-enhanced multi-task U-Net effectively captures critical features from the Stokes parameters and yields more stable and accurate geoacoustic parameter estimation in shallow-water environments.  Conclusions  The inversion method based on the Stokes parameters and an attention-enhanced multi-task U-Net effectively improves the accuracy and stability of shallow-water geoacoustic parameter estimation and shows strong performance in the prediction of compressional wave velocity, shear wave velocity, and density. However, limitations remain in the inversion of seabed attenuation. Future work should focus on improving feature extraction methods and network architecture and on testing the applicability of the method under more complex marine conditions.
Index Modulation Design with Sparse Spatial Constellation and Dynamic Multi-RIS Block Selection for RIS-MIMO Systems
HUANG Fuchun, ZHU Han, TANG Xiaoqing, YANG Fan, HUANG Jie
Available online  , doi: 10.11999/JEIT251289
Abstract:
  Objective  This paper aims to address two main challenges in RIS-assisted MIMO index modulation (IM) systems: (1) the practical deployment difficulty of using a single large-scale RIS panel, and (2) the high complexity of designing efficient transmit spatial signal vectors. To overcome these issues, this paper proposes a joint design of sparse spatial constellation and dynamic multi-RIS block selection to enhance spectral efficiency, bit error rate (BER) performance, and deployment flexibility.  Methods  Inspired by the extended space index modulation (ESIM) paradigm, a new design of sparse spatial constellation with two active antennas (SCTA) is proposed, which leads to the SCTA-RIS-SM system. The idea is to mix primary and secondary PAM constellations to form a spatial constellation vector[x1,x2]T and modulated onto two active antennas. Thus, it not only maximizes the minimum Euclidean distance between transmit vectors but also significantly enhances the anti-interference capability. To get around the deployment difficulties of a single large RIS panel, an enhanced scheme of SCTA-MBRIS-SM is further proposed. This system employs a distributed array of multiple small RIS blocks and dynamically selects a subset of blocks for cooperative reflection, treating different “RIS block selection combinations” as a new index modulation dimension. Finally, theoretical analysis of spectral efficiency and average bit error rate is carried out, and Monte Carlo simulations are conducted to compare the proposed systems with several existing schemes.  Results and Discussions  Simulation results demonstrate that the proposed SCTA-RIS-SM system achieves notable signal-to-noise ratio (SNR) gains over RIS-SIM, RIS-SM, and DH RIS-SM systems under the same spectral efficiency (e.g., 10–12 bits/s/Hz) in near-field wideband scenarios. For instance, at BER = 10−3, SCTA-RIS-SM outperforms RIS-SIM by about 1.5–2.5 dB and DH RIS-SM by more than 6 dB. Furthermore, the SCTA-MBRIS-SM system, by exploiting additional index modulation from RIS block selection, further improves the BER performance and spectral efficiency compared to SCTA-RIS-SM without increasing the number of radio frequency chains. With total numbers of reflecting elements kept identical, the proposed multi-block scheme achieves up to 5 dB gain over RIS-SIM at BER = 10−3. Theoretical BER curves match well with simulation results in the high SNR region, validating the analytical derivations. The results also show that the performance advantage is maintained as the number of transmit antennas increases, and the system exhibits good compatibility with channel coding.  Conclusions  This paper addresses the challenges of large-scale RIS deployment and high-complexity spatial signal design in RIS-assisted MIMO systems. The proposed sparse spatial constellation with two active antennas optimizes the Euclidean distance distribution in the signal space, effectively improving system reliability. The introduction of dynamic multi-RIS block selection transforms hardware deployment constraints into a new dimension for spectral efficiency enhancement, offering a feasible path for practical large-scale RIS applications. Simulation results confirm that jointly optimizing the transmit spatial vector and the degrees of freedom of RIS reflections is an effective strategy for performance improvement. Future work will focus on robustness under imperfect channel state information, construction of higher-dimensional sparse constellations, extension to extremely large-scale MIMO scenarios, and multi-user communications.
Semantic-guided Unified Multi-scale Deep Unrolling Network for Pansharpening
CHEN Junjie, WANG Tingting, FANG Faming, ZHANG Guixu
Available online  , doi: 10.11999/JEIT251252
Abstract:
  Objective  With the rapid advancement of satellite imaging technologies, the demand for high-resolution multispectral remote sensing imagery has grown substantially across a wide range of applications. Due to the wide variety of satellite platforms, there exists a significant domain shift across datasets collected from different satellites. As a result, most existing deep learning (DL)-based pansharpening methods are trained individually for each satellite dataset, and consequently exhibit limited generalization capability across different satellites. To address these limitations, this study proposes a Semantic-guided Unified Multi-scale Deep Unrolling Network (SUM-DUN), which is designed based on classical optimization theory, adopting a 3D multi-scale deep unfolding architecture for integrated feature extraction and fusion. Leveraging multimodal large language models (MLLMs), the proposed method derives semantic textual prompts from the input images, which direct the model to adaptively adjust its feature representations and thereby enhance fusion quality. The proposed method aims to achieve unified remote sensing image fusion through tailored network architecture and prompt-guided mechanisms, thereby providing reliable support for high-level image interpretation tasks.  Methods  Following the Maximum A Posteriori(MAP) estimation principle, the optimization process for HRMS recovery is unfolded into the proposed SUM-DUN(Fig. 1). Each iteration stage of SUM-DUN consists of two main modules: a Gradient Descent Module (GDM) and a Semantic-guided Proximal Mapping Network (SPMN), which are used to approximate the operations in Eq. (5) and Eq. (6), respectively. GDM performs a gradient descent update based on the current feature estimate and the degradation model. The SPMN, implemented with a Transformer-based architecture as illustrated in Fig. 2(b), incorporates semantic textual prompts generated from the input image pair by MLLMs. These prompts guide the network to adaptively select appropriate feature propagation strategies for the current pair, helping suppress noise and mitigate discrepancies across different satellite sensors. Moreover, leveraging upsampling and downsampling operations, the network transmits MS and PAN features between iterative stages, thereby progressively preserving and enhancing multi-scale spatial and spectral information throughout the unfolding process.  Results and Discussions  To demonstrate the effectiveness of the proposed method, we compare the method against seven representative baselines, including 2 traditional methods (BDSD and PRACS) and 5 DL–based methods (AWFLN, FusionMamba, PanMamba, WFANet and TMDiff). For the reduced resolution evaluation, where ground-truth HRMS images are available, we adopt several widely-used reference based metrics, including Spectral Angle Mapper (SAM), Spatial Correlation Coefficient (SCC), Peak Signal-to-Noise Ratio(PSNR), Erreur Relative Global Adimensionnelle de Synthèse (ERGAS), Averaged Universal Image Quality Index(QAVE) and the Universal Image Quality Index for 4-band and 8-band images. These metrics jointly evaluate spectral fidelity, spatial consistency, and overall image quality. For the full-resolution evaluation, where ground-truth HRMS are unavailable, we rely on no-reference quality indices. Specifically, we employ the Hybrid Quality with No Reference (HQNR) metric, along with its spectral distortion component and spatial distortion component, to assess the fusion quality in real-world scenarios. Quantitative evaluations on the GF-1, QB, WV-2, and WV-4 test datasets demonstrate that the proposed method consistently achieves either the best or second-best performance across all metrics, under both reduced-resolution and full-resolution settings(Table 23). These results clearly indicate that the proposed method is capable of simultaneously preserving spectral fidelity and spatial consistency, while maintaining robust performance across different satellites and remaining effective in more challenging scenarios. The ablation studies validate the effectiveness of the 3D architecture, the multi-scale network design, and the spatial–channel prompt guidance mechanism, as removing or altering any of these components leads to varying degrees of performance degradation(Table 4-5).  Conclusions  This study proposes a semantic-guided unified multi-scale deep unfolding method for pansharpening, which leverages semantic prompts generated by a MLLM to facilitate efficient and unified fusion of images from different satellites. The proposed approach is built upon a deep unfolding framework and employs a 3D convolutional architecture to accommodate varying numbers of spectral bands across satellite datasets. The multi-scale network design is further incorporated to extract spatial and spectral features at different levels, thereby enhancing the fusion capability. In addition, the sematic prompt integration module is introduced to adaptively route spatial and channel features based on the extracted semantic information, enabling more effective feature propagation and improving both spatial detail reconstruction and spectral consistency. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance in terms of both visual quality and quantitative evaluation metrics.
A Spatiotemporal Coupling Traffic Flow Prediction Model with Dynamic Graph Recursion and State Space
ZHANG Hong, QI Fangzheng, LUO Shengjun, ZHANG Xijun, HOU Liang, HUANG Hairong
Available online  , doi: 10.11999/JEIT251198
Abstract:
  Objective  Accurate traffic flow prediction is crucial for intelligent transportation systems, but it remains challenging due to dynamically evolving spatial dependencies and long-range temporal correlations in urban road networks. To address these issues, this study proposes DGGRU-Mamba, a spatiotemporal traffic forecasting framework that integrates dynamic graph recurrent modeling with a structured state space mechanism and jointly captures adaptive spatial structures and long-term temporal dynamics.  Methods  The proposed DGGRU-Mamba consists of two core modules: Dynamic Graph Recurrent Modeling (DGRM) and Spatiotemporal Mamba (ST-Mamba). A spatiotemporal embedding generator is introduced to encode periodic temporal information and node-specific spatial features for adaptive graph construction. The DGRM module dynamically updates time-varying adjacency structures through gated graph recurrent units, enabling adaptive modeling of evolving spatial dependencies, while the ST-Mamba module employs structured state transitions to efficiently capture long-range temporal correlations. In addition, a dual-branch prediction scheme, including Forecast and Backcast branches, is adopted to improve multi-step prediction accuracy and alleviate cumulative errors.  Results and Discussions  DGGRU-Mamba is evaluated on four benchmark datasets, PEMS03, PEMS04, PEMS07, and PEMS08, using MAE, RMSE, and MAPE as evaluation metrics. Experimental results show that the proposed model achieves competitive performance across all datasets. On PEMS04, compared with the mainstream attention-based model STAEformer, DGGRU-Mamba reduces MAE, RMSE, and MAPE by about 4.2%, 3.8%, and 2.9%, respectively, while shortening the inference time by 4.82 s. These results indicate that the proposed framework improves prediction accuracy while maintaining high computational efficiency. The gains mainly stem from the complementary effects of DGRM and ST-Mamba, which enhance dynamic spatial dependency modeling and long-range temporal learning at lower computational cost.  Conclusions  A novel spatiotemporal traffic flow prediction framework, DGGRU-Mamba, is proposed for modeling dynamic spatial structures and long-term temporal dependencies in complex traffic networks. By integrating dynamic graph recurrent modeling with a structured state space mechanism, the framework achieves a favorable balance between prediction accuracy and computational efficiency. Extensive experiments on multiple benchmark datasets verify its effectiveness and scalability for multi-step traffic forecasting. Future work will consider external factors such as weather and traffic events to further improve practical applicability.
Secure Multi-Task Federated Panoptic Perception Algorithm for Connected Autonomous Vehicles
HUANG Xiaoge, CHEN Ming, TANG Yi, LIANG Chengchao, CHEN Qianbin
Available online  , doi: 10.11999/JEIT250749
Abstract:
With the rapid development of vehicular networks and deep learning, connected autonomous vehicles (CAV) are now capable of collecting image data from driving scenarios and leveraging Convolutional Neural Networks for feature extraction and processing, thereby enabling efficient perception of their surroundings. However, due to the inherent complexity of driving scenarios, single-task models struggle to address various perception demands. And the performance of deep learning models heavily relies on large-scale data, while the data collected by individual vehicles is insufficient for training models with generalization capabilities. Federated learning overcomes data silos by enabling CAV to upload local model gradients instead of raw data to a central server for aggregation, which can preserve data privacy. Therefore, we present a Secure Multi-Task Federated Panoptic Perception algorithm for vehicular network scenarios. Firstly, the panoptic perception model is constructed to allow CAV to execute multiple perception tasks simultaneously. Besides, a CAV selection strategy based on hybrid scoring is designed to select high-quality local models from vehicles. Finally, a global model aggregation scheme based on Shamir secret sharing is introduced to prevent data leakage in the event of server attacks or outages, which employs secret sharing during the aggregation process. Simulation results validate the effectiveness of the proposed algorithm.
Recent Advances in Remote Sensing Image-Text Retrieval Driven by Vision–Language Foundation Models
WU Hui, ZHAO Yan, ZHANG Peirong, HOU Yingyan, QI Xiyu, WANG Lei
Available online  , doi: 10.11999/JEIT260189
Abstract:
  Significance   Remote sensing image–text retrieval (RS-TIR) connects massive Earth observation imagery with natural-language queries and has become an important interface for geospatial intelligence systems. Compared with conventional content-based retrieval, RS-TIR enables users to search scenes, objects, spatial layouts, and functional regions through semantic descriptions instead of handcrafted visual cues. This capability is increasingly needed in natural resource monitoring, urban governance, disaster response, environmental assessment, and on-demand retrieval from rapidly growing satellite archives. However, the task remains fundamentally challenging because remote sensing imagery is captured from a nadir or near-nadir perspective, exhibits strong rotation invariance, contains extreme scale variation from tiny vehicles to large airports, and often involves domain-specific semantic descriptions such as land-use attributes, spatial distributions, and geoscientific relations. Meanwhile, the amount of high-quality image–text annotation is still limited relative to the scale of remote sensing data. These properties enlarge the semantic gap between images and language and constrain the generalization ability of traditional cross-modal retrieval methods. Against this background, the review focuses on how vision–language foundation models (VLMs) reshape RS-TIR by introducing large-scale contrastive pre-training, stronger transferable representations, and more flexible multimodal interaction mechanisms. The review also clarifies why remote sensing adaptation is necessary and why a dedicated synthesis of architectures, datasets, alignment mechanisms, and future directions is timely for the field.  Progress   The technical development of RS-TIR is organized from three complementary perspectives. First, the review summarizes the domain-specific challenges that shape the task, including visually isotropic topology with extreme scale variation, professionalized and fine-grained textual semantics, and the compounded semantic gap between overhead imagery and natural-language descriptions (Fig.3). The overall survey structure is then outlined to show the logical progression from task formulation to future challenges (Fig.1). From the methodological timeline, RS-TIR evolves from handcrafted visual descriptors and shallow semantic mapping to deep representation learning, and then to VLM-driven paradigms with broader generalization and zero-shot transfer ability (Fig.4, Table 2). Early methods rely on color, texture, shape, and hash-based retrieval, but they struggle to model high-level geospatial semantics and complex scene composition. Deep learning methods improve retrieval by learning joint embedding spaces, adopting dual-encoder or interaction-based architectures, and introducing multi-scale feature fusion and region-aware matching. These methods substantially enhance semantic consistency, yet they still depend heavily on labeled data and often suffer from limited robustness in open or cross-sensor scenarios. Second, the review summarizes the benchmark ecosystem used to evaluate these methods. Representative datasets span small-scale test sets such as Sydney-Caption and UCM-Caption, mainstream benchmarks such as RSICD and RSITMD, and recent large-scale training resources such as RS5M and SkyScript (Table 1). These datasets reveal a clear transition from small manually annotated corpora to web-scale or automatically generated image–text pairs, which in turn supports domain pre-training and larger model adaptation. Third, the review analyzes the core VLM techniques now driving progress in RS-TIR. The model spectrum and representative architecture families, including contrastive dual-encoder models, multimodal interaction models, and remote sensing foundation models integrated with large language models, are summarized systematically (Fig.5, Fig.6, Table 3). Domain adaptation routes are further grouped into continued remote sensing pre-training, parameter-efficient transfer learning, adapter-based tuning, prompt learning, and instruction tuning. At the semantic alignment level, the review emphasizes contrastive joint embedding, fine-grained multi-scale alignment, and the incorporation of remote sensing priors such as spatial topology and geolocation. Performance comparisons on RSICD and RSITMD show that the introduction of remote sensing VLMs, especially RemoteCLIP, GeoRSCLIP, iEBAKER, and LRSCLIP, leads to consistent gains in mean Recall and overall retrieval robustness (Table 4). In parallel, the review also tracks the extension of retrieval capability into unified multi-task remote sensing models, where retrieval, grounding, segmentation, and reasoning begin to share a common multimodal representation space.  Conclusions  Several conclusions are drawn from the comparative analysis. First, VLMs establish a new dominant paradigm for RS-TIR because they significantly narrow the cross-modal semantic gap while improving transferability across datasets and scenes. Second, there is no universally optimal architecture: dual-encoder models remain attractive for large-scale retrieval because of their efficiency, whereas interaction-based or instruction-enhanced models offer finer semantic alignment at higher computational cost. Third, domain adaptation is indispensable. Continued pre-training on remote sensing image–text corpora, parameter-efficient tuning, and prompt-based adaptation consistently outperform direct reuse of Internet-trained VLMs, indicating that remote sensing imagery differs too strongly from natural-image distributions to rely on generic pre-training alone. Fourth, the most effective recent methods do not improve performance through scale alone; they also exploit remote sensing-specific information, including multi-scale structures, foreground entities, explicit keyword reasoning, and spatial priors. Finally, the review shows that the field is shifting from isolated retrieval models toward more general geospatial multimodal systems. Retrieval is no longer treated only as a matching task, but also as a key capability supporting question answering, instruction following, knowledge augmentation, and coordinated reasoning in remote sensing applications.  Prospects   Future research is expected to move in four closely related directions. One direction is the unified representation of multi-source heterogeneous data, especially the integration of optical imagery with synthetic aperture radar, hyperspectral data, thermal infrared observations, and multi-temporal acquisitions. Another direction is knowledge-enhanced retrieval, where geospatial priors, land-use rules, remote sensing terminology, and external knowledge bases are incorporated into multimodal alignment and retrieval-augmented reasoning. A third direction is lifelong and open-world learning. Real deployment requires models to remain reliable under seasonal changes, sensor updates, regional domain shifts, cloud contamination, and newly emerging categories without catastrophic forgetting. The fourth direction concerns efficiency and deployability. Because practical remote sensing systems often operate under tight computational budgets, lightweight tuning, sparse computation, token reduction, model compression, and on-orbit or edge inference will become increasingly important. Interactive and explainable retrieval is also likely to grow in importance, allowing analysts to refine queries through dialogue and inspect the image regions or semantic cues that support retrieval decisions. Overall, continued progress in data construction, domain adaptation, semantic alignment, and efficient multimodal modeling is expected to make RS-TIR a more robust infrastructure capability for Earth observation applications.
Semi-passive Intelligent Reflecting Surface-assisted Integrated Sensing and Communication for Distributed and High-precision Joint Localization
HUANG Yi, XIONG Chaorui, TANG Xiaowei, SHI Yunmei
Available online  , doi: 10.11999/JEIT251039
Abstract:
  Objective   Integrated Sensing And Communication (ISAC) enables communication and sensing on a shared radio platform, supporting emerging applications such as autonomous driving and smart city infrastructure while improving spectral efficiency and reducing system cost. A key feature of ISAC systems is the reuse of communication signals for sensing and localization, which enables high-precision positioning without dedicated localization pilots. In semi-passive Intelligent Reflecting Surface (IRS)-aided ISAC systems, sensing performance is improved while low hardware complexity and power consumption are maintained. Compared with fully passive IRSs, semi-passive IRSs provide limited signal-processing capability for more flexible beam control, while avoiding the high hardware cost of fully active IRSs. In addition, a semi-passive IRS can cooperate with the sensing array at the Base Station (BS) to form a distributed sensing architecture. Through joint processing of the signals received at the BS and the IRS sensing arrays, the effective sensing aperture is enlarged, which improves the accuracy and robustness of channel-parameter estimation. However, existing studies mainly address fully passive or fully active IRSs in communication scenarios, whereas the sensing capability of semi-passive IRSs and their cooperation with BS arrays for high-precision localization remain insufficiently studied. Therefore, high-precision Three-Dimensional (3D) target localization under semi-passive IRS-assisted cooperative sensing is investigated.  Methods  A semi-passive IRS-assisted ISAC framework is proposed for cooperative 3D target localization. Sensing arrays are deployed at both the BS and IRS to jointly receive target-reflected Orthogonal Frequency Division Multiplexing (OFDM) signals, which are then delivered through reliable backhaul links to a central processor for joint processing. Two localization algorithms are proposed. The first is a parameter-decoupled two-step localization method. In this method, the Angle of Arrival (AoA) is first estimated by Fast Fourier Transform (FFT) with a refinement procedure, and the propagation delay is then estimated by the Spatial Smoothing MUltiple SIgnal Classification (MUSIC) algorithm. The target position is subsequently obtained by solving linear equations constructed from the estimated channel parameters and the geometric relationships among the arrays. The second is a Direct Position Determination (DPD) method, in which a maximum-likelihood optimization problem is formulated and a Newton-like algorithm is used to estimate the target position directly. By jointly using prior information, including spatial correlation among arrays, communication symbols, beamforming vectors, and IRS reflection coefficients, this method reduces the error propagation of the two-step localization method and improves localization accuracy and robustness. Furthermore, the Cramér-Rao Lower Bound (CRLB) for target-position estimation is derived under circularly symmetric complex Gaussian noise to provide a theoretical benchmark. Monte Carlo simulations are conducted to verify the proposed algorithms, examine the effect of the Rician K-factor on localization performance, and compare the proposed methods with conventional AoA/ToA-based localization methods.  Results and Discussions  Under the proposed semi-passive IRS-assisted ISAC framework, the two-step localization method achieves statistically efficient channel-parameter estimation, and its estimation error approaches the CRLB at high Signal-to-Noise Ratio (SNR) (Figs. 24). At low BS transmit power, severe path loss and noise distortion cause a clear gap between the Root Mean Square Error (RMSE) and the CRLB. As the transmit power increases, the sensing SNR increases and parameter-estimation accuracy is improved. Because the target position in the two-step localization method is obtained from linear equations constructed from the estimated channel parameters and known array geometry, the final localization accuracy follows the same trend as the intermediate parameter-estimation performance. However, because of error propagation in the two-stage process, the localization error deviates more clearly from the CRLB (Fig. 5). Increasing the number of OFDM symbols improves localization accuracy, but also increases latency, which indicates a trade-off between accuracy and delay in practical systems. Compared with the two-step localization method, the DPD method achieves higher localization accuracy under the same number of OFDM symbols (Fig. 5). By jointly processing the signals received from all sensing arrays and directly optimizing the target position under the maximum-likelihood criterion, error propagation is effectively avoided. In addition, spatial correlation among arrays, communication symbols, beamforming vectors, and IRS reflection coefficients are fully used, which further improves estimation performance. For the same localization accuracy, the DPD method requires fewer OFDM symbols or lower transmit power than the two-step localization method, which shows clear advantages in latency and energy efficiency. Simulation results also show that both proposed methods benefit from a larger Rician K-factor (Fig. 6), because a stronger line-of-sight component suppresses multipath interference. This effect is more evident in the high-SNR region, where small-scale fading becomes the main factor limiting performance. Finally, compared with conventional AoA/ToA-based localization methods, the proposed methods provide better localization accuracy and robustness (Fig. 7).  Conclusions  A semi-passive IRS-assisted ISAC system is proposed for 3D cooperative localization with reduced localization pilot overhead. Two localization algorithms are developed: a low-complexity two-step localization method and a high-accuracy DPD method. The theoretical performance limit is established through derivation of the CRLB. Simulation results verify that the two-step localization method enables high-precision localization, whereas the DPD method provides better performance, and its RMSE approaches the CRLB at high SNR. Both methods also show good scalability and robustness. Future work will address multi-target scenarios and resource optimization.
Cell-Free Joint Beamforming and AP–User/Target AssociationOptimization for Integrated Sensing and Communication
FANG Zhiyu, XIA Xiaochen, XU Kui, WEI Chen, XIE Wei, YE Zilü
Available online  , doi: 10.11999/JEIT250574
Abstract:
  Objective  Integrated Sensing And Communication (ISAC) is a key technology for Sixth-Generation (6G) networks. The cell-free architecture is a promising regional coverage paradigm for 6G. Cooperation among Access Points (APs) mitigates coverage imbalance, interference, and capacity limitations in conventional cellular systems, while enabling communication and sensing services for low-altitude targets with wide-area continuous coverage. However, existing studies on cell-free systems often rely on statistical channel models, which fail to capture realistic propagation characteristics in complex environments. The global Channel State Information (CSI) required for transmission optimization is difficult to obtain, and instantaneous CSI cannot be guaranteed due to the high mobility of low-altitude targets. To address these issues, a joint beamforming and AP–user/target association optimization method based on a Binary Radio Map (BRM) is proposed. The environmental information provided by the BRM is used to predict channels between APs and users/targets, thereby providing global channel information for joint optimization. On this basis, an ISAC satisfaction-based optimization model is constructed, and an iterative optimization algorithm for beamforming design and AP–user/target association is developed using a genetic algorithm.  Methods  First, the channels between APs and users/targets are predicted using environmental information derived from the BRM. An ISAC satisfaction-based optimization model is then established to unify communication and sensing performance. Due to the coupling between communication and sensing and the non-convex nature of the problem, the optimization problem is decomposed into two subproblems corresponding to communication and sensing beamforming. In each iteration, the beamforming design is reformulated as a Second-Order Cone Program (SOCP) to obtain beamforming matrices that maximize the satisfaction function. An iterative solution algorithm is applied to compute the communication and sensing beamforming matrices efficiently. Subsequently, based on the optimized satisfaction function, an AP–user/target association optimization method is designed using a genetic algorithm.  Results and Discussions  Simulation results verify the effectiveness of the BRM-assisted channel prediction and association optimization method. Compared with the conventional AP association method based on the shortest path, the proposed approach reduces the required transmission power by approximately 5 dBm while achieving higher user/target satisfaction (Fig. 7). As the transmission power increases, the satisfaction of users/targets gradually improves and approaches 1. In contrast, under the conventional scheme, a large gap remains between the maximum and minimum satisfaction values at the same transmission power (Fig. 8). When the transmission power is 40 dBm, the proposed method effectively reduces this disparity and balances performance among different users/targets. Although the null-space projection scheme leads to some degradation in sensing performance, the minimum received sensing power remains stable. This indicates that the overall system satisfaction is not affected and that sensing requirements are still satisfied (Fig. 9).  Conclusions  This study addresses the AP-user/target association problem in low-altitude airspace. The BRM is used to predict channels between APs and users/targets and to provide global channel information for joint optimization. By maximizing the minimum user/target satisfaction, ISAC beamforming is optimized, and AP-user/target association is iteratively refined using a genetic algorithm. Simulation results show that the proposed method effectively improves AP-user/target association and enhances integrated communication and sensing performance compared with existing approaches.
Joint Optimization of Service Placement and Task Offloading for QoS Balancing in Satellite-Terrestrial Integrated Networks
DAI Cuiqin, WANG Hongyun, LIAO Rongpeng, CHEN Qianbin
Available online  , doi: 10.11999/JEIT251294
Abstract:
  Objective  Satellite-Terrestrial Integrated Networks (STIN) integrate multi-source and multi-dimensional services from terrestrial and satellite networks, providing wide coverage, large capacity, and flexible networking. These features support global coverage and ubiquitous access for diverse services. However, the dynamic topology and heterogeneous, resource-constrained nodes in STIN complicate service placement at satellite-terrestrial edge nodes. This further increases the difficulty of matching user service requests with edge computing resources during task offloading, making it difficult to satisfy Quality of Service (QoS) requirements. To address this issue, a joint optimization scheme for QoS-Balanced Service Placement and Task Offloading (BQSPTO) is proposed. The scheme integrates a Delay, Security, and Privacy-aware QoS (DSPQoS) evaluation model with satellite-terrestrial collaboration, inter-satellite cooperation, and service migration. It enables joint optimization of service placement and task offloading in a cloud-edge-end architecture, while satisfying task latency, security, and privacy requirements.  Methods  The proposed scheme integrates service placement, task offloading, and QoS evaluation into a unified framework. First, a cloud-edge-end collaborative STIN model is constructed, including terminal devices, terrestrial edge servers, satellite edge nodes, and cloud servers. Task security is quantified using the attack avoidance probability derived from key-cracking capability, and task privacy is characterized by usage-pattern privacy and location privacy. A DSPQoS evaluation model is established by combining task completion latency, attack avoidance probability, and privacy level. Second, a service placement strategy is designed based on task popularity prediction and service migration. A cloud-edge-end collaborative full offloading strategy is developed by determining offloading locations and multi-node cooperation modes according to QoS performance. Based on the service placement strategy and task offloading decisions, an optimization problem is formulated to maximize the total QoS performance under communication and computation resource constraints. Third, the joint optimization problem is decomposed into service placement and task offloading subproblems. A Non-dominated Sorting Genetic Algorithm II (NSGA-II) is applied to the service placement subproblem, while a hybrid Grey Wolf Optimization (GWO) and Whale Optimization Algorithm (WOA) is applied to the task offloading subproblem. Alternating optimization is employed to iteratively update both decisions and obtain the final solution.  Results and Discussions  The QoS performance of the proposed BQSPTO scheme is evaluated through MATLAB simulations. The cloud-edge-end collaborative task processing model (Fig. 2) and the overall BQSPTO framework (Fig. 3) are analyzed. The proposed scheme is compared with three baseline methods: GWOBQ (Grey Wolf Optimization Algorithm-based BQSPTO Scheme), BSSLM (BQSPTO Scheme Without Service Migration), and HWGWTO (Hybrid Grey Wolf Optimization with Whale Algorithm Fusion for Task Offloading). Results show that BQSPTO achieves faster convergence and better avoids local optima, resulting in higher QoS performance (Fig. 4). Compared with GWOBQ, HWGWTO, and BSSLM, the QoS performance is improved by approximately 2.1%, 5.4%, and 4.8%, respectively. As the number of tasks increases, QoS performance improves for all methods, while BQSPTO consistently achieves the highest performance (Fig. 5). Latency, security, and privacy metrics increase with task volume, and BQSPTO maintains superior performance across these metrics, although trade-offs appear due to multi-objective optimization (Fig. 6). QoS performance decreases as the number of malicious users increases, while BQSPTO shows stronger robustness and stability (Fig. 7). As satellite capacity increases, the number of deployable service types grows, and QoS performance improves for all methods. BQSPTO remains superior under different capacity settings (Fig. 8).  Conclusions  A joint optimization scheme for service placement and task offloading in STIN is proposed under multi-objective QoS constraints. The DSPQoS evaluation model integrates latency, security, and privacy into a unified evaluation framework. The joint optimization problem is decomposed and solved using alternating optimization, enabling effective coordination between service placement and task offloading. Simulation results demonstrate that the proposed scheme achieves higher QoS performance, better convergence stability, and improved multi-objective balance under varying task loads, malicious user scales, and satellite capacities.
Context-Aware Fine-Grained Multimodal Emotion Recognition Based on Mamba
SUN Linhui, CHENG Leyang, YANG Xinyue, CHEN Shuaitong, LI Pingan, SHAO Xi
Available online  , doi: 10.11999/JEIT251307
Abstract:
  Objective  Multimodal Emotion Recognition(MER) aims to infer human emotional states by integrating speech and text signals. Existing MER methods often fail to use temporal and speaker context effectively and lack fine-grained intra- and inter-modal interaction modeling. These limitations reduce the ability to distinguish similar emotions. This study proposes a Context-Aware Fine-Grained Multimodal Emotion Recognition model based on the Mamba State Space Model(SSM), termed CA-FGMER-Mamba, to improve recognition accuracy in complex scenarios.  Methods  The CA-FGMER-Mamba model consists of five modules. First, text features are encoded using RoBERTa with explicit speaker identity injection and a three-segment contextual input. Audio features are extracted using OpenSMILE and reduced to 512 dimensions. Second, a Bidirectional Gated Recurrent Unit(Bi-GRU) integrates historical and future contextual dependencies. Third, intra-modal fine-grained filtering applies multi-head self-attention to emphasize key emotional cues and suppress redundancy. Fourth, inter-modal fine-grained fusion uses a Mamba SSM module to recalibrate features across time steps. This stage includes higher-order outer-product fusion, mean pooling, and a cross-modal interaction modulation module to adaptively adjust modality contributions. Finally, fused features are processed by a Bi-LSTM, followed by a self-attention layer and a fully connected network for classification. The model is optimized using a joint triplet loss and cross-entropy loss.  Results and Discussions  Experiments are conducted on the IEMOCAP and MELD datasets. On the IEMOCAP four-class task, CA-FGMER-Mamba achieves a Weighted Accuracy(WA) of 0.781 and an Unweighted Accuracy(UA) of 0.790, outperforming seven representative methods. On the six-class task, the model achieves a Weighted F1-score of 0.703 and shows strong performance in distinguishing similar emotions such as “happy” (0.646) and “excited” (0.803). On the MELD dataset, the model achieves a Weighted F1-score of 0.665, indicating strong generalization. Ablation experiments confirm that combining intra-modal and inter-modal fusion improves performance.  Conclusions  The CA-FGMER-Mamba model addresses key limitations in existing MER methods by integrating context-aware modeling with fine-grained intra- and inter-modal fusion based on the Mamba SSM. The Bi-GRU with speaker identity enhances modeling of temporal and role-related context and alleviates recency bias. Intra-modal self-attention and Mamba-based inter-modal recalibration improve feature extraction and cross-modal interaction modeling, enabling accurate discrimination of similar emotions. The cross-modal interaction modulation module adaptively adjusts modality contributions and enhances robustness. Experimental results demonstrate strong performance in WA, UA, and Weighted F1-score, with good generalization. Future work will explore multi-scale interaction mechanisms, multi-task learning strategies, and noise-aware modeling to further improve fusion accuracy and robustness.
Near-field tomographic imaging for uplink communication and coordinate reconstruction algorithm
YIN Lannuo, WANG Yong
Available online  , doi: 10.11999/JEIT250715
Abstract:
  Objective  With the rapid evolution of 6G network technology, communication systems are evolving toward high bandwidth, low latency, and massive connectivity. Against this backdrop, integrated sensing and communications (ISAC), as a novel system architecture, enables wireless signals to perform dual functions—transmitting information while simultaneously sensing the environment—thereby providing more intelligent and efficient services for 6G networks. Environmental reconstruction, a core component of ISAC systems, aims to restore the true spatial structure of targets and scenes using echo signals. However, current environmental reconstruction techniques in practical applications still face the following three major challenges: First, in 6G communication systems, the dense deployment of base stations (BS) causes building targets to reside in the near-field region of the imaging system, leading to severe coupling among the range, azimuth, and elevation dimensions in tomographic imaging and resulting in significant discrepancies between the reconstructed target geometry and the actual shape. Second, because the positioning error of user equipment (UE) far exceeds the wavelength used by existing communication systems, traditional SAR imaging autofocus algorithms become ineffective, necessitating the development of new methods to circumvent the issues posed by positioning errors. Finally, conventional TomoSAR algorithms adopt a per-channel processing framework by independently generating SLC images for each channel; however, when each channel employs ISAR techniques to generate SLC images, inherent data discrepancies among the channels result in inconsistent translational compensation, which introduces phase errors during the elevation focusing process and ultimately leads to the occurrence of spurious targets in the imaging outcomes.  Methods  In this paper, we first propose applying the nonparametric translational compensation method originally developed for ISAR imaging to the generation of single-look complex (SLC) images, thereby effectively circumventing the adverse effects introduced by positioning errors. Existing ISAR-related literature typically assumes that the target adheres to a turntable model, yet the actual SAR imaging geometry diverges significantly from this idealized assumption. Based on the SAR imaging scenario, we have rederived the mathematical mapping that links the ISAR tomographic imaging results to the target’s true spatial coordinates. Leveraging this mapping, we formulate the coordinate reconstruction challenge as a system of nonlinear equations and subsequently propose a novel coordinate reconstruction method that integrates a particle swarm optimization (PSO) algorithm, ultimately achieving an accurate restoration of the target's genuine geometric shape. Furthermore, in order to address the inherent issue of inconsistent translational compensation among channels within traditional per-channel processing frameworks, we have designed a joint phase calibration tomographic imaging algorithm that employs a unified phase calibration strategy to eliminate inter-channel phase discrepancies, thereby markedly improving both the elevation focusing performance and the overall imaging quality.  Results and Discussions  We validate the proposed methods through simulation experiments on complex building targets under both ideal and non-ideal trajectory conditions, using the CD distance as the evaluation metric for coordinate reconstruction accuracy. The experimental results demonstrate that the CD distances under ideal and non-ideal trajectories are 1.34 and 1.54, respectively, indicating only a slight performance degradation under non-ideal conditions. Notably, imaging point clouds obtained under non-ideal trajectories exhibit evident point dropout. A comparative analysis of the cumulative probability distribution curves of distance errors under the two trajectory conditions reveals that the overall distribution trends are very similar; significant differences in the probability distributions emerge only when the distance error exceeds 2 m. This observation indicates that, in terms of the CD distance evaluation metric, the primary discrepancies between imaging results obtained under ideal and non-ideal trajectories are concentrated in regions exhibiting point cloud dropout and in areas outside the main target. Hence, the influence of non-ideal trajectories is mainly manifested in the variation of scattering intensity distribution. Moreover, comparative experiments between the joint phase calibration framework and traditional algorithm frameworks show that conventional tomographic imaging methods exhibit marked stacking effects at different elevations, with false targets appearing at incorrect elevation levels. This behavior suggests that independently compensating for translational motion in each channel is prone to inducing inter-channel phase discrepancies, thereby severely impairing elevation focusing performance. In contrast, the incorporation of joint phase calibration yields a substantial improvement in imaging quality.  Conclusions  The experimental results validate the effectiveness of the proposed methods: by adopting the ISAR nonparametric translational compensation and the PSO-based coordinate reconstruction techniques, the true geometric shape of the target is successfully recovered. Moreover, the joint phase calibration strategy effectively eliminates the issue of false targets in elevation focusing that arises from conventional per-channel processing, thereby significantly enhancing both the elevation focusing capability and the overall image quality.
Research on Inverse QR Decomposition Optimization for Sparse Adaptive System Identification Algorithms
PENG Yi, ZHANG Pengfei, WANG Xiaoyong, GAO Junqi, LI Changlong, ZHANG Zhiyuan, SUN Tianxiang
Available online  , doi: 10.11999/JEIT250562
Abstract:
  Objective  The traditional sparse regularization recursive least squares algorithm, L1/L0 Norm Recursive Least Squares (L1/L0-RLS), demonstrates theoretical superiority in sparse parameter space estimation and has become a significant method in system identification and channel equalization. However, under limited numerical precision conditions, its covariance matrix iterative computation process can lead to successive accumulation of rounding errors, inducing divergence and instability in the least squares solution.  Methods  To address this issue, this paper proposes an improved algorithm based on the Inverse QR Decomposition (IQRD) framework. This framework not only effectively suppresses the accumulation of rounding errors in traditional regularized RLS algorithms, but also eliminates the calculation step of weight coefficient replacement in traditional QR decomposition, thereby significantly improving the numerical robustness and system identification efficiency of the algorithm in finite precision environments. Specifically, this article first systematically constructs the L1-IQRD-RLS and L0-IQRD-RLS algorithms under the L1/L0 constrained inverse QR decomposition architecture. Through theoretical derivation, a universal recursive expression for weight coefficients is obtained, and an innovative automatic parameter selection mechanism is introduced into the algorithm framework to solve the dynamic optimization problem of sparse regularization parameters.  Results and Discussions  To verify the effectiveness of the proposed algorithm in sparse constraints and robustness, Monte Carlo simulation experiments were used to quantitatively evaluate the algorithm performance. The results showed that L1-IQRD-RLS and L0-IQRD-RLS can maintain long-term numerical stability in an 11 decimal fixed-point computing environment. Compared with traditional algorithms, they exhibit significant performance advantages in key indicators such as system sparse representation, parameter estimation variance, and covariance matrix condition number. Further verification of actual test data confirms that the improved algorithm can maintain numerical stability even in environments with limited accuracy, significantly improving its robustness compared to traditional methods. The application effect of measured data shows that the regularized RLS algorithm improved by the inverse QR framework exhibits significant advantages in key indicators such as system sparsity representation, parameter estimation, and numerical stability. Its iterative convergence success rate is significantly improved compared to traditional methods.  Conclusions  This paper focuses on the issue of sparse system identification in the field of adaptive filtering. Currently, traditional sparse-regularized recursive least squares (RLS) algorithms still face challenges in numerical stability under limited numerical precision. To address this problem, this study proposes constructing an inverse QR decomposition framework to overcome the numerical ill-conditioning caused by successive rounding errors in sparse-regularized RLS algorithms. This approach significantly enhances the algorithm's numerical robustness in low-precision environments. Additionally, it innovatively introduces an automatic parameter selection mechanism into the algorithm framework, effectively eliminating the need for repeated parameter tuning and ensuring stable performance optimization through sparse constraints.In practical electromagnetic signal processing, tasks such as system identification and beamforming are constrained by the finite precision of hardware implementation and often face the inherent sparsity characteristics of the system itself. This paper's algorithm provides targeted solutions: its enhanced finite word-length robustness effectively suppresses numerical divergence in adaptive weight updates, ensuring stable implementation on fixed-point processors; meanwhile, the introduced sparse constraints naturally align with the physical structure of sparse arrays, improving the accuracy of algorithm estimation results. This research offers a practical algorithmic approach for achieving high-performance, high-stability sparse-constrained systems on precision-limited hardware platforms.
A Physics-Constrained Deep Learning Framework for High-Fidelity Sea Clutter Generation under Small-Sample Conditions
SUN Dianxing, LIU Xinliang, LIU Ningbo, DING Hao, YU Hengli, SONG Guanglei
Available online  , doi: 10.11999/JEIT250697
Abstract:
  Objective  The verification and validation of radar target detection algorithms, particularly in maritime surveillance, heavily relies on the availability of high-fidelity synthetic sea clutter data. However, generating realistic sea clutter under high sea-state conditions (e.g., Sea State 4 and above) is a significant challenge due to the non-stationary and non-Gaussian nature of the signal. Traditional statistical models often fail to capture the complex time-frequency characteristics of such data, especially when direct measurement is difficult or unavailable. A novel framework is proposed that combines a complex-valued generative adversarial network with physics-constrained learning and an adaptive transfer learning mechanism to address the issue of small-sample sea clutter generation. The primary goal is to develop a robust and efficient method for generating high-quality synthetic sea clutter data that closely mimics real-world conditions, thereby providing a reliable data foundation for the development and testing of advanced radar systems.  Methods  The proposed framework integrates a Complex Variational Autoencoder Wasserstein Generative Adversarial Network (CVAE-WGAN) with a transfer learning strategy to address the challenge of generating high-fidelity sea clutter data under small-sample conditions. The model operates in the complex domain to jointly process in-phase and quadrature components, preserving the orthogonality and phase relationships of the signal. A Magnitude-Phase Attention (APA) module is introduced to enhance the joint modeling of amplitude and phase, while complex residual blocks are designed to improve gradient propagation and training stability. A physics-constrained loss function system, comprising a time-frequency ridge loss and a Doppler band loss, is implemented to guide the generation process to align with the physical characteristics of sea clutter. To handle data scarcity, an adaptive transfer learning mechanism based on Kullback-Leibler Divergence (KLD) is employed to dynamically adjust the model during fine-tuning in target domains, enabling efficient knowledge transfer across different sea-state scenarios.  Results and Discussions  The performance of the proposed CVAE-WGAN framework is evaluated using real-world sea clutter datasets, demonstrating its effectiveness in generating high-fidelity synthetic data. In the source domain (Sea State 4), the generated data closely matches real measurements in terms of amplitude statistics (PDF-CS = 0.872) (Fig. 5), temporal correlation (ACF-CS = 0.9382) (Fig. 7), and time-frequency characteristics (SPEC-RMSE = 4.5379 dB) (Fig. 6). The time-frequency ridge accuracy reaches 95.2% (|z|≤1) (Fig. 10). The adaptive transfer learning mechanism is validated by applying the pre-trained model to a more challenging scenario (Sea State 5) with only 20% of the target domain samples. The generated clutter maintains a strong fit to the empirical amplitude distribution (PDF-CS = 0.8448) (Fig. 11, Table 2) and exhibits good autocorrelation properties (ACF-CS = 0.9557) (Fig. 12, Table 2), with time-frequency ridge accuracy at 95.24% (∣z∣≤1) (Fig. 14, Table 2). Ablation studies reveal that the Magnitude-Phase Attention (APA) module is critical for joint amplitude and phase modeling, as its removal significantly degrades performance (e.g., PDF-CS drops 17.3%, SPEC-RMSE increases 35.0%) (Table 1). The method proves stable even with as little as 15% of the target data (PDF-CS > 0.6, Z=1 > 82%) (Table 3), underscoring its suitability for data-scarce environments.  Conclusions  This study presents a novel framework for generating high-fidelity sea clutter data under small-sample conditions, combining a complex-valued generative adversarial network with physics-constrained learning and an adaptive transfer learning mechanism. The proposed CVAE-WGAN model, guided by a sophisticated loss function system, demonstrates a strong capability to capture both the statistical and physical properties of high sea-state environments. The integration of the KLD-based transfer learning mechanism significantly enhances the model's adaptability, enabling high-quality data generation even with limited target domain samples. By addressing the challenge of small-sample sea clutter generation, this framework provides a reliable and robust data foundation for the development and testing of advanced radar anti-clutter and anti-jamming algorithms. Future work focuses on further optimizing the framework for extreme data scarcity and exploring its application in other non-stationary radar signal scenarios.
A Risk-modulated Learning Framework for Physical-layer RFIDAuthentication under Dynamic Interference
WU Haifeng, YU Wenbo, ZENG Yu, YANG JiangFeng
Available online  , doi: 10.11999/JEIT251108
Abstract:
  Objective  Dynamic interference and metallic reflections severely affect the reliability of coupled Radio Frequency IDentification (RFID) authentication. Conventional static models cannot adapt to time-varying noise and multipath effects, which leads to unstable recognition. To address this problem, this paper proposes a Risk-Modulated Learning Identification Framework (RMLIF) that integrates stochastic channel modeling, adaptive risk regulation, and risk-regularized classification. The aim is to achieve stable and interpretable physical-layer authentication under nonstationary interference, thereby improving the anti-counterfeiting reliability of RFID systems.  Methods  A Stochastic Differential Equation (SDE)-based coupled channel model is first established to jointly characterize drift, diffusion, and impulsive interference (Eq.(1)), and the existence and uniqueness of its solution are proved. A Target-Driven Adaptive Risk (TDAR) algorithm is then designed to dynamically adjust physical-layer parameters based on the Recognition Risk Index (RRI). The RRI is derived from classification posterior probabilities (Eq.(3)), and its exponential mapping to the Signal-to-Interference-plus-Noise Ratio (SINR) is characterized analytically (Eq.(11), Fig. 3), which enables real-time risk estimation and closed-loop control. For feature representation, a difference-based compressive feature modeling method is used to capture the perturbation between normalized and reference signals (Fig. 1), and Theorem 1 establishes the stability of the compressed mapping. Parallel steady-state and perturbation feature paths are further designed (Table 1), and their joint robustness is proved in Corollary 4. In addition, the framework shows that TDAR regulation is equivalent to a risk-regularized classification process (Theorem 3), which effectively enlarges the classification margin without modifying the classifier structure.  Results and Discussions  Theoretical analysis derives the generalization error bound, sample complexity, and robustness limits (Theorem 4~7), showing that filtering high-risk samples reduces redundancy and improves learning efficiency. The Asymptotic Real Risk Index (ARRI) is further defined to explain long-term convergence and structural self-consistency (Theorem 8). Experiments conducted on a USRP N2000 platform (Table 3) use six types of EPC C1 Gen2 tags under four interference conditions, namely no copper plate and small, medium, and large copper plates (Fig. 4). Compared with conventional methods, including Coupling_14, Hu_Fu, CNN_Vgg19, and PCFM, the corresponding RMLIF-enhanced versions achieve clear gains in classification accuracy (Fig. 5). In all no/small/medium/large copper-plate interference scenarios, the proposed framework achieves accuracy above 90%, with an average improvement of 10%~20% over traditional methods. PCFM_RMLIF achieves the best overall performance. PCA visualization confirms the stability of the compressed features (Fig. 6) and the clearer class separation after risk regulation (Fig. 7). The TDAR algorithm converges rapidly, generally within two iterations (Fig. 9). As the effective sample ratio and feature dimension increase, the RRI decreases monotonically (Fig. 10), in agreement with Theorem 6. Entropy analysis (Fig. 11) shows that risk regulation reduces system uncertainty and improves stability. Cross-condition tests further verify the robustness and generalization ability of the framework (Fig. 12).  Conclusions  This paper develops a unified risk-modulated learning framework for physical-layer RFID authentication under dynamic interference. The RMLIF framework combines SDE-based channel modeling, adaptive TDAR regulation, and compressive feature reconstruction into a closed-loop mechanism that links physical signals with recognition risk. Both theoretical analysis and experimental results show that risk-driven regulation effectively suppresses disturbance, improves feature separability, and reduces generalization error. The proposed approach achieves high accuracy, rapid convergence, and strong robustness, and provides an effective solution for dynamic RFID anti-counterfeiting authentication.
Pearson Correlation Fusion Sensing Method for Noncircular Signals
LAI Huadong, LIN Cong, LUO Peng, XU Jinqiang, LIU Mingxin, XU Weichao
Available online  , doi: 10.11999/JEIT251247
Abstract:
  Objective  With the rapid growth of wireless devices and communication services, spectrum resources have become increasingly scarce. Spectrum sensing, as a fundamental function of cognitive radio, enables dynamic spectrum access and improves spectrum utilization efficiency. However, conventional spectrum sensing methods based on circular signal assumptions cannot effectively detect noncircular signals. In addition, some detectors designed for noncircular signals show degraded performance under low signal-to-noise ratio (SNR) or limited sample conditions. To address these limitations, a nonparametric spectrum sensing scheme based on the Weighted Pearson Correlation Coefficient (WPCC) is proposed. The scheme applies a linear fusion strategy to the real-valued composite coherence matrix, which captures the second-order statistical characteristics of noncircular signals.  Methods  The WPCC detector constructs a real-valued composite observation vector and computes the corresponding composite coherence matrix. Pearson Correlation Coefficients (PCCs) are extracted from this matrix to characterize the statistical properties of noncircular signals. The first two product moments of squared sample PCCs are derived, and optimal fusion weights are obtained based on the deflection coefficient. The true PCCs are approximated by their sample estimates to obtain data-driven fusion weights that do not require prior knowledge of sensing channels. These weights are then linearly combined with the squared sample PCCs to construct the WPCC test statistic, thereby exploiting the spatial diversity of sensing antennas. The final decision is made by comparing the WPCC statistic with a sensing threshold determined by the specified false alarm probability. Specifically, a WPCC value below the threshold indicates the null hypothesis of an idle frequency band, whereas a value above the threshold indicates the alternative hypothesis that the frequency band is occupied by primary users.  Results and Discussions  Simulation experiments evaluate the sensing performance of the proposed nonparametric WPCC-based method (Algorithm 1) in terms of sensing probability, deflection coefficient, Receiver Operating Characteristic (ROC) curve, and Area Under the Curve (AUC), with comparisons to NCLMPIT, NCAGM, NCHDM, and NCJT. The numerical results show that the proposed method outperforms the compared detectors under various simulation conditions. In particular, the WPCC detector achieves the highest sensing probability and exhibits superior performance at low false alarm probabilities of 0.05 (Fig. 2), 0.01 (Fig. 3(a)), and 0.005 (Fig. 3(b)), with sample sizes not exceeding 100. In addition, the proposed method shows clear advantages under different numbers of antennas (Fig. 4), different noise variance conditions (Fig. 5), and different levels of correlation strength (Fig. 6). The applicability of the WPCC method to circular signals is also demonstrated by its high sensing probability for QPSK and 16PSK signals (Fig. 7). The superior overall performance of the proposed detector is further confirmed by higher deflection coefficient curves and ROC curves (Figs. 8, 9). The largest AUC values quantitatively demonstrate its overall optimality among all considered methods (Table 1). These results indicate strong robustness under low SNR and small-sample conditions.  Conclusions  A Pearson correlation fusion sensing method for noncircular signals is proposed based on the real-valued composite covariance representation and the Locally Most Powerful Invariant Test (LMPIT) framework. By combining optimal fusion weights derived from sample PCCs with a linear weighting scheme, the method fully exploits second-order statistical information. It enhances strongly correlated components while suppressing weak correlations and noise interference. Analytical expressions for the false alarm probability and sensing threshold are derived. Both theoretical analysis and simulation results show that the proposed method achieves superior performance compared with existing noncircular signal sensing methods in terms of sensing probability, deflection coefficient, ROC curve, and AUC.
Entropy-driven Adaptive Fusion Network for Scene Classification of High-Resolution Remote Sensing Images
SONG Wanying, LIU Yuchen, WANG Jie, WANG Anyi
Available online  , doi: 10.11999/JEIT251147
Abstract:
  Objective  Remote sensing image scene classification is intended to assign semantic labels to aerial or satellite images. With the rapid development of Earth observation technologies, high-resolution remote sensing images provide abundant detail but also present major challenges, including complex spatial structures, large scale variations, high intra-class variance, and strong inter-class similarity. Traditional Convolutional Neural Networks (CNNs) have achieved notable success in local spatial modeling, but they cannot adequately capture long-range dependencies because of their fixed receptive fields. To address this limitation, CNN-Transformer hybrid architectures have been proposed to balance local detail and global semantics. However, these models usually adopt simple concatenation for multi-scale feature fusion, which introduces redundancy and reduces discriminability. In addition, although the Swin Transformer uses window-based self-attention to capture contextual information, it still shows clear limitations in the analysis of complex high-resolution images. Specifically, long-range dependency modeling across windows is constrained by the fixed window size. The extraction of fine-grained local features is also limited because deep networks tend to overlook crucial fine-texture information from low- and mid-level features. Moreover, existing multi-level feature fusion strategies lack semantic guidance and therefore readily introduce background noise. Therefore, a network that can balance global contextual modeling and local discriminability while enabling adaptive fusion is still needed.  Methods  To address limited cross-window interaction and the absence of semantic guidance in multi-level feature fusion, an Entropy-driven Adaptive Fusion Swin Transformer (E-AF-ST) network is proposed. The architecture uses a lightweight Swin-Tiny backbone and incorporates two key modules: the Attention-guided region Selection and feature Optimization module (ASO) and the Entropy-driven Gated Fusion module (EGF) (Fig. 1). The ASO module addresses weak cross-window interaction and insufficient fine-grained feature extraction in the Swin Transformer through three consecutive stages (Fig. 2a). First, cross-window sparse attention is computed to remove physical window boundaries. By enlarging the patch partition size, sparse attention is applied to the entire image sequence, allowing global contextual correlations across the whole image to be captured. Second, dynamic region selection is performed. On the basis of pixel-level entropy measurement, a multilayer perceptron maps entropy features to attention scores, and a Top-k masking strategy dynamically selects the most informative discriminative regions. Third, recursive feature optimization is performed. Multi-head self-attention and layer normalization are applied at the local scale to progressively enhance boundaries and microstructural information. The EGF module then integrates the Swin Transformer output features, the globally enhanced contextual features, and the locally optimized features to reduce semantic discrepancies (Fig. 2b). First, energy normalization is performed using the Frobenius norm to obtain a probabilistic energy distribution. Next, an entropy-driven gated fusion mechanism calculates the Shannon entropy for each branch. A learnable soft-normalization gating function then maps the entropy information to normalized fusion weights, automatically reducing the weight of branches with high entropy caused by cluttered backgrounds. Finally, the fused representations undergo lightweight recursive optimization using depthwise separable convolutions and GELU activation functions with residual connections to suppress redundant information. The forward propagation process is systematically summarized in Algorithm 1.  Results and Discussions  To validate the discriminative capability of the proposed network, extensive experiments were conducted on two widely used public datasets, AID and NWPU-RESISC45. The proposed E-AF-ST network shows superior classification performance compared with existing advanced methods (Table 1). On the AID dataset, the model achieves state-of-the-art overall accuracies of 95.56% and 97.21% at training ratios of 20% and 50%, respectively. On the challenging NWPU-RESISC45 dataset, it achieves the highest accuracies of 92.45% and 94.59% at training ratios of 10% and 20%, respectively. The confusion matrices show that the recognition accuracy of most categories exceeds 95% (Figs. 3, 4), and the misclassification proportions for classes with complex backgrounds are significantly lower than those of the baseline model (Table 2). Visual analysis based on Grad-CAM further confirms the advantages of the E-AF-ST network in global contextual modeling and critical region selection. Compared with the Swin-Tiny baseline, the proposed network demonstrates more precise semantic focus (Fig. 5). In “airport” and “port” scenes, background noise is effectively suppressed and key targets are accurately highlighted. In structurally complex scenes such as “viaducts" and “railway stations”, extension directions and texture characteristics are comprehensively captured. Ablation experiments confirm that the cross-window sparse attention in the ASO module and the dynamic weight allocation in the EGF module are highly complementary. Furthermore, this performance gain is achieved with only a minimal increase in model complexity, with a total of 30.45M parameters and 4.72G- FLOPs.  Conclusions  An E-AF-ST network is proposed to address insufficient extraction of local discriminative information, cross-scale feature inconsistency, and semantic redundancy in high-resolution remote sensing image scene classification. With information entropy used as a guiding metric, the ASO module enables precise selection and recursive optimization of discriminative regions, whereas the EGF module achieves adaptive and redundancy-reduced integration of multi-source features. Experimental and visual results show that the proposed method effectively reduces interference from complex backgrounds and outperforms existing mainstream CNN-Transformer hybrid architectures. This study provides a new theoretical perspective and technical route for multi-scale target perception and feature semantic alignment.
A Closed-loop Feedback Adaptive Beam Alignment Algorithm for Shipborne Low Earth Orbit Satellite Communication Terminals
CHEN Haotian, MA Zixian, XIE Xinhong, LI Nayu, LI Baozhu, SONG Chunyi, XU Zhiwei
Available online  , doi: 10.11999/JEIT251324
Abstract:
  Objective  The 6G-based SATellite COMmunication (SATCOM) network has become a primary solution for ubiquitous and oceanic communications. Compared with traditional Geostationary Earth Orbit (GEO) satellites, the latest generation of Low Earth Orbit (LEO) satellites offers higher throughput, lower end-to-end latency, and lower deployment cost. Phased arrays are therefore widely used in LEO SATCOM because of their beam agility. However, maritime wind-wave disturbances cause nonlinear relative motion between shipborne terminals and LEO satellites, which creates major challenges for high-precision satellite acquisition and tracking. To address this issue, a new beam alignment algorithm is required for LEO SATCOM systems. Such an algorithm should first obtain the instantaneous target state and motion characteristics through target acquisition, and then use a multi-target tracking method to predict satellite trajectories on the basis of the target states, thereby compensating for estimation errors caused by severe coupled motions.  Methods  The proposed closed-loop feedback adaptive beam alignment algorithm consists of two tightly coupled components: target acquisition and target state updating. In the target acquisition stage, a RAnk Reduction Estimator(RARE) is first used to decompose the array factor matrix and convert the original two-dimensional Direction Of Arrival(DOA) estimation problem into two sequential one-dimensional estimation problems. This process greatly reduces the computational complexity of each Sparse Bayesian Learning(SBL) iteration. On the basis of the coarse grid generated by RARE, an Adaptive Newton Sparse Bayesian Learning(ANSBL) method is developed. ANSBL uses block-sparse Bayesian learning to achieve initial target acquisition on the coarse grid, and then performs two-stage Newton refinement to reduce off-grid mismatch. This strategy provides high-accuracy DOA estimation in both \begin{document}$ \theta $\end{document} and \begin{document}$ \varphi $\end{document} and improves angular observation precision. In the target state updating stage, an Unscented Kalman Filter(UKF)-based ternary joint prediction mechanism is proposed. The UKF simultaneously predicts the target motion state, signal variance, and noise variance for the next target acquisition process. These predicted probability distributions are then used to update the initial grid and hyperparameters of the subsequent SBL acquisition stage, providing more consistent and comprehensive initial values. Through this closed-loop interaction, target acquisition and state tracking are deeply integrated, which substantially reduces the number of SBL iterations required for convergence. This advantage is particularly evident under high sea-state conditions, where reduced beam alignment time is critical.  Results and Discussions  The proposed closed-loop feedback adaptive beam alignment algorithm first uses on-grid DOA estimation to reduce array factor correlation and improve target acquisition efficiency, and then uses Newton iteration to achieve higher off-grid accuracy (Fig. 3). The proposed method is subsequently validated using real ship attitude data collected from a 28000-DWT bulk carrier under actual sea conditions (Fig. 4). The UKF refines the DOA results through state updating. Its predictions of signal position, signal variance, and noise variance provide accurate initial values for the hyperparameters, thereby reducing the number of iterations and enabling faster convergence than other algorithms (Fig. 5). Under low sea-state conditions, the proposed method not only achieves satellite alignment in less than 0.2 s, but also reduces the satellite position estimation error from ±1°\begin{document}$ \sim $\end{document}±0.5° (Fig. 6(a)). Under high sea-state conditions, the UKF effectively predicts satellite positions and reduces the satellite position estimation error from ±2.5°\begin{document}$ \sim $\end{document}±0.65°, which verifies the robust tracking accuracy and error mitigation capability of the proposed method in harsh marine environments (Fig. 6(b)).  Conclusions  To meet the performance requirements of beam alignment algorithms for LEO communication satellites, this paper proposes a closed-loop feedback adaptive beam alignment algorithm. The algorithm first uses a block-based SBL algorithm to obtain grid-based DOA estimation results, and then achieves super-resolution direction estimation under off-grid conditions through adaptive Newton iteration. Through the UKF, the estimation results are dynamically calibrated in real time. The UKF further predicts the target motion state, signal variance, and noise variance for the next target acquisition process, thereby improving tracking continuity and alignment accuracy. Numerical simulations show that the proposed algorithm outperforms traditional beam alignment methods in both numerical accuracy and robustness, and effectively mitigates severe terminal shaking under complex sea conditions.
Real-Time Sub-bottom Horizon Picking Based on Maximum Correlated Kurtosis Deconvolution Combined with Continuity Constraint
MENG Xinbao, ZHOU Tian, ZHU Jianjun, LI Tie, WANG Peihong, ZHAO Guoqing
Available online  , doi: 10.11999/JEIT250727
Abstract:
  Objective  Sub-bottom profiling is widely employed in seabed geological and resource exploration, pipeline route inspection, and port and channel safety assurance, and is regarded as a frontier in underwater acoustic detection research. Accurate extraction of sub-bottom horizons plays a critical role in the interpretation of sedimentary structures, analysis of seabed substrate characteristics, and identification of buried objects. However, existing horizon picking techniques often face difficulty in balancing picking quality, false-alarm control, and online real-time performance. To address this issue, a real-time sub-bottom horizon picking method integrating maximum correlated kurtosis deconvolution and continuity constraint is proposed.  Methods  The proposed method consists of three stages: preprocessing, coarse horizon extraction, and fine horizon extraction. In preprocessing, the raw echoes are enhanced via cascaded band-pass filtering and matched filtering, followed by a fixed delay correction to align picked positions with the pulse leading-edge arrivals. In coarse extraction, synthesized periodic signals are constructed under multiple slicing step lengths, and maximum correlated kurtosis deconvolution is applied to enhance impulsive horizon responses, yielding potential horizon sequences. These candidates are then screened and fused using a cross-step-length consistency criterion to suppress false alarms. In fine extraction, a continuity constraint is introduced within an online sliding window to filter isolated points, segment horizons, and perform curve fitting and correction, further reducing residual false alarms and improving continuity.  Results and Discussions  Simulation and field-data experiments were conducted to evaluate detection probability, false alarm probability, horizon positioning error, processing time, and extracted horizon profiles. Monte Carlo results show that the fine extraction stage further reduces false alarms and positioning errors while maintaining detection performance close to that of the coarse extraction stage (Fig.5, Fig.6). When the echo signal-to-noise ratio is higher than –15 decibels, the detection probability exceeds 70.000% and the false alarm probability remains below 0.200%; when it is higher than –10 decibels, the detection probability exceeds 99.000%, the false alarm probability falls below 0.100%, and the positioning error approaches one sample interval (Fig.6). In sub-bottom survey simulation, the proposed method successfully extracts both the seabed surface and the buried sedimentary horizon under different noise conditions, with results more refined than those of the comparative algorithm based on fractional Fourier transform and overall comparable to manual interpretation (Fig.7, Fig.8). Field-data results further confirm its effectiveness: for the signal-based comparative algorithms, the proposed method achieves an average detection probability of 91.833%, an average false alarm probability of 0.004%, and an average positioning error of 10.15 samples, while the comparative algorithm based on fractional Fourier transform shows a much higher false alarm probability of 3.987% (Table 1). For the image-based comparative algorithms, although detection probabilities are above 95%, their false alarm probabilities and processing times remain markedly higher than those of the proposed method (Table 2). Qualitative results also show that the extracted horizons agree well with manual interpretation trends, with lower background noise, no obvious large-scale false layers, and good preservation of local fluctuations and interruptions (Fig.912). Overall, the proposed method achieves a more favorable balance for online horizon extraction by combining acceptable detection probability and positioning accuracy with extremely low false alarm probability and real-time processing capability (Table 1, Table 2).  Conclusions  This study presents a real-time sub-bottom horizon picking method based on maximum correlated kurtosis deconvolution combined with continuity constraint, structured into three stages: preprocessing, coarse extraction, and fine extraction. The method effectively extracts the seabed surface and sedimentary horizons while meeting real-time processing requirements. Simulation results show that when the signal-to-noise ratio exceeds –10 dB, the method achieves a detection probability greater than 99.000%, a false alarm probability below 0.100%, and a positioning error near one sample. Field data processing results indicate an average detection probability of 91.833%, an average false alarm probability of 0.004%, and an average positioning error is 10.15 samples. These findings validate the effectiveness and practical value of the proposed approach for real-time extraction of shallow sub-bottom horizons. The method demonstrates the ability to maintain high detection accuracy while minimizing false alarms and ensuring millisecond-level processing times, making it highly suitable for online sub-bottom horizon extraction tasks in practical applications.
A Cross-Precision Motion Compensation Technique for Security Surveillance Video Coding
JIANG Wei, MA Wei, LU Jinghui, ZHANG Yue, ZHANG Yundong
Available online  , doi: 10.11999/JEIT251301
Abstract:
  Objective  In the field of modern security surveillance, high-altitude dome cameras are often deployed at critical locations such as bridges and tower tops that are susceptible to external interference, resulting in problems such as jitter and blurring in captured videos, which pose great challenges to video coding. In video compression coding, high-precision motion compensation is the key to improving coding efficiency. The existing Ultimate Motion Vector Expression (UMVE) technique suffers from insufficient precision and lack of flexibility in adaptive adjustment. Although high-precision coding tools such as Registration-Based Coding Mode (RCM) and Affine Motion Compensation Prediction (AFFINE) can improve compensation accuracy, they have disadvantages of high computational complexity and hardware cost, making it difficult to meet the multiple requirements of coding efficiency, power consumption and real-time performance in high-altitude surveillance scenarios. Therefore, aiming at the core pain points of video coding for high-altitude dome cameras, it is of important academic value and practical application significance to design an optimized UMVE scheme that combines high-precision motion compensation, low computational complexity and scene adaptability, so as to improve coding efficiency and balance resource consumption.  Methods  This study proposes an Ultimate Motion Vector Expression technique supporting Cross-Precision Motion Compensation (UMVE_CPMC). Its core is to improve motion compensation accuracy by constructing an extended Up-Precision Motion Vector (UPMV), whose mathematical expression is UPMV = BaseMV + MMV(p, angle), where BaseMV is the basic motion vector obtained by the existing UMVE method, and MMV is the refined fine-tuning motion vector based on specific precision p and angle, with incremental candidates only provided at the 1/8 precision level to balance computational complexity and compression efficiency. For step-size adaptive adjustment, an improved scheme with six modes is proposed, covering enhanced UMVE, conventional UMVE and four precision-improved modes, allowing the encoder to switch flexibly according to scene characteristics. The average image gradient is adopted as an objective evaluation index; test scenes are divided into Class A (high-definition motion scenes) and Class B (low-definition scenes), and different coding configurations, sequences and parameters are set to compare coding gains and computational efficiency under different modes.  Results and Discussions  Experiments show that UMVE_CPMC achieves effective performance improvement in various scenes and modes. In Class A high-definition motion scenes, with the adaptive strategy disabled and RCM disabled, the average gains of Y, U and V components in Fusion Mode 1 reach -2.912%, -1.656% and -1.654% respectively, and the average coding time is reduced to 94.55% of the baseline; the average gain of the Y component in Independent Mode 1 reaches -2.925%, with coding time reduced to 91.91% of the baseline. Compared with traditional UMVE, when CPMC Independent Mode 1 is enabled under the scenario where RCM is enabled and other tools work collaboratively, the gain is improved from -0.276% to -1.310%, showing significantly higher cost performance. In Class B low-definition scenes, after enabling adaptive adjustment, the gain losses of Fusion Mode 1 and Mode 0 are significantly reduced, with average gain losses controlled at 0.071% and 0.108% respectively, successfully maintaining the original coding gain. In multi-scene comprehensive tests, when RCM and AFFINE are disabled, 9 out of 10 test sequences in adaptive Fusion Mode 1 show positive gains, including a Y-component gain of -10.691% for the yuxuedaolu sequence and -11.400% for the BQTerrace sequence. When all existing coding tools are enabled, the Y-component gains of dianjing, yuxuedaolu and BQTerrace sequences reach -1.29%, -2.05% and -1.21% respectively, with coding time reduced to 94%–96% of the baseline. In addition, correlation analysis between average image gradient and gain reveals a significant positive correlation: images with high average gradient (high definition) achieve greater gains from UMVE_CPMC, while those with low average gradient (low definition) hardly benefit. Principle analysis indicates that pixel changes in low-definition images are gentle, and high-precision interpolation fails to generate effective pixel values, resulting in insignificant compensation effects. Performance differences among modes match computational complexity: the fusion mode balances gain and stability, while the independent mode further reduces computation. The six step-size adaptive modes can meet real-time and precision requirements of different scenes.  Conclusions  The proposed UMVE_CPMC technique, by integrating cross-precision motion compensation with the UMVE algorithm, effectively solves the core problems of insufficient precision in traditional UMVE and high computational complexity of high-precision coding tools, achieving a favorable balance among coding efficiency, computational complexity and scene adaptability. This technique delivers remarkable coding gains in Class A high-definition motion scenes, with gains exceeding 10% for some sequences without other high-precision compensation tools and 1%–2% when cooperating with other tools. In Class B low-definition scenes, the original coding gain can be maintained through frame-level adaptive adjustment interfaces. Meanwhile, the fusion mode does not increase hardware complexity, and the independent mode significantly reduces coding time, suitable for encoder designs with limited resources or simplified requirements. UMVE_CPMC provides a new effective approach to solving the low coding efficiency caused by jitter and blurring in high-altitude dome camera video coding, enriches the video coding toolset, and offers important practical guidance for the optimization of video coding technologies in the security surveillance field. Future work can further optimize the adaptive strategy, explore integration with other advanced coding technologies, develop personalized coding schemes, and improve performance in complex scenarios.
Modeling and Characterization of Broadband Earth-Moon-Earth Communication Channels
LI Chengqian, QIAN Xiaowei, HU Xiaoling
Available online  , doi: 10.11999/JEIT251028
Abstract:
  Objective  This paper presents a comprehensive channel model for wideband Earth-Moon-Earth (EME) communication, tackling the shortcomings of traditional simplified models that cannot accurately represent the Moon’s complex scattering behavior and terrain-induced effects. Existing approaches, which treat the Moon as a point reflector or depend on empirical scattering laws, are inadequate for broadband, high-capacity systems. To address this, a unified large-scale link model is proposed to statistically capture terrain-driven reflection characteristics, while a small-scale model systematically analyzes multipath and Doppler effects, decomposing the channel and quantifying dynamic impairments. Link-level simulations validate the model’s accuracy. This work fills a critical gap in broadband EME channel modeling, providing a necessary foundation for the design and optimization of future deep space communication systems.  Methods  A dual-scale modeling approach is proposed for wideband Earth-Moon-Earth (EME) channels. At the large scale, a unified integral path loss model is developed for both wide- and narrow-beam scenarios, with lunar terrain statistically represented by a Gaussian height distribution to capture shadowing and roughness effects. A distributed integration method is used to compute effective RCS under narrow-beam conditions. At the small scale, the channel is decomposed into quasi-specular and diffuse components, with delay-power profiles derived from surface roughness and scattering mechanisms. Doppler shift and spread are analytically modeled based on Earth-Moon orbital dynamics. Monte Carlo simulations and numerical integration verify the models, and system-level performance is evaluated in terms of BER under various channel conditions with different equalization and frequency offset correction schemes.  Results and Discussions  A comprehensive channel model is developed to capture both large- and small-scale fading in wideband Earth-Moon-Earth (EME) communication. The large-scale model, validated by simulations, accurately represents the non-uniform power distribution across the lunar disk through an integrated RCS approach. At the small scale, quasi-specular and diffuse components characterize multipath delay spread, while the Doppler model quantifies effects from Earth’s rotation and lunar orbital motion, with a two-way shift of ~4.5 kHz and a spread of ±39.88 Hz at 1.296 GHz. Low-SNR simulations show that conventional equalizers (LMS, RLS, RAKE) stagnate near BER = 0.1, and frequency correction methods (FFT-based, MLE) degrade under large frequency offsets, highlighting the challenges of accurate compensation.  Conclusions  This paper develops and validates a comprehensive channel model for broadband Earth-Moon-Earth (EME) communication. The model more accurately predicts path loss, shadowing, multipath delay, and Doppler effects than conventional point-target or empirical methods. Results show that lunar terrain and surface properties cause severe signal degradation, which traditional equalization and frequency correction cannot effectively mitigate. Future work should integrate high-resolution lunar DEMs and measured RCS data to improve accuracy and explore adaptive methods, such as machine learning, to handle severe delay spread. This model offers a foundation for reliable EME links and future deep-space communication networks.
Slice Pricing and Access Control with QoS Guarantee for Vehicular Networks
CUI Yaping, ZHANG Feng, WU Dapeng, HE Peng, WANG Ruyan, WANG Pan
Available online  , doi: 10.11999/JEIT251219
Abstract:
  Objective  Vehicular applications have diverse Quality of Service (QoS) needs that traditional spectrum-focused networks struggle to meet. While network slicing over Mobile Edge Computing (MEC) offers customized provisioning, current approaches often overlook the holistic generation of slices and adaptive access control. To address these limitations, this paper proposes a two-stage vehicular network slicing framework that integrates resource-aware slice generation with intelligent pricing and access control. This framework enables efficient, dynamic resource allocation and access management, benefiting both the MEC-based Network Slice Provider (MEC-NSP) and vehicles by improving service quality, utilization, and adaptability through a Stackelberg game-based interaction mechanism.  Methods  The proposed solution features a two-layer coupled mechanism: “resource pre-allocation” and “Stackelberg game pricing and access control”. In the first stage, a 3D resource pre-allocation mechanism jointly optimizes communication, computation, and caching resources to satisfy vehicular latency and bandwidth requirements. This allocation is formulated as a Mixed-Integer Nonlinear Programming (MINLP) problem and decoupled into uplink and downlink sub-problems, solved via branch-and-bound and interior-point methods, respectively. In the second stage, a Stackelberg game balances the MEC-NSP’s profit and vehicles’ QoS. The MEC-NSP acts as the leader, setting dynamic slice prices, while the network controller (the follower) determines the optimal slice selection probabilities. This interaction is resolved using the Iterative Slice Pricing Algorithm (ISPA), which has been proven to converge to a Nash equilibrium.  Results and Discussions  Simulations demonstrate that the proposed framework consistently outperforms baseline algorithms (Fixed Slice Pricing, Average Resource Allocation, and Random Selection) under various network conditions. In bandwidth-constrained scenarios, it increases MEC-NSP profit by up to 20.77% compared to the Random Selection approach. With abundant resources (150% capacity), it maintains profit gains of 3–9% over other baselines. The ISPA algorithm exhibits fast convergence to equilibrium (approx. 175 iterations). The flexible pricing mechanism effectively balances network loads, improves cache hit rates, and reduces resource bottlenecks, ensuring high QoS satisfaction.  Conclusions  The proposed dual-layer framework successfully integrates slice generation and pricing to address resource-aware network slicing in vehicular MEC environments. By coupling 3D resource pre-allocation with a Stackelberg game-based pricing strategy, the system significantly improves MEC-NSP profit, resource utilization, and vehicle QoS. Future work will explore blockchain-based mechanisms to facilitate trust negotiation and decentralized resource orchestration for cross-domain cooperation in multi-operator, multi-vendor environments.
Available online  , doi: 10.11999/JEIT251134
Abstract:
An Ultra-Wideband Low-Profile Dipole Patch Antenna for VHF-Band Probing Radars
TIAN Yuxiao, ZHANG Feng, MA Zhangjun, WANG Jiacheng, JI Yicai
Available online  , doi: 10.11999/JEIT260105
Abstract:
  Objective  In radar systems, the limitations of traditional narrowband antennas in data transmission rate and resolution have become increasingly evident. Ultra-WideBand (UWB) antennas therefore receive broad attention because they provide high range resolution and strong interference suppression capability. However, at low frequencies, existing UWB antennas usually suffer from excessively large physical size, which makes installation on airborne or vehicle-mounted platforms difficult. By contrast, compact antennas that are easier to deploy often exhibit insufficient gain and cannot satisfy the penetration-depth requirement of deep subsurface detection. Thus, achieving a proper balance among antenna size, bandwidth, and gain over an ultra-wideband range remains a major challenge for VHF-band probing radars. To address this issue, a planar dipole antenna loaded with an Artificial Magnetic Conductor (AMC) structure and metallic shorting walls is proposed. The antenna maintains stable radiation performance over a wide frequency range while preserving a low-profile and structurally simple configuration.  Methods  The reflection-phase characteristics of AMC unit cells with different geometries are compared, and square unit cells are selected to construct a 9 × 7 AMC reflective layer. Owing to its in-phase reflection property, the AMC structure removes the conventional requirement for a quarter-wavelength spacing between the antenna and a metallic ground plane, thereby reducing the profile height. The dipole patch adopts an optimized meandered current-bending structure to reduce the lateral size. Metallic shorting walls are further loaded at both ends of the antenna. According to image theory, equivalent currents are generated on the outer surfaces of these metal walls during operation, which effectively extends the electrical length and improves low-frequency performance without increasing the physical size. In addition, two vertical metallic walls are connected to the ground plane on both sides of the antenna to form a reflective back cavity, which strengthens unidirectional radiation and improves antenna gain. As part of the overall co-design, four 125 Ω resistors are inserted between the feed region and the metallic sidewalls. This resistive loading suppresses strong low-frequency resonances and broadens the impedance bandwidth at the cost of acceptable Ohmic loss.  Results and Discussions  A prototype with favorable simulated performance is fabricated and measured in a microwave anechoic chamber. The measured impedance bandwidth for VSWR<2 is 50~400 MHz, which agrees well with the simulated range of 84~366 MHz. The measured impedance matching is slightly better than the simulated result, mainly because cable loss and power-divider loss in the feeding network reduce the reflected power. The measured gain follows the same trend as the simulated gain, with deviations within 1 dBi. Radiation-pattern measurements show that at 100, 200, and 300 MHz, the measured copolarization patterns agree well with the simulated results, and the maximum radiation direction remains normal to the antenna plane, which confirms the effectiveness of the proposed design. As shown in Fig. 5, the current on the radiating patch layer mainly flows along the +x direction and generates a radiated electric field along the +z direction. The current on the AMC unit can be represented by an equivalent current loop oriented along the +z direction. At this frequency, the x-direction current and the parasitic current loop on the AMC jointly enhance the antenna gain. This result explains the gain-improvement mechanism of the AMC structure. When the operating frequency increases to 400 MHz, the electrical size of the antenna reaches approximately \begin{document}$ 1.6\lambda $\end{document}, which causes main-lobe splitting and shifts the maximum radiation direction toward 90°. Although this high-frequency beam splitting introduces spatial clutter, it is an acceptable physical trade-off for achieving the ultra-low profile of 0.07 λL, while the overall UWB characteristic still supports high time-domain resolution in probing radar systems. At 400 MHz, the measured H-plane co-polarization level is slightly higher than the simulated value, possibly because of coupling between the feeding cable and the vertically mounted antenna.  Conclusions  A low-profile UWB planar dipole antenna is proposed for VHF-band probing radar applications. By combining the AMC layer, metallic shorting walls, and resistive loading, the proposed design improves impedance matching while preserving a compact size. The reflective back cavity further improves the realized gain. The fabricated prototype shows good agreement between measurement and simulation. The antenna operates over 100–366 MHz and exhibits a measured VSWR<2 bandwidth of 50~400 MHz. It maintains a compact electrical size of 0.38λL × 0.18λL × 0.07λL, and the maximum measured gain within the operating band reaches 6 dBi. The proposed co-design provides a practical solution for low-frequency probing radar antennas that require wide bandwidth, low profile, and relatively high gain.
Semantic Relation-enhanced Adaptive Graph Representation Learning for Next POI Recommendation
WANG Zhuolu, XU Shenghua, WANG Yong, JIANG Shunshun
Available online  , doi: 10.11999/JEIT251357
Abstract:
  Objective  In recent years, next Point Of Interest (POI) recommendation has played an increasingly important role in Location-Based Social Networks (LBSNs). However, existing Graph Representation Learning (GRL)-based recommendation methods have struggled to balance node distributions across different domains (i.e., node types) effectively and have often overlooked feature differences among heterogeneous relations. Thus, complex semantic dependencies in contextual information cannot be fully captured when users’ temporal preference patterns are modeled.  Methods  To address these issues, a next POI recommendation method based on Semantic Relation-enhanced adaptive Graph Representation Learning (SR-GRL) is proposed. A heterogeneous transition graph is constructed to integrate three entity types, namely POIs, POI categories, and regions, and their complex interrelationships. An adaptive balanced random walk sampling strategy is designed to balance node distributions across different domains dynamically and to reduce information redundancy. A type-aware attention mechanism is then used to learn semantic associations among nodes through relation-specific transformation matrices, so that feature differences across node types can be identified effectively. The obtained disentangled POI representations are then used for spatiotemporal encoding of user check-in sequences, and a self-attention mechanism is applied to aggregate users, temporal preference features. Finally, next POI recommendation is generated through a Softmax function.  Results and Discussions  Experiments on the Foursquare datasets from Tokyo and New York and the Sina Weibo dataset from Shanghai show that, compared with state-of-the-art baselines, the SR-GRL method achieves Recall@10 improvements of 2.22%\begin{document}$ \sim $\end{document}24.16%, F1@10 improvements of 1.16%\begin{document}$ \sim $\end{document}10.48%, and NDCG@10 improvements of 3.01%\begin{document}$ \sim $\end{document}17.37%, indicating better recommendation performance.  Conclusions  Overall, the SR-GRL approach can balance the distributions of different node types dynamically and strengthen the modeling of complex semantic dependencies in heterogeneous contextual information.
Design of an Aerospace-grade Radiation-hardened SRAM Cell for High-speed Read/Write Applications
CAI Shuo, SHUAI Wei, HU Xing, LIANG Xinjie, HUANG Zhu, YU Fei
Available online  , doi: 10.11999/JEIT251287
Abstract:
  Objective  With the continued scaling of Complementary Metal-Oxide-Semiconductor (CMOS) technology nodes and the reduction in supply voltage, Static Random Access Memory (SRAM) in aerospace environments becomes increasingly sensitive to high-energy particle radiation and is prone to Single-Node Upset (SNU) and Double-Node Upset (DNU). This sensitivity poses a serious challenge to the reliability of spaceborne Systems-on-Chip (SoC). Existing Radiation-Hardened-By-Design (RHBD) structures, however, usually cannot balance strong radiation tolerance with high-speed access performance. This work therefore aims to design an aerospace-grade radiation-hardened SRAM cell for high-speed read/write applications that provides both strong radiation resistance and fast access performance.  Methods  The proposed Read Fast and Write Fast 16-Transistor (RFWF16T) SRAM is built on a dual-source isolation architecture composed of 16 transistors (8 PMOS and 8 NMOS) (Fig. 1, Fig. 2). By using a symmetric recovery mechanism, the RFWF16T reduces the number of key sensitive nodes to only two. Redundant transistors (P2 and P6) are used to establish a stable high-level isolation state, which isolates the storage nodes from potential disturbances during the non-access phase. To achieve high-speed operation, the RFWF16T combines a short feedback path with a low-impedance voltage discharge loop. Unlike conventional hardened cells that rely on stacked transistors, which increase resistance and delay, the RFWF16T adopts a parallel access topology connected to word lines and bit lines. This configuration forms a low-impedance path during write operations and significantly accelerates node voltage switching (Fig. 3). Performance verification confirms the self-recovery capability of the four data nodes. A comprehensive variation analysis is conducted, including Process-Voltage-Temperature (PVT) variations and 2,000-point Monte Carlo simulations. Additionally, an improved Electrical Quality Metric (EQM) is proposed to evaluate multidimensional performance quantitatively.  Results and Discussions  The RFWF16T exhibits strong overall performance, particularly in overcoming the speed bottleneck of hardened SRAM cells. In terms of access speed, the RFWF16T performs substantially better than typical models such as S8P8N, SAW16T, and RH20T. Under standard conditions (28 nm CMOS process, 1.0 V, 25 °C, TT corner), the RFWF16T achieves a Read Access Time (RAT) of 20.97 ps and a Write Access Time (WAT) of 2.72 ps. These values correspond to average speed improvements of 46.65% and 14.77%, respectively, over eight comparable hardened structures (Table 2). PVT analysis confirms that the RFWF16T maintains the lowest latency across voltages from 0.7 V to 1.1 V and temperatures from –25 °C to 125 °C (Fig. 6). This write-speed advantage is attributed to the removal of write contention through optimized discharge paths. In terms of noise margin and stability, the RFWF16T demonstrates strong robustness and achieves the highest Write Word-line Toggle Voltage (WWTV) among nine comparative structures. Its Hold Static Noise Margin (HSNM) and Read Static Noise Margin (RSNM) also rank among the best, which ensures stability under disturbances (Fig. 7). In radiation hardening, the RFWF16T achieves a 100% self-recovery rate for SNUs and an 83.3% recovery rate for DNUs, reaching the state-of-the-art level among DNU-recoverable units (Table 1). Monte Carlo simulations confirm that the average recovery times of the internal nodes range from 1.09 ns to 1.19 ns (Fig. 4, Fig. 5). In terms of overhead, the RFWF16T maintains a normalized area of 1.00× (4.3 μm × 1.9 μm) (Table 3, Fig. 2) and an average power consumption of 23.45 nW (Table 4). Although the power consumption is slightly higher, this increase is a reasonable trade-off for the substantial speed advantage. In the EQM evaluation, the RFWF16T obtains the highest score, which confirms its overall advantage in balancing reliability, speed, and stability (Fig. 9).  Conclusions  A radiation-hardened SRAM cell, RFWF16T, is proposed for aerospace-grade high-speed read/write applications. The cell contains only two sensitive nodes and achieves 100% self-recovery for SNUs and an 83.3% recovery rate for DNUs, which demonstrates strong radiation tolerance. Compared with eight other SRAM cells, the RFWF16T significantly reduces read and write delay with only a slight increase in area and power consumption, while maintaining good noise immunity and the best electrical quality metric. PVT and Monte Carlo simulations further confirm the stability and robustness of the proposed cell under different operating conditions. Future work will focus on array-level integration and tape-out verification, and on its application in satellite-borne high-speed data processing.
Construction of Maximum Distance Separable Codes and Near Maximum Distance Separable Codes Based on Cyclic Subgroup of \begin{document}$ \mathbb{F}_{{q}^{2}}^{*} $\end{document}
DU Xiaoni, XUE Jing, QIAO Xingbin, ZHAO Ziwei
Available online  , doi: 10.11999/JEIT251204
Abstract:
  Objective  The demand for higher performance and efficiency in error-correcting codes has increased with the rapid development of modern communication technologies. These codes detect and correct transmission errors. Because of their algebraic structure, straightforward encoding and decoding, and ease of implementation, linear codes are widely used in communication systems. Their parameters follow classical bounds such as the Singleton bound: for a linear code with length \begin{document}$ n $\end{document} and dimension \begin{document}$ k $\end{document}, the minimum distance \begin{document}$ d $\end{document} satisfies \begin{document}$ d\leq n-k+1 $\end{document}. When \begin{document}$ d=n-k+1 $\end{document}, the code is a Maximum Distance Separable (MDS) code. MDS codes are applied in distributed storage systems and random error channels. If \begin{document}$ d=n-k $\end{document}, the code is Almost MDS (AMDS); when both a code and its dual are AMDS, the code is Near MDS (NMDS). NMDS codes have geometric properties that are useful in cryptography and combinatorics. Extensive research has focused on constructing structurally simple, high-performance MDS and NMDS codes. This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over the finite field \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} of even characteristic using the cyclic subgroup \begin{document}$ {U}_{q+1} $\end{document}. Several families of optimal Locally Repairable Codes (LRCs) are also obtained. LRCs support efficient failure recovery by accessing a small set of local nodes, which reduces repair overhead and improves system availability in distributed and cloud-storage settings.  Methods  In 2021, Wang et al. constructed NMDS codes of dimension 3 using elliptic curves over \begin{document}$ {\mathbb{F}}_{q} $\end{document}. In 2023, Heng et al. obtained several classes of dimension-4 NMDS codes by appending appropriate column vectors to a base generator matrix. In 2024, Ding et al. presented four classes of dimension-4 NMDS codes, determined the locality of their dual codes, and constructed four classes of distance-optimal and dimension-optimal LRCs. Building on these works, this paper uses the unit circle \begin{document}$ {U}_{q+1} $\end{document} in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} and elliptic curves to construct generator matrices. By augmenting these matrices with two additional column vectors, several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} are obtained. The locality of the constructed NMDS codes is also determined, yielding several classes of optimal LRCs.  Results and Discussions  In 2023, Heng et al. constructed generator matrices with second-row entries in \begin{document}$ \mathbb{F}_{q}^{*} $\end{document} and with the remaining entries given by nonconsecutive powers of the second-row elements. In 2025, Yin et al. extended this approach by constructing generator matrices using elements of \begin{document}$ {U}_{q+1} $\end{document} and obtained infinite families of MDS and NMDS codes. Following this direction, the present study expands these matrices by appending two column vectors whose elements lie in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}. The resulting matrices generate several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document}. Several classes of NMDS codes with identical parameters but different weight distributions are also obtained. Computing the minimum locality of the constructed NMDS codes shows that some are optimal LRCs satisfying the Singleton-like, Cadambe–Mazumdar, Plotkin-like, and Griesmer-like bounds. All constructed MDS codes are Griesmer codes, and the NMDS codes are near Griesmer. These results show that the proposed constructions are more general and unified than earlier approaches.  Conclusions  This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} using elements of the unit circle \begin{document}$ {U}_{q+1} $\end{document} and oval polynomials, and by appending two additional column vectors with entries in \begin{document}$ {\mathbb{F}}_{q} $\end{document}. The minimum locality of the constructed NMDS codes is analyzed, and some of these codes are shown to be optimal LRCs. The framework generalizes earlier constructions, and the resulting codes are optimal or near-optimal with respect to the Griesmer bound.
A Miniaturized Steady-State Visual Evoked Potential Brain-Computer Interface System
CAI Yu, WANG Junyang, JIANG Chuanli, LUO Ruixin, LÜ Zhengchao, YU Haiqing, HUANG Yongzhi, ZHONG Ziping, XU Minpeng
Available online  , doi: 10.11999/JEIT251223
Abstract:
  Objective  The practical use of Brain-Computer Interface (BCI) systems in daily settings is limited by bulky acquisition hardware and the cables required for stable performance. Although portable systems exist, achieving compact hardware, full mobility, and high decoding performance at the same time remains difficult. This study aims to design, implement, and validate a wearable Steady-State Visual Evoked Potential (SSVEP) BCI system. The goal is to create an integrated system with ultra-miniaturized and concealable acquisition hardware and a stable cable-free architecture, and to show that this approach provides online performance comparable with laboratory systems.  Methods  A system-level solution was developed based on a distributed architecture to support wearability and hardware simplification. The core component is an ultra-miniaturized acquisition node. Each node functions as an independent EEG acquisition unit and integrates a Bluetooth Low Energy (BLE) system-on-chip (CC2640R2F), a high-precision analog-to-digital converter (ADS1291), a battery, and an electrode in one encapsulated module. Through an optimized 6-layer PCB design and stacked assembly, the module size was reduced to 15.12 mm × 14.08 mm × 14.31 mm (3.05 cm3) with a weight of 3.7 g. Each node uses one active electrode, and all nodes share a common reference electrode connected by a thin short wire. This structure reduces scalp connections and allows concealed placement in hair using a hair-clip form factor. Multiple nodes form a star network coordinated by a master device that manages communication with a stimulus computer. A cable-free synchronization strategy was implemented to handle timing uncertainties in distributed wireless operation. Hardware-event detection and software-based clock management were combined to align stimulus markers with multi-channel EEG data without dedicated synchronization cables. The master device coordinates this process and streams synchronized data to the computer for real-time processing. System evaluation was conducted in two phases. Foundational performance metrics included physical characteristics, electrical parameters (input-referred noise: 3.91 mVpp; common-mode rejection ratio: 132.99 dB), and synchronization accuracy under different network scales. Application-level performance was assessed using a 40-command online SSVEP spelling task with six subjects in an unshielded room with common RF interference. Four nodes were placed at Pz, PO3, PO4, and Oz. EEG epochs (0.14\begin{document}$ \sim $\end{document}3.14 s post-stimulus) were analyzed using Canonical Correlation Analysis (CCA) and ensemble Task-Related Component Analysis (e-TRCA) to compute recognition accuracy and Information Transfer Rate (ITR).  Results and Discussions  The system met its design objectives. Each acquisition node achieved an ultra-compact form factor (3.05 cm3, 3.7 g) suitable for concealed wear and provided more than 5 hours of battery life at a 1 000 Hz sampling rate. Electrical performance supported high-quality SSVEP acquisition. The cable-free synchronization strategy ensured stable operation. More than 95% of event markers aligned with the EEG stream with less than 1 ms error (Fig. 4), meeting SSVEP-BCI requirements. This stability supported the quality of recorded neural signals. Grand-averaged SSVEP responses showed clear and stable waveforms with precise phase alignment (Fig. 5). The signal-to-noise ratio at the fundamental stimulation frequency exceeded 10 dB for all 40 commands (Fig. 6). In the online spelling experiment, the system showed strong decoding performance. With the e-TRCA algorithm and a 3-s window, the average accuracy was (95.00 ± 2.04)%. The system reached a peak ITR of (147.24 ± 30.52) bits/min with a 0.4-s data length (Fig. 7). Comparison with existing SSVEP-BCI systems (Table 1) indicates that, despite constraints of miniaturization, cable-free use, and four channels, the system achieved accuracy comparable with several cable-dependent laboratory systems while offering improved wearability.  Conclusions  This work presents a wearable SSVEP-BCI system that integrates ultra-miniaturized hardware with a distributed cable-free architecture. The results show that coordinated hardware and system design can overcome tradeoffs between device size, user mobility, and decoding capability. The acquisition node (3.7 g, 3.05 cm3) supports concealable wearability, and the synchronization strategy provides reliable cable-free operation. In a realistic environment, the system produced online performance comparable with many cable-dependent setups, achieving 95.00% accuracy and a peak ITR of 147.24 bits/min in a 40-target task. Therefore, this study provides a practical system-level solution that supports progress toward wearable high-performance BCIs.
Model-Free Adaptive Resilient Control of Vehicle Platoons Against Hybrid Cyberattacks
HAN Qiaoni, MA Jianguo, LI Peng, ZUO Zhiqiang
Available online  , doi: 10.11999/JEIT251135
Abstract:
  Objective  Connected and automated vehicle platoons represent a key technology for improving traffic efficiency, driving safety, and fuel economy in intelligent transportation systems. Through inter-vehicle information exchange and cooperative control, vehicle platoons achieve safe and efficient car-following operations. However, the strong dependence on vehicular communication networks makes such systems vulnerable to cyberattacks, particularly hybrid threats that combine Denial-of-Service (DoS) and False Data Injection (FDI) attacks. These attacks may interrupt communication or tamper with transmitted information, which threatens the safety and stability of vehicle platoon systems. In addition, vehicle platoon control is affected by environmental disturbances, parametric uncertainties, and nonlinear vehicle dynamics. Existing model-based control methods often experience performance degradation under such complex conditions. Therefore, a resilient data-driven control strategy that does not rely on accurate mechanical models is required. This paper develops an attack-compensated Model-Free Adaptive Control (MFAC) framework to ensure secure and stable operation of heterogeneous nonlinear vehicle platoons under hybrid cyberattacks.  Methods  To address the resilient control problem of connected vehicle platoons under cyberattacks, an MFAC method with attack compensation is proposed for hybrid attacks that include both DoS and FDI attacks. First, a nonlinear longitudinal vehicle dynamics model of the platoon is established. Using the dynamic linearization technique, the model is converted into an equivalent compact-form dynamic linearized data model. This transformation decouples controller design from the specific mechanical model of the vehicle. An output tuning factor is further introduced to balance the tracking of position and velocity states. Second, a hybrid attack model is constructed to represent persistent FDI attacks that inject malicious data and aperiodic DoS attacks that interrupt communication. A pseudo-gradient estimator is then designed to capture system dynamics from real-time input-output data. The influence of hybrid attacks on this estimator is analyzed, and an adaptive update strategy is proposed for operation during DoS attacks. Finally, an intelligent attack compensation mechanism is designed. During DoS attack periods, the mechanism utilizes historical control input information to maintain controller operation. This design enables the system to operate continuously even when real-time vehicle state information is unavailable and further improves control performance under DoS attacks.  Results and Discussions  Rigorous theoretical analysis proves that the tracking error of the closed-loop system remains bounded under specific conditions on the frequency and duration of cyberattacks (Theorem 1). Extensive simulations verify the effectiveness of the proposed method. During cyberattacks, the MFAC method with the proposed compensation mechanism adaptively adjusts the attenuation rate of the control input and maintains system control performance (Fig. 3). Follower vehicles successfully track the leader’s velocity variations and maintain the desired inter-vehicle spacing (Fig. 4a, 4b). The tracking error exhibits satisfactory convergence behavior (Fig. 4d), which confirms the stability of the closed-loop system. Comparative studies highlight the role of the compensation mechanism. When the mechanism is disabled, the platoon experiences clear performance degradation during cyberattacks (Fig. 5). In contrast, the proposed method maintains higher tracking accuracy and faster error recovery. Additional simulations analyze the effect of FDI attack intensity. As attack intensity increases, the steady-state error bound expands (Fig. 6). This observation quantitatively supports the theoretical robustness analysis and provides useful guidance for determining security thresholds in applications.  Conclusions  This paper advances secure control of heterogeneous nonlinear connected vehicle platoons by proposing an attack-compensated MFAC framework. The framework addresses the combined challenges of hybrid cyberattacks (DoS and FDI attacks) and nonlinear system dynamics. Specifically, three key contributions are made: (1) A data-driven dynamic linearization framework is developed, and an output tuning factor is introduced to enable simultaneous position and velocity tracking based on the nonlinear longitudinal vehicle dynamics model and its equivalent data-based linearized model. (2) A hybrid attack model is established that includes aperiodic DoS attacks that interrupt communication and bounded additive FDI attacks that inject malicious data, capturing their essential characteristics. (3) An intelligent historical input-driven compensation mechanism is designed and integrated with a pseudo-gradient estimator to improve control performance during DoS-induced communication interruptions. Theoretical analysis and simulation results confirm the effectiveness of the proposed method. When attack parameters satisfy specific conditions, the system tracking error remains bounded, and follower vehicles accurately track the leader’s states. The proposed method also achieves better velocity tracking accuracy and faster error convergence than the compensation-free baseline scheme. By focusing on hybrid scenarios with aperiodic DoS and bounded additive FDI attacks, this study provides a practical model-free approach to improve cybersecurity in connected vehicle platoons. Future work will examine more stealthy hybrid attack modes, including non-additive FDI, spoofing, and DoS attacks, to analyze their coupling mechanisms and develop targeted defense strategies. In addition, a communication-efficient MFAC strategy that integrates an event-triggered mechanism will be investigated to reduce network load and improve scalability.
SCUNet-Based Decoding Algorithm for Rayleigh Fading Channels Integrating Feature Extraction and Recovery Mechanisms
WANG Leijun, WANG Kuan, XIE Jinfa, PENG Xidong, LI Jiawen, CHEN Rongjun
Available online  , doi: 10.11999/JEIT251138
Abstract:
  Objective  This study examines limitations of conventional Deep Neural Network (DNN) decoding algorithms in Rayleigh fading channels, including constrained performance, limited generalization, and weak fading resistance. To address these issues, a decoding algorithm based on the SCUNet (Swin Conv UNet) architecture, termed SCUNetDec, is proposed. In 6G communication scenarios, wireless channels exhibit strong dynamics and complexity, which restrict the ability of traditional decoding methods to meet requirements for high reliability, low latency, and robustness. Intelligent decoding methods with adaptive feature learning are therefore valuable. SCUNetDec integrates multi-dimensional feature extraction and recovery modules and uses a noise-level map to strengthen channel-state perception. These components enable the network to learn channel characteristics, reduce fading effects, and improve decoding performance. The study provides an approach for intelligent decoding in complex channel environments and supports the development of efficient 6G communication systems.  Methods  The SCUNetDec network combines three mechanisms—data preprocessing, feature extraction and recovery, and noise-level mapping—to enhance signal representation learning and decoding in Rayleigh fading channels. In the preprocessing stage, dimensionality expansion converts the one-dimensional received signal into a two-dimensional feature map, improving structural visibility and supporting spatial correlation learning. The feature extraction and recovery module uses multi-layer convolution and attention mechanisms to capture essential channel features, whereas deconvolution layers and residual connections suppress interference introduced during dimensionality transformation. This improves reconstruction quality and decoding accuracy. A noise-level map embeds SNR (Signal to Noise Ratio)-related information aligned with the feature maps, allowing the model to adjust to channel variation and adapt decoding strength. The combined effect of these mechanisms increases noise robustness, generalization, and decoding stability, offering a systematic decoding solution for complex 6G wireless environments.  Results and Discussions  SCUNetDec enhances signal learning and decoding in Rayleigh fading channels through its feature extraction–recovery module and noise-level map. Simulations under different coding schemes validate its effectiveness. For the (7,4) Hamming code, SCUNetDec outperforms conventional DNN decoding and approaches Maximum Likelihood (ML) performance; at BER (Bit Error Rate) = 10–4, the gap to ML is about 1.5 dB, and at FER (Frame Error Rate) = 10–3, the gap is about 2.0 dB (Fig. 4). This indicates that SCUNetDec captures complex signal relationships and learns associations between information and parity-check nodes. For the (2,1,3) convolutional code, SCUNetDec performs close to the Viterbi algorithm at BER = 10–3, with a gap of roughly 2.0 dB, while conventional DNN decoding degrades at high SNRs (Fig. 5). For Polar codes with a rate of 0.5, SCUNetDec shows a gain of about 4.0 dB over successive cancellation (SC) decoding at BER = 10–4 and maintains an advantage of about 1.0 dB at FER = 10–3, with SC performing slightly better only in the low-SNR region (Fig. 6). Decoding-time comparisons show that SCUNetDec reduces decoding latency relative to traditional methods (Table S1). Ablation experiments confirm that integrating the feature extraction and recovery modules into SCUNet improves decoding performance (Fig. 7). Overall, results show that SCUNetDec provides robust decoding performance across coding schemes and SNR levels.  Conclusions  This study proposes SCUNetDec to address performance limitations of DNN decoders in Rayleigh fading channels. The method enhances SCUNet using signal feature extraction and recovery modules. Simulations and ablation experiments on Hamming, convolutional, and Polar codes show strong generalization capability and effectiveness. Compared with traditional DNN models, SCUNetDec achieves decoding performance close to optimal decoding algorithms and reduces decoding time. These findings indicate that SCUNetDec has practical potential for complex channel environments. Future work will examine fusion of neural and traditional algorithms to balance performance and complexity through dynamic parameter optimization and explore intelligent decoding strategies for long codes. Research will also investigate joint modulation–decoding modeling and end-to-end architectures to improve adaptability under high-order modulation and complex channels.
Multimodal Pedestrian Trajectory Prediction with Multi-Scale Spatio-Temporal Group Modeling and Diffusion
KONG Xiangyan, GAO YuLong, WANG Gang
Available online  , doi: 10.11999/JEIT250900
Abstract:
  Objective  The rapid development of autonomous driving and social robotics has increased the need for accurate pedestrian trajectory prediction to improve safety and interaction efficiency. Existing group-based methods mainly emphasize local spatial interaction and often overlook latent grouping characteristics across time. This study proposes a multi-scale spatiotemporal feature construction method that separates trajectory shape from absolute spatiotemporal coordinates. This enables the model to capture latent group associations across different temporal intervals. A spatiotemporal interaction three-element encoding mechanism is incorporated to extract dynamic relationships between individuals and groups. By integrating the reverse process length mechanism of diffusion models, the system progressively reduces prediction uncertainty. This approach provides an effective solution for multimodal trajectory prediction in complex, crowded scenes and offers theoretical support for improving the accuracy and stability of long-range trajectory forecasting.  Methods  The algorithm performs deep modeling of pedestrian trajectories through multi-scale spatiotemporal group modeling across three components: group construction, interaction modeling, and trajectory generation. First, to address the limitations of methods that focus on local spatiotemporal patterns but overlook cross-dimensional latent characteristics, a multiscale trajectory grouping model is developed. Its core design extracts trajectory offsets to represent trajectory shapes, separating motion features from absolute positions. This enables the system to identify latent group associations among agents who follow similar motion patterns across different periods. Second, a spatiotemporal interaction three-element encoding method is proposed. By defining neural interaction strength, interaction categories, and category functions, the method captures detailed individual interactions and the global dynamic evolution of collective behavior. Finally, a Diffusion Model is introduced for multimodal prediction. Through the reverse process length mechanism, the model converges gradually, reduces uncertainty, and transforms a diffuse prediction space into plausible future trajectories.  Results and Discussions  The proposed model was evaluated against 11 state-of-the-art baselines on the NBA dataset (Table 1). The results show clear advantages in minADE20. It achieves substantial gains over GroupNet+CVAE in long-term prediction tasks, improving minADE20 and minFDE20 by 0.18 and 0.36, respectively, at the 4-second horizon. Although it is slightly inferior to MID in long-term trend prediction, possibly because group dynamics shift rapidly and intensely in NBA scenarios, the model maintains strong instantaneous accuracy. This supports the effectiveness of the multi-scale grouping strategy, which uses historical trajectories to capture complex dynamic interactions. On the ETH/UCY datasets (Table 2), MSGD provides consistent improvements across all five sub-scenes. In the dense and highly interactive UNIV scene, the method exceeds all baselines by leveraging the strengths of multi-scale modeling. Although MSGD is marginally behind PPT in long-distance endpoint constraints, it maintains a lead in minADE20. It also outperforms Trajectory++ in velocity smoothness and directional coherence (std dev: 0.701 2) (Table 3), indicating that the generated trajectories maintain natural smoothness aligned with human motion. Ablation studies verify the independent effects of the diffusion model, spatiotemporal feature extraction, and multi-scale grouping modules (Table 4). Grouping sensitivity analysis on the NBA dataset shows that full-court grouping (group size 11) enhances long-term stability, reducing minFDE20 by 0.026–0.03 at 4 seconds (Table 5). Configurations with group sizes of 5 or 2 further support the importance of team formations and “one-on-one” local offensive and defensive dynamics (Table 6). Diffusion-step and training-epoch sensitivity analysis reveals a complementary relationship: moderate diffusion steps (30–40) refine denoising and improve accuracy, whereas excessive steps may cause overfitting (Table 7). Qualitative visualization confirms that MSGD generates multimodal trajectories with high overlap with ground truth (Fig. 2).  Conclusions  This study presents a trajectory prediction algorithm that improves performance in two primary ways: (1) it captures pedestrian interactions by extracting spatiotemporal features, and (2) it strengthens collective behavior modeling through multi-scale grouping. Experiments show that the method achieves state-of-the-art performance on the NBA and ETH/UCY datasets, and ablation studies confirm the effectiveness of all modules. Two limitations remain. First, explicit environmental information, such as maps or obstacles, is not yet incorporated. Second, the diffusion model requires substantial computational cost during inference. Future research will address these issues.
Design of Dynamic Resource Awareness and Task Offloading Schemes in Multi-Access Edge Computing Networks
ZHANG Bingxue, LI Xisheng, YOU Jia
Available online  , doi: 10.11999/JEIT250640
Abstract:
  Objective  With the growth of the industrial Internet of Things and the widespread use of multimode terminals, multi-access edge computing has become a key technology that supports low-latency and energy-efficient industrial applications. Task offloading is central to addressing the large volume and complex processing requirements of multimode terminals. In multi-access edge computing systems, end-user network selection strongly affects offloading and resource allocation. However, existing network-selection mechanisms emphasize user decisions while neglecting the effects of task execution, task-data transmission, and processing on network performance. Current studies on offloading design emphasize delay, energy optimization, and resource allocation, but overlook how collaborative computing across heterogeneous networks affects resource cost and dynamic resource balance. To address these issues, this study considers users’ diverse requirements and the differentiated capabilities of heterogeneous resource providers. It focuses on cost-efficient task-execution decisions and dynamic-resource allocation in multi-access heterogeneous networks to reduce system cost, improve service quality, and support cooperative use of heterogeneous resources.  Methods  Following the MEC network model, this study establishes cost-calculation models for task-execution time, energy consumption, and communication-resource consumption for different networks during end-user task selection. Using auction theory, it constructs a cost-effectiveness model for task evaluation and bidding between users and edge servers, and formulates the objective optimization problem based on combinatorial two-way auction theory. A dynamic resource-sensing and task offloading algorithm based on an auction mechanism is then proposed. Through two-way broadcasting of pending tasks and required resources, the algorithm performs network-selection assessment and dynamic allocation of computing and communication resources. Servers submit valid bids only when their available resources satisfy user constraints. Servers that issue valid bids compete for task-execution opportunities until the user obtains the optimal bid and corresponding server, which completes the auction-matching process.  Results and Discussions  The proposed dynamic-resource allocation and task offloading algorithm accounts for heterogeneous-network conditions and resource usage, and selects offloading locations based on resource availability. By setting simulation parameters, a heterogeneous wireless-network cooperation model is constructed. The effects of network size on offloading cost and offloaded data volume are analyzed. Simulation results show that the algorithm reduces system cost by at least 5% compared with benchmark algorithms (Fig. 3), with larger advantages when the number of end users increases. Changes in the number of servers influence users’ network-selection behavior (Fig. 4, 5, 6). Across algorithms, the proposed method increases the amount of offloaded data by approximately 10% relative to benchmark schemes (Fig. 7, 8). Finally, the study analyzes how variation in communication-resource cost parameters affects users’ preference for offloading via the 5G public network. Higher communication-cost parameters markedly reduce the data volume offloaded through the 5G network (Fig. 9).  Conclusions  To address complex data-processing demands from multimode terminals, this study develops a cooperative multi-access edge computing architecture for multimode devices. Flexible and intelligent wireless-network selection provides additional resources for end-user task offloading. A server-bidding and user-target bidding model is built using an auction framework, and a dynamic resource-perception and task offloading algorithm is proposed. The algorithm first adjusts and selects the offloading network and allocates computing and communication resources according to incoming tasks. It then determines the offloading location with minimum execution cost based on competition among edge servers. Results indicate that the proposed algorithm lowers system cost compared with benchmark approaches, increases the amount of data offloaded to multiple edge servers, improves utilization of edge-computing resources, and enhances system energy efficiency and operational efficiency.
Multi-Agent Deep Reinforcement Learning Strategy for Multi-Spacecraft Long-Distance Orbital Game
DI Peng, YIN Zengshan, LIN Zheng, YAO Ye
Available online  , doi: 10.11999/JEIT251384
Abstract:
This paper introduces a novel research scenario for multi-spacecraft Orbital Pursuit-Evasion Game (OPEG), which has not yet been systematically studied. To enhance the decision-making capabilities of spacecraft and enable them to formulate more robust policies in complex multi-agent games, this paper proposes a multi-agent deep reinforcement learning algorithm based on a progressive adversarial training framework to solve the game policies of each spacecraft. Two sets of examples with different orbital characteristics and various simulation conditions were set up for simulation verification, and behavioral deviation analysis is conducted to verify the robustness of the policy. The impact of different orbital characteristics, simulation conditions, and behavioral deviations on the game policy was analyzed. Simulation results show that the proposed method enables each spacecraft to formulate an effective game policy that satisfies all set constraints and has good robustness.  Objective  As the space environment becomes increasingly complex, space security has become a hot research area. The existence of a large amount of space debris and failed spacecraft poses a serious threat to high-value spacecraft in orbit. Therefore, the study of Orbital Pursuit-Evasion Game (OPEG) for non-cooperative target spacecraft has attracted widespread attention. Existing research focuses on OPEG for two spacecraft, but less on OPEG for multiple spacecraft. When there are more than two players in the game, zero-sum game design is not feasible, and it is difficult to solve using traditional methods. Furthermore, existing research ignores engineering dynamic constraints and simplifies or defines the dynamics as a two-dimensional scene when modeling the problem, which can cause considerable errors. To overcome the limitations of existing spacecraft game scenarios, this paper proposes a novel multi-spacecraft OPEG research scenario. The aim is to investigate the application of the MADRL algorithm in solving the approximate steady-state policies of each spacecraft in long-distance multi-spacecraft OPEG, highlighting the significant advantages of the MADRL algorithm in solving multi-spacecraft OPEG, and providing a feasible solution for truly realizing autonomous multi-spacecraft game play in the future.  Methods  The Multi-Agent Proximal Policy Optimization (MAPPO) algorithm based on the Progressive Adversarial Training Framework (PATF) is used to solve the optimal game policy for each spacecraft in the Multi-Spacecraft OPEG. First, a multi-constrained multi-spacecraft OPEG model is established based on actual engineering constraints, and the problem is transformed into a Decentralized Partially Observable Markov Decision Process (Dec-POMDPs). Secondly, in order to improve the decision-making ability of agents in complex multi-agent game environments and formulate more robust game policies, a novel PATF is introduced, with different reward functions designed for the specific missions of each spacecraft. Finally, two sets of simulation examples with different orbital characteristics were set up, and four different simulation conditions were set up for simulation and behavioral deviation analysis was performed.  Results and Discussions  The MAPPO algorithm based on the PATF proposed in this paper is compared with the original MAPPO (Fig. 3). The results show that the proposed method can learn effective policies more quickly, reduce ineffective exploration, and achieve a higher final convergence reward value with less fluctuation in the reward curve. This also demonstrates that the PATF can significantly enhance the decision-making ability of agents, enabling them to formulate robust policies more effectively. Simulation verification was performed using two sets of examples in four different settings (Figs. 4, 5, 6, and 7). Simulation results (Tables 3 and 4) show that the proposed method performs well in both sets of examples. Furthermore, it was verified that when the pursuer and the interceptor are on the same orbital plane, the pursuer is more likely to be intercepted. When the interceptor and the target are not on the same orbital plane, the interceptor has a relatively easier time carrying out the interception mission. This paper also analyzes the situation where both sides of the game have behavioral biases, and models this by adding control noise. Simulation results (Tables 5 and 6) show that both sides adopt relatively conservative policies to counter the control noise. The game policy formulated by the method in this paper is an approximate steady-state policy. Behavioral deviations will lead to a decrease in one’s own payoff and an increase in the opponent's payoff, and the game policy has good robustness.  Conclusions  The method proposed in this paper can be well applied to solving the long-distance OPEG problem involving multiple spacecraft in non-coplanar elliptical orbits, enabling each spacecraft to formulate excellent game policies. The PATF facilitates better decision-making by the spacecraft in complex multi-spacecraft dynamic systems, with robust control policies developed by the pursuer and interceptors. The results also demonstrate the accuracy and effectiveness of the reward function design. Through two sets of examples and simulation results with different settings, the impact on the policies of both parties when the pursuer and interceptor have different orbital characteristics is analyzed. When interceptors have different maximum thrusts, the decision-making of each spacecraft changes accordingly. The behavior deviation analysis proves that the game policies of each spacecraft have good robustness. When one party’s behavior deviates, the approximate steady-state policy balance will change, resulting in a decrease in its own benefits and an increase in the other party’s benefits. The research scenario formulated in this paper expands the scope of existing research on multi-spacecraft game problems.
Multi-dimensional Resource Joint Optimization Algorithm for UAV Inspection of Collaborative Tasks of Perception and AI
LI Shiyang, ZHU Xiaorong
Available online  , doi: 10.11999/JEIT251284
Abstract:
  Objective  With increasing demand for aerial operations, the capabilities of various aircraft are steadily expanding across all airspace levels and multiple industries. The application of Unmanned Aerial Vehicles (UAVs) now spans multiple altitude layers, from low to high altitudes, and covers micro, medium, and large models. UAVs are widely used in public safety, transportation, emergency management, logistics and distribution, geographic surveying and mapping, and other fields, thereby promoting innovation and transformation in production and daily life. Compared with traditional manual inspection, UAV inspection, as an emerging operational approach, can acquire image information that is difficult for the human eye to capture. Labor costs are therefore significantly reduced, and the accuracy and efficiency of inspection operations are improved. However, UAV inspection also creates new challenges for multidimensional resource allocation and task scheduling. In power system inspection, for example, transmission lines are exposed to outdoor environments for long periods and are vulnerable to corrosion, aging, and even damage. Regular inspections are therefore required to ensure operational safety.   Methods  A four-stage multidimensional resource inspection and scheduling collaborative optimization algorithm is proposed. The original optimization problem is decomposed into four subproblems according to the inspection process. After mathematical analysis of each subproblem, a corresponding solution method is proposed. For the node selection problem, a dual-aided Mixed-Integer Linear Programming (MILP) transformation method is used. For the UAV data acquisition problem, a data-driven boundary learning method is adopted. For UAV communication resource allocation, a bandwidth-power joint optimization algorithm based on Successive Convex Approximation (SCA) is used. For node computing power allocation, a lower-bound analytical allocation method is adopted. Finally, the original problem is solved by an alternating optimization method across the subproblems, thereby forming the complete algorithm.  Results and Discussions  Simulation results show that the proposed algorithm reduces overall UAV energy consumption compared with the benchmark algorithms. Simulation training is conducted for visual positioning and fault detection services to examine the relationship among compression ratio, data volume, and service performance. Figures 25 show that fault detection accuracy reaches its optimum at 60% data volume and 60% compression ratio. Visual positioning accuracy reaches its optimum at 80% data volume and 80% compression ratio. Figure 6 shows that the proposed algorithm achieves higher accuracy than the benchmark algorithms for AI services. As shown in Figures 7 and 8, under varying bandwidth, computing power, and other resource conditions, the proposed algorithm consistently performs better than the benchmark algorithms in terms of energy consumption and effectively reduces total energy consumption.  Conclusions  A multidimensional resource joint optimization algorithm is proposed for intelligent UAV inspection with collaborative perception and AI tasks. An optimization problem is formulated with the objective of minimizing UAV energy consumption, using bandwidth, power, computing power, node selection, data volume, and actual compression ratio as variables. The algorithm jointly minimizes UAV energy consumption for two AI services, fault detection and visual localization. Simulation results show that the algorithm reduces total UAV energy consumption and improves model training accuracy. This study focuses on the application scenario of single-UAV inspection. More complex multi-UAV collaborative inspection scenarios can be examined in future work, and additional services can be incorporated for a more comprehensive analysis.
Joint Power Allocation and AP On-Off Control for Long-Term Energy Efficient Cell-Free Massive MIMO Systems
WEI Siqi, GUO Fengqian, CHONG Baolin, CHENG Guo, LU Hancheng
Available online  , doi: 10.11999/JEIT260014
Abstract:
  Objective   With the rapid development of wireless communication technologies, Cell-Free Massive Multiple-Input Multiple-Output (CF-mMIMO) has emerged as an effective paradigm to overcome the limitations of traditional cell-centric networks, such as limited performance for edge users. By deploying a large number of distributed Access Points (APs) connected to a Central Processing Unit (CPU) to cooperatively serve users, CF-mMIMO improves spectral efficiency and macro-diversity gain. However, dense AP deployment also introduces a critical challenge: high energy consumption. In practical systems, if all APs remain continuously active, especially during periods of low traffic load, substantial and unnecessary energy consumption occurs. This behavior reduces network sustainability and conflicts with global “dual-carbon” goals. Existing studies on energy efficiency in CF-mMIMO systems mainly focus on short-term performance optimization. These short-term approaches often ignore long-term traffic dynamics and the requirement of queue stability. Therefore, they lack robustness under time-varying traffic conditions and may cause queue congestion and significant performance fluctuations, which are unacceptable for next-generation wireless networks with strict reliability requirements. Although several recent studies examine long-term energy efficiency optimization, most assume that all APs remain active at all times. Therefore, the energy-saving potential of adaptive AP on-off control is not fully utilized.  Methods   To address these issues, a joint power allocation and AP on-off control strategy is proposed for downlink CF-mMIMO systems. The optimization problem aims to maximize long-term energy efficiency subject to user queue stability and AP power constraints. Because the problem has stochastic and long-term characteristics, the Lyapunov optimization framework is applied to transform the original long-term fractional programming problem into a sequence of deterministic drift-plus-penalty minimization problems solved in each time slot. The resulting per-slot problems remain nonconvex. Therefore, each problem is decomposed into two subproblems: power allocation and AP on-off control. The Successive Convex Approximation (SCA) method is used to convert the nonconvex formulations into solvable convex problems. An alternating optimization algorithm is then developed to jointly solve the two subproblems, which enables adaptive resource configuration under dynamic network conditions and stochastic traffic arrivals.  Results and Discussions   The proposed algorithm is evaluated through extensive simulations. First, the convergence behavior is examined. Numerical results (Fig. 2) show that per-slot energy efficiency increases rapidly and stabilizes after several iterations, which verifies the convergence of the alternating optimization procedure. Second, the effect of the control parameter is analyzed. As the parameter increases, the algorithm places greater emphasis on energy efficiency. Average power consumption decreases and then stabilizes (Fig. 3), whereas long-term energy efficiency increases and eventually stabilizes (Fig. 4). These results confirm the trade-off between energy efficiency and queue stability. Third, the proposed scheme is compared with three baseline methods. The results (Fig. 5) show that the proposed joint optimization approach consistently achieves higher long-term energy efficiency than the baseline methods. Fourth, the necessity of long-term optimization is demonstrated by comparing queue lengths with a short-term baseline (Fig. 6). Under the same traffic arrival rate, the short-term method shows cumulative queue growth, whereas the Lyapunov-based approach maintains queue lengths within a stable range and ensures network stability. Finally, robustness under imperfect Channel State Information (CSI) is evaluated (Fig. 7). Although energy efficiency decreases as channel uncertainty increases, the proposed method consistently outperforms the baseline approaches, which demonstrates strong robustness to channel estimation errors.  Conclusions   A long-term energy efficiency optimization framework is proposed for CF-mMIMO systems with stochastic traffic arrivals. By applying Lyapunov optimization theory, the stochastic long-term problem is transformed into slot-level drift-plus-penalty problems based on queue states. This transformation enables per-slot resource scheduling decisions while maintaining queue stability. On this basis, an efficient joint resource scheduling algorithm that integrates power allocation and AP on-off control is developed. The original problem is decomposed into power allocation and AP on-off control subproblems and solved through alternating optimization. Simulation results show that the proposed method adapts to dynamic traffic conditions. By placing underutilized APs into sleep mode, the algorithm improves long-term system energy efficiency and maintains queue stability. These results provide guidance for the design of green and sustainable wireless networks.
Research on Time Slots Aggregation and Topology Aggregation Model for Unmanned Aerial Vehicle Swarm Overall Time Synchronization
WANG Zhenling, TAO Haihong, WEI Haitao, WANG Zhengyong
Available online  , doi: 10.11999/JEIT251274
Abstract:
  Objective  Unmanned Aerial Vehicle (UAV) swarms overcome the technical and performance limitations of individual UAVs and enable complex missions that cannot be accomplished by a single platform. High-precision time synchronization among swarm nodes serves as a fundamental requirement for key swarm operations, including resource scheduling, cooperative positioning, and multi-node data fusion. Existing research on UAV time synchronization mainly focuses on improving the accuracy of basic synchronization approaches. However, limitations remain in adapting to topological changes during swarm formation flights and in achieving global synchronization among multiple nodes. As the scale of UAV swarms increases, the connectivity of time-comparison links between nodes during formation flights exhibits clear time-varying characteristics. These characteristics create challenges for maintaining continuous, reliable, and precise overall time synchronization. To address stable formation flight and formation transformation scenarios in different mission stages of UAV swarms, an Observation Time Slots Aggregation (OTSA) model and a Time-Varying Topology Aggregation (TVTA) model are proposed to enhance the robustness of global time synchronization among swarm nodes and to improve Time Synchronization Accuracy (TSA). This study proposes an effective solution for Leader-Following Consistency Time Synchronization (LFCTS) in UAV swarms and provides references for time synchronization applications in heterogeneous and distributed systems.  Methods  Compared with the traditional Quasi Real-time Bidirectional Time Comparison (QRBTC) scheme, the time synchronization method based on the OTSA model fully uses all synchronization signal transmission and reception link resources within each time slot of the system synchronization period. Based on the “one transmission and multiple receptions” mechanism of all nodes, the Follower Node (FN) performs direct synchronization or single-hop indirect synchronization with the Leader Node (LN) in each time slot according to the OTSA model. This process produces tens of times more clock-skew observation samples than the traditional QRBTC scheme. The OTSA method improves the robustness of global time synchronization. It also enables secondary data processing using multi-slot synchronization samples, which further improves TSA compared with the QRBTC method. Based on the LFCTS results obtained during the system signal synchronization period, the TVTA model extends the direct comparison and single-hop indirect comparison mechanism of the OTSA model to cross-period multi-hop comparison. This extension addresses overall time synchronization instability caused by the time-varying characteristics of synchronization link relationships during UAV swarm takeoff, assembly, and formation transformation.  Results and Discussions  In the OTSA method, all time-comparison link resources of the total time slots are fully used during the synchronization period (Fig. 2). Based on the constructed error model and simulation analysis, for a UAV swarm with 50 nodes and a time slot allocation of 20 ms, time synchronization using the OTSA model achieves a single-slot TSA of 4.10~4.27 ns (Fig. 6). Within a complete time synchronization period, the overall TSA reaches 2.46~2.56 ns, which is better than the QRBTC scheme under the same conditions (Fig. 5(a)). The TVTA method uses cross-period synchronization comparison relationships to construct multi-hop time comparison links (Fig. 3 and 4). When the FN obtains external comparison relationships of other nodes through aggregation processing, one-way or two-way Dijkstra’s algorithm is applied to determine the multi-hop comparison link with optimal connectivity. Time tracing and comparison with the LN are then completed through edge computing. Error analysis indicates that during UAV swarm takeoff, assembly, and transitions to triangle or rhombus formations, time synchronization based on the TVTA model achieves an overall TSA better than 8.6 ns, which provides stronger global time synchronization capability.  Conclusions  This study addresses the robustness of time synchronization in UAV swarm formation flights. For stable formation flight and formation transformation scenarios during different mission stages, the OTSA and TVTA models are proposed. An error model is constructed and performance is analyzed. The results show the following. (1) The OTSA model improves the robustness of overall time synchronization through direct comparison and single-hop indirect comparison across multiple time slots within one synchronization period. The model achieves an overall TSA better than 2.56 ns and performs better than the traditional QRBTC method. (2) The TVTA model achieves overall UAV swarm time synchronization through multi-hop relay between nodes. Even when time-comparison links change, the model maintains global TSA better than 8.6 ns. (3) These two methods consider the time-varying characteristics of comparison links among UAV swarm nodes and have been verified through small-scale UAV swarm flight tests. They maintain synchronization robustness and performance and provide necessary support for coordinated UAV swarm operations. Future work will focus on practical flight verification, adaptation in complex scenarios, and further improvement of overall synchronization accuracy.
Spherical Geometry-guided and Frequency-Enhanced Segment Anything Model for 360° Salient Object Detection
CHEN Xiaolei, SHEN Yujie, ZHONG Zhihua
Available online  , doi: 10.11999/JEIT251254
Abstract:
  Objective  With the rapid development of Virtual Reality (VR) and Augmented Reality (AR) technologies and the increasing demand for omnidirectional visual applications, accurate salient object detection in complex 360° scenes has become critical for system stability and intelligent decision-making. The Segment Anything Model (SAM) demonstrates strong transferability across two-dimensional vision tasks. However, it is primarily designed for planar images and lacks explicit modeling of spherical geometry, which limits its direct application to 360° Salient Object Detection (360° SOD). To address this limitation, this study integrates the generalization capability of SAM with spherical-aware multi-scale geometric modeling to improve 360° SOD. Specifically, a Multi-Cognitive Adapter (MCA), Spherical Geometry Guided Attention (SGGA), and Spatial-Frequency Joint Perception Module (SFJPM) are proposed to enhance multi-scale structural representation, mitigate projection-induced geometric distortions and boundary discontinuities, and strengthen joint global and local feature modeling.  Methods  The proposed 360° SOD framework is built on SAM and consists of an image encoder and a mask decoder. During encoding, spherical geometry modeling is incorporated into patch embedding by mapping image patches onto a unit sphere and explicitly modeling spatial relationships between patch centers. This strategy injects geometric priors into the attention mechanism, which improves sensitivity to non-uniform geometric characteristics and reduces information loss caused by omnidirectional projection distortion. The encoder adopts a partial freezing strategy and is organized into four stages, each containing three encoder blocks. Each block integrates the MCA for multi-scale contextual fusion and the SGGA to model long-range dependencies in spherical space. Multi-level features are concatenated along the channel dimension to form a unified representation. The representation is then refined by the SFJPM, which jointly captures spatial structures and frequency-domain global information. The fused features are subsequently fed into the SAM mask decoder. Saliency maps are optimized under ground-truth supervision to achieve accurate object localization and boundary refinement.  Results and Discussions  Experiments are conducted using the PyTorch framework on an RTX 3090 GPU with an input resolution of 512 × 512. Evaluations are performed on two public datasets, 360-SOD and 360-SSOD, and compared with 14 state-of-the-art methods. The proposed approach consistently achieves superior performance across six evaluation metrics. On the 360-SOD dataset, the model achieves a Mean Absolute Error (MAE) of 0.015 2 and a maximum F-measure of 0.849 2, outperforming representative methods such as MDSAM and DPNet. Qualitative results show that the proposed method produces saliency maps that are highly consistent with ground-truth annotations. The model handles challenging scenarios effectively, including projection distortion, boundary discontinuity, multi-object scenes, and complex backgrounds. Ablation studies further show that MCA, SGGA, and SFJPM each contribute to performance improvement and operate complementarily.  Conclusions  This study proposes an SAM-based framework for 360° salient object detection that jointly addresses multi-scale representation, spherical distortion awareness, and spatial-frequency feature modeling. The MCA improves multi-scale feature fusion, the SGGA compensates for Equirectangular Projection (ERP)-induced geometric distortion, and the SFJPM enhances long-range dependency modeling. Extensive experiments verify the effectiveness and feasibility of applying SAM to 360° SOD. Future research will extend this framework to omnidirectional video and multi-modal scenarios to further improve spatiotemporal modeling and scene understanding.
Communication, Computation, and Caching Resource Collaboration for Heterogeneous Artificial Intelligence Generated Content Service Provisioning
WU Mengru, GAO Yu, ZHAO Bo, XU Bo, SUN Hao, GUO Lei
Available online  , doi: 10.11999/JEIT251300
Abstract:
  Objective  In the Artificial Intelligence of Things (AIoT), Edge Servers (ESs) provide intelligent content generation services to AIoT devices by utilizing cached Artificial Intelligence Generated Content (AIGC) models. However, the limited computing resources and caching capacity of ESs make it difficult to support the large-scale caching demands of heterogeneous AIGC services. To address this issue, a communication, computation, and caching resource collaboration scheme is proposed based on a combined cloud-edge and edge-edge collaborative framework. The scheme considers three representative AIGC services: lightweight AIGC services, computation-intensive AIGC services, and preprocessing-based AIGC services. The objective is to minimize the total AIGC service latency through joint optimization of transmit power, computing resource allocation, model caching strategies, and offloading decisions.  Methods  Communication, computation, and caching resource collaboration for heterogeneous AIGC services is investigated. First, an AIGC service-oriented AIoT system model is established to incorporate both cloud-edge and edge-edge collaboration. An optimization problem is then formulated to minimize the total latency of AIGC services through joint optimization of transmit power, computing resource allocation, model caching strategies, and offloading decisions. Because the formulated problem is non-convex, an Alternating Optimization (AO) algorithm is proposed. The original problem is decomposed into three subproblems. These subproblems are solved using the Successive Convex Approximation (SCA) method, Karush-Kuhn-Tucker (KKT) conditions, and an improved Harris Hawks Optimization (HHO) algorithm.  Results and Discussions  Simulation experiments compare the proposed joint optimization scheme with three baseline methods: Particle Swarm Optimization (PSO), fixed resource allocation, and random offloading and caching. First, the convergence of the proposed AO algorithm is verified (Fig. 2). The results show that the algorithm converges rapidly within a limited number of iterations across different subproblems. Second, increasing transmission bandwidth significantly reduces the total AIGC service latency (Fig. 3). This occurs because each device obtains more bandwidth resources for task transmission, and the ES can allocate more bandwidth to deliver generated content in the downlink. Furthermore, the total AIGC service latency decreases as the ES storage capacity increases for all schemes (Fig. 4). Greater storage capacity enables the ES to store more AIGC models, which reduces the transmission delay between the ES and the cloud server. Moreover, when the required floating-point operations per bit increase, the total AIGC service latency rises significantly across all schemes (Fig. 5). Finally, the total AIGC service latency decreases as the maximum transmit power of the Base Station (BS) increases (Fig. 6). This occurs because higher BS transmit power improves the downlink signal-to-noise ratio, which increases the downlink transmission rate and reduces overall service latency. The proposed scheme demonstrates better performance than the baseline schemes, particularly under high computational demand.  Conclusions  Communication, computation, and caching resource collaboration for heterogeneous AIGC services is investigated. The objective is to minimize total AIGC service latency through joint optimization of the transmit power of AIoT devices and BSs, computing resource allocation, AIGC model deployment, and service offloading decisions under computation and caching resource constraints. Because the formulated problem is a mixed-integer nonlinear programming problem, an efficient AO algorithm is developed. The original optimization problem is decomposed into three subproblems, which are solved using the SCA algorithm, KKT conditions, and the HHO algorithm, respectively. Simulation results show that the proposed algorithm reduces the total AIGC service latency compared with the baseline schemes.
Wavelet Transform and Attentional Dual-Path EEG Model for Virtual Reality Motion Sickness Detection
CHEN Yuechi, HUA Chengcheng, DAI Zhian, FU Jingqi, ZHU Min, WANG Qiuyu, YAN Ying, LIU Jia
Available online  , doi: 10.11999/JEIT251233
Abstract:
  Objective  Virtual Reality Motion Sickness (VRMS) presents a barrier to the wider adoption of immersive Virtual Reality (VR). It is primarily caused by sensory conflict between the vestibular and visual systems. Existing assessments rely on subjective reports that disrupt immersion and do not provide real-time measurements. An objective detection method is therefore needed. This study proposes a dual-path fusion model, the Wavelet Transform ATtentional Network (WTATNet), which integrates wavelet transform and attention mechanisms. WTATNet is designed to classify resting-state ElectroEncephaloGraph (EEG) signals collected before and after VR motion stimulus exposure to support VRMS detection and research on the mechanisms and mitigation strategies.  Methods  WTATNet contains two parallel pathways for EEG feature extraction. The first applies a Two-Dimensional Discrete Wavelet Transform (2D-DWT) to both the time and electrode dimensions of the EEG, reshaping the signal into a two-dimensional matrix based on the spatial layout of the scalp electrodes in horizontal or vertical form. This decomposition captures multi-scale spatiotemporal features, which are then processed using Convolutional Neural Network (CNN) layers. The second pathway applies a one-dimensional CNN for initial filtering followed by a dual-attention structure consisting of a channel attention module and an electrode attention module. These modules recalibrate the importance of features across channels and electrodes to emphasize task-relevant information. Features from both pathways are fused and passed through fully connected layers to classify EEGs into pre-exposure (non-VRMS) and post-exposure (VRMS) states based on subjective questionnaire validation. EEG data were collected from 22 subjects exposed to VRMS using the game “Ultrawings2.” Ten-fold cross-validation was used for training and evaluation with accuracy, precision, recall, and F1-score as metrics.  Results and Discussions  WTATNet achieved high VRMS-related EEG classification performance, with an average accuracy of 98.39%, F1-score of 98.39%, precision of 98.38%, and recall of 98.40%. It outperformed classical and state-of-the-art EEG models, including ShallowConvNet, EEGNet, Conformer, and FBCNet (Table 2). Ablation experiments (Tables 3 and 4) showed that removing the wavelet transform path, the electrode attention module, or the channel attention module reduced accuracy by 1.78%, 1.36%, and 1.01%, respectively. The 2D-DWT performed better than the one-dimensional DWT, supporting the value of joint spatiotemporal analysis. Experiments with randomized electrode ordering (Table 4) produced lower accuracy than spatially coherent layouts, indicating that 2D-DWT leverages inherent spatial correlations among electrodes. Feature visualizations using t-SNE (Figures 5 and 6) showed that WTATNet produced more discriminative features than baseline and ablated variants.  Conclusions  The dual-path WTATNet model integrates wavelet transform and attention mechanisms to achieve accurate VRMS detection using resting-state EEG. Its design combines interpretable, multi-scale spatiotemporal features from 2D-DWT with adaptive channel-level and electrode-level weighting. The experimental results confirm state-of-the-art performance and show that WTATNet offers an objective, robust, and non-intrusive VRMS detection method. It provides a technical foundation for studies on VRMS neural mechanisms and countermeasure development. WTATNet also shows potential for generalization to other EEG decoding tasks in neuroscience and clinical research.
SAR Saturated Interference Suppression Method Guided by Precise Saturation Model
DUAN Lunhao, LU Xingyu, TAN Ke, LIU Yushuang, YANG Jianchao, YU Jing, GU Hong
Available online  , doi: 10.11999/JEIT251283
Abstract:
  Objective  With the increasing number of electromagnetic devices, Synthetic Aperture Radar (SAR) is highly susceptible to Radio Frequency Interference (RFI) within the same frequency band. RFI typically appears as bright streaks in SAR images and severely degrades image quality. Considerable research has been conducted on interference suppression, and many effective methods have been proposed. However, most existing approaches do not consider the nonlinear saturation of interfered echoes. In practical scenarios, the interference power is usually high, and the gain controller in the SAR receiver cannot effectively regulate the amplitude of interfered echoes. Therefore, the input signal amplitude of the Analog-to-Digital Converter (ADC) exceeds its dynamic range. This condition drives the SAR receiver into saturation and leads to nonlinear distortion in the interfered echoes. Such phenomena have been observed in multiple SAR systems. Documented cases include receiver saturation in the LuTan-1 satellite and several airborne SAR platforms. Analyses of SAR data further confirm the presence of saturated interference in systems such as Sentinel-1, Gaofen-3, and other spaceborne SAR platforms. After saturation occurs, the echo spectrum exhibits spurious components and spectral artifacts. These effects cause a mismatch between existing suppression methods and the actual characteristics of saturated interference. Therefore, many current methods cannot effectively mitigate this type of interference. Moreover, accurate models that precisely describe the output components of saturated interfered echoes remain limited. To address these issues, a precise analytical model for saturated interference is established, and an effective saturated interference suppression method is proposed based on this model.  Methods  Based on the processing of the basic saturation model, a mathematical model is first developed to accurately characterize the output components of saturated interference. The accuracy of the model in describing amplitude and phase is validated through simulations. A detailed analysis of the output components of interfered echoes under saturation conditions is also conducted. Compared with the one-bit sampling model and the traditional tanh saturation model, the proposed model provides higher accuracy in describing amplitude information. In addition, the model is not limited by the sampling bit width of ADCs and can theoretically be extended to describe saturation outputs in other radar receivers. Based on the observation that harmonic phases can be expressed as a linear combination of the phases of the original signal components, and by exploiting the high-power characteristic of the interference fundamental harmonic, a saturated interference suppression method is proposed. First, because the interference fundamental harmonic has relatively high power, it is extracted using eigen-subspace decomposition. Then, based on harmonic phase relationships, the extracted interference fundamental harmonic, and the SAR transmitted signal, various interference harmonics are systematically constructed. These include higher-order interference harmonics, target harmonics, and intermodulation harmonics, which together form a complete dictionary. Finally, a sparse optimization problem is solved to achieve separation and suppression of saturated interference. The effectiveness of the proposed method is verified using measured Gaofen-3 data.  Results and Discussions  Experiments are conducted using both simulated and measured data to verify the effectiveness of the proposed method in suppressing saturated interference. For simulated data, the proposed method completely removes interference stripes in the SAR image (Fig. 7). Analysis of the time-frequency spectra of the processed echoes (Fig. 8 and Fig. 9) shows that traditional methods cannot effectively eliminate higher-order harmonics. Thus, the proposed method improves the Target-to-Background Ratio (TBR) by 1.76 dB and achieves the lowest Root Mean Square Error (RMSE) of 0.078 3 (Table 3). For the measured Gaofen-3 data, analysis of the processed images and the time-frequency spectra of echoes confirms that the proposed method effectively suppresses interference. Conventional methods still exhibit residual interference in the processed results (Fig. 10 and Fig. 11).  Conclusions  With the increasing deployment of electromagnetic devices, SAR systems are increasingly susceptible to in-band interference. High-power interference can drive the SAR receiver into saturation and cause nonlinear distortion, which reduces the effectiveness of traditional interference suppression methods. To address this issue, a model that precisely characterizes the saturated output components of interfered echoes is established. Based on this model, an interference suppression method for saturated interference is proposed. Simulation and experimental results show that the model accurately describes saturation behavior and that the proposed method effectively suppresses saturated interference.
Reconfigurable Intelligent Surface Assisted Key Generation Resistant to Signal Injection Attacks
YANG Lijun, WANG Haomin, ZHU Tiancheng, WU Meng
Available online  , doi: 10.11999/JEIT251281
Abstract:
  Objective  This study examines the potential threat of signal injection attacks to Physical Layer Key Generation (PLKG) in Reconfigurable Intelligent Surface (RIS)-assisted wireless systems. The threat is especially pronounced in quasi-static channels, where the channel state remains highly correlated across multiple probing rounds. From both attack and defense perspectives, the study clarifies how spatial correlation between RIS reflection channels and eavesdropping channels can be exploited to improve key inference. A channel-randomization mechanism is designed that uses the controllability of RIS to suppress key leakage, reduce the eavesdropper’s key capacity, and improve the security of RIS-assisted PLKG in future 6G scenarios. Quantitative analysis further examines the relationships among injection power, Signal-to-Noise Ratio (SNR), and spatial correlation. These results provide reference guidance for robust RIS configuration and secure system design.  Methods  An RIS-assisted Time-Division Duplex (TDD) system is considered. Single-antenna Alice and Bob generate symmetric keys from a reciprocal channel, whereas a two-antenna active eavesdropper, Eve, injects signals using previously observed Channel State Information (CSI) (Fig. 1). The links follow quasi-static Rayleigh block fading. CSI for Alice, Bob, and Eve is defined for each time slot within a coherence interval. A conventional injection attack is first modeled. Eve estimates the eavesdropping channel in one slot, precodes an injected waveform, and contaminates the subsequent probing at Alice and Bob, partially steering their key source. A joint key inference strategy is then proposed. This strategy exploits the spatial correlation between RIS reflection channels and eavesdropping channels, as well as the common RIS-induced subchannel shared by legitimate and eavesdropping links (Table 1). As a defense, a channel-randomization PLKG scheme is proposed. Alice randomly reconfigures RIS coefficients at each probing round. Therefore, the effective channels of Alice-Bob, Alice-Eve, and Bob-Eve vary independently across rounds, whereas Alice-Bob reciprocity within a single round is preserved. Injection signals precoded with outdated CSI therefore appear as uncorrelated interference at the legitimate nodes. Mutual-information-based bounds on secret-key capacity are derived to obtain key capacities. The eavesdropper’s Key Recovery Rate (KRR) is defined for performance evaluation. The theoretical results are validated through MATLAB Monte Carlo simulations with 10,000 trials using an information-theoretic estimator toolbox. The simulations examine different SNR levels, injection power values, and spatial correlation conditions (Figs. 2\begin{document}$ \sim $\end{document}5, Table 2).  Results and Discussions  Analysis of the conventional injection attack without RIS defense shows that at high SNR, Alice and Bob observe nearly identical reciprocal channels due to channel reciprocity. Eve’s estimate, derived from injected signals, follows a similar trend but shows noticeable mismatch (Fig. 2). Eve can therefore recover some key bits, although errors remain, and the KRR remains moderate. When the proposed joint key inference strategy is applied, Eve’s reconstructed channel more closely matches the legitimate response (Fig. 3). This effect arises because RIS-assisted PLKG causes legitimate and eavesdropping links to share an RIS-induced subchannel. The resulting spatial correlation provides additional exploitable information beyond the known injected signal. Therefore, Eve’s key capacity and KRR increase significantly, which indicates a stronger RIS-specific security threat. At fixed SNR (Fig. 4), Eve’s key capacity without defense increases rapidly with injection power and may approach or exceed the legitimate key capacity. Under RIS randomization, the legitimate capacity decreases slightly, whereas Eve’s capacity remains small and nearly constant. This result indicates that randomization converts structured injection signals into noise. Spatial-correlation analysis in Fig. 5 shows that Eve’s capacity without defense increases rapidly and becomes critical as correlation approaches one. In contrast, under RIS randomization the increase is gradual, and the capacity may remain near zero at moderate correlation levels. Table 2 confirms these trends in terms of KRR. The KRR is about 50% without correlation and injection. It increases to about 62.5% when injection is applied but spatial correlation is zero, whereas the defense keeps the value close to random guessing. When spatial correlation and injection power are higher, the KRR exceeds 80%. The proposed defense reduces this value to approximately 57%~66%.  Conclusions  This study examines the dual role of RIS in PLKG security. RIS can increase vulnerability but can also serve as an effective defensive mechanism. By exploiting the correlation between RIS reflection channels and eavesdropping channels, a joint key inference attack is developed that increases the eavesdropper’s key capacity and recovery rate compared with conventional injection attacks. This result reveals a new attack vector in RIS-assisted systems. A channel-randomization PLKG scheme is then proposed by exploiting the dynamic controllability of RIS. The scheme shortens the effective coherence time to a single probing round and decorrelates successive channel realizations from the attacker’s perspective. Theoretical analysis and Monte Carlo simulations show that the proposed scheme converts malicious injection signals into uncorrelated interference, reduces the eavesdropping key capacity, and pushes the eavesdropper’s KRR close to random guessing. This property remains effective even under high SNR, strong spatial correlation, and high injection power. The scheme achieves these security improvements with low hardware overhead compared with reconfigurable antenna-based solutions, because RIS devices are expected to serve as infrastructure elements in future 6G networks. The results provide guidance for the secure design of RIS-assisted PLKG systems and suggest that the controllable characteristics of RIS should be used for both performance improvement and security protection.
Research on Monophonic Speech Separation Method Using Time-Frequency Domain Multi-scale Information Interaction Strategy
LAN Chaofeng, YANG Guotao, CHEN Yingqi, GUO Xiaoxia
Available online  , doi: 10.11999/JEIT251340
Abstract:
  Objective  Monaural speech separation aims to extract individual speaker signals from a single-channel mixture. It is a core technology for addressing the “cocktail party problem” and has substantial application value in low-resource, low-latency scenarios such as mobile voice assistants, teleconferencing, and hearing aids. However, the lack of spatial cues in single-channel signals, together with the substantial overlap of multiple speakers in both time-domain waveforms and frequency-domain spectra, makes accurate separation highly challenging, especially when the integrity and clarity of the target speech must be preserved. Current deep learning-based models often show limitations in three closely related aspects: effective coordination of multi-scale dependencies, efficient fusion of time-frequency information, and control of computational complexity. To address these challenges, a novel Multi-Scale Attention model integrating Time-Frequency domain information (MSA-TF) is proposed to improve separation performance, computational efficiency, and generalization capability.  Methods  The MSA-TF model contains three key components. First, a lightweight Time-Frequency fusion module is designed. The module first divides the frequency band into four subbands on the basis of speech priors, such as low-frequency energy concentration and high-frequency detail sensitivity, to extract spectral features efficiently. A dynamic gating mechanism with decomposed convolutions and SiLU activation is then applied to adaptively enhance speaker-discriminative features and suppress redundant channels associated with noise. Finally, a cross-attention mechanism is used to promote deep interaction between time-domain and frequency-domain features during the encoding stage. Global semantic information from the time domain guides the selection and weighting of useful frequency-domain features, allowing mutual correction and complementarity. This module adds only 0.8 M parameters. Second, a Multi-scale Interaction Separator is proposed to address the limitations of sequential or loosely coupled multi-scale processing in models such as SepFormer. Multi-granularity features, ranging from frame-level F 1 to syllable-level semantic F 4, are extracted through cascaded dilated convolutions. Its core is the “GF-LF Iterative Feedback” mechanism. The Global Flash module, based on efficient FLASH attention, captures long-range dependencies and syllable-level context. This global information is upsampled and injected into local features ( F k) through residual connections. Local Flash modules, also based on FLASH attention, then process the enhanced local features (\begin{document}$ {\boldsymbol{F}}_k^{\prime} $\end{document}) to model fine-grained structures and suppress frame-level noise. The updated local features are subsequently fed back through adaptive pooling to refine the global representation in the next iteration. This closed-loop bidirectional flow enables deep synergy between global semantics and local details. A gated fusion mechanism at the end dynamically balances the contributions of different scales. Third, to control computational complexity, an efficient hierarchical grouped attention mechanism is adopted, reducing the complexity from quadratic to nearly linear with sequence length. The overall MSA-TF architecture is end-to-end and consists of a 1D convolutional encoder, the integrated time-frequency and multi-scale modules, a mask network, and a symmetric decoder.  Results and Discussions  Extensive experiments are conducted on the standard WSJ0-2mix and Libri-2mix datasets, with Scale-Invariant Signal-to-Noise Ratio (SI-SNR) and Signal-to-Distortion Ratio (SDR) used as evaluation metrics. Ablation studies (Table 1) confirm the individual and joint contributions of the proposed modules. When only the time-frequency module is added to the TDAnet baseline, SI-SNR increases by 0.3 dB and SDR by 0.4 dB with only a small increase in parameters, confirming its contribution to signal structure modeling, particularly for high-frequency details. When only the multi-scale interaction module is incorporated, SI-SNR increases by 2.5 dB and SDR by 2.7 dB, highlighting its central role in modeling long-term dependencies. When the time-frequency and multi-scale modules are combined in the complete MSA-TF core, a synergistic effect is obtained, reaching 17.6 dB SI-SNR, which exceeds the sum of the individual gains. This result indicates that the dual-dimensional features provided by time-frequency fusion and the deep dependency modeling enabled by multi-scale interaction strengthen each other. Spectrogram analysis (Fig. 3) further shows that the time-frequency module effectively suppresses residual high-frequency noise and produces clearer spectral contours for the target speech. On the WSJ0-2mix test set (Table 2), MSA-TF achieves state-of-the-art performance, with 17.6 dB SI-SNR and 17.8 dB SDR. It matches the performance of SuperFormer and substantially outperforms strong baselines such as Conv-Tasnet by 2.3 dB SI-SNR, while maintaining a reasonable parameter count of 15.6 M. Compared with models with larger parameter sizes, such as SignPredictionNet at 55.2 M, MSA-TF shows more efficient modeling. For generalization evaluation on the completely unseen Libri-2mix dataset (Table 4), MSA-TF, trained only on WSJ0-2mix, achieves 14.2 dB SI-SNR and 14.7 dB SDR. Its performance is comparable to that of Conv-Tasnet models trained specifically on Libri-2mix, which achieve 14.4 dB SI-SNR, and it outperforms BLSTM-Tasnet trained on Libri-2mix. This strong cross-dataset adaptability indicates that the model captures universal time-frequency characteristics and multi-scale dependency structures in speech signals rather than overfitting to a specific dataset distribution.  Conclusions  An MSA-TF model is presented to address key challenges in monaural speech separation through deep integration of multi-scale time-frequency information interaction. The proposed lightweight Time-Frequency fusion module efficiently supplements time-domain features with discriminative frequency-domain information. The Multi-scale Interaction Separator, with its iterative feedback mechanism, enables dynamic bidirectional information flow across scales and substantially improves the joint modeling of short-term details and long-term dependencies. Combined with an efficient attention design, the model achieves superior performance without excessive computational cost. Experimental results show that MSA-TF achieves leading separation performance on standard benchmarks and shows strong generalization ability on unseen data distributions, confirming the effectiveness of this comprehensive design. The model provides an efficient, robust, and generalizable solution for practical low-resource application scenarios. Future work may examine advanced cross-modal fusion techniques and dynamic scale adjustment strategies to further improve robustness and performance in more complex and variable acoustic environments.
Intelligent Sorting Algorithm for Multi-station Radar Signals Based on Federated Learning
YE Chengji, XIE Jian, ZHANG Zhaolin, WANG Ling
Available online  , doi: 10.11999/JEIT251355
Abstract:
  Objective  Radar signal sorting is a critical step in electronic reconnaissance and battlefield situational awareness. It is used to accurately separate interleaved pulse streams in complex electromagnetic environments. Although multi-station cooperative reconnaissance systems provide spatial diversity gains that can mitigate the parameter ambiguity and aliasing problems of single-station systems, their practical deployment faces major challenges. Traditional centralized processing architectures require massive volumes of raw Pulse Description Word (PDW) data to be transmitted to a central server. This requirement leads to prohibitive communication bandwidth costs and increases the risk of leakage of sensitive electromagnetic spectrum intelligence. In addition, because stations are geographically distributed and differ in antenna scanning patterns, the data collected at different stations often show significant Non-Independent and Identically Distributed (Non-IID) characteristics. Such heterogeneity reduces the generalization ability of local models trained on isolated data islands. To resolve the conflict between data isolation and the need for collaborative intelligence, a multi-station collaborative radar signal sorting method is proposed based on a Federated Learning (FL) framework. Collaborative model training is enabled without exchange of raw data, so that data privacy is preserved, communication overhead is reduced, and sorting robustness is improved in heterogeneous and noisy battlefield environments.  Methods  A centralized federated sorting framework is constructed to coordinate multiple reconnaissance stations. The method contains three main components: feature preprocessing, a lightweight local temporal model, and a heterogeneity-aware aggregation strategy. First, in data preprocessing, the raw PDW parameters, including TOA, CF, and PW, are normalized to address substantial differences in scale. Specifically, TOA is transformed into first-order differential values to extract Pulse Repetition Interval (PRI) information, which prevents numerical saturation and captures periodic patterns effectively (Fig. 3). Second, a local time-series sorting model is designed for the resource constraints of edge devices. A bidirectional Long Short-Term Memory (LSTM) network is used as the backbone to capture long-range dependencies and dynamic patterns in pulse sequences from both forward and backward directions. To accelerate convergence and prevent gradient vanishing, residual connections are added to fuse static and dynamic features. The extracted features are then mapped to the radiation source category space through a cascaded linear classification layer. Third, to address model drift caused by Non-IID data, including feature distribution shift and label distribution shift, a new aggregation strategy is proposed based on parameter decomposition and proximal regularization. Model parameters are decoupled into a feature extractor and a classifier. During federated aggregation, only the parameters of the generic feature extractor are uploaded and globally averaged, whereas the personalized classifier parameters are retained locally to adapt to the class distribution of each station. Furthermore, a proximal regularization term is added to the local loss function (Eq. 20). This constraint limits the deviation of local updates from the global model and ensures that the optimization direction does not diverge substantially because of local data heterogeneity, thereby improving the stability and convergence speed of the global model.  Results and Discussions  Extensive simulation experiments are conducted on core datasets with 3 stations and 5 radars, and on extended datasets with 9 stations and 12 radars, including complex modulation patterns such as jitter, sliding, and staggering. Quantitative analysis shows that the proposed method achieves sorting performance comparable to that of Centralized Learning (CL). On the core dataset, the Precision, Recall, and F1-score of the proposed method reach 96.51%, 96.35%, and 96.42%, respectively, exceeding those of FedAvg by approximately 0.67% in F1-score. On the more challenging extended dataset, the performance advantage becomes more significant, with an F1-score improvement of 3.86% over FedAvg (Table 4). These results indicate that the parameter decomposition strategy effectively balances common feature learning with personalized decision-making. Analysis by class further shows that, for categories that are difficult to distinguish, such as Radar 7 and Radar 10, the proposed method improves recognition accuracy by up to 15% and 6%, respectively, compared with FedAvg (Fig. 7 and Fig. 8). Robustness tests further demonstrate the adaptability of the method. When the number of participating stations increases from 3 to 9 (Fig. 9), the F1-score rises steadily from 73.53% to 83.75%. This result confirms that enlarging node scale in the FL framework produces collaborative gains through more diverse samples and reduced geographic statistical heterogeneity, which substantially improve model generalization and robustness. Under severe class skew conditions, the method maintains an F1-score above 80% on the core dataset (Fig. 10 and Fig. 11). Furthermore, under extreme electromagnetic conditions characterized by high pulse loss rates of 70% and spurious pulse rates of 70%, the model maintains sorting performance above 75%, which demonstrates strong robustness against noise and interference (Fig. 12).  Conclusions  An FL-based framework is proposed for multi-station collaborative radar signal sorting to address data privacy and transmission constraints in distributed reconnaissance. By integrating a lightweight LSTM with a heterogeneity-aware aggregation mechanism, the method effectively captures temporal pulse features and mitigates model drift caused by Non-IID data. Experimental results verify that the approach achieves accuracy comparable to that of centralized methods and shows superior robustness under label skew and severe data degradation, including high pulse loss and spurious pulse rates. This study provides a privacy-preserving and efficient solution for intelligent signal processing in distributed electronic warfare systems.
Dynamic Scale Perception-Driven Multi-UAV Collaborative 3D Object Detection Method
DUAN Shujing, WANG Zhirui, CHENG Peirui, FU Kun
Available online  , doi: 10.11999/JEIT251378
Abstract:
  Objective  Multi-UAV collaborative 3D object detection is a core technology for low-altitude intelligent perception, and the Bird’s-Eye View (BEV) feature representation paradigm provides support for global spatial consistency. However, in practical UAV remote-sensing scenarios, targets are extremely small, sparsely distributed, and embedded in a large proportion of background regions. Existing Transformer-based BEV perception methods adopt a homogeneous full-image feature-processing strategy. This strategy not only wastes computing resources because of excessive computation in large background areas, but also tends to dilute small-target features with background noise, making it difficult to balance computational efficiency and detection accuracy. Meanwhile, multi-UAV collaboration requires cross-device information interaction to achieve view complementarity and information gain, but this process is prone to redundant information and even feature conflicts. Traditional fixed-weight aggregation methods cannot accurately identify effective information or suppress redundancy, resulting in poor consistency of global BEV features and reduced collaborative detection accuracy. Therefore, the development of a detection network that is adaptive to multi-UAV aerial scenarios is of clear practical value.  Methods  A dynamic scale-aware detection network is proposed for efficient and accurate 3D object detection through two core modules: the Dynamic Scale-aware BEV Generation (DSBG) module and the Adaptive Collaborative BEV-Feature Aggregation (ACFA) module. The network establishes an end-to-end pipeline of “multi-view image input-dynamic scale adaptive feature encoding-BEV space 3D detection” (Fig. 1). First, the observed images collected by each UAV are processed independently by a parameter-sharing ResNet-50 backbone network to generate feature maps with a consistent structure. The DSBG module then takes these feature maps as input, calculates the amplitude of feature responses in each spatial region through the Local Scale-Aware Unit, and estimates the target distribution. On this basis, differentiated BEV grid encoding is dynamically allocated: high-resolution dense grids are assigned to high-response target regions to preserve fine-grained features, whereas low-resolution sparse grids are assigned to low-response background regions to reduce invalid computation. At the same time, target query vectors with spatial position priors are generated. The ACFA module receives the multi-resolution BEV features generated by the DSBG module, concatenates the dual-resolution features from different UAVs in the channel dimension, upsamples the low-resolution features to align them with the high-resolution features, models the local correlations of two-scale features through 3*3 convolution, and obtains a globally consistent BEV feature map through element-wise weighted summation. Finally, the global BEV features are fed into the DETR decoder for 3D target prediction, with Focal Loss used for classification and Smooth L1 Loss used for regression (Eqs. 5\begin{document}$ \sim $\end{document}6).  Results and Discussions  Extensive experiments are conducted on two public multi-UAV collaborative simulation datasets, AeroCollab3D and Air-Co-Pred. The results show that the proposed method achieves strong performance on both datasets. Compared with current state-of-the-art methods and baseline models, it not only improves mean Average Precision (mAP) by up to 7.2 percentage points, but also substantially reduces key evaluation metrics, including mean size error by more than 48%, mean localization error, and mean orientation error. In particular, clear advantages are observed in small-target detection and fine-grained category recognition, with pedestrian detection accuracy improved by nearly 10 percentage points. Ablation experiments verify the effectiveness of both the DSBG and ACFA modules. The proposed method steadily improves detection accuracy while significantly reducing computational cost by up to 41.6%, thereby achieving coordinated optimization of accuracy and efficiency. Visualization results (Fig. 3) show that the predicted bounding boxes have higher spatial alignment with the ground truth, effectively alleviating the common problems of target overlap and missed detection in traditional methods. Fig. 4 further illustrates the technical advantages of multi-UAV collaborative detection. Even for targets occluded by obstacles, the proposed method achieves efficient detection, thereby enhancing the comprehensive perception capability of the global region.  Conclusions  A dynamic scale-aware detection network is proposed for multi-UAV collaborative 3D object detection to address the core challenges of the efficiency-accuracy tradeoff and poor feature consistency in traditional methods. The DSBG module achieves dynamic matching between the BEV encoding scale and target distribution, thereby reducing redundant computation, whereas the ACFA module improves multi-scale and multi-view feature aggregation to ensure global feature consistency and accuracy. Experimental results on two datasets confirm that the proposed method outperforms existing advanced methods in detection accuracy, computational efficiency, and robustness. Future work will focus on optimizing dynamic scale-adjustment strategies with temporal information and exploring multi-sensor fusion with lightweight LiDAR data to improve detection stability in complex scenarios.
Construction Methods of Two-Dimensional Golay-Zero Correlation Zone Array Sets with Flexible Parameters
WANG Meiyue, LIU Tao, CHEN Xiaoyu, LI Yubo
Available online  , doi: 10.11999/JEIT251360
Abstract:
  Objective  Sequences with good correlation properties are widely used in wireless communications, cryptography, and radar systems. However, a sequence set cannot simultaneously achieve ideal autocorrelation and ideal cross-correlation. This limitation has led to the study of two signal classes with ideal correlation properties: Zero Correlation Zone(ZCZ) sequences and Golay Complementary Sets(GCS). A Golay-ZCZ sequence set combines the advantages of both. Its constituent sequences exhibit ideal periodic autocorrelation and cross-correlation within the ZCZ, and the sums of their aperiodic autocorrelations are zero at all nonzero shifts. Therefore, a Golay-ZCZ set is both a ZCZ set and a GCS. It can thus be used in the applications of both sequence classes. An array set is a two-dimensional extension of a sequence set. Although Golay-ZCZ sequence sets have been widely studied and constructed, research on Two-Dimensional (2D) Golay-ZCZ array sets remains limited. This study proposes three constructions of 2D Golay-ZCZ array sets based on 2D multivariable functions and the concatenation operator. These array sets can be used as precoding matrices for massive Multiple Input Multiple Output(MIMO) omnidirectional transmission.  Methods  Three construction methods for 2D Golay-ZCZ array sets are proposed, including one direct construction and two indirect constructions. The resulting parameters have not been reported in existing studies. In the first construction, a 2D Golay-ZCZ array set is generated using 2D multivariable functions, with parameters expressed as prime powers. This direct function-based approach enables efficient synthesis of the target arrays. The second and third constructions generate 2D Golay-ZCZ array sets through horizontal and vertical concatenation of Two-Dimensional Complete Complementary Codes(2D CCC), respectively. In these indirect constructions, the parameters are not restricted to prime powers. This property broadens the applicability of the methods and increases parameter flexibility.  Results and Discussions  The first construction generates a 2D Golay-ZCZ array set with array size \begin{document}$ p_{1}^{{m}_{1}}\times p_{2}^{{m}_{2}} $\end{document} and ZCZ size \begin{document}$ ({p}_{1}-1)p_{1}^{{\pi }_{1}(2)-1}\times ({p}_{2}-1)p_{2}^{{\sigma }_{1}(2)-1} $\end{document} through a direct function-based method, where \begin{document}$ {p}_{1} $\end{document} and \begin{document}$ {p}_{2} $\end{document} are prime numbers. For clarity, the magnitudes of the 2D periodic cross-correlation function of the constructed array set are illustrated in Example 1 (Fig. 1). The second construction generates a ZCZ array set with array size \begin{document}$ {L}_{1}\times {N}^{2}{L}_{2} $\end{document} and ZCZ size \begin{document}$ ({L}_{1}-1)\times (N-1){L}_{2} $\end{document} based on the horizontal concatenation of \begin{document}$ (N,N,{L}_{1},{L}_{2}) $\end{document} 2D CCC. The third construction generates a ZCZ array set with array size \begin{document}$ {N}^{2}{L}_{1}\times {L}_{2} $\end{document} and ZCZ size \begin{document}$ (N-1){L}_{1}\times ({L}_{2}-1) $\end{document} based on the vertical concatenation of \begin{document}$ (N,N,{L}_{1},{L}_{2}) $\end{document} 2D CCC. An illustrative example of Construction 2 is provided, and the corresponding correlation magnitudes are shown in (Figs. 2 and 3). As summarized in (Table 1), the construction methods proposed in this paper generate parameter sets that have not been reported in the existing literature. The constructed array sets provide considerable flexibility in array dimensions and ZCZ sizes. This flexibility is valuable for the design of precoding matrices in MIMO omnidirectional transmission systems. In practical implementations, the dimension of a precoding matrix is typically determined by the number of transmit antennas, whereas the ZCZ size must match the maximum multipath delay spread of the channel. Owing to this parameter flexibility, the proposed 2D Golay-ZCZ array sets support adaptive selection under different antenna configurations and channel conditions.  Conclusions  Three construction methods for 2D Golay-ZCZ array sets are proposed. These methods generate array sets with flexible array sizes and large ZCZ widths. The first construction is based on a 2D multivariable function and can include previous results as special cases without using kernels. The second and third constructions rely on the concatenation operator and provide greater parameter flexibility. The proposed 2D Golay-ZCZ arrays have potential applications in MIMO omnidirectional transmission. The parameter-flexible array sets can be selected according to different antenna configurations and channel conditions. This property suppresses multi-antenna interference within the zero-correlation zone and maintains uniform transmitted energy.
PSAQNet: A Perceptual Structure Adaptive Quality Network for Authentic Distortion Oriented No-reference Image Quality Assessment
JIA Huizhen, ZHAO Yuxuan, FU Peng, WANG Tonghan
Available online  , doi: 10.11999/JEIT251220
Abstract:
  Objective  No-Reference Image Quality Assessment (NR-IQA) is critical for practical imaging systems when pristine reference images are unavailable. However, many existing methods face three major challenges: limited robustness under complex distortions, weak generalization when distortion distributions shift (e.g., from synthetic to real-world settings), and insufficient modeling of geometric or structural degradations such as spatially varying blur, misalignment, and texture-structure coupling. These limitations cause models to rely excessively on dataset-specific statistics and reduce their effectiveness when applied to diverse scenes with mixed degradations. To address these issues, the Perceptual Structure Adaptive Quality Network (PSAQNet) is proposed to improve the accuracy and adaptability of NR-IQA under complex distortion conditions.  Methods  PSAQNet is designed as a unified CNN-Transformer framework that preserves hierarchical perceptual cues and supports global context reasoning. Instead of relying on late-stage pooling, distortion evidence is progressively enhanced throughout the network. The architecture contains several key components. The Advanced Distortion Enhanced Module (ADEM) operates on multi-scale features extracted from a pre-trained backbone. It adopts multi-branch gating and a distortion-aware adapter to emphasize degradation-related signals and reduce interference from dominant image content. This mechanism dynamically selects feature branches that correspond to perceptual degradation patterns, which is beneficial for spatially non-uniform or mixed distortions. To model geometric degradations, PSAQNet integrates Spatial-Guided Convolution (SGC) and Channel-Aware Adaptive Kernel convolution (CA_AK). SGC improves spatial sensitivity by guiding convolutional responses with structure-aware cues and focusing on regions where geometric distortions are prominent. CA_AK further improves geometric modeling by adaptively adjusting receptive behavior and recalibrating channels to preserve distortion-sensitive components. Additionally, PSAQNet incorporates efficient feature fusion strategies. Group Convolutional Block Attention Module (GroupCBAM) enables lightweight attention-based fusion of multi-level CNN features, whereas AttInjector selectively injects local distortion cues into global Transformer representations. This design allows global semantic reasoning to be guided by localized degradation evidence without introducing redundancy or instability.  Results and Discussions  Extensive experiments on six benchmark datasets containing both synthetic and real-world distortions demonstrate that PSAQNet achieves strong performance and stable agreement with human subjective judgments. The proposed method outperforms several recent approaches, particularly on real-world distortion datasets. These results indicate that PSAQNet effectively enhances distortion evidence, models geometric degradation, and integrates local distortion cues with global semantic representations. Such capabilities improve robustness under distribution shifts and reduce reliance on narrow distortion priors. Ablation studies confirm the contribution of each module. ADEM increases distortion saliency, SGC and CA_AK improve sensitivity to geometric degradations, and GroupCBAM and AttInjector strengthen the interaction between local and global features. Cross-dataset evaluations further demonstrate the generalization capability of PSAQNet across different content categories and distortion types. Scalability experiments also show that the framework benefits from stronger pretrained backbones without compromising its modular design.  Conclusions  PSAQNet addresses several key limitations in NR-IQA by integrating local distortion enhancement, geometric-aware feature modeling, and global semantic fusion within a unified framework. The modular architecture improves robustness and generalization across diverse distortion conditions and supports practical deployment in real-world scenarios. Future work will explore vision–language pre-training to improve cross-scene adaptability.
A Lightweight and High-Reliability Challenge Generation Strategy for APUF
LAN Guohao, ZHANG Hui, DUO Bin, WANG Zibin, ZHOU Rang, LI Dongfen
Available online  , doi: 10.11999/JEIT251073
Abstract:
  Objective  The Arbiter Physical Unclonable Function (APUF) is a lightweight security primitive that has been widely adopted in identity authentication and key generation for resource-constrained devices. However, its response consistency is highly sensitive to environmental perturbations, leading to inconsistent responses for the same challenge under different conditions, severely undermining the reliability of APUF-based security systems. Existing reliability improvement schemes for APUF, which mainly rely on hardware modification or challenge screening, generally suffer from high resource overhead and low efficiency. To address the limitations of these existing solutions, a Delay-Constrained Challenge Generation Strategy (DCGS) is proposed to enhance APUF reliability without extra hardware overhead or screening-related inefficiencies.  Methods  The core of DCGS lies in modeling APUF path delay properties and constructing challenges with constrained delay differences to ensure response stability. First, a logistic regression (LR) model is established to characterize the relationship between APUF challenge bits and path delays. From the trained LR model, a delay weight vector is derived to quantify the contribution of each challenge bit to the overall path delay. Second, a two-stage challenge generation mechanism is designed to integrate delay constraint control: The first stage is prefix bit initialization, which generates distinct prefix sequences to establish a stable delay baseline for subsequent bit extension. The second stage is bit-wise extension, where each remaining challenge bit is dynamically determined based on the delay weight vector. During this extension process, the cumulative delay difference of the challenge is monitored in real time, ensuring it stays within a preset threshold range. Unlike traditional screening methods that post-process candidate challenges, DCGS directly generates stable challenges by design, eliminating the need for candidate pools and improving generation efficiency.  Results and Discussions  Performance evaluations of DCGS are conducted under varying noise intensities. At a noise intensity of 0.3 (maximum practical level), the reliability of DCGS-generated challenges remains at 100% (Fig.2). In terms of generation efficiency, DCGS consumes only 0.017 seconds to generate 10,000 challenges (Table 4). For response uniformity, DCGS achieves a value of 50.02% (Table 4). For uniqueness, it reaches 50.46% (Table 4). These two key metrics are both close to the ideal theoretical value of 50%. Security analysis shows that the average bit entropy of DCGS-generated challenges is 0.9807 (Fig.3), and the conditional entropy is 0.9878—only 0.0023 lower than that of random challenges (0.9901).  Conclusions  This paper proposes a delay-constrained challenge generation strategy for APUF, aiming to address the problems of inconsistent responses, low generation efficiency, and high hardware resource consumption of traditional schemes in high-noise environments. By modeling the path delay characteristics of APUF using LR and integrating a prefix initialization mechanism with a bit-wise extension mechanism, the strategy ensures that the generated challenges meet the preset delay difference threshold range. Through this method, the DCGS achieves high reliability, high efficiency, and good response uniformity without increasing hardware overhead. Experimental results show that DCGS can effectively enhance the reliability of APUF in complex environments, providing strong technical support for secure applications in resource-constrained devices.
Review of Non-invasive Brain–Computer Interfaces for Continuous Motor Control
XU Minpeng, JIA Leyi, ZHOU Xiaoyu, CHEN Enze, WANG Junyang, XIAO Xiaolin, MING Dong
Available online  , doi: 10.11999/JEIT260011
Abstract:
  Significance   Continuous motor control is a fundamental capability for brain–computer interface (BCI) systems aiming at natural and efficient interaction with external devices. Compared with discrete command-based control, continuous control enables real-time and smooth regulation of motion parameters such as position, velocity, and trajectory, which is essential for applications including assistive mobility, neuro rehabilitation, robotic manipulation, and immersive human–machine interaction. Although invasive BCI s have demonstrated high-performance continuous control benefiting from high-quality neural recordings, their reliance on surgical implantation restricts long-term use and large-scale deployment. Therefore, a systematic review of non-invasive continuous motor control BCI technologies is necessary to clarify research progress, methodological characteristics, and remaining challenges.  Progress   This review summarizes advances in non-invasive continuous motor control BCIs from four closely related aspects: control paradigms, decoding algorithms, application scenarios, and performance evaluation. At the paradigm level, motor imagery, steady-state visual evoked potentials, event-related potentials, and hybrid paradigms have been investigated to support continuous control through sustained intention modulation, dynamic stimulus encoding, and hierarchical or shared-control strategies. Regarding decoding algorithms, two major frameworks are identified: motion parameter mapping methods and motion parameter regression methods. Motion parameter mapping methods achieve continuous output by temporally integrating discrete classification results or mapping them to velocity or state variables, whereas motion parameter regression methods directly establish relationships between EEG features and continuous kinematic parameters. In recent studies, nonlinear models and deep learning approaches have been increasingly incorporated to improve robustness under non-stationary EEG conditions. At the application level, non-invasive continuous control has evolved from two-dimensional cursor tasks to more practical scenarios such as wheelchair navigation, robotic arm manipulation, unmanned systems, and virtual or augmented reality environments. In addition, existing studies evaluate continuous control performance using both objective metrics (e.g., trajectory error, task success rate, and information transfer rate) and subjective measures (e.g., workload and user experience), reflecting diverse experimental designs and control objectives.  Conclusions  Overall, existing studies demonstrate that non-invasive BCIs are capable of supporting continuous motor control; however, current research remains at a stage where diverse methods coexist without a unified framework. At the paradigm level, different approaches vary in their ability to reliably elicit and sustain continuous motor intentions. In terms of decoding algorithms, both motion parameter mapping and regression methods face limitations in robustness, generalization, and long-term stability due to the non-stationary nature of EEG signals. At the application level, many studies are still constrained to specific tasks and controlled environments, and the transferability of continuous control strategies to complex real-world scenarios requires further validation. Moreover, the lack of standardized evaluation protocols hinders direct comparison and systematic optimization across studies.  Prospects   Future research should focus on improving the stability and reliability of continuous control paradigms, enhancing decoding robustness under realistic EEG conditions, and strengthening the alignment between control strategies and application requirements. Establishing unified evaluation frameworks that integrate both objective and subjective indicators will be critical for methodological convergence and fair comparison. With continued advances, non-invasive continuous motor control BCIs are expected to play an increasingly important role in assistive technologies, rehabilitation systems, and advanced human–machine interaction.
Rotatable-Antenna-Aided Near-Field Wideband Integrated Sensing and Communication Systems: Hybrid Beamforming Design
XU Hongbo, MO Minghui, XIN Wei, WANG Shuli, WANG Ji, LI Xingwang, ZHENG Le
Available online  , doi: 10.11999/JEIT260023
Abstract:
  Objective  With the rapid evolution of sixth-generation (6G) mobile communication systems, integrated sensing and communication (ISAC) has emerged as a key enabling paradigm for simultaneously supporting high-precision sensing and high-rate data transmission under limited spectrum resources. In near-field wideband scenarios, however, ISAC systems suffer from several fundamental challenges, including pronounced near-field effects, and wideband beam splitting. These impairments significantly degrade both communication throughput and sensing reliability, especially when conventional fixed-orientation antenna arrays and phase-shifter-based beamforming architectures are employed. Due to their limited spatial adaptability and inherent frequency-independent characteristics, traditional architectures are unable to fully exploit the spatial–frequency degrees of freedom available in near-field wideband channels. Therefore, it is of great importance to develop a new antenna architecture and beamforming framework that can effectively mitigate beam splitting, enhance energy focusing capability, and maintain robustness across wide bandwidths. To address these challenges, a rotatable-antenna-assisted near-field wideband ISAC architecture is investigated, aiming to improve system sum-rate performance while satisfying sensing-related constraints.  Methods  A novel near-field wideband ISAC system architecture assisted by rotatable antennas (RAs) is proposed. By introducing mechanically or electronically adjustable antenna boresight directions, additional angular degrees of freedom are provided at the antenna element level, enabling flexible spatial coverage and adaptive energy focusing. Furthermore, a TTD-based hybrid beamforming architecture is adopted, which provides frequency-dependent phase shifts in the frequency domain to compensate for the frequency-independent characteristics of conventional phase shifters, thereby ensuring consistent beam focusing across all subcarriers and effectively suppressing wideband beam splitting. Based on a spherical-wave near-field channel model that explicitly incorporates propagation distance, angular information, and the orientation gain of rotatable antennas—thereby allowing the array response to depend jointly on both angle and distance and overcoming the limitations of the planar-wave assumption—a joint optimization problem is formulated to maximize the system sum rate, while simultaneously considering transmit power constraints, sensing power thresholds, and physical limitations on antenna rotation angles. To address the formulated non-convex optimization problem, a penalty-based fully digital approximation (PBFDA) algorithm is developed. In each iteration, the orientations of the rotatable antennas are first optimized using a particle swarm optimization (PSO) method to enhance the weighted channel gain. Then, with the antenna orientations fixed, a reduced-dimensional formulation combined with successive convex approximation (SCA) is employed to solve the fully digital beamforming problem. Finally, a block coordinate descent (BCD) algorithm based on manifold optimization is adopted to jointly optimize the analog beamformer, digital beamformer, and TTD units, thereby progressively approximating the fully digital solution, with the three components iteratively updated until convergence is achieved (Algorithm 1–Algorithm 4).  Results and Discussions  Simulation results demonstrate the effectiveness and superiority of the proposed RA-assisted near-field wideband ISAC framework. The convergence behavior of the proposed penalty-based fully digital approximation (PBFDA) optimization algorithm indicates that the objective function monotonically increases and stabilizes within a limited number of iterations, confirming its numerical stability and efficiency (Fig. 2). Compared with conventional fixed-antenna architectures, the proposed RA-based scheme achieves a substantial improvement in system sum rate under the same transmit power constraints (Fig. 3). Furthermore, the impact of system bandwidth on spectral efficiency is investigated. As the system bandwidth increases, TTD-based hybrid beamforming schemes experience weakened frequency-dependent compensation capability due to the limited number of TTD units and the constrained maximum delay, which exacerbates wideband beam splitting and leads to a degradation in spectral efficiency. In contrast, the optimal fully digital beamforming approach enables accurate control over each subcarrier, rendering its spectral efficiency basically not varying with bandwidth (Fig. 4). The trade-off between communication performance and sensing power is also evaluated. As the sensing power threshold increases, the achievable sum rate decreases for all schemes, while the proposed method consistently outperforms the others (Fig. 5). The effects of antenna array size, antenna directivity factor, and maximum rotation angle are further investigated. Increasing the number of antennas improves spectral efficiency due to higher array gain, with the RA-based system consistently outperforming benchmark schemes (Fig. 6). As the antenna directivity factor increases, the RA system leverages adaptive orientation to focus energy toward desired users, achieving continuous performance gains, whereas fixed-orientation and isotropic schemes degrade (Fig. 7). Moreover, enlarging the allowable rotation range provides greater spatial alignment flexibility and further improves system performance (Fig. 8). Overall, the results demonstrate that the proposed architecture enhances near-field energy focusing and achieves performance close to fully digital beamforming with lower hardware complexity.  Conclusions  A rotatable-antenna-assisted near-field wideband ISAC system with a TTD-based fully connected hybrid beamforming architecture is investigated. By jointly exploiting antenna rotation and true time delay, the proposed framework effectively mitigates near-field effects and wideband beam splitting. A penalty-based fully digital approximation (PBFDA) optimization algorithm is developed to address the resulting highly non-convex problem. Numerical results demonstrate that the proposed scheme significantly improves system sum rate under sensing constraints and approaches the performance of fully digital beamforming, validating its effectiveness for near-field wideband ISAC applications.
Vision-Guided and Force-Controlled Method for Robotic Screw Assembly
ZHANG Chunyun, MENG Xintong, TAO Tao, ZHOU Huaidong
Available online  , doi: 10.11999/JEIT251193
Abstract:
  Objective  With the rapid development of intelligent manufacturing and industrial automation, robots are increasingly applied to high-precision assembly tasks, especially screw assembly. However, current systems still face several challenges. The pose of assembly objects is often uncertain, which makes initial localization difficult. Small features such as threaded holes are blurred and difficult to identify accurately. Conventional vision-based open-loop control may also cause assembly deviation or jamming. This study proposes a vision–force cooperative method for robotic screw assembly. The method establishes a closed-loop assembly system that covers coarse positioning and fine alignment. A semantic-enhanced 6D pose estimation algorithm and a lightweight hole detection model are used to improve perception accuracy. Force-feedback control then adjusts the end-effector posture dynamically. This approach improves the accuracy and stability of screw assembly.  Methods  The proposed screw-assembly method is based on a vision–force cooperative strategy that forms a closed-loop process. In the visual perception stage, a semantic-enhanced 6D pose estimation algorithm addresses disturbances and pose uncertainty in complex industrial environments. During initial pose estimation, Grounding DINO and SAM2 generate pixel-level masks that provide semantic priors for the FoundationPose module. In the continuous tracking stage, semantic cues from Grounding DINO support translational correction. To detect small threaded holes, an improved lightweight hole detection algorithm based on NanoDet is designed. It uses MobileNetV3 as the backbone and adds a CircleRefine module in the detection head to estimate hole centers precisely. In the assembly positioning stage, a hierarchical vision-guided strategy is used. The global camera performs coarse positioning for overall guidance, while the hand–eye camera conducts local correction using hole detection results. In the closed-loop assembly stage, force-feedback control adjusts the posture to achieve accurate alignment between the screw and the threaded hole.  Results and Discussions  The method is validated experimentally in robotic screw assembly scenarios. The improved 6D pose estimation algorithm reduces the average position error by 18% and the orientation error by 11.7% compared with the baseline (Table 1). The tracking success rate in dynamic sequences increases from 72% to 85% (Table 2). For threaded hole detection, the lightweight NanoDet-based algorithm is evaluated on a dataset collected from assembly environments. It achieves 98.3% precision, 99.2% recall, and 98.7% mAP (Table 3). The model size is 11.7 MB and the computational cost is 2.9 GFLOPs, which are both lower than most benchmark models while maintaining high accuracy. A circular branch is introduced to fit hole edges (Fig. 8), providing accurate center predictions for visual guidance. Under different inclination angles (Fig. 10), the assembly success rate remains above 91.6% (Table 4). For screws of different sizes (M4, M6, and M8), the success rate remains above 90% (Table 5). Under small external disturbances (Fig. 12), the success rates reach 93.3%, 90%, and 83.3% for translational, rotational, and mixed disturbances, respectively (Table 6). Force-feedback comparison experiments show that the success rate is 66.7% under visual guidance alone. With force-feedback control, the rate increases to 96.7% (Table 7). The system maintains stable performance throughout complete screw-assembly cycles and achieves an average cycle time of 9.53 s (Table 8), meeting industrial assembly requirements.  Conclusions  This study presents a vision–force cooperative method that addresses key challenges in robotic screw assembly. The approach enhances target localization accuracy through a semantic-enhanced 6D pose estimation algorithm and a lightweight threaded hole detection network. The integration of hierarchical vision guidance and force-feedback control enables precise alignment between screws and threaded holes. Experimental results show that the method ensures reliable assembly under varied conditions, providing a practical solution for intelligent robotic assembly. Future work will focus on adaptive force control, multimodal perception fusion, and intelligent task planning to further improve generalization and self-optimization in complex industrial environments.
Crosstalk-Free Frequency-Spin Multiplexed Multifunctional Device Realized by Nested Meta-Atoms
ZHANG Ming, DONG Peng, TAO En, YANG Lin, HAN Qi, HE Yuhang, HOU Weimin, LI Kang
Available online  , doi: 10.11999/JEIT251202
Abstract:
  Objective  To address high fabrication costs and signal crosstalk in existing multidimensional multiplexed metasurfaces, a crosstalk-free, frequency-spin multiplexed single-layer metasurface based on nested bi-spectral meta-atoms is proposed. Two C-shaped split-ring resonators are physically superimposed to target the Ku band (12.5 GHz) and the K band (22 GHz). This configuration enables four fully independent information channels, defined by two frequencies and two spin states, without spatial division or multilayer stacking. The objective is to demonstrate independent, high-performance vortex beam generation and holographic imaging, providing a simplified and cost-effective solution for advanced 6G communication and sensing systems.  Methods  A reflective metal–dielectric–metal metasurface architecture is adopted, in which each unit cell integrates an Outer C-Shaped Split-Ring Resonator (OCSRR) and an Inner C-Shaped Split-Ring Resonator (ICSRR). Parameter sweeps performed using CST Microwave Studio are used to select structures that provide high cross-polarization conversion at the target frequencies while maintaining negligible responses in non-target bands. Independent spin multiplexing is achieved through the combined use of transmission phase and geometric phase, controlled by resonator rotation. Two prototypes are fabricated using printed circuit board technology. MS1 is designed for focused vortex beam generation with topological charges l = +1, +2, +3, and +4, whereas MS2 is designed for holographic imaging of the letters “H”, “B”, “K”, and “D”. Device performance is validated by near-field scanning measurements under oblique incidence using a vector network analyzer.  Results and Discussions  Simulation and experimental results confirm strong frequency selectivity and effective spin decoupling enabled by the nested meta-atom design. The OCSRR and ICSRR dominate the electromagnetic responses at 12.5 GHz and 22 GHz, respectively, and exhibit linear superposition behavior with minimal crosstalk. MS1 generates four focused vortex beams with clearly separated topological charges, achieving an average mode purity of 88.25%. MS2 reconstructs four independent and well-defined holographic images with high channel isolation. The close agreement between measured and simulated results demonstrates the robustness of the device and validates the effectiveness of the crosstalk-free design strategy under practical illumination conditions.  Conclusions  A reliable approach for realizing crosstalk-free frequency-spin multiplexed metasurfaces using nested meta-atoms is demonstrated. Simultaneous and independent manipulation of electromagnetic waves across four channels is achieved on a single metasurface layer, substantially reducing design complexity and fabrication cost. The successful demonstration of multi-channel vortex beam generation and holographic imaging indicates strong potential for integrated multifunctional applications in next-generation wireless communication and optical systems.
Routing and Resource Scheduling Algorithm Driven by Mixture of Experts in Large-scale Heterogeneous Local Power Communication Network
JING Chuanfang, ZHU Xiaorong
Available online  , doi: 10.11999/JEIT251176
Abstract:
  Objective  Emerging power services, such as distributed energy consumption, place stringent performance requirements on Large-Scale Heterogeneous Local Power Communication Networks (LHLPCNs). Limited communication resources and increasing service demands make it challenging to provide on-demand services and improve network capacity while ensuring Quality of Service (QoS). Conventional routing and resource scheduling algorithms based on optimization or heuristics depend on precise mathematical models and parameters, and their computational cost increases as network size and variables grow. These limitations reduce their adaptability to expanding power application scenarios. Advances in Mixture-of-Experts (MoE) frameworks offer a promising direction because they reduce the need to train task-specific models by using an ensemble of specialized AI experts. Motivated by these challenges, this study proposes an MoE-based routing and resource scheduling algorithm (RASMoE) for LHLPCNs integrating High-Power Line Carrier (HPLC) and Radio Frequency (RF). RASMoE is designed to meet personalized QoS requirements and support more power services within limited resources.  Methods  An optimization problem that minimizes the difference between QoS supply and demand in LHLPCNs is formulated as a 0–1 integer linear programming model considering multimodal links, channels, and modulation methods. To solve this NP-hard problem, a new MoE framework comprising expert networks and gated networks is designed. The framework supports personalized service requirements in terms of data rate, delay, and reliability, while improving convergence. The expert networks include shared and QoS-specific experts that generate optimal next hops and compute allocation strategies for links, channels, and modulation modes between node pairs. The gated networks dynamically combine and reuse these experts to support known and unforeseen service types. Extensive comparative experiments are conducted, and RASMoE shows improved resource utilization, reduced delay, and higher reliability relative to multiple baselines.  Results and Discussions  The performance supply-demand differences of five algorithms under varying service numbers are compared (Fig. 3). RASMoE consistently achieves the smallest differences across scenarios due to its gating network, which combines QoS-specific experts to align resource allocation with service requirements. Because control and compute-intensive services have strict delay requirements, their average End-to-End (E2E) latency under different service numbers is evaluated (Fig. 4). The proposed algorithm achieves the lowest average E2E latency because its GAT-enhanced expert networks extract node load states and interact with the network environment in real time through a Multi-Armed Bandit (MAB) mechanism. This supports adaptive allocation strategies. The average reliability of E2E paths for different numbers of control, compute-intensive, and acquisition services is also illustrated (Fig. 5).  Conclusions  This study proposes a MoE-driven routing and resource scheduling algorithm for LHLPCNs. The framework integrates expert networks and a gating network. The expert networks include GAT-based shared experts for E2E path selection and MAB-based QoS-specific experts for adaptive allocation of links, channels, and modulation schemes according to QoS demands and link states. The gated networks orchestrate and reuse these experts to support services with single or multiple QoS requirements, including previously unseen service types. Theoretical analysis shows that the method improves resource utilization in LHLPCNs, with notable advantages in multi-service scenarios characterized by diverse QoS demands. Future work will examine integrating the MoE framework with domain-specific models, including power load forecasting and predictive analytics, to enhance the use of renewable energy sources.
A Fast and Accurate Programming Strategy for Analog In-Memory Computing Validated With a Transposable RRAM Macro and 0.64% Fully-Parallel RMS Error
XIE Lifan, WEI Songtao, YAO Peng, WU Dong, TANG Jianshi, QIAN He, GAO Bin, WU Huaqiang
Available online  , doi: 10.11999/JEIT251174
Abstract:
  Objective  Non-Volatile Memory (NVM)-based Compute-in-Memory (CIM) is considered a promising candidate for next-generation artificial intelligence accelerators because of its high energy efficiency and instant wake-up capability. However, the conventional Write-and-Verify (W&V) scheme cannot satisfy the speed and precision requirements of highly parallel CIM macros. The main limitation arises from the inefficient verification stage. Cell-by-cell reading must be repeated for the entire array, which significantly increases programming time. In addition, switching from the verify state, where only one row is active, to the compute state, where all rows are active, introduces systematic errors such as reference drift and IR-drop-induced weight inaccuracy. Analog CIM macros with on-chip programming must also tolerate large and non-uniform offsets under massive parallel operation. This work proposes three techniques: (1) a Back-Propagation-Assisted Programming (BPAP) scheme that rapidly and accurately locates failing cells without full-array verification; (2) an Analog-domain Offset-Canceling Structure (AOSC) that compensates channel-wise offsets in situ; and (3) a transposable Resistive Random-Access Memory (RRAM) macro equipped with parallel Two-Channel current-domain Analog-to-Digital Converters (TC-ADC), which doubles the effective sampling rate with only 15% additional ADC area.  Methods  As shown in Fig. 2, the transposable RRAM macro contains two processing elements (PEs) and a shared backward-processing ADC (BP-ADC). Each PE includes an input loader (IL), a Digital-to-Analog Converter (DAC) array, a Bit-Line (BL) buffer and switch array, and 32 TC-ADCs. This configuration supports fully parallel forward computation. An Error Loader (EL) and a Source-Line (SL) buffer are also included to provide an error input vector for transposed matrix-vector multiplication (MVM). Fig. 3 illustrates the programming flow of the BPAP scheme. After AOSC calibration, a forward calculation is first executed. The differences between the expected outputs (yexp) and the measured outputs (yreal) are then computed on chip and used as inputs for the following back-propagation phase. The derivatives of the RRAM weights are calculated using several validation patterns. This training-like process adapts to the actual RRAM states and detects programming failures under the highly parallel computing condition. Weights with derivatives exceeding a predefined error threshold are selected for remapping. This approach enables accurate programming without performing cell-by-cell verification across the entire array. In the forward phase (Fig. 4a), each 2T2R cell is configured as a signed weight, and the SLs are clamped at VCM by the TC-ADCs. For each PE, a fully parallel 4b-IN/4b-W MVM operation is completed with 320 active rows of 2T2R cells, and 32 ADCs perform simultaneous conversions. In the backward phase (Fig. 4b), only the upper half of the reference voltages drives the SL buffers, and the weight is configured in 1T1R mode. Differential computation between the positive and negative 1T1R cells is performed by an external processor. Fig. 5 shows the operation of the AOSC scheme. Redundant rows in the RRAM array are programmed to compensate the analog computing offsets in situ. Offset currents are first measured by applying an all-zero input pattern to the regular weights. The redundant RRAM weights are then programmed to minimize the offset currents under a constant input voltage. During normal computation, these programmed redundancy rows receive the same input voltage to cancel the offsets. The macro supports this AOSC operation with only about 1% additional array area. Fig. 6 shows the TC-ADC architecture. A class-AB output stage, together with associated switches and capacitors, enables two-channel conversion and reduces the computation latency by half. This design increases the ADC area by only about 15% while achieving a 2× sampling rate.  Conclusions  Replacing the conventional W&V procedure with BPAP, together with AOSC calibration and TC-ADC acceleration, enables reliable and high-precision programming of analog RRAM-CIM macros under massive parallel operation. The measured results show 96.5% classification accuracy on MNIST and a 4.8% improvement on ImageNet. The proposed techniques are compatible with standard 2T2R and 1T1R RRAM bit cells and can be extended to larger arrays and deeper neural networks.
Design of a Narrowband Energy-Selective Protective Antenna Integrating Electromagnetic Protection and Out-of-Band Interference Suppression
GAI Longjie, XU Yanlin, WANG Sijun, LIU Peiguo, HU Ning, HE Zhengwei
Available online  , doi: 10.11999/JEIT251363
Abstract:
  Objective  With the rapid development of wireless communication technologies, the Electromagnetic (EM) environment is becoming increasingly complex. Electronic information equipment is facing growing challenges from High-Intensity Radiation Fields (HIRFs) and out-of-band interference. This trend makes the co-design of EM protection and out-of-band interference suppression in electronic information systems an urgent issue. As the front end of the radio-frequency channel, antennas provide the main path by which EM waves in free space are converted into guided waves in microwave circuits. High-power EM waves can couple into a system through an antenna and cause EM damage. In single-frequency applications, if an antenna does not exhibit narrowband characteristics, out-of-band interference signals may also enter the system through the antenna and disrupt normal operation. A narrowband energy-selective protective antenna should therefore be developed to provide both out-of-band interference suppression and in-band EM protection against strong EM threats, thereby improving the operational stability and environmental adaptability of electronic information equipment in complex EM environments.  Methods  A coaxial-fed microstrip patch antenna is designed, and its structure is optimized through simulation for operation at 915 MHz. The antenna structure is designed to provide both narrowband behavior and EM protection, thereby achieving integrated EM protection and out-of-band interference suppression. A high dielectric constant is used to support both antenna miniaturization and narrowband operation. Accordingly, a TP-2 substrate with a dielectric constant of 20 is selected to obtain the required narrowband response. In a conventional coaxial-fed microstrip patch antenna, the probe passes directly through the dielectric substrate and connects to the radiating patch, which leaves insufficient space for the integration of a protective structure. To solve this problem, a layered-substrate design with a central hollow cavity is adopted. This configuration forms a layered cavity protective structure and enables the antenna itself to exhibit energy-selective protection characteristics.  Results and Discussions  To verify the performance of the proposed antenna, physical fabrication and experimental measurements are carried out (Fig. 14). The measured center frequency is 928.5 MHz, and the operating bandwidth is 927.0-930.0 MHz. Although the measured center frequency is shifted by 12.8 MHz from the simulated design value, the antenna still exhibits favorable narrowband characteristics (Fig. 15). The measured radiation pattern agrees well with the simulated result. In the Phi = 0 deg plane, a stable omnidirectional radiation pattern is observed, and the measured maximum gain reaches 2.5 dBi (Figs 11 and 16). The Shielding Effectiveness (SE) is measured by a high-power injection method. As the injected power increases, the radiated power increases linearly. When the injected power reaches 22 dBm, the increase in radiated power begins to saturate, which indicates that the diodes in the protective structure start to conduct and that the energy-selective mechanism is activated. As the injected power increases further, the SE rises gradually. When the injected power reaches 48 dBm, the radiated power rises sharply to the level of the original linear radiation curve, and the SE drops abruptly, which indicates diode breakdown and failure of the protective structure. In summary, the activation threshold of the protection function is 26 dBm, and the device failure threshold is 48 dBm. Within this range, the maximum SE reaches 26 dB (Fig. 18).  Conclusions  Based on a coaxial-fed microstrip patch antenna, a narrowband energy-selective protective antenna with integrated EM protection and out-of-band interference suppression is designed and demonstrated. The complete process is covered, including theoretical analysis, structural simulation and optimization, prototype fabrication, and experimental verification. First, Characteristic Mode Analysis (CMA) is used to examine the potential operating modes of the microstrip patch antenna. By analyzing the electric- and magnetic-field modal distributions, the impedance-matching characteristics are clarified, and the optimal coaxial feed position is determined. Next, the use of a high-permittivity substrate enables both antenna miniaturization and narrowband performance, and an Interference Suppression Capability (ISC) better than 22.1 dB is achieved. A layered-substrate structure with a central hollow cavity is then proposed, and a cavity-based protective structure integrated into the feed-probe region is established. An equivalent-circuit model is also developed to explain the operating mechanisms of the antenna in both the normal and protective states. Finally, the antenna prototype is fabricated and tested. The measured results show favorable narrowband characteristics, good agreement between the measured and simulated radiation patterns, and a measured maximum gain of 2.5 dBi. In addition, by applying the reciprocity principle and using a high-power injection method for SE testing, a maximum SE of 26 dB is obtained, which confirms the excellent EM protection capability of the antenna. Compared with existing protective antennas, the proposed structure achieves both out-of-band interference suppression and EM protection within the antenna itself. This design advances the integration of frequency-domain interference suppression and energy-domain protection. It should also be noted that the deviation between the measured and simulated center frequencies is caused in part by nonuniform substrate permittivity and fabrication tolerances, which reflects the sensitivity of narrowband antennas to structural parameters. In future work, a tunable mechanism may be adopted to develop a frequency-reconfigurable narrowband energy-selective protective antenna, so that frequency deviations can be compensated dynamically and the design robustness and environmental adaptability can be improved.
A Neural Network-Based Robust Direction Finding Algorithm for Mixed Circular and Non-Circular Signals Under Array Imperfections
YU Qi, YIN Jiexin, LIU Zhengwu, WANG Ding
Available online  , doi: 10.11999/JEIT250884
Abstract:
  Objective   Direction Of Arrival (DOA) estimation is affected by low Signal-to-Noise Ratios (SNR), the coexistence of Circular Signals (CSs) and Non-Circular Signals (NCSs), and multiple forms of array imperfections. Conventional subspace-based estimators exhibit model mismatch in such environments and show reduced accuracy. Although neural-network methods provide data-driven alternatives, the effective use of the distinctive statistical properties of NCSs and the maintenance of robustness against diverse array errors remain insufficiently addressed. The objective is to design a DOA estimation algorithm that operates reliably for mixed CSs and NCSs in the presence of array imperfections and provides improved estimation accuracy in challenging operating conditions.  Methods   A robust DOA estimation algorithm is proposed based on an improved Vision Transformer (ViT) model. A six-channel image-like input is first constructed by fusing features derived from the covariance matrix and pseudo-covariance matrix of the received signal. These channels include the real component, imaginary component, magnitude, phase, magnitude ratio reflecting the NCS characteristic, and the phase of the pseudo-covariance matrix. A gradient-masking mechanism is introduced to adaptively fuse core and auxiliary features. The ViT architecture is then modified: the standard patch-embedding module is replaced with a convolutional layer to extract local information, and a dual-class-token attention mechanism, placed at the sequence head and tail, is designed to enhance feature representation. A standard Transformer encoder is used for deep feature learning, and DOA estimation is performed through a multi-label classification head.  Results and Discussions   Extensive simulations are carried out to assess the proposed algorithm (6C-ViT) against MUSIC, NC-MUSIC, a Convolutional Neural Network (6C-CNN), a Residual Network (6C-ResNet), and a MultiLayer Perceptron (6C-MLP). Performance is evaluated using Root Mean Square Error (RMSE) and angular estimation error under different operating conditions. Under single-source scenarios with low SNR and no array errors, 6C-ViT achieves near-zero RMSE across most angles and shows minor edge deviations (Fig. 2). It maintains the lowest RMSE across the SNR range from –20 dB to 15 dB (Fig. 3), indicating good generalization to unseen SNR levels. In dual-source scenarios containing mixed CS and NCSs under array errors, 6C-ViT shows clear advantages. Its estimation errors fluctuate slightly around zero, whereas competing techniques present larger errors and pronounced instabilities, especially near array edges (Fig. 4). Its RMSE decreases steadily as SNR increases and reaches below 0.1° at high SNR, while traditional approaches saturate around 0.4° (Fig. 5). Robust behavior is further observed across different numbers of signal sources (K = 1, 2, 3) and snapshot counts (100 to 2 000). 6C-ViT preserves high accuracy and stability under these variations, whereas other methods show marked degradation or instability, most evident at low snapshot counts or with multiple sources (Fig. 6). When evaluated using unknown modulation types, including UQPSK with a non-circularity rate of 0.6 and 64QAM, under array errors, 6C-ViT continues to produce the lowest RMSE across most angles (Fig. 7), demonstrating strong generalization capability. Ablation studies (Fig. 8) confirm the contributions of the six-channel input, the gradient masking module, the convolutional embedding, and the dual class token mechanism. The complete configuration yields the highest accuracy and the most stable performance.  Conclusions   Strong robustness is demonstrated in complex scenarios that contain mixed CS and NCSs, multiple array imperfections, low SNR, and closely spaced sources. By fusing multi-dimensional features of the received signal and using an enhanced Transformer architecture, the algorithm attains higher estimation accuracy and improved generalization across different signal types, error conditions, snapshot counts, and noise levels compared with subspace- and neural-network-based baselines. The method provides a reliable DOA estimation solution for demanding practical environments.
Adversarial Attacks on 3D Target Recognition Driven by Gradient Adaptive Adjustment
LIU Weiquan, SHEN Xiaoying, LIU Dunqiang, SUN Yanwen, CAI Guorong, ZANG Yu, SHEN Siqi, WANG Cheng
Available online  , doi: 10.11999/JEIT251264
Abstract:
  Objective   Robust environmental perception is essential for intelligent driving systems. Light Detection And Ranging (LiDAR) provides high-resolution 3D point cloud data and serves as a core information source for object detection and recognition. However, deep learning models for 3D point cloud recognition show notable vulnerability to adversarial attacks. Small, imperceptible perturbations can cause severe classification errors and threaten system safety. Existing attack methods have improved the Attack Success Rate (ASR), but the perturbations they generate often lack concealment, create outliers, and show poor imperceptibility because they do not adequately preserve the geometric structure of point clouds. This reduces their suitability for realistic security evaluation of optoelectronic perception systems. Developing an attack method that maintains a high success rate while preserving geometric consistency and imperceptibility is therefore critical. This study addresses this need by proposing a framework that incorporates point cloud geometry into perturbation generation.  Methods   A Gradient Adaptive Adjustment (GAA) adversarial attack method for 3D point cloud recognition is proposed. The framework (Fig. 2) includes three coordinated modules. The 3D Point Cloud Salient Region Extraction module evaluates decision-level vulnerability using Shapley value analysis to identify and rank point subsets with the strongest influence on classifier output. Perturbations are then concentrated in these sensitive regions. A curvature-weighted gradient mechanism integrates local geometric priors. For each point in the salient region, a local covariance matrix is computed from its k-nearest neighbors. Principal component analysis generates eigenvalues and eigenvectors, which are used to compute a curvature measure. A Gaussian kernel function produces curvature-dependent weights that are applied to backpropagated gradients. This suppresses perturbations in high-curvature areas and encourages them in low-curvature regions to preserve local shape morphology. A principal curvature direction constrained 0ptimization module further refines the perturbation direction. The weighted gradient is projected onto the principal curvature directions, and the projection components are fused using coefficients derived from the corresponding eigenvalues. This aligns the perturbation with natural geometric trends and avoids unnatural deformation. An adaptive optimization algorithm then minimizes a multi-objective loss balancing attack success, geometric similarity (via chamfer distance and hausdorff distance), and perturbation sparsity. The adversarial point cloud is iteratively updated based on the saliency map, curvature-weighted gradients, and principal direction constraints.  Results and Discussions   Experiments on ModelNet40, ShapeNetPart, and KITTI were conducted using PointNet, DGCNN, and PointConv. The GAA method showed strong performance. On ModelNet40 with PointNet, it achieved a 97.69% ASR with an average of 28 perturbed points, outperforming ten baselines such as AL-Adv (92.92% ASR, 40 points) and Kim et al. (89.38% ASR, 36 points) (Table 1). It also produced lower geometric distortion, as indicated by smaller Chamfer Distance and Hausdorff Distance values. Visual results (Fig. 4) show that GAA produces fewer outliers and more natural adversarial point clouds compared with methods such as AL-Adv. The method generalized well across architectures, reaching 99.78% ASR on DGCNN and 96.91% on PointConv (Table 2), with similar performance on ShapeNetPart (Table 3). Ablation experiments on the number of salient regions (K) showed consistent improvements in ASR and reduced geometric distortion as K increased from 1 to 6 (Table 4, Fig. 5), confirming the advantage of targeting multiple critical regions. Tests on the KITTI dataset demonstrated strong performance in real-world, noisy environments. The method maintained high ASRs, such as 99.33% on PointNet, with limited perturbations (Table 5). An ablation study on K indicated that K=4 offers an effective balance between success rate and perturbation cost for PointNet (Table 6).  Conclusions   This study presents a GAA method for adversarial attacks on 3D point cloud recognition. By combining a Shapley value-based saliency analyzer, a curvature-weighted gradient mechanism, and a principal curvature direction constraint, the method generates adversarial examples that achieve high attack success while preserving geometric consistency. Experiments show that GAA minimizes perceptual distortion and perturbs fewer points across datasets and models. The method provides a practical tool for vulnerability analysis and supports the development of more robust and secure optoelectronic perception systems for intelligent driving. Future work will examine robustness under adverse conditions and assess physical-world implications.
A Channel Phase Self-compensation Method for Active-Integrated Arrays
SUN Liying, LU Yunlong, XU Jun, HU Yang
Available online  , doi: 10.11999/JEIT251325
Abstract:
The seamless integration of active circuitry and antennas can effectively improve link performance and system integration. At present, active-integrated antennas are mainly designed by adjusting the antenna impedance while maintaining the desired radiation characteristics to achieve direct matching with active transistors. However, the effect of the antenna’s complex impedance on the phase response of the active channel, as well as its potential application in active-integrated phased arrays, has not been thoroughly studied. This paper proposes a channel phase self-compensation method for active-integrated arrays. For each active channel, the active transistor is directly integrated with the radiating element, where the load impedance at the transistor drain is matched to the input impedance of the antenna element. Under a constant active gain, the required complex load impedance is solved to establish an explicit mapping between the phase response of each active channel and its corresponding load impedance. According to the phase-shift requirements among array channels, appropriate load impedances are selected as the input impedances of the corresponding radiating elements. This approach applies a predefined phase distribution to each channel without using external phase-shifting structures. It can control the initial beam direction or compensate for the path difference between elements in conformal arrays. An active-integrated phased-array antenna with a preset beam direction is designed as a demonstration example to verify the effectiveness of the proposed method. The method provides an efficient design approach for next-generation active-integrated arrays.  Objective  In the traditional design approach, active circuit channels and antenna arrays are matched to 50 Ω before interconnection. This configuration occupies considerable physical space and limits system-level integration. In addition, insertion loss in passive matching networks and mismatch loss at the interconnections reduce overall link performance. Direct co-integration of active circuitry and antenna elements can address these limitations. However, multi-channel active-integrated antenna arrays often require one or multiple superimposed phase distributions across the channels to satisfy different application requirements, such as initial beam offset in fuze systems, wavefront compensation in conformal active phased arrays, and wide-angle beam scanning. These phase gradients are typically realized through backend phase-shifting networks. In this work, the complex impedance characteristics of the antenna are adjusted when it is directly integrated with the active circuitry. The phase response of the active-integrated channels can therefore be tuned within a certain range without using complex matching networks or additional phase shifters. This strategy reduces the complexity and performance requirements of the backend phase-shifting network. The advantages are more evident in millimeter-wave, high-frequency, and terahertz systems, where the available phase-shift range of phase shifters is limited.  Methods  Phase self-compensation of the active channels is achieved through the direct integration of the active transistor and the radiating element. In this configuration, the drain output of the transistor is directly connected to the input of the radiating element, and impedance transformation is realized within the antenna element. The proposed method includes three main steps. (1) The active transistor is first modeled as a two-port network. By evaluating the antenna element’s complex impedance as the load on different constant-gain circles, the mapping between the phase response of the active channel and the load impedance is established. The achievable phase-shift range of the active channel is then determined. (2) According to the required phase-shift distribution among the array channels, suitable combinations of active gain and corresponding complex load impedances (not unique) are selected. These combinations are not unique. (3) The realizability of the selected impedances is examined according to the characteristics of the radiating element. The impedance values with the highest feasibility are implemented by optimizing the radiating element, which includes fine adjustment of its geometry and feed position to meet the target impedance. When the radiating element is modified, particularly for circularly polarized elements, desirable radiation characteristics must also be preserved, including good axial ratio and beam-scanning performance.  Results and Discussions  The proposed phase self-compensation mechanism enables the array to achieve initial beam pointing and compensate for path-length differences caused by special array geometries, such as conformal or curved surfaces, without using additional phase-shifting structures. Therefore, the performance requirements of the backend phase-shifting network in active phased arrays can be reduced. To verify the effectiveness of the proposed method, a 1×4 circularly polarized active-integrated linear array (Fig. 9) is designed and demonstrated. Based on channel-level impedance calculations (Fig. 6) and an analysis of the antenna-element impedance characteristics (Fig. 8), a phase gradient of 38° between adjacent channels is synthesized and applied to the circularly polarized active-integrated array. Without degrading the circular polarization performance and without external phase-shifting circuitry, the initial beam direction of the active-integrated phased array is shifted to the desired angle of θ0 = 12° (Fig. 13). The phase self-compensation design does not degrade the beam-scanning capability of the array. After an additional phase gradient is applied for beam steering, the array achieves a scanning range of up to 50°. The gain reduction remains within 2 dB relative to the initial pointing direction, and the axial ratio remains below 4 dB throughout the scanning range.  Conclusions  Within the framework of active-integrated arrays, this work uses the phase-tuning effect produced by the complex impedance at the antenna port when the radiating element is directly matched to the active transistor. A desired phase-gradient distribution can therefore be synthesized among the channels of an active-integrated phased array within an achievable range. This capability enables compensation for required phase distributions, such as preset beam direction and path-length equalization in conformal-array applications, without relying on additional phase shifters. Therefore, the complexity and performance requirements of the backend phase-shifting circuitry are reduced. The effectiveness of the proposed method is validated through a multi-channel circularly polarized active-integrated phased-array prototype with a preset beam direction. Both full-wave simulations and experimental measurements confirm that the phase self-compensation mechanism provides the required initial beam pointing while preserving beam-scanning capability and polarization performance. This study provides a new approach for the design of high-efficiency next-generation active-integrated phased arrays.
TTSPD: A Multimodal Traffic Scene Perception Dataset Integrating Tire Data
YING Zongchen, GUI Lin, YANG Jiahan, ZHANG Fangwei, WANG Junfan, DONG Zhekang
Available online  , doi: 10.11999/JEIT260022
Abstract:
  Objective  With the rapid development of Intelligent Transportation Systems (ITS) and autonomous driving technologies, accurate traffic environment perception is a fundamental prerequisite for vehicle safety and decision making. Current perception frameworks primarily rely on high-resolution cameras and LiDAR sensors. Although these sensors provide rich information, they create severe challenges across the Perception-Storage-Calculation pipeline. High acquisition costs limit large-scale deployment. In addition, the massive data volume produced by high-dimensional sensors places heavy pressure on onboard storage and computational resources, often exceeding the power and thermal budgets of vehicle-grade edge platforms. These constraints motivate the exploration of alternative sensing paradigms that are cost-effective, compact, and computationally efficient while maintaining reliable perception accuracy. In response, the present study shifts the perception perspective from conventional external sensors to the tire-road contact interface, where abundant physical interaction information naturally exists. The objective is to construct a novel multimodal dataset, termed the Tire-integrated Traffic Scene Perception Dataset (TTSPD), which combines internal tire dynamics with external visual observations. This dataset is used to examine whether low-dimensional tire sensing data can complement or partially substitute high-dimensional visual data for accurate road surface classification. The study also aims to establish a new data morphology that balances perception performance and system efficiency for future intelligent vehicles.  Methods  To construct a high-quality and practically usable multimodal dataset, an integrated hardware-software acquisition framework is developed. From a hardware perspective, a specialized sensing system is designed by coupling tire-mounted multi-parameter sensors with a vehicle-mounted camera. To ensure reliable operation under the harsh mechanical conditions of a rotating tire, sensing nodes are encapsulated using a rubber-based composite material that provides mechanical protection and long-term stability. Wireless transmission is implemented using Bluetooth Low Energy (BLE) 5.0 with an adaptive frequency-hopping mechanism, enabling low-power and reliable communication during high-speed rotation. During data acquisition, the system synchronously collects six types of internal tire signals, including radial acceleration, tire temperature, and tire pressure, producing approximately 1.8 million sampling points. In parallel, a dashboard-mounted camera records high-resolution traffic scene images totaling 309 GB across four representative road surface conditions. To address the heterogeneity between high-frequency one-dimensional tire signals and two-dimensional visual data, a timestamp-based association strategy is adopted to achieve scene-level temporal alignment rather than strict frame-by-frame correspondence. Sensor sequences and image segments are grouped according to shared temporal windows and driving scenarios. This approach ensures semantic and temporal consistency at the scene level. The alignment strategy reflects practical deployment conditions and forms the basis of the final TTSPD dataset for multimodal fusion research.  Results and Discussions  The effectiveness of the proposed TTSPD is evaluated through comprehensive road surface classification experiments using mainstream deep learning models. Initial experiments based solely on visual data demonstrate strong baseline performance, with classification accuracies ranging from 87.25% to 93.75% (Table 7). These results confirm the quality and diversity of the visual modality in the dataset. The primary contribution of this study is the quantification of efficiency gains enabled by tire-based sensing. Comparative experiments progressively reduce the amount of visual data while integrating low-dimensional tire signals, particularly radial acceleration (Table 9). The results show that the multimodal model achieves approximately 95% of the full-data baseline accuracy while using only about 38.75% of the original data volume. This reduction in data dependency produces significant system-level benefits. Storage requirements decrease by approximately 61.25%, and overall model training time decreases by about 54.10% (Fig. 8). These findings indicate that tire dynamics encode high-value physical features related to road texture and surface conditions that complement visual cues. The proposed dataset therefore supports the development of lighter perception pipelines without reducing recognition performance.  Conclusions  This study addresses the long-standing Perception-Storage-Calculation bottleneck in vision-dominated autonomous driving systems by proposing the TTSPD. Multi-parameter sensors are embedded within tires using rubber-based encapsulation, and stable wireless communication is achieved through BLE 5.0. A robust tire-camera data acquisition system is therefore established. The resulting dataset covers four common and safety-critical road surface types: cement, asphalt, damaged, and water-covered roads. It provides a comprehensive foundation for multimodal perception research. Experimental results show that combining low-dimensional tire sensing data with visual information significantly improves perception efficiency. Approximately 95% of peak classification accuracy is achieved using only about 38.75% of the original data volume. This result effectively reduces storage pressure and computational cost, reflected in a 61.25% reduction in data storage and a 54.10% reduction in training time. The TTSPD dataset therefore proposes a practical data morphology that supports efficient and high-performance perception under vehicle-grade computational constraints. It also provides valuable resources for the future development of ITS.
Multi-path Resource Allocation for Confidential Services Based on Network Coding and Fragmentation Awareness in EONs
LIU Huanlin, AN Dongxin, CHEN Yong, CHEN Haonan, MA Bing, ZOU Jiachen
Available online  , doi: 10.11999/JEIT251222
Abstract:
  Objective  Each fiber in Elastic Optical Networks (EONs) provides enormous bandwidth capacity and carries a large volume of services and data. If any element in EONs is eavesdropped on or attacked, even for a short period, a large amount of data may be leaked or lost, which significantly reduces network performance. Moreover, confidential services are increasingly sensitive to data leakage and loss during transmission. Network attacks may therefore compromise a large number of confidential services. Network Coding (NC) combines data from different services using the XOR operation and transmits the coded data through EONs. Decoding is then performed at the receiver to recover the original information, providing a potential method to mitigate data eavesdropping during transmission. However, NC requires encryption constraints in EONs. Specifically, the routing and Frequency Slot (FS) allocation of other services must overlap with those of the confidential service to be encrypted. Therefore, routing and spectrum allocation for confidential services should consider both NC constraints and the efficiency of resource allocation.  Methods  A Multi-path Resource Allocation based on Network Coding and Fragmentation Awareness (MRA-NCFA) method is proposed to support secure and reliable transmission of confidential services under eavesdropping attacks. First, the proposed method applies NC to encrypt service data and adopts multi-path protection to improve transmission reliability. Second, in the routing stage, different strategies are designed for confidential and non-confidential services. For non-confidential services, the objective is to balance network load and improve resource utilization. A path weight function based on path load is designed. This function considers path hop count, the maximum idle spectrum block on the path, and the required FS of the service. The path with the largest function value is selected as the transmission path. For confidential services, routing selection focuses on preventing information leakage while considering path resource availability. Therefore, a path cost function based on eavesdropping probability is designed, and a routing strategy that considers this probability is adopted. Finally, different resource allocation strategies are applied. For non-confidential services, the objective is to maximize spectrum efficiency. Spectrum fragmentation should be minimized to maintain resource continuity and consistency. Therefore, a fragmentation-aware spectrum allocation strategy is designed. A fragmentation measurement formula evaluates the effect of service allocation on link resources. For confidential services, encryption constraints and FS matching must be satisfied. Therefore, a spectrum allocation strategy based on FS and fragmentation sensing is designed. This strategy considers both the effect of spectrum fragments and the effect of established service resources, which improves transmission security for confidential services.  Results and Discussions  The proposed MRA-NCFA algorithm achieves the lowest service blocking probability (Fig. 2). During routing selection, both confidential and non-confidential services consider path resource conditions. During resource allocation, fragmentation effects are also considered, which preserves idle resources for subsequent services as much as possible. In addition, confidential services adopt a multi-path transmission method. Large services can be divided into multiple sub-services, which improves spectrum resource utilization. As the number of services increases, the spectrum utilization of the MRA-NCFA algorithm improves significantly. This improvement results from the multi-path transmission mechanism, which divides large services into smaller ones and allows efficient use of small spectrum fragments. In addition, both confidential and non-confidential services consider path resource quantity during routing and prefer paths with lower spectrum consumption. During resource allocation, fragmentation effects are considered to avoid generating new fragments, which improves spectrum utilization (Fig. 3). As the number of services increases, the proposed MRA-NCFA algorithm shows the slowest and smallest increase in spectrum fragmentation ratio compared with the other two algorithms. This result occurs because the algorithm combines multi-path transmission with fragmentation-aware resource allocation, which improves the utilization of small spectrum fragments and reduces fragmentation in EONs. Moreover, both confidential and non-confidential services consider fragmentation effects during resource allocation and apply strategies to reduce fragmentation. Therefore, the proposed algorithm performs better than the Survivable Multipath Fragmentation-Sensitive Fragmentation-Aware Routing and Spectrum Assignment (SM-FSFA-RSA) algorithm and the Network Coding-based Routing and Spectrum Allocation (NC-RSA) algorithm (Fig. 4).  Conclusions  This study examines resource allocation for services that require protection against eavesdropping attacks in elastic optical networks. The objective is to satisfy the security requirements of confidential services and reduce spectrum fragmentation. The proposed MRA-NCFA algorithm applies NC to encrypt confidential services and adopts multi-path protection to improve transmission reliability. For non-confidential services, a path weight function based on path resources is designed for routing selection, and fragmentation-aware spectrum metrics are used for resource allocation. For confidential services, a path cost function that considers both path resources and eavesdropping probability is designed for routing selection. A bandwidth segmentation strategy based on eavesdropping probability supports multi-path transmission, and an FS and fragmentation sensing function based on encryption constraints is used for spectrum allocation. These mechanisms improve both reliability and security for confidential services. As the number of security-sensitive services on the Internet increases, the proposed MRA-NCFA algorithm can effectively reduce traffic blocking probability and improve spectrum resource utilization.
Blind Parameter Estimation Method for PSK Modulated Frequency-Hopping Signals Based on Improved Maximum Likelihood
ZHANG Tianhao, ZHANG Yushu, XU Zhongqiu, TANG Xinyi, DANG Wenhua, LI Guangzuo
Available online  , doi: 10.11999/JEIT260005
Abstract:
  Objective  Blind parameter estimation of non-cooperative Frequency-Hopping (FH) signals is a critical task in electronic reconnaissance and countermeasures. Estimation methods based on time-frequency analysis typically suffer from limited resolution or high computational complexity. Furthermore, methods based on compressive sensing rely heavily on the consistency between the predefined dictionary and the actual signal characteristics, and the estimation precision will be significantly compromised by grid mismatch or modulation-induced energy dispersion. Maximum Likelihood (ML)-based methods offer the advantage of high theoretical estimation accuracy with relatively low computational complexity. However, existing studies typically assume an ideal unmodulated signal model with a single frequency transition. Consequently, these ML-based methods suffer from severe model mismatch when processing FH signals with digital modulation, such as Phase Shift Keying (PSK), or multi-hop signals. Moreover, the conventional iterative solution of ML-based methods is prone to divergence or trapping in local optima. To address these limitations, this paper proposes an improved ML-based method for the blind parameter estimation of PSK-modulated FH signals.  Methods  To handle received multi-hop signals, a signal slicing technique based on the Short-Time Fourier Transform (STFT) is proposed to extract slices containing individual frequency transitions. Subsequently, to mitigate the model mismatch caused by digital modulation in conventional ML-based methods, a model-matching signal extraction approach based on the ML objective function is developed for PSK-modulated FH signals. Furthermore, a weighted iterative solving algorithm for ML estimation is designed to enhance convergence, thereby achieving robust and accurate estimation of frequency-hopping parameters.  Results and Discussions  To validate the effectiveness of the model-matching signal extraction approach, ablation experiments were carried out under various modulation schemes, including binary PSK (BPSK), quadrature PSK (QPSK), and 8-ary PSK (8PSK). The results indicate that the proposed approach (Group D) significantly reduces the Mean Square Error (MSE) of hopping frequency estimation compared to that without the proposed extraction (Group ND). These results demonstrate that the proposed method effectively mitigates the model mismatch (Fig. 5). Simulation results also illustrate that the designed weighted iterative algorithm achieves superior convergence performance compared with linear weighting and non-weighting schemes (Fig. 6). Moreover, the experiments verify the algorithm's insensitivity to initial frequency offsets, showing that it tolerates offsets of up to 2 MHz at SNR of -10 dB with little performance degradation (Fig. 7). Finally, comparative analysis with representative existing methods indicates that the proposed method outperforms the others in terms of estimation accuracy (Fig. 8).  Conclusions  To achieve blind parameter estimation for PSK-modulated FH signals, this paper proposes an improved ML-based method. By utilizing a signal slicing technique based on the STFT, the proposed method successfully extends the applicability of the ML-based estimator to continuous multi-hop signals. To mitigate the model mismatch induced by PSK modulation, a model-matching signal extraction approach is developed to isolate valid signal segments that conform to the ML model. Furthermore, a weighted iterative algorithm incorporating a dynamic weighting function is introduced to address the instability of the conventional iterative ML solver. Simulation results confirm that the proposed method effectively eliminates model mismatch and ensures superior convergence performance with insensitivity to initial frequency offsets. Moreover, it is shown to achieve high estimation precision for both hopping frequencies and hopping times.
A Semantic-Enhanced Cybersecurity Named Entity Recognition Approach Oriented to Lightweight Adaptation of Large Language Models
HU Ze, XU Tongwu, YANG Hongyu
Available online  , doi: 10.11999/JEIT251260
Abstract:
  Objective  Named Entity Recognition (NER) in the field of cybersecurity is a fundamental technology supporting threat intelligence analysis, vulnerability management, and security incident response. However, this field generally faces challenges such as dense technical terms, scarce labeled data, dynamic changes in entity categories, and highly complex semantic features, which make traditional deep learning models and existing Large Language Models (LLMs) significantly inadequate in terms of domain adaptability and semantic fusion capability. To address the aforementioned key issues while also considering the need for lightweight model deployment, this paper aims to construct a cybersecurity NER approach that can enhance domain semantic representation, improve the ability to identify rare entities, and apply to low-resource environments, providing a reliable technical path for intelligent threat analysis in cybersecurity scenarios.  Methods  To address the complex semantic features of cybersecurity texts, this paper proposes a semantically enhanced, lightweight, and LLMs-adaptable cybersecurity NER approach. The proposed approach uses LLM2Vec to achieve bidirectional semantic reconstruction of large model decoders and combines Low-Rank Adaptation (LoRA) for low-rank fine-tuning, so as to maintain deep semantic encoding capability while significantly reducing the amount of parameter updates. To address the challenges of sparse keywords and severe noise interference in cybersecurity texts, a sparse gated attention mechanism is introduced to strengthen keyword-focused feature extraction by dynamically selecting high-contribution cybersecurity terms through global gating and sparse inference. A SecRoBERTa-based semantic enhancement component is introduced, which utilizes a domain-pre-trained model to generate similar word embeddings, optimizes feature robustness in small-sample scenarios, and alleviates the challenges of identifying out-of-vocabulary words and low-frequency terms. Finally, a masked conditional random field is employed to constrain label transitions and guarantee BIO-compliant output sequences, achieving robust and consistent entity boundary prediction.  Results and Discussions  Extensive experiments were conducted on two public cybersecurity datasets, DNRTI and APTNER. The proposed approach achieved an F1 score of 91.91% on DNRTI, surpassing the previous state-of-the-art model by 2.14%. On APTNER, it reached an F1 score of 80.37%, outperforming the best baseline by 2.97%. Ablation studies confirmed the contribution of each key component: the Sparse Gated Attention mechanism improved F1 by 3.57% over standard Multi-Head Attention on DNRTI; the semantic enhancement module contributed a 2.32% F1 gain; and the MCRF (Masked Conditional Random Field) layer provided a 10.63% F1 improvement over traditional CRF (Conditional Random Field). The model also demonstrated efficient training and inference characteristics, aligning with its lightweight design goals.  Conclusions  This paper proposes a lightweight adaptation approach based on LLMs for NER in the cybersecurity domain, which effectively addresses the limitations of existing LLMs-based NER methods in domain adaptation and rare entity recognition. By integrating LLM2Vec and LoRA for lightweight fine-tuning, a sparse gated attention mechanism for domain feature fusion, and a SecRoBERTa-based semantic enhancement component for similar word precomputation, the proposed approach achieves high performance on DNRTI and APTNER datasets. The research provides an efficient technical path for NER tasks in low-resource cybersecurity scenarios and offers strong support for downstream tasks such as automated threat intelligence analysis.
A High-Performance Eye Tracking Method Based on Event Camera and Dual-Channel Differential Illumination
SONG Sishun, FENG Junchi, PU Chengyu, GUO Yu, LIU Shijie, HE Xin, CHENG Yuwei
Available online  , doi: 10.11999/JEIT251162
Abstract:
  Objective  Eye tracking has become an essential technology in human–computer interaction, medical diagnostics, cognitive neuroscience, and augmented/virtual reality applications. However, traditional eye tracking systems often suffer from two major limitations: low spatial accuracy and restricted temporal resolution, particularly in high-speed eye movement scenarios. These limitations hinder precise gaze estimation and reduce the reliability of real-time interactive systems. To address these challenges, this research integrates an event camera with the dual-channel differential illumination strategy to enhance the signal-to-noise ratio of corneal reflection events. By introducing the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, accurate localization of corneal reflection points is achieved. On this basis, the corneal reflection point coordinates are utilized in combination with Singular Value Decomposition (SVD) and the least-squares method to determine the corneal curvature center, thereby significantly improving the accuracy of gaze direction estimation. This research provides an efficient technical pathway for next-generation eye tracking systems and offers theoretical support for their deployment in complex interactive environments.  Methods  The proposed event-camera-based gaze tracking method integrates asynchronous eye movement event data through a dual-channel differential illumination framework, thereby enhancing gaze direction estimation accuracy under high-speed and dynamic conditions. Firstly, the event camera asynchronously captures brightness-change events with microsecond-level temporal resolution, enabling precise tracking of rapid eye movements, while the dual-channel differential illumination mechanism suppresses redundant reflections and enhances the contrast of corneal reflection points. Secondly, the DBSCAN algorithm is employed to process event data, effectively removing noise and optimizing the spatial localization accuracy of corneal reflection features. Finally, a ray-tracing model is reconstructed using SVD and least-squares fitting to determine the corneal curvature center, thereby achieving robust and high-precision gaze direction estimation. Experimental results on a biomimetic eye movement dataset demonstrate that the proposed method achieves high temporal resolution, localization accuracy, and robustness in dynamic tracking scenarios.  Results and Discussions  Experiments demonstrate that the proposed method achieves a temporal resolution of 25 kHz (Fig. 6), far exceeding conventional cameras. Differential illumination significantly improves the signal-to-noise ratio of corneal reflection events. The DBSCAN algorithm localizes corneal reflection points more efficiently than K-Means, Agglomerative Clustering, Mean Shift, and OPTICS, achieving accurate results within 10 ms without requiring predefined clusters (Fig. 8, Table 3). For gaze estimation, the proposed method maintains stable accuracy across sampling frequencies from 2 kHz to 25 kHz. At a 15° cone angle, the mean error (ME) and root mean square error (RMSE) are approximately 0.66° and 0.67°, respectively, while at 25° they increase slightly to 0.87° and 0.90° (Table 4). Compared with existing state-of-the-art (SOTA) gaze tracking methods, the proposed approach demonstrates superior overall performance in terms of both temporal resolution and accuracy (Table 5) Trajectory results (Fig. 9) show close alignment between estimated and ground truth gaze paths, and distribution analyses (Fig. 10) confirm concentrated error ranges below 1°.  Conclusions  This paper presents a novel eye tracking method integrating event cameras, dual-channel differential illumination. The method achieves high temporal resolution (25 kHz), enhances event signal quality, and reduces localization errors, yielding gaze estimation errors of less than 1°. The proposed approach provides a reliable technical pathway for next-generation high-performance eye tracking systems. Future work should consider sensor noise modeling and computational optimization to further improve real-world applicability.
Multi-projection plane InISAR 3D reconstruction method for complex moving ship targets
LI Ning, NIU Jinfa, WANG Weibin, HU Xingwang, WU Lin
Available online  , doi: 10.11999/JEIT251268
Abstract:
  Objective  Interferometric Inverse Synthetic Aperture Radar (InISAR) is a Three Dimensions (3D) reconstruction technique for non-cooperative target. However, the complex 3D rotational motion of the ship target causes unstable Doppler frequency changes, and Inverse Synthetic Aperture Radar (ISAR) imaging inevitably suffers from target overlap and occlusion problems, making high-precision complete 3D reconstruction difficult under a single projection plane. Thus, a multi-projection planes InISAR 3D reconstruction method of complex moving ship targets based on point cloud fusion is proposed. Through efficient and high-precision point clouds registration and fusion supplement target 3D information, significantly improving the 3D reconstruction quality.  Methods  This method fully leverages the advantages of multi-plane observation from the severe movement of ship targets, extracts the ship’s centerline and estimates the vertical rotation vector via Principal Component Analysis (PCA), to select the optimal imaging time corresponding to different Imaging Projection Planes, completes ISAR imaging and InISAR 3D reconstruction. Secondly, a point cloud fusion algorithm combining Weighted Random Sampling Consensus (RANSAC) and Hierarchical Iterative Closest Point (ICP) is proposed. The random sampling process is optimized through a feature stability weighting strategy, efficiently extracting and matching corresponding feature points in InISAR images, achieving high-precision multi- Imaging Projection Plane (IPP) point cloud fusion.  Results and Discussions  Experimental results demonstrate that the proposed method significantly enhances reconstruction accuracy and target completeness. For simulated ship point target data, Fig 7 shows excellent results, with a significant reduction in reconstruction error. Signal-to-noise ratio (SNR) analysis reveals that 3D fusion imaging quality improves continuously as SNR increases from –10 dB to 10 dB, maintaining robust fusion performance even under low SNR conditions. For simulated destroyer radar cross section data, this method achieved significant registration results, and the detail recovery and structural integrity of the fused image were significantly improved, effectively solving the problem of incomplete 3D information reconstruction caused by overlapping and occlusion of scattering points.  Conclusions  To address the issues of low reconstruction accuracy and information loss caused by target rotation, overlapping, and occlusion in traditional InISAR methods for 3D reconstruction of complex moving ship targets, this paper proposes a multi-IPP InISAR 3D reconstruction method based on point cloud fusion. This method employs a PCA optimal imaging time selection strategy, By employing weighted RANSAC and hierarchical ICP algorithms to achieve efficient and high-precision registration and fusion of InISAR point clouds under multiple IPPs, obtaining high-quality 3D reconstruction results. This paper conducts multi-scenario experiments by constructing a ship model with ideal scattering points and an electromagnetic simulation RCS model with occlusion effects, verifying the accuracy of the proposed method under ideal conditions and its applicability in complex real-world scenarios.
A Clipped NMS List Decoding Algorithm of LDPC Codes for 5G URLLC
ZHANG Xiaojun, SONG Xin, GAO Jian, MI Yonghao, NIU kai
Available online  , doi: 10.11999/JEIT250853
Abstract:
  Objective  As one of the coding schemes in the fifth-generation (5G) wireless communication systems, Low-Density Parity-Check (LDPC) codes can achieve performance close to the Shannon limit through iterative decoding. However, in practical wireless transmission environments, the decoding performance of LDPC codes is susceptible to burst interference in wireless channels. The NMS decoding algorithm is highly sensitive to the distribution characteristics of input log-likelihood ratios (LLRs). Burst interference will cause LLRs to deviate from the Gaussian distribution, resulting in degradation in decoding performance. Meanwhile, 5G LDPC decoders are often equipped with a fixed number of processing units (PEs) according to the maximum lifting size to cover the full code length range. In URLLC (Ultra-Reliable Low-Latency Communications) short code transmission scenarios, the lifting size is much smaller than the maximum lifting size, leading to long-term idleness of a large number of processing units and insufficient utilization of hardware resources. To address the above issues, this paper proposes a Clipped Normalized Min-Sum List (CNMSL) decoding algorithm. By co-designing burst interference smoothing and idle resource reuse, it improves hardware resource utilization while enhancing decoding performance.  Methods  The statistical characteristics of LLRs over AWGN and interference channels are first analyzed, and the negative impact of burst interference on decoding performance is qualitatively illustrated to stem from the increased proportion of saturated LLRs induced by such interference. Next, the correlation between the optimal clipping threshold and channel noise variance, burst interference variance as well as burst probability is verified, which converges to a finite interval, the optimal threshold interval, when channel parameters undergo limited variations. On this basis, the CNMSL decoding algorithm is proposed. This algorithm constructs a list decoding architecture by reusing idle processing units in 5G LDPC decoders, where each decoding path performs independent and synchronous decoding to generate candidate codewords, and the optimal decoding result is screened out via CRC check. Meanwhile, an independent clipper is configured for each path with parameters set according to the optimal threshold interval, thereby effectively suppressing and mitigating the adverse effects of burst interference.  Results and Discussions  Experimental results show that the layered NMS algorithm almost fails to decode over interference channels without clipping mechanism. With a single clipping threshold, the algorithm works normally, and its BLER exhibits a convex-down trend of first decreasing and then increasing as the clipping threshold reduces. Under various channel conditions for both short and long codes, the single-clipping layered NMS algorithm with a clipping threshold of 3.5 achieves a gain of about 1 dB at \begin{document}$ BLER={10}^{-2} $\end{document} compared with that of 10, and the CNMSL algorithm further yields an additional gain of about 0.5 dB relative to the single-clipping NMS algorithm. In terms of hardware efficiency, when the lifting factor is less than 192, the PE utilization of the CNMSL algorithm is significantly higher than that of the layered NMS algorithm, with more remarkable improvement as the lifting factor decreases, and the average PE utilization of the CNMSL algorithm is increased by 69% compared with the layered NMS algorithm.  Conclusions  The CNMSL decoding algorithm is proposed in this paper, aiming to improve the error correction performance of the traditional layered NMS decoding algorithm over interference channels. By reusing idle PEs for list decoding to generate multiple candidate paths, the algorithm incurs no additional hardware overhead. In addition, an optimal threshold interval is defined to configure the clipper for each decoding path, which limits the proportion of saturated LLRs and makes the input LLRs follow a Gaussian or near-Gaussian distribution. Experimental results show that compared with the layered NMS decoding algorithm with a single clipper, the proposed CNMSL algorithm achieves a gain of approximately 0.5 dB for both short and long codes. Meanwhile, it increases the PE utilization by an average of 69%.
Drug Response Prediction Based on Graph Topology Attention Network
XU Peng, XU Hao, BAO Zhenshen, ZHOU Chi, LIU Wenbin
Available online  , doi: 10.11999/JEIT251099
Abstract:
  Objective  A core goal in modern cancer research is to figure out why patients respond differently to the same therapy. Achieving this requires developing computational tools that combine genetic information and drug properties to forecast treatment outcomes, which is essential for advancing personalized oncology. Although some existing methods have made progress in predicting cancer drug responses, effectively extracting features of drugs and integrating multi-omics data from cell lines have become challenges. To address these challenges, employing Graph Neural Networks (GNNs) to process drug molecular graphs has become a promising strategy. This research proposes a model that utilizes a graph topology attention network to capture features from drug molecular graphs, while an attention mechanism is applied to integrate multi-omics data.  Methods  In this study, a drug response prediction method based on Graph Topology Attention Network(GTAT) is proposed. The model integrates topological graph information to predict drug responses in cell lines. The model utilizes drug SMILES strings to generate two distinct drug representations and incorporates multi-omics data for cell line characterization (Fig. 1). For drug feature extraction, SMILES strings are first parsed to construct molecular graphs, which are then processed by the GTAT. This network captures both the topological information of the molecular graph-level and atom-level features, thereby producing structured molecular representations. Simultaneously, Extended Connectivity Fingerprints are computed from the same SMILES strings and transformed into continuous feature vectors via a Multi-Layer Perceptron (MLP). The graph-based drug representation and the fingerprint-based representation are subsequently concatenated to form a comprehensive drug feature vector. For cell line representation, multi-omics data are processed through omics-specific neural networks. The resulting features are fused using multi-head self-attention mechanisms, enabling the model to capture contextual interactions across omics modalities and generate an integrated cell line representation. Finally, the drug and cell line features are combined and fed into an MLP classifier to predict drug response outcomes. The proposed model effectively integrates heterogeneous biological data sources and significantly enhances prediction accuracy through multi-modal learning and attention-based feature fusion.  Results and Discussions  The proposed method achieves competitive performance on both GDSC and CCLE benchmark datasets (Table 2). Specifically, on the GDSC dataset, our approach outperforms all competing methods across all four metrics—AUC, AUPR, F1-score, and Accuracy. Notably, it improves the AUPR by approximately 1.92% over the second-best method, MOFGCN, demonstrating its advantage in handling class imbalance. On the CCLE dataset, our method still achieves the best performance in terms of AUC and Accuracy. Although it is marginally lower than GADRP in AUPR and F1-score, the gap is minimal, and our approach exhibits more robust overall discriminative ability (as reflected by AUC). These results collectively validate the effectiveness and strong generalizability of our method in drug sensitivity prediction tasks. The observed variation in AUPR and F1-score performance between datasets can be attributed to inherent differences in sample size and class distribution characteristics. The limited scale of the CCLE dataset, combined with its specific class imbalance (approximately 4:1 ratio of resistant to sensitive samples), may constrain the model's capacity to fully learn the underlying data distribution, particularly for minority classes. In contrast, the GDSC dataset exhibits greater heterogeneity and a more pronounced class imbalance (approximately 8:1), which collectively contribute to increased prediction difficulty and consequently lower performance on certain metrics.  Conclusions  Accurately predicting drug response in cell lines remains a central challenge in precision medicine, with significant implications for accelerating drug development and advancing personalized treatment. However, constructing a high-accuracy predictive model capable of effectively integrating multi-source biological information is difficult due to the complexity of drug molecular structures and inherent heterogeneity of cell lines. To address this, a cell line drug response prediction model based on Graph Topology Attention Network is proposed. This model employs the graph topology attention network to extract molecular graph features of drugs, which are then fused with molecular fingerprint features. Meanwhile, multi-omics features of cell lines are integrated using an attention mechanism. Experimental results demonstrate that the proposed model achieves superior performance over existing state-of-the-art benchmarks on the employed dataset. This study provides a new perspective for predicting cell line drug response. Certain limitations are acknowledged, such as the use of only three types of omics features for cell line representation and the influence of sample size on predictive outcomes. The integration of more diverse omics features, the application of pre-trained large-scale models, and the clinical translation for personalized medicine will be the primary focus of future work.
Multi-dimensional Spatio-temporal Features Enhancement for Lip reading
MA JinLin, ZHONG YaoWei, MA RuiShi
Available online  , doi: 10.11999/JEIT251111
Abstract:
  Objective  Lip reading is a challenging yet vital frontier in computer vision, dedicated to decoding spoken language solely from visual lip movements. The difficulty arises primarily from inherent ambiguities in the visual speech signal. On one hand, articulatory movements for different visemes can be extremely subtle. for instance, lip displacement differences as small as 0.3–0.7 mm for confusable pairs such as /p/–/b/ and /m/–/n/. These fine-grained spatial variations often lie below the effective resolution limits of conventional 3D convolutional neural networks. On the other hand, the natural co-articulation in speech introduces temporal ambiguity, where mouth shapes transiently blend multiple phonemes, making it difficult to isolate distinct visual units. These challenges are further compounded by real-world variables such as uneven lighting and significant inter-speaker articulation differences. As a result, current lip reading models frequently exhibit limitations in capturing discriminative spatiotemporal features, leading to suboptimal performance—especially for phonemes with minimal visual distinctions. Motivated by these issues, this work aims to develop a robust lip reading framework capable of effectively capturing and leveraging fine-grained spatiotemporal dependencies to improve recognition accuracy under diverse and realistic conditions.  Methods  To address the aforementioned limitations, this study proposes a novel lip reading framework named the Multi-dimensional Spatio-Temporal Enhancement Network (MSTEN), which is systematically designed to enhance spatial and temporal representations through integrated attention mechanisms and advanced residual learning. The framework incorporates three core components that collaboratively model the interdependencies between spatial and temporal features—an aspect often underutilized in conventional architectures. The first component, the Self-adjusting Spatio-temporal Attention (SaSTA) module, employs a self-adjusting mechanism operating concurrently across height, width, and temporal dimensions. It generates query, key, and value tensors via 1×1×1 3D convolutions, flattens them across spatial and temporal dimensions, and computes attention weights by multiplying the query with the transposed key, followed by softmax normalization. The resulting attention map is multiplied with the value vector and then combined with the original input via learnable parameters and a residual connection to preserve contextual information, yielding globally enhanced features. The second component, the Three-dimensional Enhanced Residual Block (TE-ResBlock), augments spatiotemporal feature extraction through temporal shift, multi-scale convolution, and channel shuffle. The temporal shift operation moves a quarter of the feature channels along the time axis to fuse adjacent frame information parameter-free, while multi-scale convolution uses parallel branches with kernel sizes of 3×3, 3×1, 1×3, and 1×1 to capture diverse receptive fields. Outputs are concatenated and processed via channel shuffle to improve cross-group information flow, with four TE-ResBlocks stacked for progressive feature refinement. The third component, the Multi-dimensional Adaptive Fusion (MDAF) module, deeply integrates spatial, temporal, and channel dimensions through three sub-modules: a Channel Enhancement Module (CEM) that recalibrates features using max pooling, temporal convolution, and sigmoid activation; a Spatial Enhancement Module (SEM) that expands the receptive field via identity mapping, standard and dilated convolution; and an Adaptive Temporal Capture Module (ATCM) that emphasizes dynamic movements using frame difference features and temporal weight maps. MDAF modules are inserted between TE-ResBlock stacks for iterative refinement. Finally, features from the MSTEN front-end are fed into a Densely Connected Temporal Convolutional Network (DC-TCN) back-end, which comprises four blocks, each containing three temporally convolutional layers with dense connections, to effectively model long-range phonological dependencies.  Results and Discussions  The proposed framework is comprehensively evaluated on the widely-used LRW dataset and GRID dataset, LRW comprising over 500,000 video clips from more than 1,000 speakers, GRID dataset consists of video clips from 34 speakers, with each speaker having 1,000 utterances and a total duration of 28 hours. Our model achieves an accuracy of 91.18%, representing an absolute improvement of 2.82 percentage points over a strong ResNet18 baseline, which underscores its substantial effectiveness. Ablation studies are conducted to dissect the contribution of each key component. The results clearly demonstrate that every proposed module brings a significant performance gain. Specifically, the introduction of the SaSTA module alone leads to an accuracy improvement of 2.09%, highlighting the crucial role of global spatiotemporal attention. The TE-ResBlock contributes a 1.73% increase, confirming its efficacy in multi-scale local feature extraction and inter-frame information fusion. Moreover, the MDAF module further enhances performance by 1.74%, emphasizing the benefit of adaptive multi-dimensional feature fusion, as detailed in Table 2.  Conclusions  This study presents a significant advancement in lipreading via the introduction of the MSTEN front-end network. The work is built upon three core contributions. First, the SaSTA module introduces an innovative mechanism for global context aggregation, effectively performing multi-dimensional feature weighting across height, width, and temporal sequences. Second, the TE-ResBlock tackles fundamental challenges in spatio-temporal modeling through a unique combination of temporal displacement, multi-scale convolution, and enhanced channel-wise interaction. Third, the MDAF module facilitates deep and synergistic integration of information from spatial, temporal, and channel dimensions. Together, these components work in concert to achieve state-of-the-art performance, reaching an accuracy of 91.18% on the challenging LRW dataset and 97.82% on the GRID dataset. Ablation studies further validate the individual and collective efficacy of each proposed innovation. Looking forward, future work will explore the extension of this framework to audio-visual speech recognition under noisy conditions, as well as the development of domain adaptation strategies to enhance robustness in low-resolution or resource-constrained scenarios.
FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization
WANG Yanlin, GAO Lijiang, YANG Haigang
Available online  , doi: 10.11999/JEIT260108
Abstract:
6-input look-up tables (LUTs) are frequently used in commercial Field-Programmable Gate Arrays (FPGAs) to build programmable logic blocks, while related experiments reveal that their average application in circuits is less than 30%, resulting in a significant waste of programmable resources. In this paper, the 6-input LUTs are fractured based on fracturable factors and recombined with different granularities to construct several new Hybrid Basic Logic Elements (HBLE). Based on HBLE, several novel Hybrid Programmable Logic Block (HPLB) architectures are proposed. Then the Programmable Logic Blocks (PLB) of Xilinx is replaced by several innovative HPLB architectures. Concurrently, a statistical evaluation algorithm for the mapped netlist is proposed. Finally, several HPLB architectures are experimentally verified and evaluated as appropriate. Experimental evaluations of the three enhanced architectures show that the HPLBs achieve an average area reduction of more than 30% when compared to Xilinx’s PLBs without adding more input ports. The hybrid HPLB architectures constructed with a fracturable factor N=3 produces the best optimization results when taking into account both HPLB utilization and area optimization. Based on the MCNC and VTR benchmarks, resource consumption increased by an average of 8.27% and 27.64%, respectively, thereby improving FPGA logic efficiency.  Objective  Currently, modern commercial FPGA architectures employ 6-LUTs as the fundamental building blocks for Basic Logic Elements (BLEs). Only about 30% of the Logic Elements (LEs) in the circuit are ultimately translated to 6-LUTs when mapping 6-LUT BLEs, according to experimental results. Nevertheless, more than half of the logic resources are wasted when 6-LUTs implement functions with inputs smaller than 6. Programmable resources will unavoidably be significantly wasted as a result. A circuit design mapped to 100 4-LUTs can be mapped to 78 6-LUTs during 6-LUT mapping studies, according to experimental data, with the {6,5,4,3,2}-LUT function distribution being {23,32,17,9,13}. The findings indicate that only around 25% of the 6-LUTs are ultimately mapped to 6-input functions, with the remaining 6-LUTs being underutilized. This illustrates even more how inefficient technical mapping is for LUTs with large input K.Methods The fracturable factor N, which is the number of sub-LUTs that may be obtained from a single LUT, characterizes the fracturable and reconfigurable nature of LUT architectures in FPGAs. Motivated by this, we decompose a 6-LUT into several granularities according to the fracturable factor in order to address the previously described problem of low resource utilization. Three novel hybrid-granularity divisible logic (HBLE) structures are created by connecting and reconfiguring the resultant sub-LUTs with additional input ports and multiplexer modules. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. One undivided 6-LUT and one divisible 6-LUT, divided into two 5-LUTs with a divisibility factor N=2, make up the HBLE2 structure. One undivided 6-LUT and one divisible 6-LUT, divided into one 5-LUT and two 4-LUTs, with a divisibility factor N=3, are included in the HBLE3 structure. One undivided 6-LUT and one divisible 6-LUT, which divides into four 4-LUTs with a divisibility factor N=4, make up the HBLE4 structure. Adder units are supported by all three HBLE structures, allowing for both latched and direct combinational logic output. Additionally, they allow direct latched output by avoiding combinational logic. A Hybrid Programmable Logic Block (HPLB) is a novel structure created by merging several HBLEs. The MCNC circuit set and the VTR circuit set, the two most well-known academic circuit benchmarks (BMs), are chosen for experimental assessment. A Xilinx Virtex-7 FPGA is used to map each circuit set. The mapped netlist is then used to tally the kinds and numbers of LUTs that were utilized. The minimum number of CLBs needed is found once the data has been arranged using the corresponding greedy algorithms. Since each Xilinx CLB has eight 6-LUTs, the greedy approach uses # Total LUT Number / 8 to determine the smallest number of CLBs needed following BM mapping. In order to guarantee similar conditions, each structure also needs to be sorted using the greedy algorithm after Xilinx’s CLB structure is replaced with the HPLB structure suggested in this research. This results in the bare minimum of HPLBs needed. It is not possible to use every LUT in the mapped CLBs during actual packing owing to routing constraints. As a result, the smallest value that may be achieved in a theoretical optimization scenario is represented by the optimized result that is acquired following greedy algorithm restructuring.  Results and Discussions  The average number of HPLBs needed for both HPLB2 and HPLB3 structures drops by about 8% when CLB structures are swapped out for HPLBs in order to map the MCNC circuit set. However, the number of HPLBs needed increases by more than 30% on average as a result of the HPLB4 structure. The needed count is smaller when HPLBs are used in place of CLBs for mapping the VTR circuit set. On average, the HPLB2 and HPLB4 counts drop by less than 10%, whereas the HPLB3 count drops by around 30%. This enables SRAM scheduling and complete input pin use. On the other hand, because of resource waste, the uniform CLB structure results in higher CLB requirements when implementing functions with a tiny LUT input K. The HPLB4 structure performs worse than the HPLB3 structure, according to post-mapping HPLB counts. Both the MCNC and VTR circuit sets achieve average area reduction ratios over 30%, according to analysis of post-mapping area optimization. All three HPLB structures attained area optimization ratios of about 31% on the MCNC test set. Different optimization effects were seen in the VTR test circuit set: HPLB2 produced an average area reduction of 30.63%, whereas HPLB4 produced an average decrease of 51.21%. The HPLB2 structure produced a 45.22% area reduction, even though its optimization effect was marginally less than that of HPLB4. A thorough examination of the area optimization results showed that a higher divisibility factor N produces more noticeable benefits for integrating small-scale LUTs in circuits, resulting in higher area reduction ratios from the enhanced architectures.  Conclusions  In order to solve the issue of low resource utilization in 6-LUTs, this research proposes three split granularity-based HPLB enhancement architectures. In addition to establishing an assessment procedure and matching algorithms for the enhanced structures, these HPLBs take the place of Xilinx’s CLB structure in order to examine the new structure’s benefits in resource utilization. Based on the proportion differences of different LUTs in the post-mapping netlist, evaluation experiments using the MCNC and VTR circuit test suites show that, although HPLB4 achieves significant area optimization, it requires additional HPLBs, resulting in increased interconnect area. While both HPLB2 and HPLB3 structures obtain average area optimizations over 30%, HPLB3 produces a significantly greater HPLB count and area optimization than HPLB2 as the test circuit scale grows. Thus, after replacing the CLB structure, the HPLB3 structure provides a more balanced optimization impact, greatly improving the utilization of programmable resources when taking into account the combined aspects of HPLB usage count and area optimization.
Efficient and Verifiable Ciphertext Retrieval Scheme Based on Trusted Execution Environment
WU Axin, FENG Dengguo, ZHANG Min, CHI Jialin, YI Yuling
Available online  , doi: 10.11999/JEIT251358
Abstract:
The ciphertext retrieval mechanism enables retrieval functionality over encrypted data. Symmetric Searchable Encryption (SSE) is a critical branch of ciphertext retrieval. However, due to considerations such as saving computing power, cloud servers may return incorrect or incomplete results. Moreover, attackers can also exploit these leaked information from search and access patterns to reconstruct the keyword details. Therefore, it is necessary and meaningful to protect the privacy of search and access patterns while achieving result verifiability. Nevertheless, existing verifiable SSE schemes that support search and access pattern privacy typically rely on keyword traversal mechanisms and their verification mechanisms are inefficient, which impose high computational and communication overheads on users. To address the above performance bottlenecks, this paper introduces an efficient and verifiable ciphertext retrieval scheme based on Trusted Execution Environment (TEE). To improve the efficiency of ciphertext retrieval, this scheme employs the collaborative implementation of hardware-level security isolation and oblivious data rearrangement to achieve keyword trapdoor size independent of the size of the keyword dictionary. Meanwhile, the correctness of the returned results is verified by embedding random numbers and blinding polynomial constant terms. Thanks to these designs, the scheme achieves significant efficiency improvements. Specifically, firstly, this scheme ensures that the size of keyword trapdoors depends solely on the number of query keywords, not the global dictionary size, effectively minimizing communication and computational costs. Secondly, this scheme requires storing only two random numbers to enable verifiability, substantially minimizing local storage overhead for users. Thirdly, the adoption of techniques, such as enabling data users to retrieve results via single-server and single-round interaction and leveraging symmetric homomorphic encryption, further enhances operational efficiency. Additionally, confidential computing within TEE weakens the security assumptions and trust level towards TEE. After formally proving the security of the proposed scheme using simulation-based methods, this paper has conducted a comprehensive performance evaluation. The evaluation results confirm that this scheme is significantly more efficient than other schemes with the same functionalities.
Physical Layer Security Game for Large Language Model-Based Inference in the Maritime Network
CHEN Haoyu, XIAO Liang, XU Xiaoyu, LI Jieling, WANG Zicheng, LIU Huanhuan, CHEN Hongyi
Available online  , doi: 10.11999/JEIT251269
Abstract:
  Objective  The physical-layer security game reveals the interaction between user equipment (UE) and attackers, and provides performance bounds of anti-jamming transmission and physical-layer authentication schemes based on the equilibriums. However, existing game models overlook smart attackers that send jamming or spoofing signals, fail to account for the maritime wireless channels affected by evaporation ducts and sea wave fluctuations, and are difficult to evaluate the performance of large language models (LLMs)-based inference, such as the vessel traffic monitoring.  Methods  The anti-jamming maritime communication game for LLM inference is formulated, where the jammer first selects the jamming power and channel to reduce the signal-to-interference-plus-noise ratio at the server with less jamming cost, and the UEs then choose transmit power, channel, LLM sparsity ratio and control center to send sensing data (e.g., images, temperature, and humidity) to enhance the inference accuracy with less latency. The physical-layer authentication game for maritime wireless networks with LLM inference is further formulated. The spoofing attacker first selects the number of spoofing packets to degrade authentication accuracy with less cost. The control center then selects the fast authentication mode based on channel state or the safe authentication mode based on the received signal strength and the arrival interval of the packet from multiple ambient transmitters, and the test threshold to increase accuracy with less cost.  Results and Discussions  Based on the Stackelberg equilibrium (SE) under the LLM with 7 billion parameters, the performance bounds of the reinforcement learning (RL)-based anti-jamming inference scheme are provided to reveal the impact of evaporation duct height, wave height, maximum sparsity ratio of LLM and the quantization level on inference accuracy and latency. In addition, the performance bounds of the RL-based maritime spoofing detection scheme are provided based on the SE of the physical-layer authentication game to show the impact of the maximum number of spoofing packets on the authentication accuracy. Simulations are carried out based on the five UEs with the antenna height of 3 meters offloading the image, temperature and humidity using the transmit power up to 200 mW at 5.8 GHz with a bandwidth of 20 MHz to five control centers with antenna heights of 6 m. The jammer applies Deep Q-Network to choose the jamming power with a maximum transmit power of 200 mW for each 5.8 GHz channel, and the spoofing attacker applies the Deep Q-Network to select the number of spoofing packets up to 100. The results show that the inference accuracy and latency of the RL-based anti-jamming maritime communication scheme for LLM inference converge to the performance bounds with gaps of less than 0.6% after 2500 time slots. In addition, the RL-based authentication scheme converges after 1000 time slots with the gap of less than 1.6%.  Conclusions  In this paper, we have formulated the maritime physical-layer security game for LLM inference, addressing scenarios such as anti-jamming sensing data transmission and spoofing detection, aiming at investigating how UEs determine transmit power and channel, and how the control center selects authentication modes and test thresholds to enhance the physical-layer security mechanisms. The attacker chooses attack modes and parameters to degrade the inference accuracy, increase latency, and even cause denial-of-service. Based on the SE and the conditions, the performance bounds of the inference accuracy increase with the maximum transmit power and linearly decrease with the sparsity ratio. Furthermore, the impact of the maximum number of spoofing packets on the inference accuracy is provided. Simulation results show that the RL-based maritime physical-layer security schemes converge to the performance bounds, thereby validating the accuracy and effectiveness of the game model.
A Method for Parallel Testing of Interlayer Vias in Monolithic 3D Integrated Circuits
CHEN Tian, CHEN Weikun, LIU Jun, LIANG Huaguo, LU Yingchun
Available online  , doi: 10.11999/JEIT251375
Abstract:
  Objective  As device dimensions in conventional two-dimensional integrated circuits approach fundamental physical limits, further improvements in performance and integration density face significant challenges. Monolithic three-dimensional integrated circuits (M3D ICs), which sequentially stack multiple active device layers on a single wafer, provide an effective solution to overcome these limitations. In M3D ICs, monolithic inter-tier vias (MIVs) are employed to realize vertical interconnections between device tiers. Compared with through-silicon vias (TSVs), MIVs feature much smaller dimensions, lower parasitic capacitance, and shorter interconnect delay. However, their small electrical variations and massive quantity cause defects to manifest mainly as subtle delay shifts, posing stringent requirements on test accuracy, efficiency, and robustness against Process, Voltage, and Temperature (PVT) variations. Existing MIV testing approaches suffer from limited scalability, strong PVT sensitivity, and difficulty in simultaneously achieving small-delay defect detection and fault localization in large-scale arrays. To address these challenges, a parallel MIV testing method based on a time-to-digital converter (TDC) is presented to enable efficient and reliable testing of large MIV arrays with low area and time overhead.  Methods  Large-scale MIVs are logically organized into a two-dimensional array structure. Each basic test cell consists of a device-under-test MIV, a tri-state buffer, and a D flip-flop, and multiple cells are cascaded to form row test chains and column test chains. By systematically exploiting the inherent input capacitance mismatch between the data and clock terminals of the D flip-flop, an embedded TDC structure incorporating the MIV under test is constructed. Test stimuli are generated by a digitally controlled delay line (DCDL), which produces START and STOP pulse signals with multiplicatively adjustable phase differences and injects them into different propagation paths of the test chains, enabling time quantization through a signal chasing mechanism. Structural symmetry between the test chains is employed to mitigate the influence of PVT variations. As the START and STOP phase difference is progressively amplified, multiple TDC readings are collected to characterize defect-induced small delay variations and to distinguish them from measurement noise and PVT-induced fluctuations. After fault information is obtained for individual test chains, cross-analysis of row and column test results enables fault localization within the two-dimensional MIV array.  Results and Discussions  Simulation results based on the Nangate 45 nm standard cell library demonstrate that, under fault-free conditions, TDC readings obtained at different phase difference settings exhibit a stable linear proportional relationship (Fig. 7). Extensive Monte Carlo simulations are performed to determine a robust deviation tolerance threshold of 2, which effectively separates normal variations caused by PVT fluctuations from abnormal shifts induced by defects. Fault injection experiments verify that small delay defects occurring on both the START chain and the STOP chain can be effectively detected and distinguished (Fig. 8). In terms of quantitative detection capability, the minimum detectable resistive open defect is approximately 8.4 kΩ, while the maximum detectable leakage defect and resistive short defect are about 67 kΩ and 32 kΩ, respectively, outperforming existing methods (Fig. 9). Moreover, the row–column decomposition architecture effectively alleviates the growth of test time as the MIV array size increases, resulting in a substantial reduction in overall test overhead. Area evaluation indicates that the average area overhead of the embedded built-in self-test structure is only 5.594 µm2 per MIV, making it suitable for high-density M3D integration.  Conclusions  A parallel TDC-based testing approach for large-scale MIV arrays is presented, which combines row–column decomposition, phase-difference multiplication, and proportional deviation-based decision mechanisms to achieve efficient detection and accurate localization of both hard faults and small delay defects. Structural symmetry within the test chains effectively enhances robustness against PVT variations. Simulation results confirm that the proposed method can reliably detect resistive open, leakage, and short defects while maintaining low area and time overhead. Compared with existing techniques, a favorable balance among test accuracy, PVT robustness, test efficiency, and hardware cost is achieved. Owing to its scalability and practical feasibility, the proposed approach provides an effective and reliable solution for MIV testing in advanced monolithic three-dimensional integrated circuits.
Modulation Recognition Method for High-Speed Mobile Communication Based on Attention Dynamic Fusion and Hybrid Pruning Transformer
ZHENG Qinghe, CHEN Bin, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT251211
Abstract:
  Objective  Automatic modulation recognition is a critical preprocessing step in dynamic spectrum access and anti-jamming communication systems, directly impacting the robustness and spectrum efficiency of non-cooperative communication. In high-speed mobile communication scenarios such as satellite, high-speed rail, and drone swarm communications, signal modulation features suffer severe distortion due to Doppler shifts, time-varying channels, and non-stationary interference. The above issues pose significant challenges to traditional modulation recognition methods based on static assumptions, leading to feature mismatch and increased misjudgment rates. To address the issues of insufficient robustness and real-time performance in existing deep learning-based modulation recognition models under high-speed mobile environments, this paper proposes a lightweight dynamic fusion Transformer-based approach.  Methods  The proposed method consists of three main components: signal representation fusion block, Transformer model design, and model pruning for lightweight inference. First, a RollingQ mechanism is introduced to dynamically adjust the direction of attention query matrix based on the quality of each signal representation, breaking the cycle of attention fixation and achieving the balanced utilization of all types of signal representations. Then, the multi-head attention frequency enhancement Transformer (MAFE-Transformer) is designed, which integrates local and global spatiotemporal features through modules including lightweight convolutional enhancement, multi-attention feature extraction, and frequency learning and selection. Finally, an attention-based dynamic hybrid pruning strategy is applied to reduce structural redundancy and accelerate inference, enabling real-time modulation recognition.  Results and Discussions  Extensive experiments are conducted on two public datasets, RadioML 2016.10a and RML22, to validate the effectiveness of the proposed method. The MAFE-Transformer achieves average classification accuracies of 65.14% and 78.40% on the two datasets, respectively. Under low SNR conditions of –20~0 dB, the model demonstrates strong robustness, particularly on the RML22 dataset with dynamic channel model ETU70 (Fig. 5). The confusion matrix shows that the error distribution of MAFE-Transformer is relatively uniform among different modulation schemes, reflecting its well-balanced classification performance (Fig. 6). Ablation studies confirm that the RollingQ-based dynamic fusion mechanism improves accuracy by 7.2% on RadioML 2016.10a and 9.5% on RML22 compared to single signal representation (Fig. 7). The hybrid pruning strategy reduces inference latency to 2.2 ms per signal while maintaining high accuracy (Fig. 8). Comparative experiments show that the proposed model outperforms several state-of-the-art deep learning models (e.g., Ms-RaT, MobileViT, MobileRaT, and KA-CNN) by 4%–10% in recognition accuracy, demonstrating superior performance in high-speed mobile communication scenarios (Fig. 9).  Conclusions  This paper proposes a lightweight dynamic fusion Transformer-based automatic modulation recognition method to address the challenges of robustness and real-time performance in high-speed mobile communication environments. By introducing RollingQ mechanism and the MAFE-Transformer structure combined with dynamic hybrid pruning, the proposed method achieves a better trade-off between recognition accuracy and inference efficiency. Experimental results on public datasets confirm its effectiveness and robustness under complex channel conditions with Doppler shifts and time-varying interference. However, the proposed method has not been systematically evaluated under more complex interference such as impulsive noise or frequency-selective fading. Future work will focus on improving adaptability to non-stationary noise, cross-device generalization, and optimization for edge deployment.
Design and Verification of Robust Modulation Recognition Framework Under Blind Adversarial Attacks
ZHENG Qinghe, ZHOU Fuhui, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT260019
Abstract:
  Objective  Deep learning-based automatic modulation recognition (AMR) models have demonstrated superior performance in non-cooperative communication systems such as cognitive radio and spectrum monitoring. However, the inherent vulnerability of deep learning models to adversarial attacks, where imperceptible perturbations can cause catastrophic misclassification, poses the severe security threat. Existing defense methods, including adversarial training, often rely on prior knowledge of specific attacks, incur significant computational overhead, and face the trade-off between robustness and accuracy on clean samples. To address these limitations, this paper aims to design and validate a robust modulation recognition framework that can operate effectively under blind adversarial attack scenarios without prior knowledge of the attack type and strategy, thereby ensuring the reliable deployment of intelligent communication systems in adversarial environments.  Methods  The proposed framework integrates a novel feature-purifying autoencoder module with standard modulation classifiers (CNN and Transformer). The core innovation lies in the autoencoder’s bottleneck layer, which incorporates a dynamic purification mechanism. This mechanism first calculates an adaptive threshold based on the statistical properties of the encoded latent features to identify anomalies. Subsequently, the Top-K sparsification operation selectively preserves only the most significant feature activations, effectively suppressing noise and adversarial perturbations while retaining essential signal characteristics. Then the autoencoder is trained via a three-stage curriculum learning strategy that sequentially optimizes reconstruction fidelity, feature sparsity, and semantic consistency between the purified and original clean signals, ensuring the output aligns with the true modulation manifold. This model-agnostic module can be seamlessly prepended to any trained classifier without retraining.  Results and Discussions  Comprehensive experiments are conducted on a simulated dataset encompassing 12 digital modulation types under multipath fading channels. The framework demonstrated substantial performance improvements. For the CNN and Transformer, the recognition accuracies under challenging targeted white-box attacks increased to 82.1% and 83.2%, and under non-targeted black-box attacks reached 87.7% and 89.4%, respectively (Table 1). The attack success rate (ASR) and attack effectiveness index (AEI) remained at low levels, confirming strong defensive capability. Figure 4 shows that defense efficacy improves with higher SNR. Crucially, the ablation study in Figure 5 highlights the indispensable role of the autoencoder, whose removal caused accuracy to plummet by 4.02% and 2.36% on CNN and Transformer under strong attacks. Further analysis (Figure 6) indicates that the framework maintains robustness across a wide range of perturbation bounds (\begin{document}$ \epsilon \leq 0.1 $\end{document}). Moreover, parameter sensitivity studies (Figures 7 and 8) show stable performance for threshold coefficient \begin{document}$ \xi $\end{document} in [1.5, 1.9] and sparsity rate k around 0.7, confirming its practical deployability.  Conclusions  This paper presents a robust, blind defense framework for robust AMR based on the feature-purifying autoencoder. The key advantages are threefold: 1) It provides effective defense against diverse white-box and black-box attacks without requiring any prior knowledge of various attack methods, achieving true blind defense; 2) As a preprocessing module, it eliminates the need for computationally expensive retraining of the primary classifier and is compatible with various backbone networks; 3) The multi-stage training strategy successfully balances robustness against attacks with the preservation of high accuracy on clean samples. Finally, experimental results on the comprehensive dataset validate the framework’s superiority. Future work will focus on lightweight architectural designs to reduce inference latency and further investigate performance boundaries under extreme low-SNR conditions combined with complex nonlinear channel impairments.
UWF-YOLO: A Lightweight Framework for Underwater Object Detection via Redundant Information Optimization
HOU Guojia, MA Jiaqi, WANG Yuechuan, HUANG Baoxiang, LI Kunqian
Available online  , doi: 10.11999/JEIT251129
Abstract:
  Objective  The rapid development of underwater imaging technology has significantly elevated the importance of underwater object detection for resource exploration and environmental monitoring applications. Generally, complex underwater environments yield various degradations of image quality such as color casts, haze-like effects, and non-uniform illumination. Unfortunately, existing vision-based object detection algorithms always suffer from unpleasing performance and notable limitations especially for detecting small objects, resulting in missed detections and false positives. Moreover, existing deep learning based underwater detection models also face substantial challenges in striking an optimal balance between accuracy and lightweight design under the condition of limited equipment resources. To address these issues, it is of great importance to design efficient underwater object detection methods in view of water-related vision tasks, which play a crucial role in marine resource exploration, ecological monitoring, underwater robotics, and intelligent perception systems for autonomous underwater vehicles.  Methods  In this paper, we propose a novel lightweight framework based on redundant information optimization for underwater object detection. Technically, we propose a lightweight underwater object detection network called UWF-YOLO based on redundancy information optimization. First, the C2f module is reconstructed by FasterNet Block to optimize both the backbone and neck networks, and a feature channel selection mechanism is incorporated to reduce the redundant features. On other hand, due to the redundant traditional convolutional features in the YOLO neck, it is difficult to adapt to the underwater environment. Ghost Convolution is also introduced to generate the Ghost feature map for enhancing the multi-scale feature fusion capability of the neck network. Next, our proposed method achieves parameter sharing by replacing the original detection head with a redundant optimization group detection head (RRG-Head) based on group convolution, thereby reducing computational costs. Finally, the structured channel pruning technique is applied to identify the inter-layer dependencies of the graph and bind the pruning units. Combined with the LAMP weight magnitude score normalization for evaluating the importance of channels, the low-contributing groups are pruned and fine-tuned to achieve network size compression. In addition, since the scene of underwater detection datasets are typically monotonous and the underwater objects contained in the available datasets are usually small and clustered. We also construct an underwater object detection dataset with complex scene, namely CSUOD, by collecting real-world underwater images from different websites and platforms to ensure both its diversity and authenticity, followed by manual annotation and resolution normalization preprocessing. CSUOD is specifically designed for various challenging underwater environments characterized by color casts, haze-like effects, and non-uniform illumination. In our CSUOD, we manually select 1135 images containing 6 different types, and perform the manual annotation and resolution standardization operations.  Results and Discussions  Extensive experiments are conducted on three public underwater object detection datasets (i.e., DUO, RUOD, and TrashCan) by comparing several popular and widely used object detection methods. The proposed model is evaluated against mainstream detectors, including YOLOv5s, YOLOv7-tiny, YOLOv8s, YOLOv9-tiny, and Deformable DETR. In computational complexity assessment, experimental results show that the proposed method has reduced the FLOPs, model size, and parameters by 60.4%, 77.3%, and 78.4%, respectively, compared to the baseline. In addition, our method has outperformed YOLOv9-tiny with comparable parameters by 0.3%, 2.3%, and 3.4% in mAP across the three datasets. Also, some comparative results on our established CSUOD dataset also indicate that our proposed model has a good improvement and stability even in complex underwater environments. Qualitative visualization results further illustrate the model’s robustness and detection stability under various underwater degradations, such as haze-like effects and non-uniform illumination.  Conclusions  Quantitative and qualitative experiments on different datasets have validated the effectiveness and robustness of the proposed method. In addition, our method achieves superior detection performance in complex underwater environments, effectively solving missed detections and false positives caused by background interference. A large number of experimental results show that our designed UWF-YOLO can not only achieve significant light weighting, but also maintain the comparable detection accuracy comparing with the benchmark model. This balance between the detection accuracy and low computational cost makes it particularly suitable for underwater devices with limited resources. Besides, the proposed method has great potential in practical scenarios such as marine ecological monitoring, underwater resource exploration, and autonomous underwater vehicle perception systems. It also provides a reliable and efficient technical foundation for real-time applications, with strong adaptability to different underwater conditions, efficient integration into embedded platforms, and support for real-time perception and decision-making. Our constructed dataset CSUOD in this study will help address the limitations of existing underwater object detection datasets and promote the development of underwater object detection. In the future, this work can be further extended to multi-modal perception systems and larger-scale datasets. These efforts will enable adaptive models for more dynamic underwater scenarios and support broader applications in intelligent ocean observation and autonomous navigation.
Performance Analysis and Rapid Prediction of Long-range Underwater Acoustic Communications in Uncertain Deep-sea Environments
CHEN Xiangmei, TAI Yupeng, WANG Haibin, HU Chenghao, WANG Jun, WANG Diya
Available online  , doi: 10.11999/JEIT251244
Abstract:
  Objective  In complex and dynamically changing deep-sea environments, the performance of underwater acoustic communications shows substantial variability. Feedback-based channel estimation and parameter adaptation are impractical in long-range scenarios because platform constraints prevent reliable feedback channels and the slow propagation of sound introduces significant delay. In typical long-range systems, environmental dynamics are often ignored and communication parameters are selected heuristically, which frequently leads to mismatches with actual channel conditions and causes communication failures or reduced efficiency. Predictive methods able to assess performance in advance and support feed-forward parameter adjustment are therefore required. This study proposes a deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications under uncertain environmental conditions to enable efficient and reliable parameter–channel matching without feedback.  Methods  A feed-forward method for underwater acoustic communication performance analysis and rapid prediction is developed using deep-learning-based sound-field uncertainty estimation. A neural network is first used to estimate probability distributions of Transmission Loss (TL PDFs) at the receiver under dynamic environments. TL PDFs are then mapped to probability distributions of the Signal-to-Noise Ratio (SNR PDFs), enabling communication performance evaluation without real-time feedback. Statistical channel capacity and outage capacity are analyzed to characterize the theoretical upper limits of achievable rates in dynamic conditions. Finally, by integrating the SNR distribution with the bit-error-rate characteristics of a representative deep-sea single-carrier communication system under the corresponding channel, a rate–reliability prediction model is constructed. This model estimates the probability of reliable communication at different data rates and serves as a practical tool for forecasting link performance in highly dynamic and feedback-limited underwater acoustic environments.  Results and Discussions  The method is validated using simulation data and sea trial data. The TL PDFs predicted by the deep learning model show strong consistency with the traditional Monte Carlo (MC) method across multiple receiver locations (Fig. 6). Under identical computational settings, deep-learning-based TL PDF prediction reduces computation time by 2\begin{document}$ \sim $\end{document}3 orders of magnitude compared with the MC method. The chained mapping from TL PDFs to SNR PDFs and then to channel capacity metrics accurately represents the probabilistic features of communication performance under uncertain conditions (Fig. 7 and Fig. 8). The rate–reliability curves derived from the deep-learning-based TL PDFs are highly consistent with MC-based results. In the high sound-intensity region, prediction errors for reliable communication probabilities across data rates range from 0.1% to 3%, and in the low sound-intensity region errors are approximately 0.3% to 5% (Fig. 12). Sea trial results further indicate that predicted rate–reliability performance agrees well with measured data. In the convergence zone, deviations between predicted and measured reliability probabilities at each rate range from 0.9% to 4%, and in the shadow zone from 1% to 9% (Fig. 18). Under a 90% reliability requirement, the maximum achievable rates predicted by the method match the measurements in both the convergence and shadow zones, demonstrating accuracy and practical applicability in complex channel environments.  Conclusions  A deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications in uncertain deep-sea environments is developed and validated. The framework builds a chained mapping from environmental parameters to TL PDFs, SNR PDFs, and communication performance metrics, enabling quantitative capacity assessment under dynamic ocean conditions. Predictive “rate–reliability’’ profiles are obtained by integrating probabilistic propagation characteristics with the performance of a representative deep-sea single-carrier system under the corresponding channel, providing guidance for parameter selection without feedback. Sea trial results confirm strong agreement between predicted and measured performance. The proposed approach offers a technical pathway for feed-forward performance analysis and dynamic adaptation in long-range deep-sea communication systems, and can be extended to other communication scenarios in dynamic ocean environments.
Towards Privacy-Preserving and Lightweight Modulation Recognition for Short-Wave Signals under Channel Shifts
YAO Yizhou, DENG Wen, LI Baoguo
Available online  , doi: 10.11999/JEIT251017
Abstract:
  Objective  Existing short-wave signal modulation recognition methods based on the supervised learning paradigm typically assume that training data (source domain) and test data (target domain) follow identical distributions. However, short-wave channels are susceptible to ionospheric variations, leading to significant distribution discrepancies across domains, which consequently causes model performance degradation. Furthermore, deployment on the edge side of unmanned platforms is constrained by limited device resources, scarce labeled samples, and data privacy requirements. To address these challenges, a lightweight recognition method based on source-model transfer is proposed in this paper, enabling privacy-preserving model adaptation without the need to access source domain data.  Methods  A multi-modal source-model transfer framework (M-SMOT) is developed, which utilizes information maximization loss and self-supervised pseudo-labeling techniques to facilitate model adaptation without revisiting source domain data. This approach achieves effective cross-channel recognition of short-wave modulation signals while reducing computational resource consumption and preserving data privacy. Additionally, multi-modal information—comprising in-phase/quadrature (I/Q) components, amplitude-phase (AP) characteristics, and spectral features—is fused to leverage complementary feature representations, thereby enhancing the robustness of the recognition network against complex channel variations.  Results and Discussions  Experimental results demonstrate that the recognition performance of the proposed method consistently surpasses that of the Source-Only baseline across six cross-channel scenarios, with improvements ranging from 0.31% to 10.81% (Table 1). In terms of few-shot adaptation, average recognition accuracies are maintained at 98.3% and 96% relative to the full-sample baseline, even when target domain training samples are reduced to 10% and 1%, respectively (Fig. 12). Ablation studies verify the necessity and effectiveness of the self-supervised pseudo-labeling module (Fig. 16) and the multi-modal fusion strategy (Fig. 17), confirming that both components contribute to the overall performance. Furthermore, the lightweight advantages are quantified: the method requires zero storage for source data, exhibits a peak memory consumption of only 6.00 MB, and achieves convergence within a single fine-tuning epoch (Table 2). These findings validate the capability of the proposed mechanism to mitigate domain discrepancies and protect privacy under resource-constrained conditions.  Conclusions  The M-SMOT method successfully integrates data privacy protection, source model adaptation, few-shot generalization, and low resource consumption. Consequently, it provides a practical solution for cross-channel modulation recognition in short-wave communications, demonstrating significant potential for deployment on resource-limited edge devices.
Indoor Visible Light Positioning Based on CNN–MLP Multi-Feature Fusion under Random Receiver Tilt Conditions
JIA Kejun, WANG Jian, MAO Lifei, YOU Wei, HUANG Ziyang, PENG Duo
Available online  , doi: 10.11999/JEIT251021
Abstract:
  Objective  Traditional visible light positioning (VLP) methods based on received signal strength (RSS) suffer from instability when the receiver experiences orientation perturbations, which disrupt the correspondence between optical power and spatial position, making reliable three-dimensional (3D) positioning difficult to achieve. Existing approaches typically rely on inertial measurement units (IMUs) to obtain orientation information; however, sensor fusion increases system complexity and hardware cost and introduces cumulative errors. To address these issues, this paper proposes a positioning method that fuses cosine-of-incidence-angle estimation based on a photodiode (PD) array with RSS information, enabling high-accuracy 3D indoor positioning under receiver orientation perturbations.  Methods  In the proposed fusion-based positioning method, a multi-PD array structure is first adopted, and a local coordinate system (LCS) is established at the array center. Constraint equations are then constructed based on the differences in received optical power among PDs in the array. A Gauss–Newton iterative algorithm is employed to estimate the incident light direction vector. By exploiting the orthogonal rotation invariance between the LCS and the global coordinate system (GCS), the cosine of the incident angle is estimated without the need for orientation sensors. Subsequently, a serial CNN–MLP fusion network is constructed, in which the estimated incident-angle cosine is introduced as an additional positioning feature on top of RSS-based localization. The network jointly models the RSS and incident-angle cosine information received by the PD array and maps them to 3D spatial coordinates. Finally, training samples are generated using Latin hypercube sampling (LHS) to uniformly sample spatial positions and orientation dimensions, thereby improving the representativeness of the training dataset.  Results and Discussions  Simulation experiments are conducted in a 4 m × 4 m × 2.5 m indoor environment. First, the effects of different numbers of PDs and tilt angles on the accuracy of incident-angle cosine estimation and spatial coverage are evaluated (Fig. 6), and the cumulative distribution functions (CDFs) of positioning errors under different array configurations are compared (Fig. 7). The results show that a 3-PD array with a tilt angle of 40° achieves the best balance among cost, coverage, and positioning accuracy. Next, positioning performance under different receiver tilt angles is analyzed. When the tilt angle is small, more than 70% of positioning errors are below 5 cm; even when the receiver is tilted up to 55°, the average error remains within 11.7 cm (Fig. 8). Error component comparisons indicate that the error along the Z-axis is significantly smaller than those along the X and Y axes (Fig. 9). Further tests are conducted at a height of 0.0 m covered by the training data and at an unseen height of 0.6 m not included in the training set (Fig. 10). The results demonstrate that the proposed model does not exhibit strong dependence on a specific height plane and maintains stable 3D positioning performance at unseen heights. Finally, the proposed method is compared with related positioning schemes. It outperforms existing methods in terms of CDF convergence speed, RMSE, and standard deviation (Fig. 11), achieving an average error reduction of approximately 2.5 cm and an RMSE reduction of 31.58% compared with Ref. [12].  Conclusions  This paper estimates the cosine of the incident angle at the receiver by exploiting differences in the optical power received by different PDs in an array and introduces this cosine value as a joint positioning feature into conventional RSS-based localization, thereby alleviating the instability of position mapping caused by relying solely on RSS under random receiver perturbations. By further combining the spatial feature extraction capability of CNNs with the nonlinear modeling strength of MLPs, the proposed method effectively maps positioning features to 3D spatial coordinates. The approach reduces reliance on orientation sensors such as IMUs while overcoming the susceptibility of traditional geometric positioning methods to noise and high-dimensional nonlinear features. Under varying heights and receiver orientations, the proposed algorithm demonstrates significant advantages in both positioning accuracy and stability.
Inverse Design of a Silicon-Based Compact Polarization Splitter-Rotator
HUI Zhanqiang, ZHANG Xinglong, HAN Dongdong, LI Tiantian, GONG Jiamin
Available online  , doi: 10.11999/JEIT250858
Abstract:
  Objective  The Polarization Splitter-Rotator (PSR) is a key device used to control the polarization state of light in Photonic Integrated Circuits (PICs). Device size has become a major constraint on integration density in PICs. Traditional design methods are time-consuming and tend to yield larger device footprints. Inverse design, by contrast, determines structural parameters through optimization algorithms according to target performance and enables compact devices to be obtained while maintaining functionality. This strategy is now applied to wavelength and mode division multiplexers, all-optical logic gates, power splitters, and other integrated photonic components. The objective of this work is to use inverse design to address size limitations in silicon-based PSRs by combining the Momentum Optimization algorithm with the Adjoint Method. This combined approach improves the integration level of PICs and provides a feasible pathway for the miniaturization of other photonic devices.  Methods  The design region is defined on a 220 nm Silicon-on-Insulator (SOI) wafer and is discretized into 25×50 cylindrical elements. Each element has a 50 nm radius, a 150 nm height, and an initial relative permittivity of 6.55. The adjoint method is used to obtain gradient information across the design region, and this gradient is processed with the Momentum Optimization algorithm. The relative permittivity of each element is then updated according to the processed gradient. During optimization, the momentum factor is dynamically adjusted with the iteration number to accelerate convergence, and a linear bias is applied to guide the permittivity toward the values of silicon and air as the iterations progress. After optimization, the elements are binarized based on their final permittivity: values below 6.55 are assigned to air, whereas values above 6.55 are assigned to silicon. This results in a structure containing irregularly distributed air holes. To compensate for performance loss introduced during binarization, the etching depth of air holes with pre-binarization permittivity between 3 and 6.55 is optimized. Adjacent air holes are merged to reduce fabrication errors. The final device consists of air holes with five radii, among which three larger-radius types are selected for further refinement. Their etching radii and depths are optimized to recover remaining performance loss. Device performance is evaluated through numerical analysis. Calculated parameters include Insertion Loss (IL), Crosstalk (CT), Polarization Extinction Ratio (PER), and bandwidth. Tolerance analysis is also conducted to assess robustness under fabrication variations.  Results and Discussions   A compact PSR is designed on a 220 nm SOI wafer with dimensions of 5 μm in length and 2.5 μm in width. During optimization, the momentum factor in the Momentum Optimization algorithm is dynamically adjusted. A larger momentum factor is applied in the early stage to accelerate escape from local maxima or plateau regions, whereas a smaller momentum factor is used in later iterations to increase the weight of the current gradient. Compared with other optimization strategies, this algorithm requires only 20%~33% of the iteration count needed by alternative methods to reach a Figure of Merit (FOM) of 1.7, which improves optimization efficiency. Numerical analysis shows that the device achieves stable performance across the 1 520~1 575 nm wavelength range. The IL remains low (TM0 < 1 dB, TE0 < 0.68 dB), and the CT is effectively suppressed (TM0 < –23 dB, TE0 < –25.2 dB). The PER is high (TM0 > 17 dB, TE0 > 28.5 dB). Tolerance analysis indicates strong robustness to fabrication variations. Within the 1 520~1 540 nm range, performance remains stable under etching depth offsets of ±9 nm and etching radius offsets of ±5 nm, demonstrating reliable manufacturability.  Conclusions   Numerical analysis demonstrates that combining the adjoint method with the Momentum Optimization algorithm is a feasible strategy for designing an integrated PSR. The design principle relies on controlling light propagation through adjustments to the relative permittivity, which determine the distribution and placement of air holes to achieve polarization splitting and rotation. Compared with traditional design approaches, inverse design uses the design region more efficiently and enables a more compact device structure. The proposed PSR is markedly smaller and shows enhanced fabrication tolerance. It is suitable for future large-scale PICs and provides useful guidance for the miniaturization of other photonic devices.
Research on UAV Swarm Radiation Source Localization Method Based on Dynamic Formation Optimization
WU Sujie, WU Binbin, YANG Ning, WANG Heng, GUO Daoxing, GU Chuan
Available online  , doi: 10.11999/JEIT251023
Abstract:
In dense and structurally complex urban environments, Unmanned Aerial Vehicle (UAV) swarm radiation source localization is affected by signal attenuation, multipath propagation, and building obstructions. To address these limitations, a dynamic formation-optimization method for UAV swarms is proposed. By improving the geometric configuration of the swarm, the method reduces path loss and interference, which strengthens localization accuracy. Received signal strength is used to evaluate signal quality in real time and supports adaptive formation adjustments that improve propagation conditions. Geometric dilution of precision and root mean square error metrics are integrated to refine swarm geometry and improve distance-estimation reliability. Simulation results show that the proposed method converges faster and improves localization accuracy in complex urban environments, reducing errors by more than 80 percent. The method adapts to environmental variation and demonstrates strong robustness and practical value.  Objective  UAV swarm localization and formation control in urban environments are affected by obstacles, signal attenuation, and rapid variation in the surroundings that reduce the reliability of conventional methods. This study proposes a radiation source localization approach that integrates the Received Signal Strength Indicator (RSSI) with dynamic formation adjustment to improve localization accuracy and strengthen system robustness in complex urban scenarios. RSSI is used once in full form, then referenced consistently.  Methods  The method uses RSSI measurements to estimate the distance to the radiation source and adjusts UAV swarm formation in real time to reduce localization errors. These adjustments are based on feedback that reflects relative positions, signal strength, and environmental variation. Localization accuracy is strengthened through a multi-sensor fusion strategy that integrates GPS, IMU, and depth-camera data. A data-quality assessment mechanism evaluates signal reliability and triggers formation adaptation when the signal drops below a predefined threshold. This optimization process reduces positioning errors and improves system robustness.  Results and Discussions  Simulation experiments in a ROS-based environment were conducted to evaluate the UAV swarm localization method under urban obstacles and multipath conditions. The swarm began in a hexagonal formation and adjusted its geometry according to environmental variation and localization confidence (Fig. 34). As shown in Fig. 5, localization errors fluctuated during initialization but converged to below 1 m after 150 s. Formation comparisons (Fig. 6) showed that symmetric structures such as hexagonal and triangular formations maintained errors below 0.5 m, whereas asymmetric formations (T and Y shape) produced deviations up to 4.9 m. Further comparisons (Fig. 7) showed that traditional RSSI saturated near 15 m, direction of arrival fluctuated between 5 and 14 m, and time difference of arrival failed due to synchronization problems. The proposed method achieved sub-meter accuracy within 60 s and remained robust throughout the mission. These findings indicate that combining RSSI-based distance estimation with dynamic formation adjustment improves localization accuracy, convergence speed, and adaptability under complex environmental conditions.  Conclusions  This study addresses UAV swarm localization in complex urban environments by integrating RSSI-based distance estimation, dynamic formation adjustment, and multi-sensor fusion. ROS-based simulations show that: (1) localization errors converge rapidly to sub-meter levels, reaching below 1 m within 150 s under non-line-of-sight conditions; (2) symmetric formations such as hexagonal and triangular configurations outperform asymmetric ones and reduce errors by up to 67 percent compared with fixed Y-shaped formations; and (3) relative to traditional RSSI, direction of arrival, and time difference of arrival approaches, the proposed method shows faster convergence, higher stability, and stronger robustness.
Conditional Generative Adversarial Networks-based Channel Estimation for ISAC-RIS System
LIU Yu, ZHENG Zelin, LIU Gang
Available online  , doi: 10.11999/JEIT251168
Abstract:
  Objective  In RIS-assisted ISAC systems, accurate channel estimation is crucial to ensure reliable operation. Although traditional deep learning methods can partially address the channel estimation problem, their generalization ability and estimation accuracy remain limited in complex multi-user channel environments. To tackle these challenges, this paper proposes a two-stage channel estimation method based on Conditional Generative Adversarial Network(CGAN) for RIS-assisted multi-user ISAC systems, aiming to enhance both the accuracy and stability of channel estimation.  Methods  This paper proposes a two-stage channel estimation method based on CGAN for estimating the SAC channels in RIS-assisted multi-user ISAC systems. By adjusting the switching states of the RIS, the overall estimation problem is decomposed into subproblems, enabling sequential estimation of the direct and reflected channels. Within the proposed CGAN framework, the adversarial training between the generator and discriminator allows the model not only to learn the mapping relationship between the observed signals and the true channels but also to optimize the output according to the discriminator’s feedback, thereby effectively improving both training efficiency and estimation accuracy.  Results and Discussions  Extensive simulation experiments were conducted to verify the effectiveness of the proposed method. First, the estimation performance of the SAC channel under different SNR conditions was compared. The results demonstrate that the proposed CGAN-based method achieves significantly better NMSE performance than the LS benchmark and traditional models such as FNN and ELM (Fig. 4). Then, the impact of increasing the number of antennas and RIS elements on SAC channel estimation performance was investigated. Compared with the LS benchmark, the proposed CGAN method consistently maintains superior performance under various SNR conditions (Figs. 5 and 6).  Conclusions  This paper investigates the channel estimation problem in RIS-assisted multi-user ISAC systems and proposes a two-stage channel estimation method based on CGAN. By adjusting the switching states of the RIS and employing adversarial training between the generator and discriminator networks, the proposed method achieves accurate estimation of the SAC channel. Simulation results demonstrate that, under various SNR conditions and channel dimensions, the CGAN-based estimation method exhibits strong generalization capability and significantly outperforms the benchmark schemes in estimation accuracy. Therefore, it shows great potential as an effective solution for enhancing system stability and efficiency.
Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning in Wireless Networks
LIU Jingyuan, MA Ke, XU Runchen, CHANG Zheng
Available online  , doi: 10.11999/JEIT251221
Abstract:
  Objective  Multimodal Federated Learning (MFL) uses complementary information from multiple modalities, yet in wireless edge networks it is restricted by limited energy and frequent missing modalities because many clients store only images or only reports. This study presents Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning (CREEMFL), which applies selective completion and joint communication–computation optimization to reduce training energy under latency and wireless constraints.  Methods  CREEMFL completes part of the incomplete samples by querying a public multimodal subset, and processes the remaining samples through zero padding. Each selected user downloads the global model, performs image-to-text or text-to-image retrieval, conducts local multimodal training, and uploads model updates for aggregation. An energy–delay model couples local computation and wireless communication and treats the required number of global rounds as a function of retrieval ratios. Based on this model, an energy minimization problem is formulated and solved using a two-layer algorithm with an outer search over retrieval ratios and an inner optimization of transmission time, Central Processing Unit (CPU) frequency, and transmit power.  Results and Discussions  Simulations on a single-cell wireless MFL system show that increasing the ratio of completing text from images improves test accuracy and reduces total energy. In contrast, a large ratio of completing images from text provides limited accuracy gain but increases energy consumption (Fig. 3, Fig. 4). Compared with four representative baselines, CREEMFL achieves shorter completion time and lower total energy across a wide range of maximum average transmit powers (Fig. 5, Fig. 6). For CREEMFL, increased system bandwidth further reduces completion time and energy consumption (Fig. 7, Fig. 8). Under different user modality compositions, CREEMFL also attains higher test accuracy than local training, zero padding, and cross-modal retrieval without energy optimization (Fig. 9).  Conclusions  CREEMFL integrates selective cross-modal retrieval and joint communication–computation optimization for energy-efficient MFL. By treating retrieval ratios as variables and modeling their effect on global convergence rounds, it captures the coupling between per-round costs and global training progress. Simulations verify that CREEMFL reduces training completion time and total energy while preserving classification accuracy in resource-constrained wireless edge networks.
Finite-time Adaptive Sliding Mode Control of Servo Motors Considering Frictional Nonlinearity and Unknown Loads
ZHANG Tianyu, GUO Qinxia, YANG Tingkai, GUO Xiangji, MING Ming
Available online  , doi: 10.11999/JEIT250521
Abstract:
  Objective  Ultra-fast laser processing with an infinite field of view requires servo motor systems with superior tracking accuracy and robustness. However, such systems are highly nonlinear and affected by coupled unknown load disturbances and complex friction, which constrain the performance of conventional controllers. Although Sliding Mode Control (SMC) exhibits inherent robustness, traditional SMC and observer designs cannot achieve accurate finite-time disturbance compensation under strong nonlinearities, thus limiting high-speed and high-precision trajectory tracking. To address this limitation, a novel finite-time adaptive SMC approach is proposed to ensure rapid and precise angular position tracking within a finite time, satisfying the stringent synchronization requirements of advanced laser processing systems.  Methods  A novel control strategy is developed by integrating an adaptive disturbance observer fused with a Radial Basis Function Neural Network (RBFNN) and finite-time Sliding Mode Control (SMC). First, the unknown load disturbance and complex frictional nonlinear dynamics are combined into a unified "lumped disturbance" term, improving model generality and the ability to represent real operating conditions. Second, a finite-time adaptive disturbance observer is constructed to estimate this lumped disturbance. The observer utilizes the universal approximation capability of the RBFNN to learn and approximate the dynamic characteristics of unknown disturbances online. Simultaneously, a finite-time adaptive law based on the error norm is introduced to update the neural network weights in real time, ensuring rapid and accurate finite-time estimation of the lumped disturbance while reducing dependence on precise model parameters. Based on this design, a finite-time SMC is developed. The controller uses the observer’s disturbance estimation as a feedforward compensation term, incorporates a carefully formulated finite-time sliding surface and equivalent control law, and introduces a saturation function to suppress control input chattering. A suitable Lyapunov function is then constructed, and the finite-time stability theory is rigorously applied to prove the practical finite-time convergence of both the adaptive observer and the closed-loop control system, guaranteeing that the system tracking error converges to a bounded neighborhood near the origin within finite time.  Results and Discussions  To verify the effectiveness and superiority of the proposed control strategy, a typical Permanent Magnet Synchronous Motor (PMSM) servo system model is constructed in the MATLAB environment, and a simulation scenario with desired trajectories of varying frequencies is established. The proposed method is comprehensively compared with the widely used Proportional–Integral (PI) control and the advanced method reported in reference [7]. Simulation results demonstrate the following: 1. Tracking performance: Under various reference trajectories, the proposed controller enables the system to accurately follow the target trajectory with a tracking error substantially smaller than that of the PI controller. Compared with the method in reference [7], it achieves smoother responses and smaller residual errors, effectively eliminating the chattering observed in some operating conditions of the latter. 2 Disturbance rejection and robustness: The adaptive disturbance observer based on the RBFNN rapidly and effectively learns and compensates for the lumped disturbance composed of unknown load variations and frictional nonlinearities. Even in the presence of these disturbances, the proposed controller maintains high-precision trajectory tracking, demonstrating strong disturbance rejection and robustness to system parameter variations. 3. Control input characteristics: Compared with the reference methods, the control signal of the proposed approach quickly stabilizes after the initial transient phase, effectively suppressing chattering caused by high-frequency switching. The amplitude range of the control input remains reasonable, facilitating practical actuator implementation. 4. Comprehensive evaluation: Based on multiple error performance indices, including Integral Squared Error (ISE), Integral Absolute Error (IAE), Time-weighted Integral Absolute Error (ITAE), and Time-weighted Integral Squared Error (ITSE), the proposed controller consistently outperforms both PI control and the method in reference [7]. It demonstrates comprehensive advantages in suppressing transient errors rapidly and reducing overall error accumulation. The method also improves steady-state accuracy and achieves a balanced response speed with effective noise attenuation. 5. Observer performance: The RBFNN weight norm estimation converges rapidly and stabilizes at a low level after initial adaptation, confirming the effectiveness of the proposed adaptive law and the learning efficiency of the observer.  Conclusions  A finite-time sliding mode control strategy with an adaptive disturbance observer is proposed for servo systems used in ultra-fast laser processing. The method models unknown load disturbances and frictional nonlinearities as a lumped disturbance term. An adaptive observer, integrating an RBF neural network with a finite-time mechanism, accurately estimates this disturbance for real-time compensation. Based on the observer, a finite-time SMC law is formulated, and the practical finite-time stability of the closed-loop system is theoretically proven. Simulations conducted on a permanent magnet synchronous motor platform confirm that the proposed approach achieves superior tracking accuracy, robustness, and control smoothness compared with conventional PI and existing advanced methods. This work offers an effective solution for achieving high-precision control in nonlinear systems subject to strong disturbances.
Breakthrough in Solving NP-Complete Problems Using Electronic Probe Computers
XU Jin, YU Le, YANG Huihui, JI Siyuan, ZHANG Yu, YANG Anqi, LI Quanyou, LI Haisheng, ZHU Enqiang, SHI Xiaolong, WU Pu, SHAO Zehui, LENG Huang, LIU Xiaoqing
Available online  , doi: 10.11999/JEIT250352
Abstract:
This study presents a breakthrough in addressing NP-complete problems using a newly developed Electronic Probe Computer (EPC60). The system employs a hybrid serial–parallel computational model and performs large-scale parallel operations through seven probe operators. In benchmark tests on 3-coloring problems in graphs with 2,000 vertices, EPC60 achieves 100% accuracy, outperforming the mainstream solver Gurobi, which succeeds in only 6% of cases. Computation time is reduced from 15 days to 54 seconds. The system demonstrates high scalability and offers a general-purpose solution for complex optimization problems in areas such as supply chain management, finance, and telecommunications.  Objective   NP-complete problems pose a fundamental challenge in computer science. As problem size increases, the required computational effort grows exponentially, making it infeasible for traditional electronic computers to provide timely solutions. Alternative computational models have been proposed, with biological approaches—particularly DNA computing—demonstrating notable theoretical advances. However, DNA computing systems continue to face major limitations in practical implementation.  Methods  Computational Model: EPC is based on a non-Turing computational model in which data are multidimensional and processed in parallel. Its database comprises four types of graphs, and the probe library includes seven operators, each designed for specific graph operations. By executing parallel probe operations, EPC efficiently addresses NP-complete problems.Structural Features:EPC consists of four subsystems: a conversion system, input system, computation system, and output system. The conversion system transforms the target problem into a graph coloring problem; the input system allocates tasks to the computation system; the computation system performs parallel operations via probe computation cards; and the output system maps the solution back to the original problem format.EPC60 features a three-tier hierarchical hardware architecture comprising a control layer, optical routing layer, and probe computation layer. The control layer manages data conversion, format transformation, and task scheduling. The optical routing layer supports high-throughput data transmission, while the probe computation layer conducts large-scale parallel operations using probe computation cards.  Results and Discussions  EPC60 successfully solved 100 instances of the 3-coloring problem for graphs with 2,000 vertices, achieving a 100% success rate. In comparison, the mainstream solver Gurobi succeeded in only 6% of cases. Additionally, EPC60 rapidly solved two 3-coloring problems for graphs with 1,500 and 2,000 vertices, which Gurobi failed to resolve after 15 days of continuous computation on a high-performance workstation.Using an open-source dataset, we identified 1,000 3-colorable graphs with 1,000 vertices and 100 3-colorable graphs with 2,000 vertices. These correspond to theoretical complexities of O(1.3289n) for both cases. The test results are summarized in Table 1.Currently, EPC60 can directly solve 3-coloring problems for graphs with up to n vertices, with theoretical complexity of at least O(1.3289n).On April 15, 2023, a scientific and technological achievement appraisal meeting organized by the Chinese Institute of Electronics was held at Beijing Technology and Business University. A panel of ten senior experts conducted a comprehensive technical evaluation and Q&A session. The committee reached the following unanimous conclusions:1. The probe computer represents an original breakthrough in computational models.2. The system architecture design demonstrates significant innovation.3. The technical complexity reaches internationally leading levels.4. It provides a novel approach to solving NP-complete problems.Experts at the appraisal meeting stated, “This is a major breakthrough in computational science achieved by our country, with not only theoretical value but also broad application prospects.” In cybersecurity, EPC60 has also demonstrated remarkable potential. Supported by the National Key R&D Program of China (2019YFA0706400), Professor Xu Jin’s team developed an automated binary vulnerability mining system based on a function call graph model. Evaluation of the system using the Modbus Slave software showed over 95% vulnerability coverage, far exceeding the 75 vulnerabilities detected by conventional depth-first search algorithms. The system also discovered a previously unknown flaw, the “Unauthorized Access Vulnerability in Changyuan Shenrui PRS-7910 Data Gateway” (CNVD-2020-31406), highlighting EPC60’s efficacy in cybersecurity applications.The high efficiency of EPC60 derives from its unique computational model and hardware architecture. Given that all NP-complete problems can be polynomially reduced to one another, EPC60 provides a general-purpose solution framework. It is therefore expected to be applicable in a wide range of domains, including supply chain management, financial services, telecommunications, energy, and manufacturing.  Conclusions   The successful development of EPC offers a novel approach to solving NP-complete problems. As technological capabilities continue to evolve, EPC is expected to demonstrate strong computational performance across a broader range of application domains. Its distinctive computational model and hardware architecture also provide important insights for the design of next-generation computing systems.
Personalized Federated Learning Method Based on Collation Game and Knowledge Distillation
SUN Yanhua, SHI Yahui, LI Meng, YANG Ruizhe, SI Pengbo
Available online  , doi: 10.11999/JEIT221203
Abstract:
To overcome the limitation of the Federated Learning (FL) when the data and model of each client are all heterogenous and improve the accuracy, a personalized Federated learning algorithm with Collation game and Knowledge distillation (pFedCK) is proposed. Firstly, each client uploads its soft-predict on public dataset and download the most correlative of the k soft-predict. Then, this method apply the shapley value from collation game to measure the multi-wise influences among clients and quantify their marginal contribution to others on personalized learning performance. Lastly, each client identify it’s optimal coalition and then distill the knowledge to local model and train on private dataset. The results show that compared with the state-of-the-art algorithm, this approach can achieve superior personalized accuracy and can improve by about 10%.
The Range-angle Estimation of Target Based on Time-invariant and Spot Beam Optimization
Wei CHU, Yunqing LIU, Wenyug LIU, Xiaolong LI
Available online  , doi: 10.11999/JEIT210265
Abstract:
The application of Frequency Diverse Array and Multiple Input Multiple Output (FDA-MIMO) radar to achieve range-angle estimation of target has attracted more and more attention. The FDA can simultaneously obtain the degree of freedom of transmitting beam pattern in angle and range. However, its performance is degraded due to the periodicity and time-varying of the beam pattern. Therefore, an improved Estimating Signal Parameter via Rotational Invariance Techniques (ESPRIT) algorithm to estimate the target’s parameters based on a new waveform synthesis model of the Time Modulation and Range Compensation FDA-MIMO (TMRC-FDA-MIMO) radar is proposed. Finally, the proposed method is compared with identical frequency increment FDA-MIMO radar system, logarithmically increased frequency offset FDA-MIMO radar system and MUltiple SIgnal Classification (MUSIC) algorithm through the Cramer Rao lower bound and root mean square error of range and angle estimation, and the excellent performance of the proposed method is verified.
Cryption and Information Security
A Testability Evaluation Method Based on Reconvergent Fan-Out
WU Wenjun, LIANG Huaguo, YOU Chang, DOU Xianrui, XIAO Jiahui, LU Yingchun
Available online  , doi: 10.11999/JEIT251286
Abstract:
  Objective  As the scale and structural complexity of integrated circuits continue to increase, accurate testability evaluation becomes essential for Trojan detection, fault diagnosis, and test-point optimization in modern Design-for-Testability (DFT) flows. Metrics such as controllability, observability, and fault coverage depend on reliable probabilistic modeling of signal propagation. However, existing analytical and learning-based approaches often lose accuracy in circuits with dense Reconvergent Fan-Out (RFO) structures, where strong signal correlation invalidates classical independence assumptions and causes substantial estimation bias. Although several enhanced techniques attempt to incorporate structural information, many have high computational cost or limited scalability in deeper or highly reconvergent logic networks. This work addresses these limitations by proposing a testability evaluation method that incorporates RFO structural characteristics to improve modeling accuracy while maintaining practical computational efficiency.  Methods  The proposed approach starts with a structural analysis algorithm that identifies RFO regions through topological traversal of the circuit. A dedicated RFO recognition mechanism maps each root fan-out node to its corresponding RFO nodes, capturing the structural dependencies that govern correlated signal behavior and providing the basis for accurate probabilistic modeling. Building on this structural extraction, a weighted conditional probability model is formulated to correct testability distortion in reconvergent regions. Unlike previous optimization schemes, the weighting strategy assigns influence-based weights derived from the contribution of each root node to the target node, yielding probability estimates that more accurately reflect actual testability behavior. An efficient computational framework is also developed to integrate conditional probability propagation and weight selection into a single topological traversal process, thereby maintaining low algorithmic complexity while improving accuracy.  Results and Discussions  The proposed method is evaluated on representative benchmark circuits from the ISCAS-85, ISCAS-89, ITC’99, and EPFL suites. Performance is assessed in terms of controllability accuracy, ordering consistency, fault coverage estimation, and runtime efficiency. For controllability prediction, the method achieves an average RMSE of 0.0568, which corresponds to an average reduction of 25% relative to existing techniques, as reported in Table 2. Ordering consistency also improves, with the average Spearman correlation coefficient reaching 0.935, outperforming existing techniques. Fault coverage estimation shows similarly strong performance, with an average relative error of 3.64%, which is lower than that of previously reported methods, as shown in Table 1. Runtime analysis further indicates that the proposed framework maintains practical computational efficiency. Across all benchmark circuits, the method achieves an average speedup of 7× while preserving high accuracy, as illustrated in Figure 5.  Conclusions  This work addresses the degradation in testability evaluation accuracy caused by RFO structures in integrated circuits by proposing a reconvergent-fan-out-aware testability analysis method. The presented RFO structure identification algorithm extracts reconvergent information at the topological level and establishes explicit mappings between root nodes and RFO nodes. On this structural basis, a weighted conditional probability model is constructed to mitigate probability distortion induced by signal correlation in RFO regions. An efficient computational framework is further developed to integrate the full computation into a streamlined traversal-based process. Experimental results show that the proposed technique achieves accurate fitting of controllability RMSE and ordering consistency relative to simulation-based ground truth. In testability estimation, the predicted fault coverage values also closely match the simulation results. While maintaining high accuracy, the method also has low computational overhead.
Aperiodic Total Squared Ambiguity Function: Theoretical Bounds for Binary Sequence Sets and Optimal Constructions
WEI Wenbo, SHEN Bingsheng, YANG Yang, ZHOU Zhengchun
Available online  , doi: 10.11999/JEIT251327
Abstract:
  Objective  In direct-sequence code division multiple access systems, the performance of spreading sequence sets is commonly evaluated using the total squared correlation metric. Traditional metrics such as total squared correlation and aperiodic total squared correlation are applicable only to synchronous communication systems and asynchronous systems with time shifts only, respectively. In modern high-speed mobile and satellite communications, the Doppler effect becomes significant. It causes both time and Doppler shifts in the received signal and leads to severe signal distortion. In communication scenarios that consider only time shift, the one-dimensional correlation function is typically used to measure system interference. However, in high-speed mobile environments the Doppler effect appears during signal transmission. Both time shift and Doppler shift of the sequence must therefore be considered simultaneously. In such cases, the two-dimensional ambiguity function should replace the one-dimensional correlation function. To mitigate Doppler effects, recent studies have focused on the design of Doppler-resilient sequences for mobile channels. Existing work mainly studies theoretical bounds of the ambiguity function, particularly the maximum ambiguity magnitude. Sequence sets are then constructed to achieve or asymptotically approach these bounds. This study instead examines the overall ambiguity function performance of binary sequence sets in asynchronous communication, namely the Aperiodic Total Squared Ambiguity Function (ATSAF). The objectives are as follows. First, the theoretical lower bound for the ATSAF of binary sequence sets is derived. Second, several classes of optimal binary sequence sets that achieve this bound are constructed based on the derived ATSAF bound.  Methods  The aperiodic time-phase cycling extension matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document} is defined for a binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} consisting of \begin{document}$ K $\end{document} sequences of length \begin{document}$ L $\end{document} to account for both time shifts and Doppler shifts. This definition converts the computation of the ATSAF for the sequence set \begin{document}$ \boldsymbol{S} $\end{document} into the calculation of the total squared correlation of the matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document}. The theoretical lower bounds for the ATSAF of the binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} are then derived for different combinations of the set size \begin{document}$ K $\end{document}, sequence length \begin{document}$ L $\end{document}, and Doppler shift \begin{document}$ V $\end{document}. To design binary sequence sets that achieve these ATSAF lower bounds, it is first proven that binary aperiodic complementary sets form ATSAF-optimal binary sequence sets. Furthermore, two additional classes of optimal binary sequence sets are constructed using Hadamard matrices and specific sequences. These sets are proven to achieve the theoretical ATSAF lower bound.  Results and Discussions  Existing studies mainly examine the maximum ambiguity magnitude of sequence sets, whereas this study analyzes the overall ambiguity function performance. The one-dimensional aperiodic total squared correlation analysis for asynchronous communication with delay only, studied by Ganapathy et al., is extended to the two-dimensional ATSAF, which considers both time delay and Doppler shift. First, the aperiodic time-phase cycling extension matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document} is defined for a binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} (Definition 3). The theoretical lower bounds for the ATSAF of the binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} are then derived for different parameters, including set size \begin{document}$ K $\end{document}, sequence length \begin{document}$ L $\end{document}, and Doppler shift \begin{document}$ V $\end{document} (Theorem 1). When the Doppler shift \begin{document}$ V=1 $\end{document}, the derived ATSAF bound reduces to the aperiodic total squared correlation bound. Binary sequence sets that achieve these ATSAF bounds maintain the overall cross-interference energy in the two-dimensional delay-Doppler domain at its theoretical minimum. To construct such sequence sets, it is first proven that binary aperiodic complementary sets are ATSAF-optimal binary sequence sets (Theorem 2). Furthermore, two further classes of ATSAF-optimal binary sequence sets are constructed using Hadamard matrices and specific sequences (Theorems 3 and 4). Finally, an example demonstrates that the sequence set constructed in Theorem 4 is ATSAF-optimal (Example 1).  Conclusions  In high-speed mobile communication scenarios, Doppler effects cause distortion in received signals. By defining the aperiodic time-phase cycling extension matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document} for a binary sequence set \begin{document}$ \boldsymbol{S} $\end{document}, the theoretical lower bound for the ATSAF is derived. This bound specifies the minimum theoretical value of the total energy of the binary sequence set S in the two-dimensional delay-Doppler domain. When Doppler shifts are not considered, the derived ATSAF bound reduces to the aperiodic total squared correlation bound. Furthermore, three classes of ATSAF-optimal binary sequence sets that achieve this theoretical bound are constructed using binary aperiodic complementary sets, Hadamard matrices, and specific sequences. These sequence sets maintain the overall cross-interference energy at the theoretical minimum in the two-dimensional delay-Doppler domain.
Satellite Navigation
Research on GRI Combination Design of eLORAN System
LIU Shiyao, ZHANG Shougang, HUA Yu
Available online  , doi: 10.11999/JEIT201066
Abstract:
To solve the problem of Group Repetition Interval (GRI) selection in the construction of the enhanced LORAN (eLORAN) system supplementary transmission station, a screening algorithm based on cross interference rate is proposed mainly from the mathematical point of view. Firstly, this method considers the requirement of second information, and on this basis, conducts a first screening by comparing the mutual Cross Rate Interference (CRI) with the adjacent Loran-C stations in the neighboring countries. Secondly, a second screening is conducted through permutation and pairwise comparison. Finally, the optimal GRI combination scheme is given by considering the requirements of data rate and system specification. Then, in view of the high-precision timing requirements for the new eLORAN system, an optimized selection is made in multiple optimal combinations. The analysis results show that the average interference rate of the optimal combination scheme obtained by this algorithm is comparable to that between the current navigation chains and can take into account the timing requirements, which can provide referential suggestions and theoretical basis for the construction of high-precision ground-based timing system.