Latest Articles

Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes/issues, but are citable by Digital Object Identifier (DOI).
Display Method:
Privacy-Preserving Computation in Trustworthy Face Recognition: A Comprehensive Survey
YUAN Lin, WU Yanshang, ZHANG Liyuan, ZHANG Yushu, WANG Nannan, GAO Xinbo
Available online  , doi: 10.11999/JEIT251063
Abstract:
  Significance   With the widespread deployment of face recognition in Cyber-Physical Systems (CPS), including smart cities, intelligent transportation, and public safety infrastructures, privacy leakage has become a central concern for both academia and industry. Unlike many biometric modalities, face recognition operates in highly visible and loosely controlled environments such as public spaces, consumer devices, and online platforms, where facial image acquisition is effortless and pervasive. This exposure makes facial data especially vulnerable to unauthorized collection and misuse. Insufficient protection may lead to identity theft, unauthorized tracking, and deepfake generation, undermining individual rights and eroding trust in digital systems. Consequently, facial data protection is not merely a technical problem but a critical societal and ethical challenge. This work is significant in that it integrates fragmented research efforts across computer vision, cryptography, and privacy-preserving computation, providing a unified perspective to guide the development of trustworthy face recognition ecosystems that balance usability, compliance, and public trust.  Contributions   This paper systematically reviews recent advances in privacy-preserving computation for face recognition, covering both theoretical foundations and practical implementations. It begins by examining the core architecture and application pipeline of face recognition systems, identifying privacy risks at each stage. At the data collection stage, unauthorized or covert capture of facial images introduces immediate risks of misuse. During model training and deployment, gradient leakage, membership inference, and overfitting can expose sensitive information about individuals included in training data. At the inference stage, adversaries may reconstruct facial images, perform unauthorized recognition, or link identities across datasets, compromising anonymity.To address these threats, the paper categorizes existing approaches into four major privacy-preserving paradigms: data transformation, distributed collaboration, image generation, and adversarial perturbation. Within these categories, ten representative techniques are analyzed. Cryptographic computation, including homomorphic encryption and secure multiparty computation, enables recognition without revealing raw data but often incurs high computational overhead. Frequency-domain learning transforms images into spectral representations to suppress identifiable details while retaining discriminative features. Federated learning decentralizes training to reduce centralized data exposure, though it remains vulnerable to gradient inversion attacks. Image generation techniques, such as face synthesis and virtual identity modeling, reduce reliance on real facial data for training and testing. Differential privacy introduces calibrated noise to provide statistical privacy guarantees, while face anonymization obscures identifiable traits to protect visual privacy. Template protection and anti-reconstruction mechanisms defend stored features against reverse engineering, and adversarial privacy protection introduces imperceptible perturbations that disrupt machine recognition while preserving human perception.In addition, several representative studies from each category are examined in depth. The commonly used evaluation datasets are summarized, and a comparative analysis is conducted across multiple dimensions, including face recognition performance, privacy protection effectiveness, and practical usability, thereby systematically outlining the strengths and limitations of different types of methods.  Prospects   Looking forward, several research directions are identified. A primary challenge is achieving a dynamic balance between privacy protection and system utility, as excessive protection can degrade recognition performance while insufficient safeguards expose users to unacceptable risks. Adaptive mechanisms that adjust privacy levels based on context, task requirements, and user consent are therefore essential. Another promising direction is the development of inherently privacy-aware recognition paradigms, such as representations designed to minimize identity leakage by construction.Equally important is the establishment of standardized evaluation frameworks for privacy risk and usability, enabling reproducible benchmarking and facilitating real-world adoption. The emergence of generative foundation models, including diffusion and large multimodal models, further reshapes the landscape. While such models enable synthetic data generation and controllable identity representations, they also empower more sophisticated attacks such as high-fidelity face reconstruction and impersonation. Addressing these dual effects will require interdisciplinary collaboration spanning computer vision, cryptography, law, and ethics, alongside regulatory support and continuous methodological innovation.  Conclusions  This paper provides a comprehensive reference for researchers and practitioners working on trustworthy face recognition. By integrating advances across multiple disciplines, it aims to promote the development of effective facial privacy protection technologies and support the secure, reliable, and ethically responsible deployment of face recognition in real-world scenarios. Ultimately, the goal is to establish face recognition as a trustworthy component of Cyber-Physical Systems, balancing functionality, privacy, and societal trust.
SAR Saturated Interference Suppression Method Guided by Precise Saturation Model
DUAN Lunhao, LU Xingyu, TAN Ke, LIU Yushuang, YANG Jianchao, YU Jing, GU Hong
Available online  , doi: 10.11999/JEIT251283
Abstract:
  Objective  With the increasing number of electromagnetic devices, Synthetic Aperture Radar (SAR) is highly vulnerable to Radio Frequency Interference (RFI) in the same frequency band. RFI will appear as bright streaks in SAR images, seriously degrading the image quality. Currently, relevant scholars have conducted in-depth research on interference suppression and proposed many effective interference suppression methods. However, most methods fail to consider the nonlinear saturation of interfered echoes. In practical scenarios, due to the generally high power of interference, the gain controller in the SAR receiver struggles to effectively adjust the amplitude of the interfered echoes. This causes the input signal amplitude of the Analog-to-Digital Converter (ADC) to exceed its dynamic range, thus driving the SAR receiver into saturation and eventually leading to nonlinear distortion in the interfered echoes. This phenomenon is commonly observed in SAR systems, with documented cases of receiver saturation in the LuTan-1 satellite and various airborne SAR platforms. Analysis of SAR data further confirms the presence of saturated interference in systems including Sentinel-1, Gaofen-3, and several other spaceborne SAR platforms. Following saturation, the echo spectrum exhibits various spurious components and spectral artifacts, which leads to a mismatch between existing suppression methods and the actual characteristics of saturated interference. Therefore, some of the existing interference suppression methods have difficulty effectively mitigating this type of saturated interference. Moreover, there is currently a lack of accurate models capable of precisely characterizing the output components of saturated interfered echoes. To address these issues, this paper introduces a precise saturated interference analytical model and, based on this model, further proposes an effective saturated interference suppression method.  Methods  Through the processing of the basic saturation model, this paper first establishes a mathematical model capable of accurately characterizing the output components of saturated interference. Furthermore, the model's accuracy in amplitude and phase characterization was validated through simulation, and a comprehensive analysis was conducted on various output components of the interfered echoes under saturation conditions. Compared with the one-bit sampling model and the traditional tanh saturation model, the model proposed achieves higher accuracy in describing amplitude information. In addition, it is not limited to the sampling bit width of ADCs and can theoretically be extended to the saturation output description of other types of radar receivers. Based on the finding that harmonic phases can be expressed as a linear combination of the phases of the original signal components, and leveraging the high-power characteristic of the interference fundamental harmonic, a saturated interference suppression method is proposed. First, given the relatively high power of the interference fundamental harmonic, it can be effectively extracted through eigen-subspace decomposition; then, by leveraging the harmonic phase relationships together with the extracted interference fundamental harmonic and the SAR transmitted signal, interference harmonics—including higher-order interference harmonics, target harmonics, and intermodulation harmonics—are systematically constructed, thus forming a complete dictionary; finally, a sparse optimization problem is solved to achieve the separation and suppression of saturated interference. The superiority and effectiveness of the proposed method are validated using Gaofen-3 measured data.  Results and Discussions  This paper conducted experiments on both simulated and measured data to validate the effectiveness of the proposed method in mitigating saturated interference. For the simulated data, the proposed method completely removes interference stripes in the SAR image (Fig. 7). Analysis of the time-frequency spectrum of the processed echoes (Fig. 8 and Fig. 9) shows that traditional methods struggle to eliminate higher-order harmonics. As a result, the proposed approach improves the TBR by 1.76 dB and achieves the lowest RMSE of 0.0783 (Table 3). For the measured data from Gaofen-3, analysis of the processed images and time-frequency spectra of echoes confirms the proposed method's effective interference suppression capability, whereas conventional approaches consistently exhibit residual interference issues (Fig. 10 and Fig. 11).  Conclusions  With the increasing deployment of electromagnetic devices, SAR has become highly susceptible to in-band interference. Furthermore, high-power interference can easily drive the SAR receiver into saturation, resulting in nonlinear distortion that renders traditional interference suppression methods ineffective against saturated interference. To address this challenge, this paper establishes a model capable of precisely characterizing the saturated output components of interfered echoes. Based on this model, an interference suppression method capable of effectively dealing with saturated interference is proposed. Simulation and experiment demonstrate that the model accurately characterizes saturation behavior and that the method effectively suppresses saturated interference.
Genetic-Algorithm-Optimized All-Metal Metasurface for Cross-Band Stealth via Low-cost CNC Fabrication
ZHANG Ming, ZHANG Najiao, LI Jialei, LI Kang, MELIKYAN MELIKYAN, YANG Lin, HOU Weimin
Available online  , doi: 10.11999/JEIT251080
Abstract:
  Objective  Traditional electromagnetic stealth materials face the practical challenge of simultaneously achieving both microwave absorption and infrared stealth, while conventional solutions (geometric optimization, multi-layer composite coatings) have drawbacks like narrowband operation, complex fabrication, and poor cross-band compatibility. This study aims to propose a genetic algorithm-optimized all-metal random coding metasurface, which enables concurrent broadband radar cross section (RCS) reduction and low infrared emissivity on a monolithic metallic platform, thus addressing the above implementation hurdles.  Methods  We employ monolithic all-metal C-shaped resonant units (based on the Pancharatnam–Berry (几何) geometric phase, with reflection phase regulated by rotation angle), and design 2/3/4-bit coding (corresponding to 4/8/16 discrete phase states). A MATLAB-CST co-simulation framework is established (CST extracts unit responses via the finite element method (FEM), while MATLAB uses a genetic algorithm to optimize phase distribution for scattering energy diffusion). All-metal metasurface prototypes (150×150 mm2, 10×10 array) are fabricated via computer numerical control (CNC) cutting processing.  Results and Discussions  Genetic algorithm optimization converges within 6–8 generations, and increased coding bits enhance phase randomness. The 4-bit metasurface achieves an average 10 dB RCS reduction over 11–18.4 GHz, with consistent simulation and anechoic chamber measurement results under 0–60° oblique incidence. Infrared imaging verifies its low emissivity. Compared with traditional composite/multi-layer structures, the all-metal design simplifies fabrication, avoids interfacial mismatches, and ensures structural stability, exhibiting broadband, wide-angle, and cross-band stealth performance.  Conclusions  This study presents a genetic algorithm-optimized all-metal random coding metasurface that achieves cross-band stealth compatibility for the first time, overcoming the long-standing challenge of concurrently realizing both microwave performance and thermal management in conventional stealth materials. The work advances the field through three key innovations: 1) The monolithic copper structure enables >99.9% infrared reflectivity (8–14 μm band, via FLIR imaging) and an average 10 dB RCS reduction over 11–18.4 GHz; 2) The single-material design eliminates delamination risks, and the CNC-fabricated prototype maintains structural integrity under 60° oblique incidence, reducing fabrication costs by ~78% compared to lithography; 3) The co-simulation framework converges in 8 generations (for 4-bit coding), enabling 7.4 GHz broadband scattering manipulation. This metasurface combines fabrication reliability, cost-effectiveness, and dual-band performance, laying critical groundwork for large-scale deployment in military stealth systems and satellite platforms where multispectral concealment and durability are paramount.
Total Coloring on Planar Graphs of Nested n-Pointed Stars
SU Rongjin, FANG Gang, ZHU Enqiang, XU Jin
Available online  , doi: 10.11999/JEIT250861
Abstract:
  Objective  Many combinatorial optimization problems can be regarded as graph coloring problems. A classic topic in this field is total coloring, which combines vertex coloring and edge coloring. Previous studies and current research focus on the Total Coloring Conjecture (TCC), proposed in the 1960s. For graphs, including planar graphs, with maximum degree less than six, the correctness of the TCC has been verified through case enumeration. For planar graphs with maximum degree greater than six, the discharging technique has been used to confirm the conjecture by identifying reducible configurations and establishing detailed discharging rules. This method becomes limited when applied to planar graphs with maximum degree exactly six. Only certain restricted classes of graphs have been shown to satisfy the TCC, such as graphs without 4-cycles and graphs without adjacent triangles. More recent work demonstrates that the TCC holds for planar graphs without 4-fan subgraphs and for planar graphs with maximum average degree less than twenty-three fifths. Thus, it remains unclear whether planar graphs with maximum degree six that contain a 4-fan subgraph or have maximum average degree at least twenty-three fifths satisfy the conjecture. To address this question, this paper studies total coloring of a class of planar graphs known as nested n-pointed stars and aims to show that the TCC holds for these graphs.  Methods  The study relies on theoretical methods, including mathematical induction, constructive techniques, and case enumeration. An n-pointed star is obtained by connecting each edge of an n-polygon (n ≥ 3) to a triangle and then joining the triangle vertices not on the polygon to form a new n-polygon. Repeating this operation produces a nested n-pointed star with l layers, denoted by \begin{document}$ G_{n}^{l} $\end{document}. These graphs have maximum degree exactly six. Their structural properties, including the presence of 4-fan subgraphs and maximum average degree greater than twenty-three fifths, are established. Induction on the number of layers is then used to show that \begin{document}$ G_{n}^{l} $\end{document} has a total 8-coloring: (1) \begin{document}$ G_{n}^{1} $\end{document} has a total 8-coloring; (2) Suppose that \begin{document}$ G_{n}^{l-1} $\end{document} has a total 8-coloring; (3) prove that \begin{document}$ G_{n}^{l} $\end{document} has a total 8-coloring. A graph \begin{document}$ G_{n}^{l} $\end{document} is defined as a type I graph if it has a total 7-coloring. When \begin{document}$ n=3k $\end{document}, constructive arguments show that \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph. The value of \begin{document}$ k $\end{document} is considered in two cases, \begin{document}$ (k=2m-1) $\end{document} and \begin{document}$ (k=2m) $\end{document}. In both cases, a total 7-coloring of \begin{document}$ G_{3k}^{l} $\end{document} is obtained by directly assigning colors to all vertices and edges.  Results and Discussions  Induction on the number of layers of \begin{document}$ G_{n}^{l} $\end{document} that nested n-pointed stars satisfy the Total Coloring Conjecture (Fig. 5). Five colors are assigned to the vertices and edges of \begin{document}$ G_{3k}^{1} $\end{document} to obtain a total 5-coloring (Fig. 6(a) and Fig. 8(a)). Two additional colors are then applied alternately to the edges connecting the polygons in layers 1 and 2. This produces a total 7-coloring of \begin{document}$ G_{3k}^{2} $\end{document} (Fig. 7(a) and Fig. 9(a)). After a permutation of the colors, another total 7-coloring of \begin{document}$ G_{3k}^{3} $\end{document} is obtained (Fig. 7(b) and Fig. 9(b)). The coloring pattern on the outermost layer is identical to that of \begin{document}$ G_{3k}^{1} $\end{document}, which allows the same extension to construct total 7-colorings for \begin{document}$ G_{3k}^{4},G_{3k}^{5},\cdots ,G_{3k}^{l} $\end{document} . Therefore, \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph.  Conclusions  This study verifies that the Total Coloring Conjecture holds for nested n-pointed stars, which have maximum degree six and contain 4-fan subgraphs. It shows that \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph. A further question arises regarding whether \begin{document}$ G_{n}^{l} $\end{document} is a type I graph when \begin{document}$ n\neq 3k $\end{document}. A total 7-coloring can be constructed when \begin{document}$ n=4 $\end{document} or \begin{document}$ n=5 $\end{document}, and therefore both \begin{document}$ G_{4}^{l} $\end{document} and \begin{document}$ G_{5}^{l} $\end{document} are type I graphs. For other values of \begin{document}$ n\neq 3k $\end{document}, whether \begin{document}$ G_{n}^{l} $\end{document} is a type I graph remains open.
A Class of Double-twisted Generalized Reed-Solomon Codes and Their Extended Codes
CHENG Hongli, ZHU Shixin
Available online  , doi: 10.11999/JEIT251045
Abstract:
  Objective  Twisted Generalized Reed-Solomon (TGRS) codes have attracted considerable attention in coding theory due to their flexible structural properties. However, studies on their extended codes remain limited. Existing results indicate that only a small number of works examine extended TGRS codes, leaving gaps in the understanding of their error-correcting capability, duality properties, and applications. In addition, previously proposed parity-check matrix forms for TGRS codes lack clarity and do not cover all parameter ranges. In particular, the case h = 0 is not addressed, which limits applicability in scenarios requiring diverse parameter settings. Constructing non-Generalized Reed-Solomon (non-GRS) codes is of interest because such codes resist Sidelnikov-Shestakov and Wieschebrink attacks, whereas GRS codes are vulnerable. Maximum Distance Separable (MDS) codes, self-orthogonal codes, and almost self-dual codes are valued for their error-correcting efficiency and structural properties. MDS codes achieve the Singleton bound and are essential for distributed storage systems that require data reliability under node failures. Self-orthogonal and almost self-dual codes, due to their duality structures, are applied in quantum coding, secret sharing schemes, and secure multi-party computation. Accordingly, this paper aims to: (1) characterize the MDS and Almost MDS (AMDS) properties of double-twisted GRS codes \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and their extended codes \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}; (2) derive explicit and unified parity-check matrices for all valid parameter ranges, including h = 0; (3) establish non-GRS properties under specific parameter conditions; (4) provide necessary and sufficient conditions for self-orthogonality of the extended codes and almost self-duality of the original codes; and (5) construct a class of almost self-dual double-twisted GRS codes with flexible parameters for secure and reliable communication systems.  Methods   The study is based on algebraic coding theory and finite field methods. Explicit parity-check matrices are derived using properties of polynomial rings over \begin{document}$ {F}_{q} $\end{document}, Vandermonde matrix structures, and polynomial interpolation. The Schur product method is applied to determine non-GRS properties by comparing the dimensions of the Schur squares of the codes and their duals with those of GRS codes. Linear algebra and combinatorial techniques are used to characterize MDS and AMDS properties. Conditions are obtained by analyzing the nonsingularity of generator-matrix submatrices and solving systems involving symmetric sums of finite field elements. These conditions are expressed using the sets \begin{document}$ {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document},\begin{document}$ {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}, and \begin{document}$ {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. Duality theory is used to study orthogonality. A code C is self-orthogonal if \begin{document}$ C\subseteq {C}^{\bot } $\end{document} and its generator matrix satisfies \begin{document}$ {\boldsymbol{G}}{{\boldsymbol{G}}}^{\rm T}=\boldsymbol{O} $\end{document}. For almost self-dual codes with odd length and dimension-(n-1)/2, this condition is combined with the structure of the dual code and symmetric sum relations of αi to obtain necessary and sufficient conditions.  Results and Discussions   For MDS and AMDS properties, the following results are obtained. The extended double-twisted GRS code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} is MDS if and only if \begin{document}$ 1\notin {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document} and \begin{document}$ 1\notin {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. The double-twisted GRS code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} is AMDS if and only if \begin{document}$ 1\in {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document} and \begin{document}$ (0,1)\notin {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. The code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}\begin{document}$ (0,1)\in {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. Unified parity-check matrices of \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} are derived for all \begin{document}$ 0\leq h\leq k-1 $\end{document}, removing previous restrictions that exclude h = 0. For non-GRS properties, when \begin{document}$ k\geq 4 $\end{document} and \begin{document}$ n-k\geq 4 $\end{document}, both \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and its extended code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} are non-GRS for both \begin{document}$ 2k\geq n $\end{document} or \begin{document}$ 2k \lt n $\end{document}. This conclusion follows from the fact that the dimensions of their Schur squares exceed those of the corresponding GRS codes, which ensures resistance to Sidelnikov-Shestakov and Wieschebrink attacks. Regarding orthogonality, the extended code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} with \begin{document}$ h=k-1 $\end{document} is self-orthogonal under specific algebraic conditions. The code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} with \begin{document}$ h=k-1 $\end{document} and \begin{document}$ n=2k+1 $\end{document} is almost self-dual if and only if there exists \begin{document}$ \lambda \in F_{q}^{*} $\end{document} such that \begin{document}$ \lambda {u}_{j}=v_{j}^{2} (j=1,2,\cdots ,2k+1) $\end{document} together with a symmetric sum condition on \begin{document}$ {\alpha }_{i} $\end{document} involving \begin{document}$ {\eta }_{1} $\end{document} and \begin{document}$ {\eta }_{2} $\end{document}. For odd prime power \begin{document}$ q $\end{document}, an almost self-dual code with parameters \begin{document}$ [q-t-1,(q-t-2)/2,\geq (q-t-2)/2] $\end{document} is constructed using the roots of \begin{document}$ m(x)=({x}^{q}-x)/f(x) $\end{document} where \begin{document}$ f(x)={x}^{t+1}-x $\end{document}. An example over \begin{document}$ {F}_{11} $\end{document} yields a \begin{document}$ [5,2,\geq 2] $\end{document} code.  Conclusions   The study advances the theory of double-twisted GRS codes and their extensions through five contributions: (1) complete characterization of MDS and AMDS properties using sets \begin{document}$ {S}_{k} $\end{document},\begin{document}$ {L}_{k} $\end{document},\begin{document}$ {D}_{k} $\end{document}; (2) unified parity-check matrices for all \begin{document}$ 0\leq h\leq k-1 $\end{document}; (3) non-GRS properties are established for \begin{document}$ k\geq 4 $\end{document}, ensuring resistance to known structural attacks; (4) necessary and sufficient conditions for self-orthogonal extended codes and almost self-dual original codes are obtained; (5) a flexible construction of almost self-dual double-twisted GRS codes is proposed. These results extend the theoretical understanding of TGRS-type codes and support the design of secure and reliable coding systems.
A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents
WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian
Available online  , doi: 10.11999/JEIT250818
Abstract:
  Objective   The rapid advancement of remote sensing (RS) technology has fundamentally reshaped the scope of Earth observation research, driving a paradigm shift from static image analysis toward intelligent, goal-oriented cognitive decision-making. Modern RS applications increasingly require systems that can autonomously perceive complex scenes, reason over heterogeneous information sources, decompose high-level objectives into executable subtasks, and make informed decisions under uncertainty. This evolution motivates the concept of remote sensing agents, which extend beyond conventional perception models to encompass reasoning, planning, and interaction capabilities. Despite this growing demand, existing RS datasets remain largely task-centric and fragmented, typically designed for single-purpose supervised learning such as object detection or land-cover classification. These datasets rarely support multimodal reasoning, instruction following, or multi-step decision-making, all of which are essential for agentic workflows. Furthermore, current RS vision-language datasets often suffer from limited scale, narrow modality coverage, and simplistic text annotations, with insufficient inclusion of non-optical data such as Synthetic Aperture Radar (SAR) and infrared imagery. They also lack explicit instruction-driven interactions that mirror real-world human–agent collaboration. To address these limitations, this study constructs a large-scale multimodal image–text instruction dataset explicitly designed for RS agents. The primary objective is to establish a unified data foundation that supports the entire cognitive chain, including perception, reasoning, planning, and decision-making. By enabling models to learn from structured instructions across diverse modalities and task types, the dataset aims to facilitate the development, training, and evaluation of next-generation RS foundation models with genuine agentic capabilities.  Methods   The dataset construction follows a systematic and extensible framework that integrates multi-source RS imagery with complex, instruction-oriented textual supervision. First, a unified input–output paradigm is defined to ensure compatibility across heterogeneous RS tasks and model architectures. This paradigm explicitly formalizes the interaction between visual inputs and language instructions, allowing models to process not only image pixels and text descriptions, but also structured spatial coordinates, region-level references, and action-oriented outputs. A standardized instruction schema is developed to encode task objectives, constraints, and expected responses in a consistent format. This schema is flexible enough to support diverse task types while remaining sufficiently structured for scalable data generation and automatic validation. The overall methodology comprises three key stages. (1) Data Collection and Integration: Multimodal RS imagery is aggregated from multiple authoritative sources, covering optical, SAR, and infrared modalities with diverse spatial resolutions, scene types, and geographic distributions. (2) Instruction Generation: A hybrid strategy is adopted that combines rule-based templates with Large Language Model (LLM)-assisted refinement. Template-based generation ensures task completeness and structural consistency, while LLM-based rewriting enhances linguistic diversity, naturalness, and instruction complexity. (3) Task Categorization and Organization: The dataset is organized into nine core task categories, spanning low-level perception, mid-level reasoning, and high-level decision-making, with a total of 21 sub-datasets. To ensure high data quality and reliability, a rigorous validation pipeline is implemented. This includes automated syntax and format checking, cross-modal consistency verification, and manual auditing of representative samples to ensure semantic alignment between visual content and textual instructions.  Results and Discussions   The resulting dataset comprises over 2 million multimodal instruction samples, making it one of the largest and most comprehensive instruction datasets in the RS domain. The integration of optical, SAR, and infrared data enables robust cross-modal learning and supports reasoning across heterogeneous sensing mechanisms. Compared with existing RS datasets, the proposed dataset places greater emphasis on instruction diversity, task compositionality, and agent-oriented interaction, rather than isolated perception objectives. Extensive baseline experiments are conducted using several state-of-the-art multimodal large language models (MLLMs) and RS-specific foundation models. The results demonstrate that the dataset effectively supports evaluation across the full spectrum of agentic capabilities, from visual grounding and reasoning to high-level decision-making. At the same time, the experiments reveal persistent challenges posed by RS data, such as extreme scale variations, dense object distributions, and long-range spatial dependencies. These findings highlight important research directions for improving multimodal reasoning and planning in complex RS environments.  Conclusions   This paper presents a pioneering large-scale multimodal image–text instruction dataset tailored for remote sensing agents. By systematically organizing information across nine core task categories and 21 sub-datasets, it provides a unified and extensible benchmark for agent-centric RS research. The main contributions include: (1) the establishment of a unified multimodal instruction paradigm for RS agents; (2) the construction of a 2-million-sample dataset covering optical, SAR, and infrared modalities; (3) empirical validation of the dataset’s effectiveness in supporting end-to-end agentic workflows from perception to decision-making; and (4) the provision of a comprehensive evaluation benchmark through baseline experiments across all task categories. Future work will focus on extending the dataset to temporal and video-based RS scenarios, incorporating dynamic decision-making processes, and further enhancing the reasoning and planning capabilities of RS agents in real-world, time-varying environments.
Identification of Novel Protein Drug Targets for Respiratory Diseases by Integrating Human Plasma Proteome with Genome
MA Xinqian, NI Wentao
Available online  , doi: 10.11999/JEIT250796
Abstract:
  Objective  Respiratory diseases are a major cause of global morbidity and mortality and place a heavy socioeconomic burden on healthcare systems. Epidemiological data indicate that Chronic Obstructive Pulmonary Disease (COPD), pneumonia, asthma, lung cancer, and tuberculosis are the five most significant pulmonary diseases worldwide. The COronaVIrus Disease 2019 (COVID-19) pandemic has introduced additional challenges for respiratory health and emphasizes the need for new diagnostic and therapeutic strategies. Integrating proteomics with Genome-Wide Association Studies (GWAS) provides a framework for connecting genetic variation to clinical phenotypes. Genetic variants associated with plasma protein levels, known as protein Quantitative Trait Loci (pQTLs), link the genome to complex respiratory phenotypes. This study evaluates the causal effects of druggable proteins on major respiratory diseases through proteome-wide Mendelian Randomization (MR) and colocalization analyses. The aim is to identify causal associations that can guide biomarker development and drug discovery, and to prioritize candidates for therapeutic repurposing.  Methods  Summary-level data for circulating protein levels are obtained from two large pQTL studies: the deCODE study and the UK Biobank Pharma Proteomics Project (UKB-PPP). Strictly defined cis-pQTLs are selected to ensure robust genetic instruments, yielding 2,918 proteins for downstream analyses. For disease outcomes, large GWAS summary statistics for 27 respiratory phenotypes are collected from previously published studies and international consortia. A two-sample MR design is applied to estimate the effects of plasma proteins on these phenotypes. To reduce confounding driven by Linkage Disequilibrium (LD), Bayesian colocalization analysis is used to assess whether genetic signals for protein levels and respiratory outcomes share a causal variant. The Posterior Probability of hypothesis 4 (PP4) serves as the primary metric, and PP4 > 0.8 is considered strong evidence of shared causality. Summary-data-based Mendelian Randomization (SMR) and the HEterogeneity In Dependent Instruments (HEIDI) test are used to validate the causal associations. Bidirectional MR and the Steiger test are applied to evaluate potential reverse causality. Protein-Protein Interaction (PPI) networks are generated through the STRING database to visualize functional connectivity and biological pathways associated with the causal proteins.  Results and Discussions  The causal effects of 2 918 plasma proteins on 27 respiratory phenotypes are evaluated (Fig. 1). A total of 694 protein–trait associations meet the Bonferroni-corrected threshold (P<1.7×10–5) when cis-instrumental variables are used (Fig. 2). The MR-Egger intercept test identifies 94 protein–disease associations with evidence of directional pleiotropy, which are excluded. Colocalization analysis indicates that 29 protein–phenotype associations show high-confidence evidence of a shared causal variant (PP4>0.8), and 39 show medium-level evidence (0.5<PP4<0.8). SMR validation confirms 26 associations (P<1.72×10–3), and 21 pass the HEIDI test (P>0.05). The findings provide insights into several respiratory diseases. For COPD, five proteins—NRX3A, NRX3B, ERK-1, COMMD1, and PRSS27—are identified as causal. The association between NRXN3 and COPD suggests a genetic connection between nicotine-addiction pathways and chronic lung decline. For asthma, TEF, CASP8, and IL7R show causal evidence, and the robust association between IL7R and asthma suggests that modulation of T-cell homeostasis may provide a therapeutic opportunity. The FUT3_FUT5 complex is uniquely associated with Idiopathic Pulmonary Fibrosis (IPF). CSF3 and LTBP2 are significantly associated with severe COVID-19. For lung cancer, subtype-specific causal proteins are identified, including BTN2A1 for squamous cell lung cancer, BTN1A1 for small cell lung carcinoma, and EHBP1 for lung adenocarcinoma. These findings provide a basis for the development of subtype-specific precision therapies.  Conclusions  This study identifies 29 plasma proteins with high-confidence causal associations across major respiratory diseases. Using MR and colocalization, a comprehensive map of molecular drivers of respiratory conditions is generated. These findings may support precision medicine strategies. However, the findings are limited by the focus on European populations and potential heterogeneity arising from different proteomic platforms. The associations are based on computational analysis, and further validation in independent cohorts and animal models is needed. Additional experimental studies and clinical trials are required to clarify the pathogenic roles and biological mechanisms of the identified proteins to support therapeutic innovation in respiratory medicine.
Research on Generation and Optimization of Dual-channel High-current Relativistic Electron Beams Based on a Single Magnet
AN Chenxiang, HUO Shaofei, SHI Yanchao, ZHAI Yonggui, XIAO Renzhen, CHEN Changhua, CHEN Kun, HUANG Huijie, SHEN Liuyang, LUO Kaiwen, WANG HongGuang, LI YuQing
Available online  , doi: 10.11999/JEIT250487
Abstract:
  Objective  High-Power Microwave (HPM) technology is a strategic frontier in defense, military, and civilian systems. The microwave output power of a single HPM source reaches a bottleneck because of physical limits, material constraints, and fabrication challenges. To address this issue, researchers have proposed HPM power synthesis, which increases peak power by integrating multiple HPM sources.  Methods  This study addresses the time synchronization problem in multipath HPM synthesis by designing a dual-channel high-current relativistic electron-beam generator. The device uses one pulse-power driver to drive two diodes simultaneously and applies one coil magnet to confine both electron beams. Three-dimensional particle-in-cell simulations revealed the angular nonuniformity of the beam current, and a cathode stalk modification is proposed to improve beam quality, whose effectiveness is subsequently validated by experiments.   Results and Discussions  Three-dimensional UNIPIC particle-in-cell simulations of the device’s physical processes revealed that: due to side emission from the cathode stalk, the dual electron beams exhibit significant angular nonuniformity. Specifically, the beam current density near the center of the magnetic field is relatively low, while it is higher in regions farther from the magnetic center. To address this issue, the structure of the cathode stalk was modified to suppress side emission. The angular current fluctuation of cathode emission in Tube 1 decreased dramatically from 35.61% to 2.93%, and that in Tube 2 decreased from 33.17% to 3.13%, improving beam quality. Simulations and experiments show that the device stably generates high-quality electron beams with a voltage of 800 kV and a current of 20 kA, reaching a total power of 16 GW. The current waveform remains stable within the 45 ns voltage half-width without impedance collapse.  Conclusions  The study provides a reliable basis for generating multipath high-current relativistic electron beams and for synthesizing the power of multiple HPM sources, demonstrating strong application potential.
Spatio-Temporal Constrained Refined Nearest Neighbor Fingerprinting Localization
WANG Yifan, SUN Shunyuan, QIN Ningning
Available online  , doi: 10.11999/JEIT250777
Abstract:
  Objective  Indoor fingerprint-based localization faces three key challenges. First, Dimensionality Reduction (DR), used to reduce storage and computational costs, often disrupts the geometric correlation between signal features and physical space, which reduces mapping accuracy. Second, signal features present temporal variability caused by human movement or environmental changes. During online mapping, this variability introduces bias and distorts similarity between target and reference points in the low-dimensional space. Third, pseudo-neighbor interference persists because environmental noise or imperfect similarity metrics lead to inaccurate neighbor selection and skew position estimates. To address these issues, this study proposes a Spatio-Temporal Constrained Refined Nearest Neighbor (STC-RNL) fingerprinting localization algorithm designed to provide robust, high-accuracy localization under complex interference conditions.  Methods  In the offline phase, a robust DR framework is constructed by integrating two constraints into a MultiDimensional Scaling (MDS) model. A spatial correlation constraint uses physical distances between reference points and assigns stronger associations to proximate locations to preserve alignment between low-dimensional features and the real layout. A temporal consistency constraint clusters multiple temporal signal samples from the same location into a compact region to suppress feature drift. These constraints, combined with the MDS structure-preserving loss, form the optimization objective, from which low-dimensional features and an explicit mapping matrix are obtained. In the online phase, a progressive refinement mechanism is applied. An initial candidate set is selected using a Euclidean distance threshold. A hybrid similarity metric is then constructed by enhancing shared-neighbor similarity with a Sigmoid-based strategy, which truncates low and smooths high similarities, and fusing it with Euclidean distance to improve discrimination of true neighbors. Subsequently, an iterative Z-score-based filtering procedure removes reference points that deviate from local group characteristics in feature and coordinate domains. The final position is estimated through a similarity-weighted average over the refined neighbor set, assigning higher weights to more reliable references.  Results and Discussions  The performance of STC-RNL is assessed on a private ITEC dataset and a public SYL dataset. The spatio-temporal constraints enhance the robustness of the mapping matrix under noisy conditions (Table 2). Compared with baseline DR methods, the proposed module reduces mean localization error by at least 6.30% in high-noise scenarios (Fig. 9). In the localization stage, the refined neighbor selection reduces pseudo-neighbor interference. On the ITEC dataset, STC-RNL achieves an average error of 0.959 m, improving performance by 9.61% to 33.68% compared with SSA-XGBoost and SPSO (Table 1). End-to-end comparisons show that STC-RNL reduces the average error by at least 12.42% on ITEC and by at least 7.08% on SYL (Table 2), and its CDF curves demonstrate faster convergence and higher precision, especially within the 1.2 m range (Fig. 10). These results indicate that the algorithm maintains high stability and accuracy with a lower maximum error across datasets.  Conclusions  The STC-RNL algorithm addresses structural distortion and mapping bias found in traditional DR-based localization. By jointly optimizing offline feature embedding with spatio-temporal constraints and online neighbor selection with progressive refinement, the coupling between signal features and physical coordinates is strengthened. The main innovation lies in a synergistic framework that ensures only high-confidence neighbors contribute to the final estimate, improving accuracy and robustness in dynamic environments. Experiments show that the model reduces average localization error by 12.42%\begin{document}$ \sim $\end{document}32.80% on ITEC and by 7.08%\begin{document}$ \sim $\end{document}13.67% on SYL relative to baseline algorithms, while achieving faster error convergence. Future research may incorporate nonlinear manifold modeling to further improve performance in heterogeneous access point environments.
Improved Related-tweak Attack on Full-round HALFLOOP-48
SUN Xiaomeng, ZHANG Wenying, YUAN Zhaozhong
Available online  , doi: 10.11999/JEIT251014
Abstract:
  Objective  HALFLOOP is a family of tweakable AES-like lightweight block ciphers used to encrypt automatic link establishment messages in fourth-generation high-frequency radio systems. Because the RotateRows and MixColumns operations diffuse differences rapidly, long differentials with high probability are difficult to construct, which limits attacks on the full cipher. This study examines full HALFLOOP-48 and evaluates its resistance to sandwich attacks in the related-tweak setting, a critical method in lightweight-cipher cryptanalysis.  Methods  A new truncated sandwich distinguisher framework is proposed to attack full HALFLOOP-48. The cipher is decomposed into three sub-ciphers, \begin{document}$ {{E}}_{0} $\end{document}, \begin{document}$ {{E}}_{1} $\end{document}. A model is built by applying an automatic search method based on the Boolean Satisfiability Problem (SAT) to each part: byte-wise models for \begin{document}$ {{E}}_{0} $\end{document}, \begin{document}$ {{E}}_{1} $\end{document} and a bit-wise model for \begin{document}$ {E}_{m} $\end{document}. For \begin{document}$ {E}_{m} $\end{document}, a method is proposed to model large S-boxes using SAT, the Affine Subspace Dimensional Reduction method (ADR). ADR converts the modeling of a high-dimensional set into two sub-problems for a low-dimensional set. ADR ensures that the SAT-searched differentials exist and that their probabilities are accurate, while reducing the size of Conjunctive Normal Form (CNF) clauses. It also enables the SAT method to search longer differentials efficiently when large S-boxes appear. To improve probability accuracy in \begin{document}$ {E}_{m} $\end{document}, dependencies between \begin{document}$ {{E}}_{0} $\end{document} and \begin{document}$ {{E}}_{1} $\end{document} are evaluated across three layers, and their probabilities are multiplied. Two key-recovery attacks, a sandwich attack and a rectangle-like sandwich attack, are mounted on the distinguisher in the related-tweak scenario.  Results and Discussions  The SAT-based model reveals a critical weakness in HALFLOOP-48. A practical sandwich distinguisher for the first 8 rounds withprobability \begin{document}$ {2}^{-43.415} $\end{document} is identified. An optimal truncated sandwich distinguisher for 8-round HALFLOOP-48 with probability \begin{document}$ {2}^{-43.2} $\end{document} is then established by exploiting the clustering effect of the identified differentials. Compared with earlier results, this distinguisher is practical and extends the reach by two rounds. Using the 8-round distinguisher, both a sandwich attack and a rectangle-like sandwich attack are mounted on full-round HALFLOOP-48 under related tweaks. The sandwich attack requires data complexity of \begin{document}$ {2}^{32.8} $\end{document}, time complexity \begin{document}$ {2}^{92.2} $\end{document} and memory complexity \begin{document}$ {2}^{42.8} $\end{document}. For the rectangle-like sandwich attack, the data complexity is \begin{document}$ {2}^{16.2} $\end{document}, with time complexity \begin{document}$ {2}^{99.2} $\end{document} and memory complexity \begin{document}$ {2}^{26.2} $\end{document}. Compared with the previous results, these attacks reduce time complexity by \begin{document}$ {2}^{25.4} $\end{document} and memory complexity by \begin{document}$ {2}^{10} $\end{document}.  Conclusions  To handle the rapid diffusion of differences in HALFLOOP, a new perspective on sandwich attacks based on truncated differentials is developed by combining byte-wise and bit-wise models. The models for \begin{document}$ {{E}}_{0} $\end{document} and \begin{document}$ {{E}}_{1} $\end{document} are byte-wise and extend these two parts forward and backward into \begin{document}$ {E}_{m} $\end{document}, which is based on bit-wise. To efficiently model the 8-bit S-box in the layer \begin{document}$ {E}_{m} $\end{document}, which is bit-wise. To model the 8-bit S-box in Em efficiently, an affine subspace dimensional reduction approach is proposed. This model ensures compatibility between the two truncated differential trails and covers as many rounds as possible with high probability. It supports a new 8-round truncated boomerang distinguisher that outperforms previous distinguishers for HALFLOOP-48. Based on this 8-round truncated boomerang distinguisher, a key-recovery attack is achieved with success probability 63%. The results show that (1) the ADR method offers an efficient way to apply large S-boxes in lightweight ciphers, (2) the truncated boomerang distinguisher construction can be applied to other AES-like lightweight block ciphers, and (3) HALFLOOP-48 does not provide an adequate security margin for use in the U.S. military standard.
Split-architecture Non-contact Optical Seismocardiography Triggering System for Cardiac Magnetic Resonance Imaging
GAO Qiannan, ZHANG Jiayu, ZHU Yingen, WANG Wenjin, JI Jiansong, JI Xiaoyue
Available online  , doi: 10.11999/JEIT251098
Abstract:
  Objective  Cardiac-cycle synchronization is required in Cardiovascular Magnetic Resonance (CMR) to reduce motion artifacts and maintain quantitative accuracy. At high field strengths, ElectroCardioGram (ECG) triggering is affected by magnetohydrodynamic effects and scanner-related ElectroMagnetic Interference (EMI). Electrode placement and lead routing also increase setup burden. Contact-based mechanical sensors still require skin contact, and optical photoplethysmography can introduce long physiological delay. A fully contactless, EMI-robust mechanical surrogate is therefore needed. This study develops a split-architecture, non-contact optical SeismoCardioGraphy (SCG) triggering system for CMR and evaluates its availability, beatwise detection performance, and timing characteristics under practical body-coil coverage.  Methods  The split-architecture system consists of a near-magnet optical acquisition unit and a far-magnet computation-and-triggering unit connected by fiber-optic links to minimize conductive pathways near the scanner (Fig. 2). The acquisition unit uses a defocused industrial camera and laser illumination to record speckle-pattern dynamics over the anterior chest without physical contact (Fig. 3). Dense optical flow is computed in a chest region of interest, and the displacement field is projected onto a principal motion direction to form a one-dimensional SCG sequence (Fig. 4). Drift suppression, smoothing, and short-window normalization are applied. Trigger timing is refined with a valley-constrained gradient search within a physiologically bounded window to reduce spurious detections and improve temporal consistency (Fig. 4). A benchmark dataset is collected from 20 healthy volunteers under three coil configurations: no body coil, an ultra-flexible body coil, and a rigid body coil (Fig. 5, Fig. 6, Table 3). ECG serves as the reference, and CamPPG and radar are recorded for comparison. Beatwise precision, recall, and F1 score are computed against ECG R peaks, and availability is reported as the fraction of usable segments under unified quality criteria (Table 4). Backward and forward physiological delays and delay variability are summarized across subjects and coil conditions (Table 5, Table 6). Key windowing and refractory parameters are assessed for sensitivity (Table 2). Runtime is measured to evaluate real-time feasibility, including the cost of dense optical flow and the overhead of one-dimensional processing and triggering (Table 7).  Results and Discussions  Under no-coil and ultra-flexible-coil conditions, the optical SCG trigger achieves high availability (about 97.6%) and strong beatwise performance. F1 reaches about 0.91 under the ultra-flexible coil (Table 4, Table 5). The backward physiological delay remains on the order of several tens of milliseconds, and delay jitter is generally within a few tens of milliseconds (Table 5, Table 6). Under the rigid body coil, performance decreases sharply. Mechanical decoupling between the coil surface and the chest wall weakens and distorts the vibration signature, which blurs AO-related features and increases false triggers (Fig. 1). This effect appears as lower precision and F1 and as a shift toward longer and more variable delays compared with the other conditions (Table 4, Table 6). Relative to CamPPG, which reflects peripheral blood-volume dynamics and typically lags further behind the ECG R peak, the optical SCG surrogate provides a more proximal mechanical marker with reduced trigger phase lag (Fig. 9, Table 5). EMI robustness is supported by representative segments: ECG waveforms show visible distortion under interference, whereas the optical SCG surrogate remains interpretable because acquisition and transmission near the scanner are fully optical and electrically isolated (Fig. 8). Parameter analysis supports a moderate processing window and a 0.5 s minimum interbeat interval as a stable choice across subjects (Table 2). Runtime analysis shows that dense optical flow dominates computational cost, whereas one-dimensional processing and triggering add little overhead. Throughput exceeds the acquisition frame rate, supporting real-time triggering (Table 7).  Conclusions  A split-architecture, non-contact optical SCG triggering system is developed and validated under three representative body-coil configurations. Fiber-optic separation between near-magnet acquisition and far-magnet processing improves EMI robustness while maintaining real-time trigger output. High availability, strong beatwise performance, and short physiological delay are demonstrated under no-coil and ultra-flexible-coil conditions (Table 4, Table 5). Rigid-coil coverage exposes a clear limitation caused by reduced mechanical coupling, which motivates further optimization for mechanically decoupled or heavily occluded scenarios (Fig. 1, Table 6).
Construction of Maximum Distance Separable Codes and Near Maximum Distance Separable Codes Based on Cyclic Subgroup of \begin{document}$ \mathbb{F}_{{q}^{2}}^{*} $\end{document}
DU Xiaoni, XUE Jing, QIAO Xingbin, ZHAO Ziwei
Available online  , doi: 10.11999/JEIT251204
Abstract:
  Objective  The demand for higher performance and efficiency in error-correcting codes has increased with the rapid development of modern communication technologies. These codes detect and correct transmission errors. Because of their algebraic structure, straightforward encoding and decoding, and ease of implementation, linear codes are widely used in communication systems. Their parameters follow classical bounds such as the Singleton bound: for a linear code with length \begin{document}$ n $\end{document} and dimension \begin{document}$ k $\end{document}, the minimum distance \begin{document}$ d $\end{document} satisfies \begin{document}$ d\leq n-k+1 $\end{document}. When \begin{document}$ d=n-k+1 $\end{document}, the code is a Maximum Distance Separable (MDS) code. MDS codes are applied in distributed storage systems and random error channels. If \begin{document}$ d=n-k $\end{document}, the code is Almost MDS (AMDS); when both a code and its dual are AMDS, the code is Near MDS (NMDS). NMDS codes have geometric properties that are useful in cryptography and combinatorics. Extensive research has focused on constructing structurally simple, high-performance MDS and NMDS codes. This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over the finite field \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} of even characteristic using the cyclic subgroup \begin{document}$ {U}_{q+1} $\end{document}. Several families of optimal Locally Repairable Codes (LRCs) are also obtained. LRCs support efficient failure recovery by accessing a small set of local nodes, which reduces repair overhead and improves system availability in distributed and cloud-storage settings.  Methods  In 2021, Wang et al. constructed NMDS codes of dimension 3 using elliptic curves over \begin{document}$ {\mathbb{F}}_{q} $\end{document}. In 2023, Heng et al. obtained several classes of dimension-4 NMDS codes by appending appropriate column vectors to a base generator matrix. In 2024, Ding et al. presented four classes of dimension-4 NMDS codes, determined the locality of their dual codes, and constructed four classes of distance-optimal and dimension-optimal LRCs. Building on these works, this paper uses the unit circle \begin{document}$ {U}_{q+1} $\end{document} in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} and elliptic curves to construct generator matrices. By augmenting these matrices with two additional column vectors, several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} are obtained. The locality of the constructed NMDS codes is also determined, yielding several classes of optimal LRCs.  Results and Discussions  In 2023, Heng et al. constructed generator matrices with second-row entries in \begin{document}$ \mathbb{F}_{q}^{*} $\end{document} and with the remaining entries given by nonconsecutive powers of the second-row elements. In 2025, Yin et al. extended this approach by constructing generator matrices using elements of \begin{document}$ {U}_{q+1} $\end{document} and obtained infinite families of MDS and NMDS codes. Following this direction, the present study expands these matrices by appending two column vectors whose elements lie in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}. The resulting matrices generate several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document}. Several classes of NMDS codes with identical parameters but different weight distributions are also obtained. Computing the minimum locality of the constructed NMDS codes shows that some are optimal LRCs satisfying the Singleton-like, Cadambe–Mazumdar, Plotkin-like, and Griesmer-like bounds. All constructed MDS codes are Griesmer codes, and the NMDS codes are near Griesmer. These results show that the proposed constructions are more general and unified than earlier approaches.  Conclusions  This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} using elements of the unit circle \begin{document}$ {U}_{q+1} $\end{document} and oval polynomials, and by appending two additional column vectors with entries in \begin{document}$ {\mathbb{F}}_{q} $\end{document}. The minimum locality of the constructed NMDS codes is analyzed, and some of these codes are shown to be optimal LRCs. The framework generalizes earlier constructions, and the resulting codes are optimal or near-optimal with respect to the Griesmer bound.
A Miniaturized Steady-State Visual Evoked Potential Brain-Computer Interface System
CAI Yu, WANG Junyang, JIANG Chuanli, LUO Ruixin, LÜ Zhengchao, YU Haiqing, HUANG Yongzhi, ZHONG Ziping, XU Minpeng
Available online  , doi: 10.11999/JEIT251223
Abstract:
  Objective  The practical use of Brain-Computer Interface (BCI) systems in daily settings is limited by bulky acquisition hardware and the cables required for stable performance. Although portable systems exist, achieving compact hardware, full mobility, and high decoding performance at the same time remains difficult. This study aims to design, implement, and validate a wearable Steady-State Visual Evoked Potential (SSVEP) BCI system. The goal is to create an integrated system with ultra-miniaturized and concealable acquisition hardware and a stable cable-free architecture, and to show that this approach provides online performance comparable with laboratory systems.  Methods  A system-level solution was developed based on a distributed architecture to support wearability and hardware simplification. The core component is an ultra-miniaturized acquisition node. Each node functions as an independent EEG acquisition unit and integrates a Bluetooth Low Energy (BLE) system-on-chip (CC2640R2F), a high-precision analog-to-digital converter (ADS1291), a battery, and an electrode in one encapsulated module. Through an optimized 6-layer PCB design and stacked assembly, the module size was reduced to 15.12 mm × 14.08 mm × 14.31 mm (3.05 cm3) with a weight of 3.7 g. Each node uses one active electrode, and all nodes share a common reference electrode connected by a thin short wire. This structure reduces scalp connections and allows concealed placement in hair using a hair-clip form factor. Multiple nodes form a star network coordinated by a master device that manages communication with a stimulus computer. A cable-free synchronization strategy was implemented to handle timing uncertainties in distributed wireless operation. Hardware-event detection and software-based clock management were combined to align stimulus markers with multi-channel EEG data without dedicated synchronization cables. The master device coordinates this process and streams synchronized data to the computer for real-time processing. System evaluation was conducted in two phases. Foundational performance metrics included physical characteristics, electrical parameters (input-referred noise: 3.91 mVpp; common-mode rejection ratio: 132.99 dB), and synchronization accuracy under different network scales. Application-level performance was assessed using a 40-command online SSVEP spelling task with six subjects in an unshielded room with common RF interference. Four nodes were placed at Pz, PO3, PO4, and Oz. EEG epochs (0.14\begin{document}$ \sim $\end{document}3.14 s post-stimulus) were analyzed using Canonical Correlation Analysis (CCA) and ensemble Task-Related Component Analysis (e-TRCA) to compute recognition accuracy and Information Transfer Rate (ITR).  Results and Discussions  The system met its design objectives. Each acquisition node achieved an ultra-compact form factor (3.05 cm3, 3.7 g) suitable for concealed wear and provided more than 5 hours of battery life at a 1 000 Hz sampling rate. Electrical performance supported high-quality SSVEP acquisition. The cable-free synchronization strategy ensured stable operation. More than 95% of event markers aligned with the EEG stream with less than 1 ms error (Fig. 4), meeting SSVEP-BCI requirements. This stability supported the quality of recorded neural signals. Grand-averaged SSVEP responses showed clear and stable waveforms with precise phase alignment (Fig. 5). The signal-to-noise ratio at the fundamental stimulation frequency exceeded 10 dB for all 40 commands (Fig. 6). In the online spelling experiment, the system showed strong decoding performance. With the e-TRCA algorithm and a 3-s window, the average accuracy was (95.00 ± 2.04)%. The system reached a peak ITR of (147.24 ± 30.52) bits/min with a 0.4-s data length (Fig. 7). Comparison with existing SSVEP-BCI systems (Table 1) indicates that, despite constraints of miniaturization, cable-free use, and four channels, the system achieved accuracy comparable with several cable-dependent laboratory systems while offering improved wearability.  Conclusions  This work presents a wearable SSVEP-BCI system that integrates ultra-miniaturized hardware with a distributed cable-free architecture. The results show that coordinated hardware and system design can overcome tradeoffs between device size, user mobility, and decoding capability. The acquisition node (3.7 g, 3.05 cm3) supports concealable wearability, and the synchronization strategy provides reliable cable-free operation. In a realistic environment, the system produced online performance comparable with many cable-dependent setups, achieving 95.00% accuracy and a peak ITR of 147.24 bits/min in a 40-target task. Therefore, this study provides a practical system-level solution that supports progress toward wearable high-performance BCIs.
Wavelet Transform and Attentional Dual-Path EEG Model for Virtual Reality Motion Sickness Detection
CHEN Yuechi, HUA Chengcheng, DAI Zhian, FU Jingqi, ZHU Min, WANG Qiuyu, YAN Ying, LIU Jia
Available online  , doi: 10.11999/JEIT251233
Abstract:
  Objective  Virtual Reality Motion Sickness (VRMS) presents a barrier to the wider adoption of immersive Virtual Reality (VR). It is primarily caused by sensory conflict between the vestibular and visual systems. Existing assessments rely on subjective reports that disrupt immersion and do not provide real-time measurements. An objective detection method is therefore needed. This study proposes a dual-path fusion model, the Wavelet Transform ATtentional Network (WTATNet), which integrates wavelet transform and attention mechanisms. WTATNet is designed to classify resting-state ElectroEncephaloGraph (EEG) signals collected before and after VR motion stimulus exposure to support VRMS detection and research on the mechanisms and mitigation strategies.  Methods  WTATNet contains two parallel pathways for EEG feature extraction. The first applies a Two-Dimensional Discrete Wavelet Transform (2D-DWT) to both the time and electrode dimensions of the EEG, reshaping the signal into a two-dimensional matrix based on the spatial layout of the scalp electrodes in horizontal or vertical form. This decomposition captures multi-scale spatiotemporal features, which are then processed using Convolutional Neural Network (CNN) layers. The second pathway applies a one-dimensional CNN for initial filtering followed by a dual-attention structure consisting of a channel attention module and an electrode attention module. These modules recalibrate the importance of features across channels and electrodes to emphasize task-relevant information. Features from both pathways are fused and passed through fully connected layers to classify EEGs into pre-exposure (non-VRMS) and post-exposure (VRMS) states based on subjective questionnaire validation. EEG data were collected from 22 subjects exposed to VRMS using the game “Ultrawings2.” Ten-fold cross-validation was used for training and evaluation with accuracy, precision, recall, and F1-score as metrics.  Results and Discussions  WTATNet achieved high VRMS-related EEG classification performance, with an average accuracy of 98.39%, F1-score of 98.39%, precision of 98.38%, and recall of 98.40%. It outperformed classical and state-of-the-art EEG models, including ShallowConvNet, EEGNet, Conformer, and FBCNet (Table 2). Ablation experiments (Tables 3 and 4) showed that removing the wavelet transform path, the electrode attention module, or the channel attention module reduced accuracy by 1.78%, 1.36%, and 1.01%, respectively. The 2D-DWT performed better than the one-dimensional DWT, supporting the value of joint spatiotemporal analysis. Experiments with randomized electrode ordering (Table 4) produced lower accuracy than spatially coherent layouts, indicating that 2D-DWT leverages inherent spatial correlations among electrodes. Feature visualizations using t-SNE (Figures 5 and 6) showed that WTATNet produced more discriminative features than baseline and ablated variants.  Conclusions  The dual-path WTATNet model integrates wavelet transform and attention mechanisms to achieve accurate VRMS detection using resting-state EEG. Its design combines interpretable, multi-scale spatiotemporal features from 2D-DWT with adaptive channel-level and electrode-level weighting. The experimental results confirm state-of-the-art performance and show that WTATNet offers an objective, robust, and non-intrusive VRMS detection method. It provides a technical foundation for studies on VRMS neural mechanisms and countermeasure development. WTATNet also shows potential for generalization to other EEG decoding tasks in neuroscience and clinical research.
Performance Analysis and Rapid Prediction of Long-range Underwater Acoustic Communications in Uncertain Deep-sea Environments
CHEN Xiangmei, TAI Yupeng, WANG Haibin, HU Chenghao, WANG Jun, WANG Diya
Available online  , doi: 10.11999/JEIT251244
Abstract:
  Objective  In complex and dynamically changing deep-sea environments, the performance of underwater acoustic communications shows substantial variability. Feedback-based channel estimation and parameter adaptation are impractical in long-range scenarios because platform constraints prevent reliable feedback channels and the slow propagation of sound introduces significant delay. In typical long-range systems, environmental dynamics are often ignored and communication parameters are selected heuristically, which frequently leads to mismatches with actual channel conditions and causes communication failures or reduced efficiency. Predictive methods able to assess performance in advance and support feed-forward parameter adjustment are therefore required. This study proposes a deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications under uncertain environmental conditions to enable efficient and reliable parameter–channel matching without feedback.  Methods  A feed-forward method for underwater acoustic communication performance analysis and rapid prediction is developed using deep-learning-based sound-field uncertainty estimation. A neural network is first used to estimate probability distributions of Transmission Loss (TL PDFs) at the receiver under dynamic environments. TL PDFs are then mapped to probability distributions of the Signal-to-Noise Ratio (SNR PDFs), enabling communication performance evaluation without real-time feedback. Statistical channel capacity and outage capacity are analyzed to characterize the theoretical upper limits of achievable rates in dynamic conditions. Finally, by integrating the SNR distribution with the bit-error-rate characteristics of a representative deep-sea single-carrier communication system under the corresponding channel, a rate–reliability prediction model is constructed. This model estimates the probability of reliable communication at different data rates and serves as a practical tool for forecasting link performance in highly dynamic and feedback-limited underwater acoustic environments.  Results and Discussions  The method is validated using simulation data and sea trial data. The TL PDFs predicted by the deep learning model show strong consistency with the traditional Monte Carlo (MC) method across multiple receiver locations (Fig. 6). Under identical computational settings, deep-learning-based TL PDF prediction reduces computation time by 2\begin{document}$ \sim $\end{document}3 orders of magnitude compared with the MC method. The chained mapping from TL PDFs to SNR PDFs and then to channel capacity metrics accurately represents the probabilistic features of communication performance under uncertain conditions (Fig. 7 and Fig. 8). The rate–reliability curves derived from the deep-learning-based TL PDFs are highly consistent with MC-based results. In the high sound-intensity region, prediction errors for reliable communication probabilities across data rates range from 0.1% to 3%, and in the low sound-intensity region errors are approximately 0.3% to 5% (Fig. 12). Sea trial results further indicate that predicted rate–reliability performance agrees well with measured data. In the convergence zone, deviations between predicted and measured reliability probabilities at each rate range from 0.9% to 4%, and in the shadow zone from 1% to 9% (Fig. 18). Under a 90% reliability requirement, the maximum achievable rates predicted by the method match the measurements in both the convergence and shadow zones, demonstrating accuracy and practical applicability in complex channel environments.  Conclusions  A deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications in uncertain deep-sea environments is developed and validated. The framework builds a chained mapping from environmental parameters to TL PDFs, SNR PDFs, and communication performance metrics, enabling quantitative capacity assessment under dynamic ocean conditions. Predictive “rate–reliability’’ profiles are obtained by integrating probabilistic propagation characteristics with the performance of a representative deep-sea single-carrier system under the corresponding channel, providing guidance for parameter selection without feedback. Sea trial results confirm strong agreement between predicted and measured performance. The proposed approach offers a technical pathway for feed-forward performance analysis and dynamic adaptation in long-range deep-sea communication systems, and can be extended to other communication scenarios in dynamic ocean environments.
Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images
WANG Yumeng, LIU Zhenbing, LIU Zaiyi
Available online  , doi: 10.11999/JEIT250842
Abstract:
  Objective  Data-driven deep learning methods are widely applied to cancer subtyping, yet their performance depends on large training datasets with fine-grained annotations. For gigapixel Whole Slide Images (WSI), such annotations are labor-intensive and costly. Clinical data are typically stored in isolated data silos, and sharing procedures raise privacy concerns. Federated Learning (FL) enables a global model to be trained from data distributed across multiple medical centers without transmitting local data. However, in conventional FL, substantial heterogeneity across centers reduces the performance and stability of the global model.  Methods  A privacy-preserving FL method is proposed for gigapixel WSI in computational pathology. Weakly supervised attention-based Multiple Instance Learning (MIL) is integrated with differential privacy to support training when only slide-level labels are available. Within each client, a multi-scale attention-based MIL method is used to conduct local training on histopathology WSIs, reducing the need for costly pixel-level annotation through a weakly supervised setting. During the federated update, local differential privacy is applied to limit the risk of sensitive information leakage. Random noise drawn from a Gaussian or Laplace distribution is added to model parameters after each client’s local training. Furthermore, a federated adaptive reweighting strategy is introduced to address the heterogeneity of pathological images across clients by dynamically balancing the influence of local data quantity and quality on each client’s aggregation weight.  Results and Discussions  The proposed FL framework is evaluated on two clinical diagnostic tasks: Non-small Cell Lung Cancer (NSCLC) histologic subtyping and Breast Invasive Carcinoma (BRCA) histologic subtyping. As shown in (Table 1, Table 2, and Fig. 4), the proposed FL method (Ours with DP and Ours w/o DP) achieves higher accuracy and stronger generalization than localized models and other FL approaches. Its classification performance remains competitive even when compared with the centralized model (Fig. 3). These results indicate that privacy-preserving FL is a feasible and effective strategy for multicenter histopathology images and may reduce the performance degradation typically caused by data heterogeneity across centers. When the magnitude of added noise is controlled within a limited range, stable classification can also be achieved (Table 3). The two main components, the multiscale representation attention network and the federated adaptive reweighting strategy, each contribute to consistent performance improvement (Table 4). In addition, the proposed FL method maintains stable classification performance across different hyperparameter settings (Table 5, Table 6), confirming its robustness.  Conclusions  The proposed FL method addresses two central challenges in multicenter computational pathology: the presence of data silos and concerns over privacy. It also alleviates the performance degradation caused by inter-center data heterogeneity. As balancing model accuracy with privacy protection remains a key challenge, future work focuses on developing methods that preserve privacy while sustaining stable classification performance.
Low-Complexity Joint Estimation Algorithm for Carrier Frequency Offset and Sampling Frequency Offset in 5G-NTN Low Earth Orbit Satellite Communications
GONG Xianfeng, LI Ying, LIU Mingyang, ZHAI Shenghua
Available online  , doi: 10.11999/JEIT251086
Abstract:
  Objective   The Doppler effect is a major impairment in Low Earth Orbit (LEO) satellite communications within 5G Non-Terrestrial Networks (5G-NTN). It introduces Carrier Frequency Offset (CFO), Sampling Frequency Offset (SFO), and Inter-Subcarrier Frequency Offset (ISFO) across subcarriers. Although existing estimation algorithms focus mainly on CFO and SFO, the effect of ISFO is insufficiently addressed. ISFO becomes highly detrimental to receiver performance when Orthogonal Frequency-Division Multiplexing (OFDM) systems use a large number of subcarriers and high-order modulation. Moreover, under joint CFO and SFO conditions, conventional Maximum Likelihood Estimation (MLE) methods often require one- or two-dimensional grid searches. This results in high computational cost. To reduce this cost, two joint estimation algorithms for CFO and SFO are proposed.  Methods   The influence of non-ideal factors at the transmitter, receiver, and channel, such as local oscillator offset, SFO in Digital-to-Analog Converters (DACs) and Analog-to-Digital Converters (ADCs), and the Doppler effect, is analyzed. A mathematical model for the received OFDM signal is developed, and the mechanism through which SFO and ISFO distort the phase of frequency-domain subcarriers is derived. Leveraging the pilot structure of 5G-NTN, two joint CFO and SFO estimation algorithms are proposed. (1) Algorithm 1 uses the sequence correlation between the received frequency-domain Demodulation Reference Signal (DMRS) vectors. After phase pre-compensation is applied, the normalized cross-correlation vector is computed. An objective function is constructed from this vector, and its unimodal behavior in the main lobe is used to estimate the parameters through a bisection search. (2) Algorithm 2 treats the estimation parameter as analogous to a CFO in single-carrier systems and adopts an L&R-based autocorrelation method to derive approximate closed-form expressions.  Results and Discussions   A computational complexity analysis compares the proposed algorithms with one-dimensional (1D-ML) and two-dimensional (2D-ML) grid-search MLE methods. Numerical results show that Algorithm 1 reduces complexity substantially. The number of complex multiplications, which represent the main computational cost, is 4% of that of the 2D-ML method, 8% of that of Algorithm 2, and 44% of that of the 1D-ML method. Although Algorithm 2 is more computationally demanding, it yields a closed-form estimation expression. The performance of each algorithm is evaluated through the Mean Square Error (MSE) of the estimated parameters. Simulations show that for a subcarrier number of 3072, the 1D-ML algorithm performs slightly better than the others at Signal-to-Noise Ratios (SNRs) below 5 dB. However, because robust modulation schemes such as BPSK and QPSK typically used at low SNRs tolerate larger offsets, the medium-to-high SNR range is of greater practical relevance. In this range, all four algorithms demonstrate comparable estimation performance.  Conclusions  This study addresses the effect of Doppler in 5G-NTN LEO satellite communications by analyzing the mechanism and influence of ISFO and by proposing two joint estimation algorithms for CFO and SFO. First, a mathematical model of the received signal is established considering non-ideal factors such as CFO, SFO, and ISFO. The combined effect of SFO and ISFO on OFDM signals is derived to be equivalent to their linear superposition, which expands the range of the equivalent SFO. Second, the objective function is defined using the cross-correlation vector of two DMRS sequences. By using its unimodal behavior within the main lobe, a binary search enables fast convergence. Subsequently, the parameter determined by SFO and ISFO is then treated as analogous to the CFO in single-carrier systems, allowing an approximate closed-form estimation solution to be obtained through the L&R method. Finally, complexity analysis and performance simulations show that the proposed algorithms provide significant computational savings and strong estimation performance. These results can support the development of 5G-NTN LEO satellite payloads and terminal products.
Mamba-YOWO: An Efficient Spatio-Temporal Representation Framework for Action Detection
MA Li, XIN Jiangbo, WANG Lu, DAI Xinguan, SONG Shuang
Available online  , doi: 10.11999/JEIT251124
Abstract:
  Objective  Spatio-temporal action detection aims to localize and recognize action instances in untrimmed videos, which is crucial for applications like intelligent surveillance and human-computer interaction. Existing methods, particularly those based on 3D CNNs or Transformers, often struggle with balancing computational complexity and modeling long-range temporal dependencies effectively. The YOWO series, while efficient, relies on 3D convolutions with limited receptive fields. The recent Mamba architecture, known for its linear computational complexity and selective state space mechanism, shows great potential for long-sequence modeling. This paper explores the integration of Mamba into the YOWO framework to enhance temporal modeling efficiency and capability while reducing computational burden, addressing a significant gap in applying Mamba specifically to spatio-temporal action detection tasks.  Methods  The proposed Mamba-YOWO framework is built upon the lightweight YOWOv3 architecture. It features a dual-branch heterogeneous design for feature extraction. The 2D branch, based on YOLOv8’s CSPDarknet and PANet, processes keyframes to extract multi-scale spatial features The core innovation lies in the 3D temporal modeling branch, which replaces traditional 3D convolutions with a hierarchical structure composed of a Stem layer and three Stages (Stage1-Stage3). Stage1 and Stage2 utilize Patch Merging for spatial downsampling and stack Decomposed Bidirectionally Fractal Mamba (DBFM) blocks. The DBFM block employs a bidirectional Mamba structure to capture temporal dependencies from both past-to-future and future-to-past contexts. Crucially, a Spatio-Temporal Interleaved Scan (STIS) strategy is introduced within DBFM, which combines bidirectional temporal scanning with spatial Hilbert quad-directional scanning, effectively serializing video data while preserving spatial locality and temporal coherence. Stage3 incorporates a 3D average pooling layer to compress features temporally. An Efficient Multi-scale Spatio-Temporal Fusion (EMSTF) module is designed to integrate features from the 2D and 3D branches. It employs group convolution-guided hierarchical interaction for preliminary fusion and a parallel dual-branch structure for refined fusion, generating an adaptive spatio-temporal attention map. Finally, a lightweight detection head with decoupled classification and regression sub-networks produces the final action tubes.  Results and Discussions  Extensive experiments were conducted on UCF101-24 and JHMDB datasets. Compared to the YOWOv3/L baseline on UCF101-24, Mamba-YOWO achieved a Frame-mAP of 90.24% and a Video-mAP@0.5 of 60.32%, representing significant improvements of 2.1% and 6.0%, respectively (Table 1). Notably, this performance gain was achieved while reducing parameters by 7.3% and computational load (GFLOPs) by 5.4%. On JHMDB, Mamba-YOWO attained a Frame-mAP of 83.2% and a Video-mAP@0.5 of 86.7% (Table 2). Ablation studies confirmed the effectiveness of key components: The optimal number of DBFM blocks in Stage2 was found to be 4, beyond which performance degraded likely due to overfitting (Table 3). The proposed STIS scan strategy outperformed 1D-Scan, Selective 2D-Scan, and Continus 2D-Scan, demonstrating the benefit of jointly modeling temporal coherence and spatial structure (Table 4). The EMSTF module also proved superior to other fusion methods like CFAM, EAG, and EMA (Table 5), highlighting its enhanced capability for cross-modal feature integration. The performance gains are attributed to the efficient long-range temporal dependency modeling by the Mamba-based branch with linear complexity and the effective multi-scale feature fusion facilitated by the EMSTF module.  Conclusions  This paper presents Mamba-YOWO, an efficient spatio-temporal action detection framework that integrates the Mamba architecture into YOWOv3. By replacing traditional 3D convolutions with a DBFM-based temporal modeling branch featuring the STIS strategy, the model effectively captures long-range dependencies with linear complexity. The designed EMSTF module further enhances discriminative feature fusion through group convolution and dynamic gating. Experimental results on UCF101-24 and JHMDB datasets demonstrate that Mamba-YOWO achieves superior detection accuracy (e.g., 90.24% Frame-mAP on UCF101-24) while simultaneously reducing model parameters and computational costs. Future work will focus on theoretical exploration of Mamba’s temporal mechanisms, extending its capability for long-video sequencing, and enabling lightweight deployment on edge devices.
Image Deraining Driven by CLIP Visual Embedding
SUN Jin, CUI Yuntong, TIAN Hongwei, HUANG Changcheng, WANG Jigang
Available online  , doi: 10.11999/JEIT251066
Abstract:
  Objective  Rain streaks introduce visual distortions that degrade image quality, which significantly impairs the performance of downstream vision tasks such as feature extraction and object detection. This work addresses the problem of single-image rain streak removal. Existing methods often rely heavily on restrictive priors or synthetic datasets, resulting in limited robustness and generalization capabilities due to the discrepancy with complex, unstructured real-world scenarios. CLIP demonstrates remarkable zero-shot generalization capabilities through large-scale image-text cross-modal contrastive learning. Motivated by this, we propose FCLIP-UNet, a visual-semantic-driven deraining architecture, for enhanced rain removal and improved generalization in real-world rainy environments.  Methods  FCLIP-UNet adopts the U-Net encoder-decoder architecture and reformulates deraining as pixel-level detail regression guided by high-level semantic features. In the encoding stage, dispensing with textual queries, FCLIP-UNet employs the first four layers of the frozen CLIP-RN50 to extract robust features decoupled from rain distribution, leveraging their semantic representation capacity to suppress diverse rain patterns. To guide image restoration accurately, we adopt a collaborative architecture of ConvNeXt-T and UpDWBlock at the decoding stage. The decoder utilizes ConvNeXt-T, replacing the traditional convolutional modules, to expand the receptive field for capturing global contextual information, and it parses rain streak patterns by leveraging the semantic priors extracted from the encoder. Under the constraint of such semantic priors, UpDWBlock reduces the information loss caused by upsampling and reconstructs fine-grained image details. Multi-level skip connections are employed to compensate for the information loss incurred in the encoding stage, and a layer-wise differentiated feature perturbation strategy is embedded to further enhance the model’s robustness and adaptability in complex real-world rainy scenarios.  Results and Discussions  Comprehensive evaluations are conducted on the Rain13K composite dataset by benchmarking the proposed model against ten state-of-the-art deraining algorithms. FCLIP-UNet demonstrates consistently superior performance across all five testing subsets of Rain13K. Notably, FCLIP-UNet outperformed the second-best method on both datasets: on Test100 by 0.32 dB (PSNR) and 0.06 (SSIM); on Test2800 by 0.14 dB and 0.002. On Rain100H and Rain100L, FCLIP-UNet achieved competitive results, with the best SSIM on Rain100H and comparable performance on other metrics. (Table 3). To evaluate model generalization, the Rain13K-pretrained FCLIP-UNet was quantitatively evaluated on three datasets with distinct rainfall distribution characteristics: SPA-Data, HQ-RAIN, and MPID (Table 4, Fig. 7). Qualitative and quantitative assessments were also conducted using the real-world NTURain-R dataset (Table 5, Figs. 810). Both results consistently demonstrated FCLIP-UNet's robust generalization capability. Ablation experiments on Rain100H validate the proposed encoder architecture and the effectiveness of both the UpDWBlock and LDFPS (Tables 6-8). Furthermore, ablation results demonstrated that employing LDFPS, combined with a 1:1 weighting ratio between L1 loss and perceptual loss, yielded optimal performance of FCLIP-UNet (Tables 9-11).  Conclusions  This work introduces FCLIP-UNet, a deraining network targeting real-world generalization, by leveraging the contrastive language–image pre-training (CLIP) paradigm. The main contributions are threefold. First, image deraining is reformulated as a pixel-level regression task aimed at reconstructing rain-free images based on high-level semantic features; a frozen CLIP image encoder is employed to extract representations robust to rain-distribution variations, thereby mitigating domain shifts induced by diverse rain models. Second, a decoder integrating ConvNeXt-T with an upsampling depthwise convolution block (UpDWBlock) is designed, and a layer-wise differentiated feature perturbation strategy (LDFPS) is introduced to enhance robustness against unseen rain distributions. Third, a composite loss function is constructed to jointly optimize pixel-wise accuracy and perceptual consistency. Quantitative and qualitative experiments on both synthetic and real-world rainy datasets demonstrate that FCLIP-UNet effectively removes rain streaks while preserving fine image details and exhibits superior deraining performance and strong generalization capability.
ReXNet: A Trustworthy Framework for Space-Air Security Integrating Uncertainty Quantification and Explainability
LIU Zhuang, CHEN Yuran, ZHANG Jiatong, JIANG Yujing, WANG Xuhui
Available online  , doi: 10.11999/JEIT251159
Abstract:
  Objective  The space-air-ground integrated network (SAGIN) has emerged as a new strategic infrastructure for national development, yet its security vulnerabilities are becoming increasingly prominent. Each layer of the SAGIN, namely the physical, network, and application layers, faces distinct security challenges that require targeted solutions. Given the high demand for both predictive accuracy and decision transparency in aerospace scenarios, there is an urgent need for more robust, reliable, and interpretable intelligent techniques to ensure network security and trustworthiness.  Methods  This study proposes a detection framework that deeply integrates Uncertainty Quantification (UQ) and Explainable Artificial Intelligence (XAI). On the front end, the framework employs a Bayesian deep learning approach based on Monte Carlo Dropout, enabling probabilistic modeling of predictions. This allows for the separation and quantification of epistemic uncertainty and aleatoric uncertainty, thereby improving model reliability. On the back end, SHAP and LIME are incorporated to provide clear and trustworthy feature attribution for each model decision, enhancing interpretability and transparency. Moreover, the middle layer of the framework allows flexible substitution of specific deep learning backbones to adapt to various space and aerospace application scenarios.  Results and Discussions  Extensive experiments were conducted on representative space–air security datasets, including UAV swarm fault detection, ADS-B injection attacks , and network fraud detection . The results demonstrate that the proposed framework achieves high-precision anomaly detection while effectively evaluating prediction confidence and identifying unknown samples beyond the model’s knowledge boundaries. Furthermore, the framework provides logically consistent and traceable explanations for model decisions, offering both interpretive depth and operational reliability. These results confirm that the joint use of UQ and XAI significantly enhances the robustness and trustworthiness of intelligent models in aerospace security applications.  Conclusions  This study systematically enhances the reliability and transparency of anomaly detection models in the space-air domain, marking a paradigm shift in the application of artificial intelligence from solely pursuing high accuracy to emphasizing high trustworthiness. Future work will focus on advancing the framework toward real-world deployment, emphasizing real-time processing, lightweight implementation, and resource-constrained environments such as on-orbit or onboard systems. These efforts aim to enable SAGINs to operate with greater security, autonomy, and efficiency, contributing to the sustainable and intelligent development of future space–air information networks.
PSAQNet: A Perceptual Structure Adaptive Quality Network for Authentic Distortion Oriented No-reference Image Quality Assessment
JIA Huizhen, ZHAO Yuxuan, FU Peng, WANG Tonghan
Available online  , doi: 10.11999/JEIT251220
Abstract:
  Objective  No-reference image quality assessment (NR-IQA) is critical for practical imaging systems, especially when pristine references are unavailable. However, many existing methods face three main challenges: limited robustness under complex distortions, poor generalization when distortion distributions shift (e.g., from synthetic to real-world settings), and insufficient modeling of geometric/structural degradations, such as spatially varying blur, misalignments, and texture–structure coupling. These issues cause models to overfit dataset-specific statistics, leading to poor performance when confronted with diverse scenes and mixed degradations. To address these problems, we propose the Perceptual Structure-Adaptive Quality Network (PSAQNet), which aims to improve both the accuracy and adaptability of NR-IQA under complex distortion conditions.  Methods  PSAQNet is a unified CNN-Transformer framework designed to retain hierarchical perceptual cues while enabling global context reasoning. Instead of relying on late-stage pooling, it progressively enhances distortion evidence throughout the network. The core of PSAQNet includes several key components: the Advanced Distortion Enhanced Module (ADEM), which operates on multi-scale features from a pre-trained backbone and utilizes multi-branch gating along with a distortion-aware adapter to prioritize degradation-related signals while minimizing content-dominant interference. This module dynamically selects feature branches that align with perceptual degradation patterns, making it effective for handling spatially non-uniform or mixed distortions. To model geometric degradations, PSAQNet integrates Spatial-guided Convolution (SGC) and Channel-aware Adaptive Kernel Convolution (CA_AK), where SGC enhances spatial sensitivity by guiding convolutional responses with structure-aware cues, focusing on regions where geometric distortion is significant, while CA_AK refines geometric modeling by adjusting receptive behavior and recalibrating channels to preserve distortion-sensitive components. Additionally, PSAQNet incorporates efficient fusion techniques like GroupCBAM, which enables lightweight attention-based fusion of multi-level CNN features, and AttInjector, which selectively injects local distortion cues into global Transformer representations, ensuring that global semantic reasoning is directed by localized degradation evidence without causing redundancy or instability.  Results and Discussions  Comprehensive experiments on six benchmark datasets, including both synthetic and real-world distortions, demonstrate that PSAQNet achieves strong performance and stable agreement with human subjective judgments. PSAQNet outperforms several recent methods, especially on real-world distortion datasets. This indicates that PSAQNet effectively enhances distortion evidence, models geometric degradation, and selectively bridges local distortion cues with global semantic representations. These contributions enable PSAQNet to maintain robustness under distribution shifts and avoid over-reliance on narrow distortion priors. The ablation studies verify the contributions of individual modules. ADEM improves distortion saliency, SGC and CA_AK enhance geometric sensitivity, and GroupCBAM and AttInjector strengthen the synergy between local and global cues. Cross-dataset evaluations confirm PSAQNet's generalization capabilities across various content categories and distortion types. Scalability tests also show that PSAQNet benefits from stronger pre-trained models without compromising its modular design.  Conclusions  PSAQNet effectively addresses key limitations in NR-IQA by synergizing local distortion enhancement, geometric alignment, and global semantic fusion. Its modular design ensures robustness and generalization across diverse distortion types, providing a practical solution for real-world applications. Future work will explore vision-language pre-training to further enhance cross-scene adaptability.
Research on the Architecture of Dual-Field Reconfigurable Polynomial Multiplication Unit for Lattice-Based Post-Quantum Cryptography
CHEN Tao, ZHAO Wangpeng, BIE Mengni, LI Wei, NAN Longmei, DU Yiran, FU Qiuxing
Available online  , doi: 10.11999/JEIT250929
Abstract:
  Objective  Polynomial multiplication accounts for over 80% of computational time in lattice cryptography algorithms. Utilizing the Nhanh Transform (NTT) and Fast Fourier Transform (FFT) can reduce the computational complexity of polynomial multiplication from exponential to logarithmic. However, current mainstream lattice cryptography algorithms such as Kyber, Dilithium, and Falcon exhibit significant differences in their parameter sets and polynomial multiplication implementations. To support multi-parameter polynomial multiplication operations and enhance resource utilization for polynomial multiplication, this paper proposes a dual-field reconfigurable polynomial multiplication unit architecture.  Methods  This paper first extracts the computational network for polynomial multiplication based on the parameter characteristics of the Kyber, Dilithium, and Falcon algorithms, and optimizes the internal dual-field multiplication operations at the algorithmic level. Secondly, it designs a dual-field reconfigurable polynomial multiplication unit architecture for the polynomial multiplication network and further optimizes the dual-field reconfigurable multiplication unit to enhance computational speed. Finally, to improve resource utilization of the computational unit architecture, the paper conducts a parallelism analysis. The polynomial multiplication architecture achieves the highest area efficiency when supporting 1-lane 64-bit, 2-lane 32-bit, or 4-lane 16-bit operations.  Results and Discussions  The paper was experimentally verified on the Xilinx FPGA XC7V2000TFLG1925. It simultaneously supports one channel of complex-form floating-point operations or two channels of 17- to 32-bit and four channels of 16-bit internal NTT operations. Operating at a frequency of 169 MHz, it achieves a reduction of over 50% in area-time product.  Conclusions  The dual-field reconfigurable processing unit architecture proposed in this paper offers advantages in scalability, area efficiency, and core unit performance. Its bit width configuration is more easily adaptable to traditional cryptographic processors, providing a recommended approach for transitioning traditional public-key cryptosystems to post-quantum cryptography.
Delay Deterministic Routing Algorithm Based on Inter-Controller Cooperation for Multi-Layer LEO Satellite Networks
HUANG Longhui, DING Xiaojin, ZHANG Gengxin
Available online  , doi: 10.11999/JEIT251100
Abstract:
Objective The massive scale and large number of satellites in multi-layer Low Earth Orbit (LEO) constellations result in highly dynamic network topologies. Coupled with the time-varying traffic load, this leads to temporal fluctuations in satellite network resources (e.g., available link queue sizes, available link bandwidth), making it challenging to establish stable end-to-end transmission paths and guarantee Quality of Service (QoS). To address these issues, this study introduces Software-Defined Networking (SDN) into multi-layer LEO constellations. By leveraging SDN controllers to collect network state information, unified management of network resources is achieved. The constellation is partitioned into regions, with a controller deployed in each region to manage the entire constellation. Furthermore, a deterministic delay routing algorithm is designed within the SDN controller to compute inter-region transmission paths for traffic, thereby meeting its deterministic delay requirements.Methods This paper proposes a deterministic delay routing algorithm for multi-layer LEO constellations based on controller collaboration. Firstly, a regional division strategy and controller deployment scheme are proposed, dividing the satellite network into multiple regions, each managed by an assigned controller. Subsequently, criteria are established for Inter-Satellite Links (ISLs) between satellites within the same layer and across different layers to characterize link communication states. Finally, a Time-Varying Graph (TVG) model is employed to represent the network topology and link resource attributes, including bandwidth, queue size, and link duration. This is combined with a multi-destination LaGrange relaxation method to optimize path selection, ensuring the chosen paths satisfy both delay and delay jitter constraints. Through collaboration between adjacent regional controllers, which exchange state information, the proposed algorithm enables the computation of feasible inter-region paths.Results and Discussions To validate the effectiveness of the proposed method, a simulation system for multi-layer LEO constellations was designed, and the algorithm’s performance was tested under different data transmission rates. Compared to IUDR, the proposed method significantly enhances network performance by reducing end-to-end delay, delay jitter, and packet loss rate, while improving throughput. Specifically, at a data transmission rate of 3 Mbps, the average end-to-end delay was reduced by 16.0% (Fig. 3(a)), delay jitter by 37.9% (Fig. 3(b)), packet loss rate by 37.2% (Fig. 3(c)), and throughput increased by approximately 2%(Fig. 3(d)). Regarding signaling overhead, the proposed algorithm achieves a higher Reduction-Improvement Gain Ratio, improved by approximately 111.8% compared to IUDR, indicating the superior comprehensive performance of the DDRA-ICC. Additionally, the proposed method exhibits lower time complexity for route computation compared to IUDR.Conclusions To solve the problem of deterministic delay for traffic transmission in multi-layer LEO constellations, this study proposed a controller collaboration-based deterministic delay routing algorithm. Performance evaluation under different load scenarios demonstrates that: (1) Compared to IUDR, the proposed algorithm reduces average end-to-end delay, delay jitter, and packet loss rate by 16.0%, 37.9%, and 37.2% respectively, while increasing average throughput by approximately 2%. (2) While the additional overhead increase of the DDRA-ICC is comparable to IUDR, it further reduces the packet loss rate to 2.96%—a reduction of 52.49%—and achieves a Reduction-Improvement Gain Ratio of 1.97. This indicates lower packet loss, a higher Reduction-Improvement Gain Ratio, and demonstrates a better balance between overhead and reliability, granting it greater advantage in ensuring deterministic traffic transmission. Future work could incorporate more practical factors, such as the impact of satellite node failures on network performance, to further enhance network capabilities.
A Complexity-Reduced Active Interference Cancellation Algorithm in f-OFDM
CHEN Hao, WEN Jiangang, ZOU Yuanping, HUA Jingyu, SHENG Bin
Available online  , doi: 10.11999/JEIT251172
Abstract:
  Objective  Due to spectrum scarcity and diverse communication needs, there is an urgent need for a waveform technology with high spectral efficiency, flexible subband configuration, and support for asynchronous communication in the Sixth Generation mobile communication (6G). Among the candidate waveforms, filtered Orthogonal Frequency Division Multiplexing (f-OFDM) has emerged as a promising solution that satisfies all these requirements. By applying subband filtering, capabilities such as flexible subband configuration as well as asynchronous transmission are achieved by f-OFDM. Nevertheless, this filtering mechanism inevitably introduces a certain level of intrinsic interference into the system. Notably, a dominant component of such interference is the Inter-subband Interference (ITBI), which primarily caused by the Out-Of-Band Emission (OOBE) leaking from adjacent subbands. Therefore, subband OOBE suppression plays a crucial role in reducing ITBI and enhancing the performance of f-OFDM systems. Based on the system structure of f-OFDM, the study proposes a Complexity-Reduced Active Interference Cancellation (CRAIC) algorithm in f-OFDM, so as to suppress the OOBE of f-OFDM subbands, and then enhance the system performance.  Methods  First, base on the spectral structure of f-OFDM, the paper exploits a subset of data subcarriers within the target subband to generate Cancellation Carriers (CCs). Then, the CRAIC optimization model for f-OFDM systems is constructed under the constraint of CCs power, and the cost function is defined addressing the superposed spectrum of both data subcarriers and CCs at Desired Frequency Points (DFPs). Second, by introducing the real-complex domain transformation and the reformulation of optimization model, the above complex-domain CRAIC programming problem is transformed into a real-domain Second-Order Cone Programming (SOCP) problem, enabling an efficient solution. Furthermore, through computer simulations, the impact of key parameters on CRAIC performance has been evaluated, including the number of cancellation carriers (\begin{document}$ M $\end{document}), the number of data subcarriers involved in CCs generation (\begin{document}$ K $\end{document}), and the number of DFPs (\begin{document}$ Q $\end{document}). Then practical recommendations are provided for the rational configuration of CRAIC parameters in f-OFDM systems.  Results and Discussions  The simulation results demonstrate that in the edge region of the adjacent subband, the proposed CRAIC algorithm exhibits the steepest PSD roll-off rate compared to the conventional ZP and Origin schemes. This indicates that CRAIC possesses the strongest ITBI suppression capability in such region, consequently achieving the lowest Edge Subcarriers (ESs) Bit Error Rate (BER) of the adjacent subband. Specifically, CRAIC achieves a maximum PSD reduction of 4 dB and 12 dB relative to ZP and Origin (Fig. 2a). This is attributed to the fact that the right Q/2 DFPs largely fall within the edge region of SB2, leading to effective suppression of the spectrum in this area. Consequently, in terms of the BER at the edge of SB2, RCAIC achieves a significantly lower BER compared to Origin. A visible performance improvement is also observed relative to ZP (Fig. 3a). Furthermore, the impact of key parameters, i.e. \begin{document}$ M $\end{document}, \begin{document}$ K $\end{document} and \begin{document}$ Q $\end{document} is evaluated through computer simulations. The results indicate that while increasing \begin{document}$ M $\end{document} continuously improves the OOBE suppression capability (Fig. 4a), it concurrently leads to a gradual degradation in spectral efficiency. In contrast to \begin{document}$ M $\end{document}, increasing \begin{document}$ K $\end{document} and \begin{document}$ Q $\end{document} exhibits a marginal effect on enhancing the performance of CRAIC, continual increments beyond a certain point do not yield sustained performance improvements (Fig. 5a and Fig. 6a). Based on these analyses, we consider \begin{document}$ M=4 $\end{document}, \begin{document}$ K=8 $\end{document}, \begin{document}$ Q=4 $\end{document} to be a typical parameter configuration in the scenario of the paper. Under this typical setting, CRAIC (with \begin{document}$ K=8 $\end{document}) achieves significant performance gains in ESs BER compared to Origin and ZP (Fig. 8a), while maintaining nearly the same Internal Subcarriers (ISs) BER performance as the two benchmark schemes (Fig.8b). Even when compared to the full-scale CRAIC (\begin{document}$ K=20 $\end{document}), CRAIC (\begin{document}$ K=8 $\end{document}) achieves a remarkable 60% reduction in the size of the data subcarrier mapping matrix, while incurring merely limited degradation in BER performance (Fig. 8a). This result convincingly demonstrates that the proposed algorithm effectively preserves the performance of the full-scale AIC while substantially lowering its computational complexity.  Conclusions  An algorithm named CRAIC for filtered OFDM systems is investigated in the paper. The proposed CRAIC optimization model is constructed under the constraint of CCs power, in which the cost function is defined addressing the superposed spectrum of a subset of data subcarriers as well as CCs at DFPs. Through designed real-imaginary domain conversion and model reformulation, the complex-domain optimization problem is then converted into a real-domain SOCP problem. Simulation results demonstrate that the CRAIC algorithm significantly reduces the PSD of the target subband, particularly in the transition region of the adjacent subband, leading to notable improvements in edge BER. Furthermore, the influence of key parameters is evaluated. The results indicate that increasing \begin{document}$ M $\end{document} enlarges the performance gain of CRAIC over ZP, though at the expense of reduced spectral efficiency. While a larger \begin{document}$ K $\end{document} improves the OOBE suppression capability, with diminishing marginal returns and increased computational complexity. Moreover, simply increasing \begin{document}$ Q $\end{document} does not yield continuous PSD reduction. In summary, the application of CRAIC in f-OFDM systems enhances isolation between subbands and reduces ITBI, then improves the system performance.
Research on Time Slots Aggregation and Topology Aggregation Model for Unmanned Aerial Vehicle Swarm Overall Time Synchronization
WANG Zhenling, TAO Haihong, WEI Haitao, WANG Zhengyong
Available online  , doi: 10.11999/JEIT251274
Abstract:
  Objective  Unmanned Aerial Vehicle (UAV) swarm are capable of overcoming the technical and performance constraints inherent to individual UAVs and enabling the execution of complex missions that are beyond the reach of single-platform systems. High-precision time synchronization across swarm nodes serves as a critical foundational requirement for core swarm operations, including resource scheduling, cooperative positioning, and multi-node data fusion. However, existing research on time synchronization for UAVs is predominantly confined to optimizing the accuracy of fundamental time synchronization approaches, and there are certain limitations in adapting to the topological changes during UAV swarm formation flights and achieving global synchronization among multiple nodes. As the scale of UAV swarm continues to expand, the connectivity of time comparison links between nodes during the formation flight of UAV swarm exhibits obvious time-varying characteristics, thereby posing challenges to the achievement of continuous, reliable, and precise overall time synchronization. For the scenarios of stable formation flight and formation transformation in different mission phases of UAV swarm, the Observation Time Slot Aggregation (OTSA) model and the Time-Varying Topology Aggregation (TVTA) model have been introduced for effectively enhance the robustness of global time synchronization among UAV swarm nodes and improve the Time Synchronization Accuracy (TSA) to a certain extent. This research aims to provide an effective solution for the Leader-Following Consistency Time Synchronization (LFCTS) of UAV swarm, and offer valuable references for other applications of time synchronization in heterogeneous and distributed systems.  Methods  Compared with the traditional Quasi Real-time Bidirectional Time Comparison (QRBTC) scheme, the time synchronization method based on OTSA model makes full use of all the synchronization signal transmission and reception link resources within every time-slot of the system synchronization period. Based on the "one transmission and multiple receptions" mechanism of all nodes, the Follower Node (FN) can achieve direct synchronization or single-hop indirect synchronization towards Leader Node (LN) in each time slot according to the OTSA model, thereby obtaining tens of times more clock skew observation samples than the traditional QRBTC scheme. The OTSA method not only enhances the robustness of global time synchronization, but also can further conduct secondary data processing through multiple time-slot synchronization samples and achieve a higher TSA than the QRBTC method. Based on the results of LFCTS for the signal synchronization period of system, the TVTA model achieves an expansion from the direct comparison and single-hop indirect comparison of the OTSA model to the cross-period multi-hop comparison, and thereby being able to solve the problem of overall time synchronization instability caused by the time-varying characteristics of the synchronization link relationship during the process of takeoff, assembly and formation transformation of the UAV swarm.  Results and Discussions  All of the time comparison link resources of total time-slot were fully utilized during the synchronization period in the OTSA method (Figure 2). Through the construction of the error model and simulated analysis, in the case of a UAV swarm configuration with 50 nodes and a time slot allocation of 20ms, the time synchronization based on the OTSA model can achieve a single time slot TSA of 4.10 to 4.27ns (Figure 6) and an overall TSA of 2.46 to 2.56 ns within a complete time synchronization period, which is superior to the QRBTC scheme under the same conditions (Figure 5(a)). The TVTA method fully utilizes the cross-period time synchronization comparison relationship to construct a time comparison link for multi-hop paths (Figures 3 and Figure 4). When the FN obtains the external comparison relationships of other nodes through aggregation processing, it can further utilize the one-way or two-way Dijkstra's algorithm to obtain the multi-hop comparison link with optimal connectivity, and complete the time tracing and comparison for LN through edge computing. The error calculation indicates that during the processes of takeoff, assembly, and the transition of triangle formation or rhombus formation, the time synchronization based on the TVTA model can achieve an overall TSA of better than 8.6ns, and which can provide stronger overall time synchronization capabilities.  Conclusions  This paper aims to address the robustness issue of time synchronization in the formation flight of UAV swarm. For the stable formation flight of UAV swarm and the formation transformation scenarios in different mission stages, the OTSA model and TVTA model were proposed, and the error model was constructed as well as the performance was analyzed. The results show that: (1) The OTSA model enhances the robustness of overall time synchronization with directly comparing and single-hop indirect comparison of multiple time slots within a time synchronization period. It can achieve an overall TSA of better than 2.5ns, which performance is outperforming the traditional QRBTC method; (2) The TVTA model achieves overall time synchronization of UAV swarm through multi-hop relay between nodes. Even when the time comparison link is subject to changes, it can still achieve a global time synchronization accuracy of better than 8.6ns. (3) These two methods fully take into account the time-varying characteristics of the comparison links between the nodes of the UAV swarm and which have been confirmed through a small-scale UAV swarm flight tests. These two methods can ensure the robustness and performance and providing necessary guarantees for the close coordination tasks of the UAV swarm. Subsequent research will further proceed with work aimed at practical flight verification, adaptation capabilities in complex scenarios, and improvement of overall accuracy.
Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation for Stereoscopic Image Generation from a Single Image
ZHANG Chunlan, QU Yuwei, NIE Lang, LIN Chunyu
Available online  , doi: 10.11999/JEIT250760
Abstract:
  Objective  The generation of stereo images from a single image typically relies on the depth as a prior, often leading to issues such as geometric misalignment, occlusion artifacts, and texture blurring. To address these issues, research in recent years has shifted towards end-to-end learning of alignment transformation and rendering within the image or feature domain. By adopting a content-based feature transformation and alignment mechanism, high-quality novel images can be generated without relying on explicit geometric information. However, there are still three key challenges: (1) The limited ability of fixed convolution in modeling large-scale geometry and disparity changes, which limits the effectiveness of feature alignment; (2) In network representation, texture information and structural information are coupled with each other, lacking hierarchical modeling and dynamic fusion mechanisms, making it difficult to simultaneously preserve fine details and semantic consistency; (3) The existing supervision strategies mainly focus on reconstruction errors, with insufficient constraints on the intermediate alignment process, thereby reducing the efficiency of cross-view feature consistency learning. To address these challenges, this paper proposes a multi-scale deformable alignment aware bidirectional gated feature aggregation network for generating stereoscopic images from a single image.  Methods  Firstly, in order to overcome the problem of image misalignment and distortion caused by the inability of fixed convolution to adapt to geometric deformation and disparity changes, a Multi-Scale Deformable Alignment module (MSDA) is introduced, which uses multi-scale deformable convolution to adaptively adjust the sampling position according to the content, and adapts to the source and target features at different scales. Secondly, in response to the problems of texture detail blur and structural distortion in synthesized images, a feature decoupling strategy is proposed to constrain the shallow learning of texture and deep modeling of structure in the network. A texture structure Bidirectional Gating Feature Aggregation module (Bi-GFA) is constructed to achieve dynamic complementarity and efficient fusion of texture and structural information. Finally, to improve the alignment accuracy of cross view features, a learnable Alignment Guided Loss (LAG) function was designed to guide the alignment network to adaptively adjust the offset field at the feature level, enhancing the fidelity and semantic consistency of the synthesized image.  Results and Discussions  This study focuses on scene level image synthesis from a single image. The quantitative results indicate that the proposed method significantly outperforms all compared methods in terms of PSNR, SSIM, and LPIPS, and maintains stable performance under different dataset sizes and scene complexities, demonstrating strong generalization ability and robustness (Tab. 1 and Tab. 2). Qualitative comparison shows that the generated results are closest to the real image, with excellent overall sharpness and detail fidelity. In the outdoor KITTI dataset, the pixel alignment problem of foreground objects has been effectively solved (Fig. 4). In the indoor dataset, facial and hair textures are clearly and naturally restored, and high-frequency areas (such as champagne towers and balloon edges) exhibit clear contours, accurate color reproduction, and no obvious artifacts or blurring phenomena. Both global illumination and local structural details were well preserved, resulting in the highest perceptual quality (Fig. 5). The ablation study further validated the effectiveness of the proposed MSDA, Bi-GFA, and LAG modules (Tab. 3).  Conclusions  This article proposes a multi-scale deformable alignment aware bidirectional gated feature aggregation stereo image generation network to address issues such as strong dependence on depth truth, significant geometric misalignment and distortion, texture blurring, and structural distortion in the process of stereo image generation from a monocular perspective. Specifically, the Multi Scale Deformable Alignment Module (MSDA) effectively enhances the flexibility and accuracy of cross view feature alignment; The texture structure bidirectional gating feature aggregation module (Bi-GFA) achieves complementary fusion of texture details and structural information; The learnable alignment guided loss further optimizes the offset field estimation, thereby improving the fidelity and semantic consistency of the synthesized image. The experimental results show that the proposed method outperforms existing advanced methods in terms of structural reconstruction, texture clarity, and viewpoint consistency, and has strong generalization ability and robustness. In future work, we will explore the impact of different depth estimation methods on overall system performance, investigate more efficient network architectures and model compression strategies to reduce computational costs and achieve real-time stereo image generation.
Spherical Geometry-Guided and Frequency-Enhanced Segment Anything Model for 360° Salient Object Detection
CHEN Xiaolei, SHEN Yujie, ZHONG Zhihua
Available online  , doi: 10.11999/JEIT251254
Abstract:
  Objective  With the rapid development of VR and AR technologies and the increasing demand for omnidirectional visual applications, accurate salient object detection in complex 360° scenes has become critical for system stability and intelligent decision-making. The Segment Anything Model (SAM) demonstrates strong transferability across 2D vision tasks; however, it is primarily designed for planar images and lacks explicit modeling of spherical geometry, which limits its direct applicability to 360° salient object detection (360° SOD). To address this challenge, this work explores integrating SAM’s generalization capability with spherical-aware multi-scale geometric modeling to advance 360° SOD. Specifically, a Multi-Cognitive Adapter (MCA), Spherical Geometry-Guided Attention (SGGA), and Spatial-Frequency Joint Perception Module (SFJPM) are introduced to enhance multi-scale structural representation, alleviate projection-induced geometric distortions and boundary discontinuities, and strengthen joint global–local feature modeling.  Methods  The proposed 360° SOD framework is built upon SAM and consists of an image encoder and a mask decoder. During encoding, spherical geometry modeling is incorporated into patch embedding by mapping image patches onto a unit sphere and explicitly modeling spatial relationships between patch centers, injecting geometric priors into the attention mechanism. This design enhances sensitivity to non-uniform geometric characteristics and mitigates information loss caused by omnidirectional projection distortions. The encoder adopts a partial freezing strategy and is organized into four stages, each containing three encoder blocks. Each block integrates MCA for multi-scale contextual fusion and SGGA to model long-range dependencies in spherical space. Multi-level features are concatenated along the channel dimension to form a unified representation, which is further enhanced by the SFJPM to jointly capture spatial structures and frequency-domain global information. The fused features are then fed into the SAM mask decoder, where saliency maps are optimized under ground-truth supervision to achieve accurate localization and boundary refinement.  Results and Discussions  Experiments are conducted using the PyTorch framework on an RTX 3090 GPU with an input resolution of 512×512. Evaluations on two public datasets (360-SOD and 360-SSOD) against 14 state-of-the-art methods demonstrate that the proposed approach consistently achieves superior performance across six evaluation metrics. On the 360-SOD dataset, the model attains an MAE of 0.0152 and a maximum F-measure of 0.8492, outperforming representative methods such as MDSAM and DPNet. Qualitative results show that the proposed method produces saliency maps highly consistent with ground-truth annotations, effectively handling challenging scenarios including projection distortion, boundary discontinuity, multi-object scenes, and complex backgrounds. Ablation studies further confirm that MCA, SGGA, and SFJPM contribute independently while complementing each other to improve detection performance.  Conclusions  This paper proposes a novel SAM-based framework for 360° salient object detection that jointly addresses multi-scale representation, spherical distortion awareness, and spatial-frequency feature modeling. The MCA enables efficient multi-scale feature fusion, SGGA explicitly compensates for ERP-induced geometric distortions, and SFJPM enhances long-range dependency modeling. Extensive experiments validate the effectiveness and feasibility of introducing SAM into 360° SOD. Future work will extend this framework to omnidirectional video and multi-modal scenarios to further improve spatiotemporal modeling and scene understanding.
Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification
CHEN Xiaohe, ZHANG Jiaang, LI Lingzhi, LI Guixiu, OU Zirong, BAO Yuehua, LIU Xinxin, YU Qiuchen, MA Yuhan, ZHAO Keyu, BAI Hua
Available online  , doi: 10.11999/JEIT250726
Abstract:
  Objective  As the mortality rates of cancer in China continue to rise, the significance of pathological image classification in cancer diagnosis is increasingly recognized. Pathological images are characterized by a multi-level structure. However, most existing methods primarily focus on the highest resolution of pathological images or employ simple feature concatenation strategies to fuse multi-scale information, failing to effectively utilize the multi-level information inherent in these images. Furthermore, existing methods typically employ random pseudo-bag division strategies to address the challenge of high-resolution pathological images. However, due to the sparsity of cancerous regions in positive slides, such random sampling often results in incorrect pseudo-labels and low signal-to-noise ratios, thereby posing additional challenges to classification accuracy. To address these issues, this study proposes a Hierarchical Fusion Multi-Instance Learning (HFMIL) method, integrating multi-level feature fusion with a pseudo-bag division strategy based on an attention evaluation function. It is designed to enhance the accuracy and interpretability of pathological image classification, thereby providing a more effective tool for clinical diagnosis.  Methods  A weakly supervised classification method based on a multi-level model is proposed in this study to leverage the multi-level characteristics of pathological images, thereby enhancing the performance of cancer pathological image classification. The proposed method is composed of three core steps. Initially, multi-level feature extraction is performed. Blank areas are removed from pathological images, and low-resolution images are segmented into image patches and then mapped to their corresponding high-resolution patches through index mapping. Semantic features are extracted to capture multi-level information, including low-resolution tissue structures and high-resolution cellular details. Subsequently, a pseudo-bag division method based on an attention evaluation function is employed. Classification scores for each image patch are computed through class activation mapping (CAM) to assess the importance of low-resolution features. Patches are ranked by scores, and potential positive features are selected to form pseudo-bags, while low-scoring features are discarded to ensure that pseudo-bags contain information relevant to pseudo-labels. Corresponding high-resolution pseudo-bags are then generated via feature index mapping, effectively addressing the issues of incorrect pseudo-labels and low signal-to-noise ratios. Finally, a two-stage classification model is developed. In the first stage, low-resolution pseudo-bags are aggregated using a gated attention mechanism for preliminary classification. In the second stage, a cross-attention mechanism is employed to fuse the most contributory low-resolution features with their corresponding high-resolution counterparts. The fused features are then concatenated with the aggregated high-resolution pseudo-bags to form a comprehensive image-level feature representation, which is input into a classifier for final prediction. Model training is conducted using a two-stage loss function, combining cross-entropy losses from low-resolution classification and overall classification to ensure effective integration of multi-level information. The method is experimentally validated on three pathological image datasets, demonstrating its effectiveness in weakly supervised pathological image classification tasks.  Results and Discussions  The proposed method is compared with several state-of-the-art weakly supervised classification methods, including ABMIL, CLAM, TransMIL, and DTFD. Evaluations are conducted on three pathological image datasets: the publicly available Camelyon16 and TCGA-LUNG datasets, and a private skin cancer dataset, NBU-Skin. Experimental results indicate that the proposed method achieves significant performance improvements on the test sets. On the Camelyon16 dataset, a classification accuracy of 88.3% and an AUC value of 0.979 are obtained (Table 2). On the TCGA-LUNG dataset, a classification accuracy of 86.0% and an AUC value of 0.931 are obtained (Table 2), surpassing comparative methods. On the NBU-Skin dataset, a classification accuracy of 90.5% and an AUC value of 0.976 are achieved for multi-classification tasks (Table 2).To further validate the effectiveness of the proposed approach, ablation studies are conducted to assess the necessity of the multi-level feature fusion and pseudo-bag division modules. The results demonstrate that the combination of these modules enhances classification performance. For instance, on the skin cancer dataset, the removal of the pseudo-bag division module was observed to reduce classification accuracy from 93.8% to 90.7%, and the subsequent removal of the multi-level feature fusion module further reduced it to 80.0% (Table 3). These findings collectively confirm the effectiveness of each component in the method.  Conclusions  A weakly supervised pathological image classification algorithm is proposed in this study, integrating multi-level feature fusion and an attention-based pseudo-bag division method. This approach effectively leverages multi-level information within pathological images and mitigates challenges related to incorrect pseudo-labels and low signal-to-noise ratios. Experimental results demonstrate that the proposed method outperforms existing approaches in terms of classification accuracy and AUC across three pathological image datasets. The primary contributions include: (1) A multi-level feature extraction and fusion strategy. Unlike existing strategies that primarily focus on the highest resolution or employ simple feature concatenation, this method deeply fuses feature information across different levels via a Cross-Attention Mechanism, effectively utilizing multi-scale information. (2) A pseudo-bag division method based on an attention evaluation function. By scoring features to identify potential positive regions and restructuring training samples via pseudo-bag division, this method not only maximizes the correctness of pseudo-labels through a top-k mechanism but also improves the signal-to-noise ratio by discarding low-scoring background noise. (3) Superior performance over all comparative models. The accuracy of weakly supervised pathological image classification is significantly improved, providing new insights for computer-aided cancer diagnosis. The following future research directions are proposed: (1) optimization of cross-level attention mechanisms; (2) extension of the framework to other medical imaging tasks, such as prognosis prediction or lesion segmentation; (3) designing more efficient feature extraction and fusion methods, and exploring their applications in other disease types and tasks, to better meet clinical needs.
UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular
YANG Miaoyan, FANG Xuming
Available online  , doi: 10.11999/JEIT250743
Abstract:
  Objective  In the internet of vehicle (IoV), utilizing unmanned aerial vehicle (UAV) to address the tidal wave of edge computing has become a key technology in the 6G field in recent years. However, when using deep reinforcement learning (DRL) to optimize system latency, the action space dimension grows exponentially with the number of vehicles, leading to training difficulties and slow convergence. Therefore, this paper proposes a two-layer hybrid solution for UAV-assisted mobile edge computing (MEC) based on DRL which called hybrid hierarchical deep reinforcement learning(HHDRL).  Methods  The proposed HHDRL algorithm employs a two-layer architecture to hierarchically solve complex optimization problems. The upper layer employs an agent based on proximal policy optimization (PPO) combined with a multi-head actor network to manage user offloading policy and UAV control policy. The N heads in this network handle offloading decisions for the N users (local processing, offloadi- -ng to associated CAPs or UAV). A UAV flight control head is responsible for selecting from a set of discrete acceleration actions to reflect actual control constraints. The lower layer employs a computation- -ally efficient greedy algorithm to prioritize resources based on task characteristics. This hybrid hierarchi- -cal approach avoids the high computational cost of resource allocation schemes based solely on DRL.  Results and Discussions  The performance of the proposed HHDRL scheme was verified through numerical simulations. The parameters used in the simulation include parameters related to the specific Rician fading channel, parameters related to the UAV flight energy consumption model, and system parameters(e.g., mission data size of 9-18 Mbits and mission complexity of 2000-3000 cycles/bit). Figure 3 shows a training convergence comparison between the HHDRL scheme and the original DRL algorithm, demonstrating that HHDRL consistently converges faster than the DRL scheme, despite achieving slightly lower final rewards compared to the pure DRL approach. Figure 4 illustrates the impact of the HHDRL architecture on user delay fairness; the comparison reveals that the introduction of the HHDRL framework does not compromise the user fairness performance inherent to the DRL method. The performance evaluation in Figure 5 shows that the proposed scheme reduces system latency by approximately 71%-91% compared to a random baseline, and 1%-12% compared to the original DRL algorithm. Figure 6 shows a training time analysis for different numbers of users. Across different numbers of users, the HHDRL scheme consistently has shorter training times than the DRL scheme. Furthermore, as the number of users increases, the HHDRL scheme's training time increases more slowly. This is attributed to the hybrid hierarchical algorithm network architecture, which simplifies the DRL output action space. When we replace the upper-layer algorithm from PPO with other DRL algorithm, we still outperform the random baseline, and achieve comparable performance to the non-hybrid-hierarchical approach. This demonstrates the effectiveness and universality of the hybrid hierarchical architecture in achieving significant training acceleration while maintaining performance. The system parameter sensitivity analysis in Figure 8 shows that computational resources have the most significant impact on latency performance, compared to user transmission power and system bandwidth. This is because computational latency typically accounts for a larger proportion than communication latency in task processing. Figure 9 shows the results of the UAV trajectory optimization. Figure 9(a) shows the change in the UAV's velocity over time, demonstrating that discrete acceleration control reflects actual control accuracy and response delay considerations rather than idealized instantaneous velocity changes. Figure 9(b) shows the X-coordinates of the UAV and user over time, illustrating that the UAV adaptively adjusts its position to match the changing user distribution while maintaining flight stability.  Conclusions  This paper proposes a HHDRL algorithm that integrates DRL with a greedy algorithm in a hierarchical framework to address the difficulty of training UAV-assisted MEC systems in IoV. Simulation results confirm that: (1) Compared with the DRL method, the proposed method significantly accelerates the training convergence speed and shortens the training time. (2) The system latency performance of the proposed algorithm is almost comparable to that of the pure DRL method, while significantly outperforming the heuristic baseline and random baseline algorithms. (3) The HHDRL framework is able to effectively manage user task offloading, computing node resource allocation, and joint optimization of UAV trajectories under practical operational constraints. Future work will extend the framework to apply to multi-UAV collaboration and consider more complex environments.
A Causality-Guided KAN Attention Framework for Brain Tumor Classification
FAN Yawen, WANG Xiang, YUE Zhen, YU Xiaofan
Available online  , doi: 10.11999/JEIT250865
Abstract:
  Objective  In recent years, Convolutional Neural Network (CNN)-based Computer-Aided Diagnosis (CAD) systems have advanced brain tumor classification. However, classification performance remains limited due to feature confusion and inadequate modeling of high-order interactions. To address these challenges, this study proposes an innovative framework that integrates causal feature guidance with a KAN attention mechanism. A novel metric, the Confusion Balance Index (CBI), is introduced to quantify real label distribution within clusters. Furthermore, a causal intervention mechanism is designed to explicitly incorporate confused samples, enhancing the model’s ability to distinguish causal variables from confounding factors. In addition, a KAN attention module based on spline functions is constructed to accurately model high-order feature interactions, thereby strengthening the model’s focus on critical lesion regions and discriminative features. This dual-path optimization approach, combining causal modeling with nonlinear interaction enhancement, improves classification robustness and overcomes the limitations of traditional architectures in capturing complex pathological feature relationships.  Methods  This study employs a pre-trained CLIP model for feature extraction, leveraging its representation capabilities to obtain semantically rich visual features. Subsequently, based on K-means clustering, the Confusion Balance Index (CBI) is introduced to identify confusing factor images, and a causal intervention mechanism is implemented to explicitly incorporate these confused samples into the training set. Based on this, a causal-enhanced loss function is designed to optimize the model’s ability to discriminate causal variables from confounding factors. Furthermore, to address insufficient high-order feature modeling, a Kolmogorov-Arnold Network (KAN)-based attention mechanism is further integrated. This module, built on spline functions, constructs flexible nonlinear attention representations to finely model high-order feature interactions. By fusing this module with the backbone network, the model achieves enhanced discriminative performance and generalization capabilities.  Results and Discussions  The proposed method significantly outperforms existing approaches across three datasets. On DS1, the model achieves 99.92% accuracy, 99.98% specificity, and 99.92% precision, surpassing benchmarks such as RanMerFormer (+0.15%) and SAlexNet (+0.23%), with improvements exceeding 2% over traditional CNN methods (95%–97%). Although Swin Transformers achieve 98.08% accuracy, their precision (91.75%) highlights the superior robustness of our method in reducing false detections. On DS2, the model attains 98.86% accuracy and 98.80% precision, outperforming the second-best RanMerFormer. On a more challenging in-house dataset, the model maintains 90.91% accuracy and 95.45% specificity, demonstrating generalization to complex scenarios. Performance gains are attributed to the KAN attention mechanism's high-order feature interaction modeling and the causal reasoning module's confounding factor decoupling. These components enhance focus on critical lesion regions and improve decision-making stability in complex scenarios. Experimental results validate the framework's superiority in brain tumor classification, offering reliable support for clinical precision diagnostics.  Conclusions  Experimental results substantiate that the proposed framework confers marked improvements in brain tumor classification, with the synergistic interaction between the causal intervention mechanism and the KAN attention module serving as the principal factor driving performance gains. Notably, these enhancements are achieved with negligible increases in model parameters and inference latency, thereby ensuring both efficiency and practicality. This study delineates a novel methodological paradigm for medical image classification and underscores its prospective utility in few-shot learning scenarios and clinical decision support systems.
A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning
LU Jiafa, TANG Kai, ZHANG Guoming, YU Xiaofan, GU Wenqi, LI Zhuo
Available online  , doi: 10.11999/JEIT250909
Abstract:
  Objective  The structuring of Traditional Chinese Medicine (TCM) electronic medical records (EMRs) is essential for enabling knowledge discovery, clinical decision support, and intelligent diagnosis. However, two significant barriers exist: (1) TCM EMRs are primarily unstructured free text and often paired with tongue images, which complicates automated processing; and (2) grassroots hospitals typically face limited GPU resources, preventing deployment of large-scale pretrained models. This study aims to resolve these challenges by proposing a lightweight multimodal model based on memory-constrained pruning. The approach is designed to retain near–state-of-the-art accuracy while dramatically reducing memory consumption and computation cost, thereby ensuring practical applicability in resource-limited healthcare settings.  Methods  A three-stage architecture is established, consisting of an encoder, a multimodal fusion module, and a decoder. For textual inputs, a distilled TinyBERT encoder is combined with a BiLSTM-CRF decoder to extract 23 categories of TCM clinical entities, including symptoms, syndromes, prescriptions, and herbs. For visual inputs, a ResNet-50 encoder processes tongue diagnosis images. A novel memory-constrained pruning strategy is introduced: an LSTM decision network observes convolutional feature maps and adaptively prunes redundant channels while retaining crucial diagnostic features. To expand pruning flexibility, gradient re-parameterization and dynamic channel grouping are employed, with stability ensured through a reinforcement-learning controller. In parallel, INT8 mixed-precision quantization, gradient accumulation, and dynamic batch pruning (DBP) are adopted to reduce memory usage. Finally, a TCM terminology–enhanced lexicon is incorporated into the encoder embeddings to address recognition of rare entities. The entire system is trained end-to-end on paired EMR–tongue datasets (Fig. 1), ensuring joint optimization of multimodal information flow.  Results and Discussions  Experiments are conducted on 10,500 de-identified EMRs paired with tongue images, collected from 21 tertiary hospitals. On an RTX 3060 GPU, the proposed model achieves an F1-score of 91.7%, with peak GPU memory reduced to 3.8 GB and inference speed improved to 22 records per second (Table 1). Compared with BERT-Large, memory consumption decreases by 75% and throughput increases 2.7×, while accuracy remains comparable. Ablation studies confirm the contributions of each module: the adaptive attention gating mechanism raises overall F1 by 2.8% (Table 2); DBP reduces memory usage by 40–62% with minimal accuracy loss and significantly improves performance on EMRs exceeding 5,000 characters (Fig. 2); and the terminology-enhanced lexicon boosts recognition of rare entities such as “blood stasis” by 6.2%. Moreover, structured EMR fields enable association rule mining, where the confidence of syndrome–symptom relationships increases by 18% (Algorithm 1). These findings highlight three main insights: (1) multimodal fusion with lightweight design yields clinical benefits beyond unimodal models; (2) memory-constrained pruning offers stable channel reduction under strict hardware limits, outperforming traditional magnitude-based pruning; and (3) pruning, quantization, and dynamic batching exhibit strong synergy when co-designed, rather than used independently. Collectively, these results demonstrate the feasibility of deploying high-performing TCM EMR structuring systems in real-world environments with limited computational capacity.  Conclusions  This work proposes and validates a lightweight multimodal framework for structuring TCM EMRs. By introducing memory-constrained pruning combined with quantization and dynamic batch pruning, the method significantly compresses the visual encoder while maintaining fusion accuracy between text and images. The approach delivers near–state-of-the-art performance with drastically reduced hardware requirements, enabling deployment in regional hospitals and clinics. Beyond immediate efficiency gains, the structured multimodal outputs enrich TCM knowledge graphs and improve the reliability of downstream tasks such as syndrome classification and treatment recommendation. The study thus provides both theoretical and practical contributions: it bridges the gap between powerful pretrained models and the limited hardware of grassroots medical institutions, and establishes a scalable paradigm for lightweight multimodal NLP in medical informatics. Future directions include incorporating additional modalities such as pulse-wave signals, extending pruning strategies with graph neural networks, and exploring adaptive cross-modal attention mechanisms to further enhance clinical applicability.
An Optimized Multi-Layer Equivalent Source Method for Spatial Continuation of Magnetic Anomalies in the Geomagnetic Background
GUAN Yu, ZHANG Huiqiang
Available online  , doi: 10.11999/JEIT250958
Abstract:
  Objective  Spatial continuation of magnetic anomalies is a pivotal technique in potential field data processing, serving as a prerequisite for geological interpretation and geomagnetic navigation. However, existing methods face inherent limitations: frequency-domain methods suffer from severe ill-posedness and high-frequency noise amplification during downward continuation, while traditional single-layer equivalent source methods often struggle to simultaneously fit multi-scale anomalies caused by sources at varying depths. Although the Multi-layer Equivalent Source (MES) model offers a solution for depth resolution, its application is hindered by the subjectivity in structural parameter setting and the instability of large-scale inversion, leading to the loss of high-frequency structural details. To overcome these bottlenecks, this study proposes an optimized MES method designed for high-precision continuation in complex geological environments. The method establishes an objective parameterization framework by combining Radially Averaged Power Spectrum (RAPS) analysis with Variational Mode Decomposition (VMD) to accurately separate sources. Furthermore, it introduces a collaborative inversion scheme based on the Fungal Growth Optimizer (FGO) and Preconditioned Conjugate Gradient (PCG) to adaptively optimize regularization parameters, thereby effectively suppressing ill-posedness and enhancing the robustness and fidelity of signal reconstruction under noisy conditions.  Methods  To achieve high-precision spatial continuation of magnetic anomalies, this study establishes a systematic four-step technical framework. (1) Model Construction: A multi-layer equivalent source (MES) model is constructed using a layered strategy, where uniformly magnetized rectangular prisms are selected as the fundamental source units to accurately represent subsurface field sources. (2) Parameter Configuration: To objectively determine model parameters, a hybrid approach integrating Radially Averaged Power Spectrum (RAPS) analysis and Variational Mode Decomposition (VMD) is proposed. RAPS is utilized to estimate the average depths of source layers by analyzing the slope variations in the logarithmic power spectrum. Subsequently, VMD decomposes the original magnetic signal into intrinsic mode functions corresponding to different depths, allowing for the precise calculation of layer thickness based on the ratio of Mean Total Horizontal Gradient (MTHD). (3) Collaborative Inversion: A robust inversion strategy is implemented by introducing the Fungal Growth Optimizer (FGO) into the Preconditioned Conjugate Gradient (PCG) method. Tikhonov regularization is employed to construct the objective function to mitigate the ill-posedness of the linear system. FGO adaptively searches for optimal hyperparameters—including the regularization parameter, step size scaling factor, and preconditioner weights—thereby balancing solution stability with convergence efficiency. (4) Comprehensive Validation: The method's effectiveness is rigorously verified through three stages: first, a theoretical model comprising five prisms is established to validate the reliability of the proposed method by benchmarking its continuation performance against single-layer, double-layer equivalent source models, and frequency-domain methods; second, the global EMAG2 magnetic anomaly model is used to test robustness under 5% Gaussian noise, ensuring stability in downward continuation; finally, real measured data from the Australian magnetic anomaly grid are applied. Two distinct sub-regions—a complex tectonic zone (Area A) and a gentle sedimentary basin (Area B)—are selected for downward continuation experiments (2000 m to 0 m), using quantitative indicators (RMSE, GOF) to demonstrate the method's universality across different geological textures.  Results and Discussions  The performance of the proposed method is validated through three progressive stages: (1) Theoretical Model Verification: The radial average logarithmic power spectrum (Fig. 3) and VMD analysis (Fig. 4) successfully identified three equivalent source layers, confirming the objectivity of the parameter configuration framework. The FGO-optimized inversion strategy accelerated convergence by approximately 5-6 times and reduced the residual norm by 13% compared to the traditional Conjugate Gradient (CG) method (Fig. 7). In the 100 m upward continuation (Fig. 8, Table 4) and downward continuation (Fig. 9, Table 5) experiments, the proposed method achieved the lowest RMSE and highest GOF, effectively overcoming the ill-posedness of frequency-domain methods and the large fitting errors of single/double-layer models.(2) Robustness Analysis: Using the EMAG2 data (Fig. 10), the method demonstrated exceptional anti-noise capabilities. Even with 5% Gaussian noise added to the 1000 m observation data, the downward continuation results remained stable without significant artifacts. Quantitative evaluation (Table 6) shows an RMSE of 7.36 nT and a GOF of 82.65%, verifying the method's robustness in low signal-to-noise ratio environments. (3) Generalization Verification: In the application to real Australian magnetic anomaly grid data, two distinct geological regions were analyzed (Fig. 11, Fig. 12). For Area B (Sedimentary Basin), characterized by smooth gradients, the method achieved high-fidelity reconstruction with a GOF of 84.28% and an RMSE of 29.06 nT. For Area A (Complex Tectonic Zone), despite the exponential decay of high-frequency signals, the method effectively recovered main structural features (GOF = 76.14%), although localized residuals occurred in high-gradient zones due to the physical limits of field transformation (Table 8). These results confirm the method's universality across diverse geological textures.  Conclusions  This study proposes a robust spatial continuation method for magnetic anomalies based on an optimized multi-layer equivalent source (MES) framework. By integrating Radially Averaged Power Spectrum (RAPS) analysis with Variational Mode Decomposition (VMD), the method establishes an objective parameterization scheme, effectively reducing the subjectivity in model construction. Furthermore, the introduction of the Fungal Growth Optimizer (FGO) into the inversion algorithm significantly enhances convergence speed and stability, successfully mitigating the ill-posedness inherent in downward continuation. Experimental results indicate that:(1)The method exhibits exceptional robustness, maintaining high signal fidelity even under 5% Gaussian noise interference, as verified by the EMAG2 model tests; (2) The method demonstrates excellent geological universality. In applications to real Australian aeromagnetic grid data, it achieves high-precision reconstruction in deep sedimentary basins (Area B) and effectively recovers main structural features in complex tectonic zones (Area A), outperforming traditional single-layer and frequency-domain methods. However, the method currently faces challenges regarding high memory consumption due to the storage of large-scale dense kernel matrices. Future research will focus on implementing matrix compression techniques or exploring matrix-free inversion strategies to further enhance computational efficiency for large-scale geomagnetic data processing.
Towards Privacy-Preserving and Lightweight Modulation Recognition for Short-Wave Signals under Channel Shifts
YAO Yizhou, DENG Wen, LI Baoguo
Available online  , doi: 10.11999/JEIT251017
Abstract:
  Objective  Existing short-wave signal modulation recognition methods based on the supervised learning paradigm typically assume that training data (source domain) and test data (target domain) follow identical distributions. However, short-wave channels are susceptible to ionospheric variations, leading to significant distribution discrepancies across domains, which consequently causes model performance degradation. Furthermore, deployment on the edge side of unmanned platforms is constrained by limited device resources, scarce labeled samples, and data privacy requirements. To address these challenges, a lightweight recognition method based on source-model transfer is proposed in this paper, enabling privacy-preserving model adaptation without the need to access source domain data.  Methods  A multi-modal source-model transfer framework (M-SMOT) is developed, which utilizes information maximization loss and self-supervised pseudo-labeling techniques to facilitate model adaptation without revisiting source domain data. This approach achieves effective cross-channel recognition of short-wave modulation signals while reducing computational resource consumption and preserving data privacy. Additionally, multi-modal information—comprising in-phase/quadrature (I/Q) components, amplitude-phase (AP) characteristics, and spectral features—is fused to leverage complementary feature representations, thereby enhancing the robustness of the recognition network against complex channel variations.  Results and Discussions  Experimental results demonstrate that the recognition performance of the proposed method consistently surpasses that of the Source-Only baseline across six cross-channel scenarios, with improvements ranging from 0.31% to 10.81% (Table 1). In terms of few-shot adaptation, average recognition accuracies are maintained at 98.3% and 96% relative to the full-sample baseline, even when target domain training samples are reduced to 10% and 1%, respectively (Fig. 12). Ablation studies verify the necessity and effectiveness of the self-supervised pseudo-labeling module (Fig. 16) and the multi-modal fusion strategy (Fig. 17), confirming that both components contribute to the overall performance. Furthermore, the lightweight advantages are quantified: the method requires zero storage for source data, exhibits a peak memory consumption of only 6.00 MB, and achieves convergence within a single fine-tuning epoch (Table 2). These findings validate the capability of the proposed mechanism to mitigate domain discrepancies and protect privacy under resource-constrained conditions.  Conclusions  The M-SMOT method successfully integrates data privacy protection, source model adaptation, few-shot generalization, and low resource consumption. Consequently, it provides a practical solution for cross-channel modulation recognition in short-wave communications, demonstrating significant potential for deployment on resource-limited edge devices.
Indoor Visible Light Positioning Based on CNN–MLP Multi-Feature Fusion under Random Receiver Tilt Conditions
JIA Kejun, WANG Jian, MAO Lifei, YOU Wei, HUANG Ziyang, PENG Duo
Available online  , doi: 10.11999/JEIT251021
Abstract:
  Objective  Traditional visible light positioning (VLP) methods based on received signal strength (RSS) suffer from instability when the receiver experiences orientation perturbations, which disrupt the correspondence between optical power and spatial position, making reliable three-dimensional (3D) positioning difficult to achieve. Existing approaches typically rely on inertial measurement units (IMUs) to obtain orientation information; however, sensor fusion increases system complexity and hardware cost and introduces cumulative errors. To address these issues, this paper proposes a positioning method that fuses cosine-of-incidence-angle estimation based on a photodiode (PD) array with RSS information, enabling high-accuracy 3D indoor positioning under receiver orientation perturbations.  Methods  In the proposed fusion-based positioning method, a multi-PD array structure is first adopted, and a local coordinate system (LCS) is established at the array center. Constraint equations are then constructed based on the differences in received optical power among PDs in the array. A Gauss–Newton iterative algorithm is employed to estimate the incident light direction vector. By exploiting the orthogonal rotation invariance between the LCS and the global coordinate system (GCS), the cosine of the incident angle is estimated without the need for orientation sensors. Subsequently, a serial CNN–MLP fusion network is constructed, in which the estimated incident-angle cosine is introduced as an additional positioning feature on top of RSS-based localization. The network jointly models the RSS and incident-angle cosine information received by the PD array and maps them to 3D spatial coordinates. Finally, training samples are generated using Latin hypercube sampling (LHS) to uniformly sample spatial positions and orientation dimensions, thereby improving the representativeness of the training dataset.  Results and Discussions  Simulation experiments are conducted in a 4 m × 4 m × 2.5 m indoor environment. First, the effects of different numbers of PDs and tilt angles on the accuracy of incident-angle cosine estimation and spatial coverage are evaluated (Fig. 6), and the cumulative distribution functions (CDFs) of positioning errors under different array configurations are compared (Fig. 7). The results show that a 3-PD array with a tilt angle of 40° achieves the best balance among cost, coverage, and positioning accuracy. Next, positioning performance under different receiver tilt angles is analyzed. When the tilt angle is small, more than 70% of positioning errors are below 5 cm; even when the receiver is tilted up to 55°, the average error remains within 11.7 cm (Fig. 8). Error component comparisons indicate that the error along the Z-axis is significantly smaller than those along the X and Y axes (Fig. 9). Further tests are conducted at a height of 0.0 m covered by the training data and at an unseen height of 0.6 m not included in the training set (Fig. 10). The results demonstrate that the proposed model does not exhibit strong dependence on a specific height plane and maintains stable 3D positioning performance at unseen heights. Finally, the proposed method is compared with related positioning schemes. It outperforms existing methods in terms of CDF convergence speed, RMSE, and standard deviation (Fig. 11), achieving an average error reduction of approximately 2.5 cm and an RMSE reduction of 31.58% compared with Ref. [12].  Conclusions  This paper estimates the cosine of the incident angle at the receiver by exploiting differences in the optical power received by different PDs in an array and introduces this cosine value as a joint positioning feature into conventional RSS-based localization, thereby alleviating the instability of position mapping caused by relying solely on RSS under random receiver perturbations. By further combining the spatial feature extraction capability of CNNs with the nonlinear modeling strength of MLPs, the proposed method effectively maps positioning features to 3D spatial coordinates. The approach reduces reliance on orientation sensors such as IMUs while overcoming the susceptibility of traditional geometric positioning methods to noise and high-dimensional nonlinear features. Under varying heights and receiver orientations, the proposed algorithm demonstrates significant advantages in both positioning accuracy and stability.
Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients
ZHANG Junli, XU Weiran, WANG Zhao
Available online  , doi: 10.11999/JEIT251056
Abstract:
  Objective  With the rapid increase in cancer incidence and mortality worldwide, patient education has become a critical strategy for reducing the disease burden and improving patient outcomes. However, traditional education methods, such as paper-based materials or face-to-face consultations, are limited by time, space, and personalization constraints. The emergence of large language models (LLMs) has opened new opportunities for delivering intelligent, scalable, and personalized health education. Although domestic LLMs, such as Doubao, Kimi, and DeepSeek have been widely applied in general scenarios, their utility in oncology education remains underexplored. This study aimed to systematically evaluate the performance of three domestic LLMs in cancer patient education across multiple dimensions, providing empirical evidence for their potential clinical application and optimization.  Methods  Frequently asked patient education questions were collected through group discussions with oncology nurses from a tertiary hospital. Nineteen oncology nurses with ≥1 year of clinical experience participated in item selection, and the ten most common questions were chosen, covering domains such as diet, nutrition, treatment, adverse drug reactions, and prognosis. Each question was independently input into Doubao (Pro, ByteDance, May 2024), Kimi (V1.1, Moonshot AI, Nov 2023), and DeepSeek (R1, DeepSeek AI, Jan 2025) under “new chat” conditions to avoid contextual interference. Responses were standardized to remove model identifiers and randomly coded. Quality evaluation followed a blinded design. Thirteen inpatients with cancer assessed responses for readability and effectiveness, while six senior oncologists rated responses for accuracy, comprehensiveness, and professionalism. A self-designed five-point Likert scale was used for each dimension. Statistical analyses were conducted using GraphPad Prism 9.5.1. One-way ANOVA with Bonferroni correction was applied for dimensional comparisons, while Welch’s ANOVA and Games-Howell post hoc tests were used for overall score analysis. Results were visualized with tables and radar plots.  Results and Discussions  Overall, the three models achieved mean total scores of 4.05±0.687 (Doubao), 4.17±0.791 (Kimi), and 4.19±0.640 (DeepSeek). Welch’s ANOVA showed significant overall differences (F=5.537, P=0.004). Games-Howell analysis revealed that Doubao performed significantly worse than Kimi and DeepSeek (P=0.005 and 0.042, respectively), while Kimi and DeepSeek did not differ significantly (P=0.975) .From the patient perspective, Kimi outperformed its peers, achieving the highest scores in readability (4.615±0.534) and effectiveness (4.476±0.560), with statistically significant differences (P<0.05). Patients rated Kimi’s responses to lifestyle-related queries, such as managing nausea or loss of appetite during chemotherapy, as particularly clear and actionable. From the expert perspective, DeepSeek demonstrated superiority in accuracy (4.117±0.846), comprehensiveness (4.100±0.681), and professionalism (3.917±0.645), with significant advantages over Kimi (P<0.01) and moderate superiority over Doubao (P<0.05). DeepSeek was favored for handling technical and evidence-based questions, such as drug metabolism or integrative therapy evaluation. The divergence between patient and expert assessments highlighted a mismatch: the “most understandable” responses (Kimi) were not always the “most professional” (DeepSeek). This complementarity suggests that future research should explore layered output formats or dual verification mechanisms. Such approaches would balance readability with professional rigor, minimizing the risks of misinformation while improving accessibility. Despite promising findings, limitations exist. This single-center study involved a relatively small sample size, and only patients with lung and breast cancer were included. The evaluation simulated static Q&A interactions rather than dynamic multi-turn dialogues, which are more representative of real-world consultations. Additionally, technical enhancements such as retrieval-augmented generation (RAG), fine-tuning with oncology-specific corpora, and multi-agent collaboration were not implemented. Future studies should expand to multi-center designs, diverse cancer populations, and advanced LLM optimization methods.  Conclusions  Domestic LLMs demonstrated significant potential as tools for cancer patient education. Kimi excelled in communication and patient-centered knowledge translation, while DeepSeek showed strength in professional accuracy and comprehensiveness. Doubao, although moderate across all dimensions, lagged behind in overall performance. The results indicate that LLMs can complement traditional health education by bridging the gap between patient comprehension and clinical expertise.
Small Object Detection Algorithm for UAV Aerial Images in Complex Environments
LIU Jie, LIU Shuhao, TIAN Ming, CUI Zhigang
Available online  , doi: 10.11999/JEIT251126
Abstract:
  Objective  Small object detection plays a critical role in practical applications such as UAV (Unmanned Aerial Vehicle) inspection and intelligent transportation systems, where precise perception of diminutive targets is essential for operational reliability and safety. It enables the automated identification and tracking of challenging targets. However, the limited pixel size of small objects, coupled with their tendency to be obscured or integrated with complex backgrounds, results in strong background noise, leading to poor performance and elevated false-negative rates in existing detection models. To address this issue and achieve high-performance, high-precision detection of small objects in complex environments, this study proposes HAR-DETR, an enhancement over the RT-DETR baseline model, aimed at improving the detection accuracy for small objects.  Methods  HAR-DETR is proposed for small object detection in aerial images, incorporating three key improvements: Aggregated Attention, RFF-FPN (Recalibrated Feature Fusion Network-FPN), and a high-resolution detection branch. In the backbone network, Aggregated Attention enhances the model's ability to focus on relevant features of small objects. By expanding the receptive field, the model captures more detailed edge and texture information, thereby enabling more effective extraction of multi-scale features of the targets. During the feature fusion phase, RFF-FPN selectively integrates high-level and low-level features, allowing the network to retain critical spatial information and context. This facilitates the refinement of the edges and contours of small objects, improving the accuracy of localization and recognition, especially when object details may be obscured by background clutter or varying lighting conditions. The high-resolution detection head places greater emphasis on the edge features of small objects, providing enhanced small object perception capabilities, and further improving the model's robustness and precision.  Results and Discussions  A comparative analysis is conducted with several widely used object detection models, including YOLOv5, YOLOv8, YOLOv10 and so on, to evaluate the performance of the model in small object detection using precision, recall, and mAP metrics. Experimental results show that the HAR-DETR model outperforms other comparative models in terms of precision, recall, and mAP on the VisDrone2019 dataset (Table 1). The mAP50 and mAP50-95 are improved by 3.8% and 3.2%, respectively, compared to the baseline model (Table 2). This demonstrates that the HAR-DETR model offers superior performance in detecting small objects in aerial images under complex environments. Heatmaps generated using GradCAM are utilized for comparative analysis of the proposed improvements, showing better detection results for all improvements compared to the baseline model (Fig. 6). In the generalization performance experiment, the VisDrone2019 validation set and RSOD dataset are used under identical training conditions. The experimental results indicate that HAR-DETR exhibits strong generalization ability across heterogeneous tasks (Tables 3 and 4).  Conclusions  This paper addresses the issues of false positives and false negatives in small object detection within aerial images captured in complex environments by utilizing the HAR-DETR model. Aggregated Attention is introduced in the backbone feature extraction phase to expand the receptive field and enhance global feature extraction capabilities. In the feature fusion phase, the RFF-FPN structure is proposed to enrich the feature representations. Additionally, a high-resolution detection head is introduced to make the model more sensitive to the edge textures of small objects. The model is evaluated using the Visdrone2019 and RSOD datasets, and the results demonstrate the following: (1) The proposed method improves the small object detection metrics, mAP50 and mAP50-95, by 3.8% and 3.2%, respectively, compared to the baseline model, achieving 51.2% and 32.1%, and mitigating the issues of false negatives and false positives; (2) In comparison with other mainstream object detection models, HAR-DETR exhibits the best performance in small object detection, thereby fully validating the effectiveness of the model; (3) The HAR-DETR model achieves high accuracy in cross-dataset training, demonstrating its excellent generalization performance. These results indicate that HAR-DETR possesses stronger semantic expression and spatial awareness capabilities, making it adaptable to various aerial perspectives and target distribution patterns, thus providing a more versatile solution for UAV visual perception systems in complex environments.
SCUNet-Based Decoding Algorithm for Rayleigh Fading Channels Integrating Feature Extraction and Recovery Mechanisms
WANG Leijun, WANG Kuan, XIE Jinfa, PENG Xidong, LI Jiawen, CHEN Rongjun
Available online  , doi: 10.11999/JEIT251138
Abstract:
  Objective  This study addresses the limitations of conventional deep neural network (DNN) decoding algorithms in Rayleigh fading channels, such as constrained performance, insufficient generalization capability, and weak resistance to fading. To tackle these issues, a feature extraction and recovery decoding algorithm based on the SCUNet architecture, termed SCUNetDec, is proposed. In the 6G communication era, wireless channels are characterized by high dynamics and complexity, making it difficult for traditional decoding methods to meet the strict requirements for high reliability, low latency, and strong robustness. Therefore, exploring intelligent decoding mechanisms with adaptive feature learning capabilities holds significant theoretical and practical importance. By integrating a multi-dimensional feature extraction and recovery mechanism and incorporating a noise-level map to enhance the network’s perception of channel states, SCUNetDec effectively learns channel characteristics, mitigates fading effects, and significantly improves decoding performance. This research not only provides a new approach to the design of intelligent decoding in complex channel environments but also lays a key technical foundation for building efficient and intelligent 6G communication systems.  Methods  In the construction of the research methodology, the proposed SCUNetDec network deeply integrates three core mechanisms—data preprocessing, feature extraction and recovery, and noise-level mapping—to achieve efficient and robust signal representation learning and decoding performance enhancement in Rayleigh fading channel environments. First, in the data preprocessing stage, dimensionality expansion operations are employed to map the original one-dimensional received signal into a two-dimensional feature map, enhancing the discernibility of the signal structure and providing a spatial correlation foundation for subsequent deep feature extraction. Second, a feature extraction and recovery module is constructed: the extraction module combines multi-layer convolutional layers with attention mechanisms to effectively capture essential channel features in the signal, while the recovery module utilizes deconvolutional layers and residual connections to suppress irrelevant interference introduced during the dimensionality transformation process, thereby improving signal reconstruction quality and decoding accuracy. Furthermore, a noise-level map mechanism is incorporated into the network. By embedding SNR-aware information that aligns with the feature maps, the model can dynamically adapt to changes in channel conditions and adjust its decoding strategy and feature extraction intensity accordingly. The synergistic interaction of these three mechanisms significantly enhances the noise robustness, generalization capability, and decoding stability of SCUNetDec in Rayleigh fading channels, providing a systematic solution for intelligent decoding in complex 6G wireless environments.  Results and Discussions  The SCUNetDec decoding algorithm, built upon the SCUNet architecture, significantly enhances signal learning and decoding capabilities in Rayleigh fading channels by integrating a feature extraction-recovery module and a noise-level map. Its performance was evaluated through simulations under various coding schemes. For (7,4) Hamming code, SCUNetDec outperformed conventional DNN decoding and closely approached Maximum Likelihood (ML) performance. Specifically, at a BER of \begin{document}$ {10}^{-4} $\end{document}, the performance gap to ML decoding was about 1.5 dB, and at a FER of \begin{document}$ {10}^{-3} $\end{document}, the gap was approximately 2.0 dB (Fig. 4). This shows that SCUNetDec can capture complex relationships within signals, thereby effectively learning the latent associations between information and parity-check nodes. For (2,1,3) Convolutional code, SCUNetDec's performance at BER=\begin{document}$ {10}^{-3} $\end{document} was close to the Viterbi algorithm, with a marginal gap of only about 2.0 dB. In contrast, DNN decoding performance degraded significantly at high SNRs, demonstrating SCUNetDec's superior decoding capability and robustness (Fig. 5). For Polar codes with a rate of 0.5, SCUNetDec exhibited strong learning and generalization capabilities. It achieved a gain of approximately 4.0 dB over Successive Cancellation (SC) decoding at BER=\begin{document}$ {10}^{-4} $\end{document} and maintained an advantage of about 1.0 dB at FER=\begin{document}$ {10}^{-3} $\end{document}, whereas SC decoding only showed a slight advantage in the low SNR region (Fig. 6). The comparison results of decoding time indicate that the SCUNetDec decoder can reduce decoding time compared to traditional decoding algorithms (Table 2). The ablation experiments demonstrate that combining the designed feature extraction and recovery modules with SCUNet leads to better decoding performance (Fig. 7). In summary, comprehensive analysis confirms that SCUNetDec delivers outstanding and robust decoding performance across multiple coding schemes and varying signal-to-noise ratio conditions.  Conclusions  To address the limited decoding performance of DNNs in Rayleigh fading channels, this paper proposes a decoding method named SCUNetDec based on the SCUNet network. The method enhances SCUNet by designing signal feature extraction and signal recovery modules. Simulations and ablation studies on Hamming codes, convolutional codes, and Polar codes demonstrate that the proposed modules exhibit strong generalization capability and effectiveness, making them suitable for various coding schemes. Compared with traditional DNN models, SCUNetDec shows superior decoding performance in Rayleigh fading channels, approaching that of conventional optimal decoding algorithms while significantly reducing decoding time. These results indicate that the SCUNetDec decoding algorithm possesses certain performance advantages and practical application potential in complex channel environments. Future work will focus on algorithm fusion and engineering implementation. On one hand, we aim to deepen the co-design of neural networks and traditional algorithms to achieve an optimal trade-off between performance and complexity via dynamic parameter optimization, while further exploring intelligent decoding schemes for long codes. On the other hand, research will be conducted on joint modulation-decoding modeling and end-to-end architectures to enhance the model's adaptability and practical value under high-order modulation and complex channel environments.
Detection and Parameter Estimation of Quadratic Frequency Modulated Signal Based on Non-uniform Quadrilinear Autocorrelation Function
YANG Yuchao, FANG Gang
Available online  , doi: 10.11999/JEIT250723
Abstract:
  Objective  Polynomial Phase Signal (PPS) analysis has attracted broad attention because many radar, sonar, and seismic signals are modeled as PPS of different orders. A first-order PPS can be focused into a frequency bin through the Fourier transform to estimate the center frequency. For higher order PPS, such as a Quadratic Frequency Modulated (QFM) signal, non-coherent characteristics limit the effectiveness of the Fourier transform for energy integration. Existing time–frequency distribution methods, such as the short-time Fourier transform and the Wigner-Ville distribution, do not resolve the conflicts between auto-terms and cross-terms or between time- and frequency-domain resolution. In addition, current algorithms face difficulties in balancing computational complexity and detection performance, which results in reduced parameter estimation accuracy. This study proposes a QFM detection method based on a non-uniform quadrilinear autocorrelation function to provide balanced performance for QFM parameter estimation with controlled computational cost.  Methods  A time–frequency distribution method for QFM detection and parameter estimation is presented. The method applies non-uniform sampling and maps a one-dimensional signal into a two-dimensional time domain through a forth-order autocorrelation function. A non-uniform fast Fourier transform is used to resolve the time variable and concentrate the energy into a vertical line in the two-dimensional plane. Then, FFT is performed along this line to focuse the signal into a peak, from which the chirp rate and quadratic chirp rate are estimated. Finally, dechirp processing compensates high-order phase terms of the original signal, and FFT yields the center frequency estimation result can be obtaioned through FFT operation.  Results and Discussions  Theoretical analysis and simulation results show that the method balances computational complexity and detection performance. Under low signal-to-noise ratio conditions, it distinguishes targets effectively and produces accurate parameter estimates (Fig. 1). For multicomponent signals with large amplitude differences, it enables stepwise detection and estimation (Fig. 2). Comparative experiments with state-of-the-art algorithms show that the method is quasi-optimal in estimation accuracy and integration gain (Fig. 3Fig. 6). Compared with the ML estimator, it offers markedly higher computational efficiency.  Conclusions  A QFM detection and parameter estimation method based on non-uniform quadrilinear autocorrelation functions is proposed. The method maps the QFM signal into a two-dimensional time domain through a new autocorrelation kernel and achieves coherent integration through scaling and FFT. Mathematical analysis and simulation results show that, relative to the ML method, it sacrifices part of the detection performance but substantially reduces computational complexity. When computational efficiency is similar, it outperforms other classical methods in detection and parameter estimation accuracy. The method provides a balanced solution for QFM signal detection and parameter estimation.
Component Placement Algorithm Considering Reagent Type Differences in Cell Reuse for FPVA Biochips
XU Yanbo, ZHU Yuhan, HUANG Xing, LIU Genggeng
Available online  , doi: 10.11999/JEIT250731
Abstract:
  Objective  Fully Programmable Valve Array (FPVA) biochips, a recent type of flow-based microfluidic biochip, offer high flexibility and programmability, which enables them to meet different and complex experimental needs. Component placement is a critical stage in FPVA architectural synthesis because it affects several performance metrics, including assay completion time, total fluid-transport length, and cross-contamination. Cell reuse, an essential feature of FPVA programmability, requires special consideration during placement. However, existing studies have largely ignored the effect of reagent type differences in cell reuse on these metrics  Methods  This study presents a component placement algorithm for FPVA biochips that accounts for reagent type differences during cell reuse. The algorithm first introduces a cell reuse complexity metric that quantifies reuse complexity by considering the effects of reagent-type differences and component overlap on cross-contamination. It then integrates constraints, including placement-area limits and non-overlapping conditions for concurrent components, to ensure valid placement. The reward function is optimized to minimize reuse complexity and reduce the distance between components that use the same reagent type. The goal is to lower cross-contamination, total fluid-transport length, and assay completion time.  Results and Discussions  The algorithm is evaluated on benchmark FPVA instances with different chip sizes and functional requirements and compared with related methods. It reduces cell reuse complexity by 34.2%, assay completion time by 2.8%, and total fluid-transport length by 9.2% on average (Table 2). It also reduces the reagent-aware distance metric by 29.9% on average (Fig. 6). The learning agent’s decision trajectories show clear spatial structure, which reflects global placement awareness.  Conclusions  This study is the first to investigate FPVA component placement with attention to reagent type differences in cell reuse. The main contributions are as follows: (1) a cell reuse complexity metric is proposed to assess reuse intensity in placement, (2) the FPVA placement problem is modeled as a Markov decision process to enable the use of double deep Q-networks for safe and efficient placement policy learning, and (3) compared with existing work, the model improves FPVA biochemical assay performance and reliability.
Lightweight Dual Convolutional Finger Vein Recognition Network Based on Attention Mechanism
ZHAO Bingyan, LIANG Yihuai, ZHANG Zhongxia, ZHANG Wenzheng
Available online  , doi: 10.11999/JEIT250380
Abstract:
  Objective  Finger vein recognition is an emerging biometric authentication technology valued for its physiological uniqueness and advantages in in vivo detection. However, mainstream deep learning recognition frameworks still face two challenges. High-precision recognition often depends on complex network structures, which increase parameter counts and hinder deployment in memory-limited embedded devices and edge scenarios with constrained computing resources. Model compression can reduce computational cost but often weakens feature representation, creating a conflict between recognition accuracy and efficiency. To address these issues, a lightweight dual convolutional model integrated with an attention mechanism is proposed. A parallel heterogeneous convolution module and an attention guidance mechanism are designed to extract diverse image features and improve recognition accuracy while preserving a lightweight network structure.  Methods  The proposed architecture adopts a three-level collaborative mechanism comprising feature extraction, dynamic calibration, and decision fusion. A dual convolutional feature extraction module is constructed using normalized ROI images. This module combines heterogeneous convolution kernels. Rectangular convolution branches with different shapes capture venous topological structures and diameter orientations, whereas square convolution branches employ stacked square kernels to extract local texture details and background intensity distributions. These branches operate in parallel with reduced channel numbers and generate complementary responses through kernel shape diversity. This design reduces parameter scale while improving feature discrimination. A parallel dual attention mechanism is then applied to achieve two-dimensional calibration through joint optimization of channel attention and spatial attention. Channel attention adaptively assigns weights to enhance discriminative venous texture features, whereas spatial attention constructs pixel-level dependency models that focus on effective discriminative regions. A parallel concatenation fusion strategy preserves structural information without introducing additional parameters and improves sensitivity to critical features. Finally, a three-level progressive feature optimization structure is implemented. A convolutional compression module with stride 2 nests multi-scale receptive fields and progressively refines primary features during dimensionality reduction. Two fully connected layers then perform feature space transformation. The first layer applies ReLU activation to form sparse representations, and the final layer applies Softmax for probability calibration. This structure balances shallow underfitting and deep overfitting while maintaining efficient forward inference.  Results and Discussions  The effectiveness and robustness of the proposed network are evaluated on three public datasets, namely USM, HKPU, and SDUMLA. Recognition accuracy is assessed using the Acc metric. Experimental results (Table 1) show strong recognition performance. Feature visualization heatmaps (Fig. 4, Fig. 6) confirm that the network extracts complete and discriminative venous features. Training visualizations (Fig. 7, Fig. 8) show stable loss and accuracy trends, achieving 100% classification performance and demonstrating training reliability and robustness. Quantitative comparisons (Tables 2 and 3) indicate that the proposed method effectively addresses the trade-off between model complexity and classification performance and achieves superior results across all three datasets. Ablation studies (Table 4) further verify the effectiveness of the proposed modules and show significant improvements in finger vein recognition performance.  Conclusions  A lightweight dual convolutional neural network with an attention mechanism is proposed. The network consists of three core modules: a dual convolutional feature extraction module, a parallel dual-attention module, and a feature optimization classification module. During feature extraction, long-range venous features and background information are jointly encoded through a low-channel parallel design, which substantially reduces parameter counts while improving inter-individual discrimination. The attention module efficiently captures critical venous features without the parameter expansion commonly observed in conventional attention mechanisms. The feature optimization classification module applies progressive feature recalibration, which reduces underfitting and overfitting during stacked dimensionality reduction. Experimental results show recognition accuracies of 99.70%, 98.33%, and 98.27% on the USM, HKPU, and SDUMLA datasets, corresponding to an average improvement of 2.05% over existing state-of-the-art methods. Compared with representative lightweight finger vein recognition approaches, the proposed method reduces parameter scale by 11.35%~60.19%, achieving a balance between model lightening and performance improvement.
Research on a Miniaturized Wide Stopband Folded Substrate Integrated Waveguide Filter
KE Rongjie, WANG Hongbin, CHENG Yujian
Available online  , doi: 10.11999/JEIT250869
Abstract:
To meet the requirements of 5G/6G communication systems for miniaturization, high integration, and a wide stopband, this paper proposes a fourth-order bandpass filter based on an eighth-mode Folded Substrate Integrated Waveguide (FSIW) using High-Temperature Co-Fired Ceramic (HTCC) technology. The design combines the miniaturization characteristics of FSIW with the three-dimensional integration capability of HTCC. Size reduction is achieved through an eighth-mode FSIW cavity structure with dimensions of 0.29λg × 0.29λg, where λg denotes the waveguide wavelength at the center operating frequency (f0). Metal vias suppress high-order mode coupling, a bent 10 microstrip line introduces transmission zeros, and an L-shaped stub improves the high-frequency response. Three controllable transmission zeros are generated in the upper stopband, achieving 20 dB@3.73f0. Measurements show a center frequency of 6.4 GHz. Although slight frequency deviation and insertion loss are observed, the design provides clear advantages in miniaturization, stopband width, and the number of transmission zeros compared with reported work, indicating potential for high-density integrated communication systems.  Objective  The rapid development of 5G/6G communication systems increases the demand for Radio Frequency (RF) microwave devices that provide miniaturization, high integration, and wide stopband performance. As core components of RF transceiver front-ends, bandpass filters transmit useful signals and suppress interference. Conventional Substrate Integrated Waveguide (SIW) filters often show large size, limited stopband extension, and insufficient control of transmission zeros, which restrict their use in high-density integrated systems. To address these challenges, this paper presents a miniaturized wide stopband fourth-order bandpass filter based on an eighth-mode FSIW structure and HTCC technology to achieve compact size and broad stopband performance.  Methods  The filter integrates the miniaturization capability of FSIW with the three-dimensional integration characteristics of HTCC. First, an eighth-mode FSIW cavity is developed by modifying a quarter-mode FSIW cavity. A square patch is replaced with a triangular patch (eighth-mode cavity I), followed by slot etching in the triangular patch (eighth-mode cavity II). Second, a fourth-order bandpass filter is constructed by symmetrically designing two triangular metal patches for each cavity type and stacking them vertically. A common metal layer (fifth layer) containing coupling windows enables coupling between the upper and lower cavities. Three techniques are used to optimize performance: metal vias to suppress high-order mode coupling, bent microstrip lines to generate transmission zeros, and an L-shaped stub to enhance high-frequency response. Parameter scanning of key dimensions (d2, s4, s6) verifies the controllability of transmission zeros. The filter is fabricated using HTCC on an Al2O3 substrate with relative permittivity 9.8 and loss tangent 0.000 2.  Results and Discussions  Measurements show a center frequency of 6.4 GHz. Although fabrication and assembly deviations cause slight frequency shift and additional insertion loss, the filter demonstrates strong performance compared with reported designs (Table 2). The size of 0.29λg×0.29λg is smaller than that of most SIW filters. The upper stopband extends to 20dB@3.73f0, outperforming filters of comparable size. Three controllable transmission zeros appear in the upper stopband, and parameter scanning confirms their tunability (Fig. 13).  Conclusions  A miniaturized wide stopband fourth-order bandpass filter based on an eighth-mode FSIW structure is presented. The eighth-mode cavity combined with HTCC technology achieves a compact footprint of 0.29λg×0.29λg, meeting the integration requirements of 5G/6G systems. The use of metal vias, bent microstrip lines, and L-shaped stubs generates a wide stopband of 20 dB@3.73f0 and three tunable transmission zeros, strengthening interference suppression. Adjustable parameters enable flexible tuning of transmission zero frequencies without affecting the passband, improving the adaptability of the design to different interference conditions. These advances address key challenges in miniaturization, stopband extension, and design flexibility of SIW filters, offering a practical solution for RF front-ends in next-generation high-density integrated communication systems.
Radio Map Enabled Path Planning for Multiple Cellular-Connected Unmanned Aerial Vehicles
ZHOU Decheng, WANG Wei, SHAO Xiang, CHEN Mei, XIAO Jianghao
Available online  , doi: 10.11999/JEIT250821
Abstract:
  Objective  In collaborative operation scenarios of cellular-connected Unmanned Aerial Vehicles (UAVs), conflict avoidance strategies often cause unbalanced service quality. Traditional schemes focus on reducing total task completion time but do not ensure service fairness. To address this issue, a radio map-assisted cooperative path planning scheme is proposed. The objective is to minimize the maximum weighted sum of task completion time and communication disconnection time across all UAVs to improve service fairness in multi-UAV scenarios.  Methods  A Signal-to-Interference-plus-Noise Ratio (SINR) map is constructed to assess communication quality. The two-dimensional airspace is discretized into grids, and link gain maps are generated through ray tracing and Axis-Aligned Bounding Box detection to determine Line-of-Sight (LoS) or Non-Line-of-Sight (NLoS) conditions. The SINR map is produced by selecting, for each grid, the base station with the highest expected SINR. To solve the optimization problem, an Improved Conflict-Based Search (ICBS) algorithm with a hierarchical structure is developed. At the high-level stage, proximity conflicts are managed to maintain safety distances, and the cost function is reformulated to emphasize fairness by minimizing the maximum weighted time. The low-level stage applies a bidirectional A* algorithm for single-UAV path planning, using parallel search to improve efficiency while meeting the constraints set by the high-level stage.  Results and Discussions  The proposed scheme is evaluated through simulations across different scenarios. Building heights and positions are shown, where base station locations are marked by red stars and building heights are represented with color gradients from light to dark to indicate increasing height (Fig. 2). The wireless propagation characteristics between UAVs and ground base stations are demonstrated by the SINR map at an altitude of 60 m (Fig. 3), which shows significant SINR degradation in areas affected by building blockage and co-channel interference, resulting in communication blind zones. Trajectory planning results for four UAVs at an altitude of 60 m with a SINR threshold of 2 dB show that all UAVs avoid signal blind zones and complete tasks without collision risks under the proposed scheme (Fig. 4). The trade-off between task completion time and disconnection time is controlled by the weight coefficient (Fig. 5). The maximum weighted time increases monotonically as the weight coefficient increases, whereas the maximum disconnection time decreases. The bidirectional A* algorithm achieves higher computational efficiency than Dijkstra’s and traditional A* algorithms while maintaining optimal solution quality (Table 1). All three algorithms yield identical weighted times, confirming the optimality of the bidirectional A* approach, and its runtime is reduced significantly due to parallel search. Compared with three benchmark schemes, the proposed scheme achieves the lowest maximum weighted time for different SINR thresholds (Fig. 6). Performance analysis at different UAV altitudes shows that the proposed scheme maintains stable maximum weighted time below 75 m, while sharp increases appear above 75 m due to intensified interference from non-serving base stations (Fig. 7). The scalability analysis further shows clear improvements over benchmark schemes, especially when conflicts occur more frequently (Fig. 8).  Conclusions  To address fairness in cellular-connected multi-UAV systems, a radio map-assisted path planning scheme is proposed to minimize the maximum weighted time. Based on a discretized SINR map, an ICBS algorithm is developed. At the high-level stage, proximity conflicts and a reformulated cost function ensure safety and fairness, and at the low-level stage, a bidirectional A* algorithm increases search efficiency. Simulation results show that the proposed scheme lowers the maximum weighted time compared with benchmark schemes and improves fairness and overall multi-UAV collaboration performance.
Dynamic Wavelet Multi-Directional Perception and Geometry Axis-Solution Guided 3D CT Fracture Image Segmentation
ZHANG Yinhui, LIU Kai, HE Zifen, ZHANG Jinkai, CHEN Guangchen, MA Zhijian
Available online  , doi: 10.11999/JEIT250732
Abstract:
  Objective  Accurate segmentation of fracture surfaces in three-dimensional computed tomography (3D CT) images is essential for orthopedic surgical planning, particularly for determining nail insertion angles perpendicular to fracture planes. However, existing approaches present three major limitations: limited capture of deep global volumetric context, directional texture ambiguity in low-contrast fracture regions, and insufficient decoding of geometric features. To address these limitations, a Dynamic Wavelet Multi-Directional Perception and Geometry Axis-Solution Guided Network (DWAG-Net) is proposed to improve segmentation accuracy for complex tibial fractures and to provide reliable 3D digital guidance for preoperative planning.  Methods  The proposed architecture extends 3D nnU-Netv2 through three core components. First, a Dynamic Multi-View Aggregation (DMVA) module adaptively fuses tri-planar views (axial, sagittal, and coronal) with full-volume features using learnable parameter interpolation with an optimized kernel size of 2×2×2 and a channel-wise Hadamard product, thereby strengthening global context representation. Second, a Wavelet Direction Perception Enhancement (WDPE) module applies a 3D Symlets discrete wavelet transform to decompose inputs into eight subbands, followed by direction-specific enhancement. Adaptive convolutional kernels (e.g., [5, 3, 3] for depth-dominant fractures), reinforce texture information in high-frequency subbands, whereas cross-subband fusion integrates complementary features. Third, a Geometry Axis-Solution Guided (GASG) module is embedded in the decoder to maintain anatomical consistency by constructing axis-level affinity maps along depth, height, and width that combine geometric similarity with spatial distance decay, and by refining boundary delineation using rotational positional encoding and multi-axis attention. The network is trained on the YN-TFS dataset, which contains 110 tibial fracture CT scans with spatial resolutions ranging from 0.39 to 1.00 mm. Stochastic gradient descent is used with a learning rate of 0.01 and a momentum of 0.99. A class-weighted loss function with weights of 0.5 for background, 1 for bone, and 5 for fracture is adopted to address severe pixel imbalance.  Results and Discussions  DWAG-Net achieves state-of-the-art performance, with a mean Dice score of 71.20% (Table 1), exceeding that of nnU-Netv2 by 5.06%. For fracture surfaces, the Dice score reaches 69.48%, corresponding to an improvement of 7.12%. Boundary accuracy improves significantly, with a mean 95th percentile Hausdorff distance (HD95) of 1.38 mm and a fracture surface HD95 of 1.54 mm, representing a reduction of 3.70 mm. Ablation studies (Table 2) confirm the contribution of each component. DMVA increases the Dice score by 2.40% through adaptive multi-view fusion. WDPE reduces directional ambiguity and yields a 5.84% gain in fracture surface Dice. GASG provides an additional 1.20% improvement by enforcing geometric consistency. Optimal performance is obtained with a DMVA kernel size of 2×2×2, the use of Symlets wavelets, and sequential axis processing in the order of depth, height, and width. Qualitative comparisons indicate that DWAG-Net preserves fracture continuity in cases where U-Mamba and nnWNet fail, and reduces over-segmentation relative to nnFormer and UNETR++. (Fig. 4).  Conclusions  DWAG-Net establishes a state-of-the-art framework for 3D fracture segmentation by integrating multi-directional wavelet perception with geometry-guided decoding. The coordinated use of DMVA, directional texture enhancement, and geometry axis-solution guidance achieves clinically relevant precision, with a Dice score of 71.20% and an HD95 of 1.38 mm. These results support accurate data-driven surgical planning. Future work will focus on refining loss design to further mitigate severe class imbalance.
Inverse Design of a Silicon-Based Compact Polarization Splitter-Rotator
HUI Zhanqiang, ZHANG Xinglong, HAN Dongdong, LI Tiantian, GONG Jiamin
Available online  , doi: 10.11999/JEIT250858
Abstract:
  Objective  The Polarization Splitter-Rotator (PSR) is a key device used to control the polarization state of light in Photonic Integrated Circuits (PICs). Device size has become a major constraint on integration density in PICs. Traditional design methods are time-consuming and tend to yield larger device footprints. Inverse design, by contrast, determines structural parameters through optimization algorithms according to target performance and enables compact devices to be obtained while maintaining functionality. This strategy is now applied to wavelength and mode division multiplexers, all-optical logic gates, power splitters, and other integrated photonic components. The objective of this work is to use inverse design to address size limitations in silicon-based PSRs by combining the Momentum Optimization algorithm with the Adjoint Method. This combined approach improves the integration level of PICs and provides a feasible pathway for the miniaturization of other photonic devices.  Methods  The design region is defined on a 220 nm Silicon-on-Insulator (SOI) wafer and is discretized into 25×50 cylindrical elements. Each element has a 50 nm radius, a 150 nm height, and an initial relative permittivity of 6.55. The adjoint method is used to obtain gradient information across the design region, and this gradient is processed with the Momentum Optimization algorithm. The relative permittivity of each element is then updated according to the processed gradient. During optimization, the momentum factor is dynamically adjusted with the iteration number to accelerate convergence, and a linear bias is applied to guide the permittivity toward the values of silicon and air as the iterations progress. After optimization, the elements are binarized based on their final permittivity: values below 6.55 are assigned to air, whereas values above 6.55 are assigned to silicon. This results in a structure containing irregularly distributed air holes. To compensate for performance loss introduced during binarization, the etching depth of air holes with pre-binarization permittivity between 3 and 6.55 is optimized. Adjacent air holes are merged to reduce fabrication errors. The final device consists of air holes with five radii, among which three larger-radius types are selected for further refinement. Their etching radii and depths are optimized to recover remaining performance loss. Device performance is evaluated through numerical analysis. Calculated parameters include Insertion Loss (IL), Crosstalk (CT), Polarization Extinction Ratio (PER), and bandwidth. Tolerance analysis is also conducted to assess robustness under fabrication variations.  Results and Discussions   A compact PSR is designed on a 220 nm SOI wafer with dimensions of 5 μm in length and 2.5 μm in width. During optimization, the momentum factor in the Momentum Optimization algorithm is dynamically adjusted. A larger momentum factor is applied in the early stage to accelerate escape from local maxima or plateau regions, whereas a smaller momentum factor is used in later iterations to increase the weight of the current gradient. Compared with other optimization strategies, this algorithm requires only 20%~33% of the iteration count needed by alternative methods to reach a Figure of Merit (FOM) of 1.7, which improves optimization efficiency. Numerical analysis shows that the device achieves stable performance across the 1 520~1 575 nm wavelength range. The IL remains low (TM0 < 1 dB, TE0 < 0.68 dB), and the CT is effectively suppressed (TM0 < –23 dB, TE0 < –25.2 dB). The PER is high (TM0 > 17 dB, TE0 > 28.5 dB). Tolerance analysis indicates strong robustness to fabrication variations. Within the 1 520~1 540 nm range, performance remains stable under etching depth offsets of ±9 nm and etching radius offsets of ±5 nm, demonstrating reliable manufacturability.  Conclusions   Numerical analysis demonstrates that combining the adjoint method with the Momentum Optimization algorithm is a feasible strategy for designing an integrated PSR. The design principle relies on controlling light propagation through adjustments to the relative permittivity, which determine the distribution and placement of air holes to achieve polarization splitting and rotation. Compared with traditional design approaches, inverse design uses the design region more efficiently and enables a more compact device structure. The proposed PSR is markedly smaller and shows enhanced fabrication tolerance. It is suitable for future large-scale PICs and provides useful guidance for the miniaturization of other photonic devices.
Research on UAV Swarm Radiation Source Localization Method Based on Dynamic Formation Optimization
WU Sujie, WU Binbin, YANG Ning, WANG Heng, GUO Daoxing, GU Chuan
Available online  , doi: 10.11999/JEIT251023
Abstract:
In dense and structurally complex urban environments, Unmanned Aerial Vehicle (UAV) swarm radiation source localization is affected by signal attenuation, multipath propagation, and building obstructions. To address these limitations, a dynamic formation-optimization method for UAV swarms is proposed. By improving the geometric configuration of the swarm, the method reduces path loss and interference, which strengthens localization accuracy. Received signal strength is used to evaluate signal quality in real time and supports adaptive formation adjustments that improve propagation conditions. Geometric dilution of precision and root mean square error metrics are integrated to refine swarm geometry and improve distance-estimation reliability. Simulation results show that the proposed method converges faster and improves localization accuracy in complex urban environments, reducing errors by more than 80 percent. The method adapts to environmental variation and demonstrates strong robustness and practical value.  Objective  UAV swarm localization and formation control in urban environments are affected by obstacles, signal attenuation, and rapid variation in the surroundings that reduce the reliability of conventional methods. This study proposes a radiation source localization approach that integrates the Received Signal Strength Indicator (RSSI) with dynamic formation adjustment to improve localization accuracy and strengthen system robustness in complex urban scenarios. RSSI is used once in full form, then referenced consistently.  Methods  The method uses RSSI measurements to estimate the distance to the radiation source and adjusts UAV swarm formation in real time to reduce localization errors. These adjustments are based on feedback that reflects relative positions, signal strength, and environmental variation. Localization accuracy is strengthened through a multi-sensor fusion strategy that integrates GPS, IMU, and depth-camera data. A data-quality assessment mechanism evaluates signal reliability and triggers formation adaptation when the signal drops below a predefined threshold. This optimization process reduces positioning errors and improves system robustness.  Results and Discussions  Simulation experiments in a ROS-based environment were conducted to evaluate the UAV swarm localization method under urban obstacles and multipath conditions. The swarm began in a hexagonal formation and adjusted its geometry according to environmental variation and localization confidence (Fig. 34). As shown in Fig. 5, localization errors fluctuated during initialization but converged to below 1 m after 150 s. Formation comparisons (Fig. 6) showed that symmetric structures such as hexagonal and triangular formations maintained errors below 0.5 m, whereas asymmetric formations (T and Y shape) produced deviations up to 4.9 m. Further comparisons (Fig. 7) showed that traditional RSSI saturated near 15 m, direction of arrival fluctuated between 5 and 14 m, and time difference of arrival failed due to synchronization problems. The proposed method achieved sub-meter accuracy within 60 s and remained robust throughout the mission. These findings indicate that combining RSSI-based distance estimation with dynamic formation adjustment improves localization accuracy, convergence speed, and adaptability under complex environmental conditions.  Conclusions  This study addresses UAV swarm localization in complex urban environments by integrating RSSI-based distance estimation, dynamic formation adjustment, and multi-sensor fusion. ROS-based simulations show that: (1) localization errors converge rapidly to sub-meter levels, reaching below 1 m within 150 s under non-line-of-sight conditions; (2) symmetric formations such as hexagonal and triangular configurations outperform asymmetric ones and reduce errors by up to 67 percent compared with fixed Y-shaped formations; and (3) relative to traditional RSSI, direction of arrival, and time difference of arrival approaches, the proposed method shows faster convergence, higher stability, and stronger robustness.
Adversarial Attacks on 3D Target Recognition Driven by Gradient Adaptive Adjustment
LIU Weiquan, SHEN Xiaoying, LIU Dunqiang, SUN Yanwen, CAI Guorong, ZANG Yu, SHEN Siqi, WANG Cheng
Available online  , doi: 10.11999/JEIT251264
Abstract:
  Objective   Robust environmental perception is essential for intelligent driving systems. Light Detection and Ranging (LiDAR) provides high-resolution 3D point cloud data and serves as a core information source for object detection and recognition. However, deep learning models for 3D point cloud recognition show notable vulnerability to adversarial attacks. Small, imperceptible perturbations can cause severe classification errors and threaten system safety. Existing attack methods have improved the Attack Success Rate (ASR), but the perturbations they generate often lack concealment, create outliers, and show poor imperceptibility because they do not adequately preserve the geometric structure of point clouds. This reduces their suitability for realistic security evaluation of optoelectronic perception systems. Developing an attack method that maintains a high success rate while preserving geometric consistency and imperceptibility is therefore critical. This study addresses this need by proposing a framework that incorporates point cloud geometry into perturbation generation.  Methods   A Gradient Adaptive Adjustment (GAA) adversarial attack method for 3D point cloud recognition is proposed. The framework (Fig. 2) includes three coordinated modules. The 3D Point Cloud Salient Region Extraction module evaluates decision-level vulnerability using Shapley value analysis to identify and rank point subsets with the strongest influence on classifier output. Perturbations are then concentrated in these sensitive regions. A Curvature-Weighted Gradient Mechanism integrates local geometric priors. For each point in the salient region, a local covariance matrix is computed from its k-nearest neighbors. Principal component analysis generates eigenvalues and eigenvectors, which are used to compute a curvature measure. A Gaussian kernel function produces curvature-dependent weights that are applied to backpropagated gradients. This suppresses perturbations in high-curvature areas and encourages them in low-curvature regions to preserve local shape morphology. A Principal Curvature Direction Constrained Optimization module further refines the perturbation direction. The weighted gradient is projected onto the principal curvature directions, and the projection components are fused using coefficients derived from the corresponding eigenvalues. This aligns the perturbation with natural geometric trends and avoids unnatural deformation. An Adaptive Optimization Algorithm then minimizes a multi-objective loss balancing attack success, geometric similarity (via Chamfer Distance and Hausdorff Distance), and perturbation sparsity. The adversarial point cloud is iteratively updated based on the saliency map, curvature-weighted gradients, and principal direction constraints.  Results and Discussions   Experiments on ModelNet40, ShapeNetPart, and KITTI were conducted using PointNet, DGCNN, and PointConv. The GAA method showed strong performance. On ModelNet40 with PointNet, it achieved a 97.69% ASR with an average of 28 perturbed points, outperforming ten baselines such as AL-Adv (92.92% ASR, 40 points) and Kim et al. (89.38% ASR, 36 points) (Table 1). It also produced lower geometric distortion, as indicated by smaller Chamfer Distance and Hausdorff Distance values. Visual results (Fig. 4) show that GAA produces fewer outliers and more natural adversarial point clouds compared with methods such as AL-Adv. The method generalized well across architectures, reaching 99.78% ASR on DGCNN and 96.91% on PointConv (Table 2), with similar performance on ShapeNetPart (Table 3). Ablation experiments on the number of salient regions (K) showed consistent improvements in ASR and reduced geometric distortion as K increased from 1 to 6 (Table 4, Fig. 5), confirming the advantage of targeting multiple critical regions. Tests on the KITTI dataset demonstrated strong performance in real-world, noisy environments. The method maintained high ASRs, such as 99.33% on PointNet, with limited perturbations (Table 5). An ablation study on K indicated that K=4 offers an effective balance between success rate and perturbation cost for PointNet (Table 6).  Conclusions   This study presents a GAA method for adversarial attacks on 3D point cloud recognition. By combining a Shapley value-based saliency analyzer, a curvature-weighted gradient mechanism, and a principal curvature direction constraint, the method generates adversarial examples that achieve high attack success while preserving geometric consistency. Experiments show that GAA minimizes perceptual distortion and perturbs fewer points across datasets and models. The method provides a practical tool for vulnerability analysis and supports the development of more robust and secure optoelectronic perception systems for intelligent driving. Future work will examine robustness under adverse conditions and assess physical-world implications.
Conditional Generative Adversarial Networks-based Channel Estimation for ISAC-RIS System
LIU Yu, ZHENG Zelin, LIU Gang
Available online  , doi: 10.11999/JEIT251168
Abstract:
  Objective  In RIS-assisted ISAC systems, accurate channel estimation is crucial to ensure reliable operation. Although traditional deep learning methods can partially address the channel estimation problem, their generalization ability and estimation accuracy remain limited in complex multi-user channel environments. To tackle these challenges, this paper proposes a two-stage channel estimation method based on Conditional Generative Adversarial Network(CGAN) for RIS-assisted multi-user ISAC systems, aiming to enhance both the accuracy and stability of channel estimation.  Methods  This paper proposes a two-stage channel estimation method based on CGAN for estimating the SAC channels in RIS-assisted multi-user ISAC systems. By adjusting the switching states of the RIS, the overall estimation problem is decomposed into subproblems, enabling sequential estimation of the direct and reflected channels. Within the proposed CGAN framework, the adversarial training between the generator and discriminator allows the model not only to learn the mapping relationship between the observed signals and the true channels but also to optimize the output according to the discriminator’s feedback, thereby effectively improving both training efficiency and estimation accuracy.  Results and Discussions  Extensive simulation experiments were conducted to verify the effectiveness of the proposed method. First, the estimation performance of the SAC channel under different SNR conditions was compared. The results demonstrate that the proposed CGAN-based method achieves significantly better NMSE performance than the LS benchmark and traditional models such as FNN and ELM (Fig. 4). Then, the impact of increasing the number of antennas and RIS elements on SAC channel estimation performance was investigated. Compared with the LS benchmark, the proposed CGAN method consistently maintains superior performance under various SNR conditions (Figs. 5 and 6).  Conclusions  This paper investigates the channel estimation problem in RIS-assisted multi-user ISAC systems and proposes a two-stage channel estimation method based on CGAN. By adjusting the switching states of the RIS and employing adversarial training between the generator and discriminator networks, the proposed method achieves accurate estimation of the SAC channel. Simulation results demonstrate that, under various SNR conditions and channel dimensions, the CGAN-based estimation method exhibits strong generalization capability and significantly outperforms the benchmark schemes in estimation accuracy. Therefore, it shows great potential as an effective solution for enhancing system stability and efficiency.
Physical Layer Key Generation Method for Integrated Sensing and Communication Systems
LIU Kexin, HUANG Kaizhi, PEI Xinglong, JIN Liang, CHEN Yajun
Available online  , doi: 10.11999/JEIT251034
Abstract:
  Objective  Integrated Sensing And Communication (ISAC) has become a central technology in Sixth-Generation (6G) wireless networks, enabling simultaneous data transmission and environmental sensing. However, the characteristics of ISAC systems, including highly directional sensing signals and the risk of sensitive information leakage to malicious sensing targets, create specific security challenges. Physical layer security provides lightweight methods to enhance confidentiality. In secure transmission, approaches such as artificial noise injection and beamforming can partially improve secrecy, although they may reduce sensing accuracy or communication efficiency. Their effect also depends on the quality advantage of legitimate channels over eavesdropping channels. For Physical Layer Key Generation (PLKG), existing work has only demonstrated basic feasibility. Most current schemes adopt a radar-centric design, which limits compatibility with communication protocols and restricts key generation rates. This paper proposes a PLKG method tailored for ISAC systems. It aims to maximize the Sum Key Generation Rate (SKGR) under sensing accuracy constraints through a Twin Delayed Deep Deterministic policy gradient (TD3)-based joint communication and sensing beamforming algorithm, thereby improving the security performance of ISAC systems.  Methods  A MIMO ISAC system is considered, where a base station (Alice) equipped with multiple antennas communicates with single-antenna users (Bobs) and senses a malicious target (Eve). The system operates under a TDD protocol to leverage channel reciprocity. A PLKG protocol designed for ISAC systems is developed, including channel estimation, joint communication and sensing beamforming, and key generation. The SKGR is derived in closed form, and sensing accuracy is evaluated using the Cramér-Rao Bound (CRB). To maximize the SKGR under CRB constraints, a non-convex optimization problem for the joint design of communication and sensing beamforming matrices is formulated. Given its NP-hardness, an algorithm based on TD3 is proposed. TD3 employs dual critic networks to reduce overestimation, delayed policy updates to enhance stability, and target policy smoothing to improve robustness. The state includes channel state information, the actions correspond to beamforming matrices, and the reward function combines SKGR, CRB, and power constraints.  Results and Discussions  Simulation results confirm the effectiveness of the proposed design. The TD3-based algorithm achieves a stable SKGR of 18.5 bits/channel use after training (Fig. 4), outperforming benchmark schemes such as Deep Deterministic Policy Gradient (DDPG), greedy search, and random algorithms. The SKGR increases monotonically with transmit power because of reduced noise interference (Fig. 5). Increasing the number of antennas also improves SKGR, although the gain diminishes as power per antenna decreases. The scheme maintains stable SKGR across different distances to the eavesdropper (Fig. 6), demonstrating the robustness of PLKG against eavesdropping attacks. The proposed algorithm manages the complex optimization problem effectively and adapts to dynamic system conditions, offering a practical approach for secure ISAC systems.  Conclusions  This paper presents a PLKG method for ISAC systems. The proposed protocol generates consistent keys between the base station and communication users. The SKGR maximization problem with sensing constraints is solved using a TD3-based algorithm that jointly optimizes communication and sensing beamforming matrices. Simulation results show that the method outperforms benchmark schemes, with significant gains in SKGR and adaptability to system conditions. The study establishes a basis for integrating PLKG into ISAC to strengthen security without reducing sensing performance. Future work will examine real-time implementation and scalability in large networks.
Communication, Computation, and Caching Resource Collaboration for Heterogeneous AIGC Service Provisioning
WU Mengru, GAO Yu, ZHAO Bo, XU Bo, SUN Hao, GUO Lei
Available online  , doi: 10.11999/JEIT251300
Abstract:
  Objective  In the artificial intelligence of things (AIoT), edge servers (ESs) can provide intelligent content generation services to AIoT devices by utilizing their cached AI-generated content (AIGC) models. However, the limited computing resources and caching capacity of ESs make it difficult to support the large-scale caching demands of heterogeneous AIGC services. To address this issue, this paper proposes a communication, computation, and caching resource collaboration scheme that leverages a combined cloud-edge and edge-edge collaborative framework. This scheme focuses on three representative AIGC services, including lightweight AIGC services, computation-intensive AIGC services, and preprocessing-based AIGC services. Furthermore, the proposed approach aims to minimize the total AIGC service latency by jointly optimizing transmit power, computing resource allocation, model caching strategies, and offloading decisions.  Methods  This paper investigates communication, computation, and caching resource collaboration for supporting heterogeneous AIGC services. First, an AIGC service-oriented AIoT system model is proposed to incorporate both cloud-edge and edge-edge collaboration. Subsequently, an optimization problem is formulated with the objective of minimizing the total latency of AIGC services by jointly optimizing transmit power, computing resource allocation, model caching strategies, and offloading decisions. Since the formulated problem is non-convex, an alternating optimization (AO) algorithm is proposed, which decomposes the problem into three subproblems that are solved using the successive convex approximation (SCA) method, Karush-Kuhn-Tucker (KKT) conditions, and an improved Harris Hawks Optimization (HHO) algorithm, respectively.  Results and Discussions  In the simulations, the proposed joint optimization scheme is compared to three baselines, including particle swarm optimization (PSO), fixed resource allocation, and random offloading and caching. First, the convergence of the proposed AO algorithm is verified (Fig. 2). The results demonstrate that the algorithm achieves rapid convergence within a limited number of iterations across different sub-problems. Second, increasing the transmission bandwidth leads to a significant reduction in the total AIGC service latency (Fig. 3). This is because each device can occupy more bandwidth resources to send tasks. Similarly, the ES can allocate more bandwidth to send generated content in the downlink. Furthermore, the total AIGC service latency decreases with the ES’s storage capacity for all the schemes (Fig. 4). This is because an increase in storage capacity allows the ES to store more AIGC models, thus reducing the transmission delay between the ES and the cloud server. Additionally, as the required floating point operations per bit increase, the total AIGC service latency exhibits a significant upward trend across all schemes (Fig. 5). Finally, the total AIGC service latency for all schemes decreases as the BS’s maximum transmit power increases (Fig. 6). This trend is attributed to the fact that the improvement of the BS’s maximum transmit power strengthens the downlink signal-to-noise ratio, which improves the downlink transmission rate, thereby leading to a reduction in the total AIGC service latency. However, the proposed scheme mitigates this increase more effectively than the baselines, demonstrating its robustness in handling computationally demanding AIGC tasks. In conclusion, these simulation results confirm that, compared to baselines, the proposed schemes significantly minimize the total AIGC service latency.  Conclusions  This paper investigates communication, computation, and caching resource collaboration for supporting heterogeneous AIGC services. Our objective is to minimize the total latency of AIGC services by jointly optimizing the transmit power of AIoT devices and base stations, computing resource allocation, AIGC model deployment, and service offloading decisions, subject to computation and caching resource constraints. Since the formulated problem is a mixed-integer non-linear programming problem, an efficient AO algorithm is designed. This algorithm decomposes the original optimization problem into three sub-problems, which are solved via the SCA algorithm, KKT conditions, and the HHO algorithm, respectively. Simulation results demonstrate that the proposed algorithm can reduce the total AIGC service latency compared to baselines.
CaRS-Align: Channel Relation Spectra Alignment for Cross-Modal Vehicle Re-identification
SA Baihui, ZHUANG Jingyi, ZHENG Jinjie, ZHU Jianqing
Available online  , doi: 10.11999/JEIT250917
Abstract:
  Objective  Visible and infrared images are two commonly used modalities in intelligent transportation scenarios and play a key role in vehicle re-identification. However, differences in imaging mechanisms and spectral responses lead to inconsistent visual characteristics between these modalities, which limits cross-modal vehicle re-identification. To address this problem, this paper proposes a Channel Relation Spectra Alignment (CaRS-Align) method that uses channel relation spectra, rather than channel-wise features, as the alignment target. This strategy reduces interference caused by imaging style differences at the relational-structure level. Within each modality, a channel relation spectrum is constructed to capture stable and semantically coordinated channel-to-channel relationships through correlation modeling. At the cross-modal level, the correlation between the corresponding channel relation spectra of the two modalities is maximized to achieve consistent alignment of relational structures. Experiments on the public MSVR310 and RGBN300 datasets show that CaRS-Align outperforms existing state-of-the-art methods. For example, on MSVR310, under infrared-to-visible retrieval, CaRS-Align achieves a Rank-1 accuracy of 64.35%, which is 2.58% higher than advanced existing methods.  Methods  CaRS-Align adopts a hierarchical optimization paradigm: (1) for each modality, a channel–channel relation spectrum is constructed by mining inter-channel dependencies, yielding a semantically coordinated relation matrix that preserves the organizational structure of semantic cues; (2) cross-modal consistency is achieved by maximizing the correlation between the relation spectra of the two modalities, enabling progressive optimization from intra-modal construction to cross-modal alignment; and (3) relation spectrum alignment is integrated with standard classification and retrieval objectives commonly used in re-identification to supervise backbone training for the vehicle re-identification model.  Results and Discussions  Compared with several state-of-the-art cross-modal re-identification methods on the RGBN300 and MSVR310 datasets, CaRS-Align demonstrates strong performance and achieves best or second-best results across both retrieval modes. As shown in (Table 1), on RGBN300 it attains 75.09% Rank-1 accuracy and 55.45% mean Average Precision (mAP) in the infrared-to-visible mode, and 76.60% Rank-1 accuracy and 56.12% mAP in the visible-to-infrared mode. As shown in (Table 2), similar advantages are observed on MSVR310, with 64.54% Rank-1 accuracy and 41.25% mAP in the visible-to-infrared mode, and 64.35% Rank-1 accuracy and 40.99% mAP in the infrared-to-visible mode. (Fig. 4) presents Top-10 retrieval results, where CaRS-Align reduces identity mismatches in both directions (Fig. 5) illustrates feature distance distributions, showing substantial overlap between intra-class and inter-class distances without CaRS-Align (Fig. 5(a)), whereas clearer separation is observed with CaRS-Align (Fig. 5(b)), confirming improved feature discrimination. These results indicate that modeling channel-level relational structures improves both retrieval modes, increases adaptability to modality shifts, and effectively reduces mismatches caused by cross-modal differences.  Conclusions  This paper proposes a visible–infrared cross-modal vehicle re-identification method based on CaRS-Align. Within each modality, a channel relation spectrum is constructed to preserve semantic co-occurrence structures. A CaRS-Align function is then designed to maximize the correlation between modalities, thereby achieving consistent alignment and improving cross-modal performance. Experiments on the MSVR310 and RGBN300 datasets demonstrate that CaRS-Align outperforms existing state-of-the-art methods in key metrics, including Rank-1 accuracy and mAP.
Routing and Resource Scheduling Algorithm Driven by Mixture of Experts in Large-scale Heterogeneous Local Power Communication Network
JING Chuanfang, ZHU Xiaorong
Available online  , doi: 10.11999/JEIT251176
Abstract:
  Objective  Emerging power services, such as distributed energy consumption, impose more stringent requirements on the performance of large-scale heterogeneous local power communication networks (LHLPCNs). Given the limited communication resources and rising service demands, providing on-demand services and enhancing network capacity while guaranteeing Quality of Service (QoS) presents a major challenge for LHLPCNs. Conventional routing and resource scheduling algorithms based on optimization or heuristics depend on precise mathematical models and parameters. As network scales and optimization variables increase, these algorithms become computationally expensive, hindering their effective adaptation to the growing variety of power application scenarios. Recent advances in mixture of experts (MoE) frameworks offer a promising solution, which greatly reduces the need to train individual task-specific model by employing an ensemble of AI models as specialized experts. Motivated by these challenges and the potential of MoE, this paper proposes a MoE-based routing and resource scheduling algorithm (RASMoE) tailored for LHLPCNs integrating High Power Line Carrier (HPLC) and Radio Frequency (RF). RASMoE can efficiently meet the personalized QoS requirements of diverse services and accommodate more power services within limited resources.  Methods  Firstly, considering the multi-modal links, channels and data modulation methods, the optimization problem of minimizing the difference between QoS supply and demand in LHLPCNs is established, which conforms to a 0-1 integer linear programming model. Then, to solve this NP-hard problem, a novel MOE framework comprising expert networks and gated networks is designed. This framework is capable of meeting the personalized demands of diverse services in terms of data transmission rate, delay and reliability, while achieving faster convergence. The expert networks, which include both shared and QoS-specific experts, are responsible for generating the optimal next hop and computing the efficient allocation strategies of links, channels and data modulation modes between node pairs. Meanwhile, the gated networks dynamically combine and reuse these experts to efficiently accommodate both known and unforeseen service types. Finally, extensive comparative experiments validate the effectiveness of the proposed algorithm. Compared with many baselines, RASMoE shows better performance in terms of resource utilization, delay and Reliability.  Results and Discussions  The difference between the performance supply and demand of five algorithms under varying service numbers is compared ( Fig. 3 ) . Simulation results show that RASMoE consistently exhibits the smallest performance supply-demand differences across all scenarios. This advantage stems from its gating network, which dynamically combines QoS-specific experts to precisely match resource allocation with service requirements. Given that control and computing-intensive services have strict delay requirements, the average end-to-end (E2E) latency of these two service types under different service numbers is compared ( Fig. 4 ) . It can be observed that the proposed algorithm achieves the lowest average E2E latency. This is because its expert networks, enhanced by Graph Attention Networks (GATs), efficiently extract node load states and interact with the network environment in real-time via a Multi-Armed Bandit (MAB) mechanism. This enables RASMoE to learn adaptive resource allocation strategies. Moreover, the average reliability of the E2E paths by the five algorithms for different numbers of control, compute-intensive, and acquisition services is illustrated (Fig. 5).  Conclusions  This paper proposes a MoE-driven routing and resource scheduling algorithm for LHLPCNs. The proposed framework comprises two core components: expert networks and a gating network. The expert networks include shared experts based on GATs and service QoS-specific experts based on MAB. The former are responsible for E2E path selection by analyzing node characteristics, while the latter focuses on adaptively allocating and scheduling links, channels, and modulation schemes according to distinct QoS requirements and link conditions. The gated networks dynamically orchestrate and reuse these expert models to efficiently serve services with single or multiple QoS demands, including previously unseen service types. Theoretical analysis validates that the proposed method enhances resource utilization of LHLCPNs, with its advantages being particularly pronounced in multi-service scenarios characterized by diverse QoS requirements. Future work will explore the integration of the MoE framework with domain-specific models (e.g., for power load forecasting) and predictive analytics, aiming to optimize the integration and utilization of renewable energy sources, such as wind and solar power.
Vision-Guided and Force-Controlled Method for Robotic Screw Assembly
ZHANG Chunyun, MENG Xintong, TAO Tao, ZHOU Huaidong
Available online  , doi: 10.11999/JEIT251193
Abstract:
  Objective  With the rapid development of intelligent manufacturing and industrial automation, robots have been increasingly applied to high-precision assembly tasks, especially in screw assembly. However, existing assembly systems still face multiple challenges. First, the pose of assembly objects is often uncertain, making initial localization difficult. Second, small features such as threaded holes are blurred and hard to identify accurately. Third, traditional vision-based open-loop control may lead to assembly deviation or jamming. To address these issues, this study proposes a vision–force cooperative method for robotic screw assembly. The method builds a closed-loop assembly system that covers both coarse positioning and fine alignment. A semantic-enhanced 6D pose estimation algorithm and a lightweight hole detection model are used to improve perception accuracy. Force feedback control is then applied to adjust the end-effector posture dynamically. The proposed approach improves the accuracy and stability of screw assembly.  Methods  The proposed screw assembly method is built on a vision–force cooperative strategy, forming a closed-loop process. In the visual perception stage, a semantic-enhanced 6D pose estimation algorithm is applied to handle disturbances and pose uncertainty in complex industrial environments. During initial pose estimation, Grounding DINO and SAM2 jointly generate pixel-level masks to provide semantic priors for the FoundationPose module. In the continuous tracking stage, semantic constraints from Grounding DINO are used for translational correction. For detecting small threaded holes, an improved lightweight hole detection algorithm based on NanoDet is designed. It uses MobileNetV3 as the backbone and adds a CircleRefine module in the detection head to regress the hole center precisely. In the assembly positioning stage, a hierarchical vision-guided strategy is used. The global camera conducts coarse positioning to provide overall guidance. The hand-eye camera then performs local correction based on hole detection results. In the closed-loop assembly stage, force feedback is applied for posture adjustment, achieving precise alignment between the screw and the threaded hole.  Results and Discussions  The proposed method is experimentally validated in robotic screw assembly scenarios. The improved 6D pose estimation algorithm reduces the average position error by 18% and the orientation error by 11.7% compared with the baseline (Tbl.1). The tracking success rate in dynamic sequences increases from 72% to 85% (Tbl.2). For threaded hole detection, the lightweight algorithm based on the improved NanoDet is evaluated using a dataset collected from assembly scenarios. The algorithm achieves 98.3% precision, 99.2% recall and 98.7% mAP on the test set (Tbl.3). The model size is only 11.7 MB and the computation cost is 2.9 GFLOPS. This remains lower than most benchmark models while maintaining high accuracy. A circular branch is introduced to fit hole edges (Fig.8), providing accurate center for visual guidance. Under various inclination angles (Fig.10), the assembly success rate stays above 91.6% (Tbl.4). For screws of different sizes (M4, M6 and M8), the success rate remains higher than 90% (Tbl.5). Under small external disturbances (Fig.12), the success rates reach 93.3%, 90% and 83.3% for translational, rotational and mixed disturbances, respectively (Tbl.6). Force-feedback comparison experiments show that the assembly success rate is 66.7% under visual guidance alone. When force feedback is added, the success rate increases to 96.7% (Tbl.7). The system demonstrated stable performance throughout a screw assembly cycle, achieving an average total cycle time of 9.53 seconds (Tbl.8), thereby meeting industrial assembly requirements.  Conclusions  This study proposes a vision and force control cooperative method to address several challenges in robotic screw assembly. The approach improves target localization accuracy through a semantics-enhanced 6D pose estimation algorithm and a lightweight threaded-hole detection network. By integrating a hierarchical vision-guided strategy with force-feedback control, precise alignment between the screw and the threaded hole is achieved. Experimental results demonstrate that the proposed method ensures reliable assembly under various conditions, providing a feasible solution for intelligent robotic assembly. Future work will focus on adaptive force control, multimodal perception fusion and intelligent task planning to further enhance the system’s generalization and self-optimization capabilities in complex industrial environments.
AutoPenGPT: Drift-Resistant Penetration Testing Driven by Search-Space Convergence and Dependency Modeling
HUANG Weigang, FU Lirong, LIU Peiyu, DU Linkang, YE Tong, XIA Yifan, WANG Wenhai
Available online  , doi: 10.11999/JEIT250873
Abstract:
  Objective  Industrial Control Systems (ICS) are widely deployed in critical sectors and often contain long-standing vulnerabilities due to strict availability requirements and limited patching opportunities. The increasing exposure of external management and access infrastructure has expanded the attack surface and allows adversaries to pivot from boundary components into fragile production networks. Continuous penetration testing of these components is essential but remains costly and difficult to scale when carried out manually. Recent work examines Large Language Models (LLMs) for automated penetration testing; however, existing systems often experience strategy drift and intention drift, which produce incoherent testing behaviors and ineffective exploitation chains.  Methods  This study proposes AutoPenGPT, a multi-agent framework for automated Web security testing. AutoPenGPT uses an adaptive exploration-space convergence mechanism that predicts likely vulnerability types from target semantics and constrains LLM-driven testing through a dynamically updated payload knowledge base. To reduce intention drift in multi-step exploitation, a dependency-driven strategy module rewrites historical feedback, models step dependencies, and generates coherent, executable strategies in a closed-loop workflow. A semi-structured prompt embedding scheme is also developed to support heterogeneous penetration testing tasks while preserving semantic integrity.  Results and Discussions  AutoPenGPT is evaluated on Capture-the-Flag (CTF) benchmarks and real-world ICS and Web platforms. On CTF datasets, it achieves 97.62% vulnerability-type detection accuracy and an 80.95% requirement completion rate, exceeding state-of-the-art tools by a wide margin. In real-world deployments, it reaches approximately 70% requirement completion and identifies six previously undisclosed vulnerabilities, demonstrating practical effectiveness.  Conclusions   The contributions are threefold. (1) Strategy drift and intention drift in LLM-driven penetration testing are examined and addressed through adaptive exploration and dependency-aware strategy mechanisms that stabilize long-horizon testing behaviors. (2) AutoPenGPT is designed and implemented as a multi-agent penetration testing system that integrates semantic vulnerability prediction, closed-loop strategy generation, and semi-structured prompt embedding. (3) Extensive evaluation on CTF and real-world ICS and Web platforms confirms the effectiveness and practicality of the system, including the discovery of previously unknown vulnerabilities.
Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning in Wireless Networks
LIU Jingyuan, MA Ke, XU Runchen, CHANG Zheng
Available online  , doi: 10.11999/JEIT251221
Abstract:
  Objective  Multimodal Federated Learning (MFL) uses complementary information from multiple modalities, yet in wireless edge networks it is restricted by limited energy and frequent missing modalities because many clients store only images or only reports. This study presents Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning (CREEMFL), which applies selective completion and joint communication–computation optimization to reduce training energy under latency and wireless constraints.  Methods  CREEMFL completes part of the incomplete samples by querying a public multimodal subset, and processes the remaining samples through zero padding. Each selected user downloads the global model, performs image-to-text or text-to-image retrieval, conducts local multimodal training, and uploads model updates for aggregation. An energy–delay model couples local computation and wireless communication and treats the required number of global rounds as a function of retrieval ratios. Based on this model, an energy minimization problem is formulated and solved using a two-layer algorithm with an outer search over retrieval ratios and an inner optimization of transmission time, Central Processing Unit (CPU) frequency, and transmit power.  Results and Discussions  Simulations on a single-cell wireless MFL system show that increasing the ratio of completing text from images improves test accuracy and reduces total energy. In contrast, a large ratio of completing images from text provides limited accuracy gain but increases energy consumption (Fig. 3, Fig. 4). Compared with four representative baselines, CREEMFL achieves shorter completion time and lower total energy across a wide range of maximum average transmit powers (Fig. 5, Fig. 6). For CREEMFL, increased system bandwidth further reduces completion time and energy consumption (Fig. 7, Fig. 8). Under different user modality compositions, CREEMFL also attains higher test accuracy than local training, zero padding, and cross-modal retrieval without energy optimization (Fig. 9).  Conclusions  CREEMFL integrates selective cross-modal retrieval and joint communication–computation optimization for energy-efficient MFL. By treating retrieval ratios as variables and modeling their effect on global convergence rounds, it captures the coupling between per-round costs and global training progress. Simulations verify that CREEMFL reduces training completion time and total energy while preserving classification accuracy in resource-constrained wireless edge networks.
Battery Pack Multi-fault Diagnosis Algorithm Based on Dual-Perspective Spectral Attention Fusion
LIU Mingjun, GU Shenyu, YIN Jingde, ZHANG Yifan, DONG Zhekang, JI Xiaoyue
Available online  , doi: 10.11999/JEIT251156
Abstract:
  Objective  With the rapid growth of electric vehicles and their widespread deployment, battery pack faults have become more frequent, creating an urgent need for efficient fault diagnosis methods. Although deep learning-based approaches have achieved notable progress, existing studies remain limited in addressing multiple fault types, such as Internal Short Circuit (ISC), sensor noise, sensor drift, and State-Of-Charge (SOC) inconsistency, and in modeling the coupling relationships among these faults. To address these limitations, a multi-fault diagnosis algorithm for battery packs based on dual-perspective spectral attention is proposed. A dual-perspective tokenization module is designed to extract spatiotemporal features from battery data, whereas a spectral attention mechanism addresses non-stationary time-series characteristics and captures long-term dependencies, thereby improving diagnostic performance.   Methods  To improve spatiotemporal feature extraction and fault diagnosis performance, a dual-perspective spectral attention fusion algorithm for battery pack multi-fault diagnosis is proposed. The overall architecture consists of four core modules (Fig. 3): a dual-perspective tokenization module, a spectral attention module, a feature fusion module, and an output module. The dual-perspective tokenization module applies positional encoding to jointly model temporal and spatial dimensions, enabling comprehensive spatiotemporal feature representation. When combined with the spectral attention mechanism, the capability of the model to handle non-stationary characteristics is strengthened, leading to improved diagnostic performance. In addition, to address the lack of comprehensive publicly available datasets for battery pack fault diagnosis, a new dataset is constructed, covering ISC, sensor noise, sensor drift, and SOC inconsistency faults. The dataset includes three operating conditions, FUDS, UDDS, and US06, which alleviates data scarcity in this research field.  Results and Discussions  Experimental results indicate that the proposed method improves average precision, recall, F1 score, and accuracy by 10.98%, 12.64%, 13.84%, and 13.45%, respectively, compared with existing optimal fault diagnosis methods. Comparison experiments under different operating conditions (Table 6) support this conclusion. Conventional convolutional neural network methods perform well in local feature extraction; however, fixed-size convolution kernels are not well suited to time features with varying frequencies, which limits long-term temporal dependency modeling and global feature capture. Recurrent neural network-based methods show reduced computational efficiency when large-scale datasets are processed. Transformer-based models face constraints in spatial feature extraction and in representing temporal variations. By contrast, the proposed algorithm addresses these limitations through an integrated architectural design. Ablation experiments demonstrate the contribution of each module to overall performance (Table 7), and the complete framework improves average F1 score and accuracy by 9.30% and 9.26%, respectively, compared with ablation variants. Robustness analysis under simulated noise conditions (Table 8) shows that the proposed method achieves accuracy improvements ranging from 49.95% to 124.34% over baseline methods at noise levels from –2 dB to –8 dB, indicating strong noise resistance.  Conclusions  A multi-fault diagnosis algorithm for battery packs is presented that integrates dual-perspective tokenization and spectral attention to combine spatiotemporal and spectral information. The dual-perspective tokenization module performs tokenization and positional encoding along temporal and spatial axes, which improves spatiotemporal representation. The spectral attention mechanism strengthens modeling of non-stationary signals and long-term dependencies. Experiments under FUDS, UDDS, and US06 driving cycles show that the proposed method outperforms existing multi-fault diagnosis approaches, with average gains of 13.84% in F1 score and 13.45% in accuracy. Ablation studies confirm that both modules contribute substantially and that their combination enables effective handling of complex time-series features. Under high-noise conditions (–2 dB, –4 dB, –6 dB, and –8 dB), the method also shows improved robustness, with accuracy gains of 49.95%, 90.39%, 112.01%, and 124.34%, respectively, compared with baseline methods. Several limitations remain. First, the data are mainly derived from laboratory simulations, and further validation under real-world operating conditions is required. Second, the effect of fault severity on battery management system hierarchical decision making has not been fully addressed, and future work will focus on establishing a fault severity grading strategy. Third, physical interpretability requires further improvement, and subsequent studies will explore the integration of equivalent circuit models or electrochemical mechanism models to balance diagnostic accuracy and interpretability.
Intelligent Analysis Technologies for Encrypted Traffic: Current Status, Advances, and Challenges
GONG Bi, LIU Jian, TANG Xiaomei, YU Meiting, GONG Hang, HUANG Meigen
Available online  , doi: 10.11999/JEIT250416
Abstract:
  Significance   Encrypted traffic enables the secure and reliable transmission of data yet poses notable challenges to network security, such as the covert propagation of malicious attacks, diminished effectiveness of security protection tools, and increased network resource overhead. In this context, encrypted traffic analysis technologies become particularly important. Traditional methods based on port filtering and deep packet inspection are inadequate to address the increasingly complex network environment. Intelligent analysis technologies for encrypted traffic integrate multiple cutting-edge technologies, including feature engineering, deep learning, Transformer architecture, federated learning, multimodal feature fusion, and generative models. These technologies solve problems in network security management from multiple aspects, playing a crucial role in efficiently identifying hidden attacks, optimizing network resource allocation, balancing system security and user privacy protection, enhancing network security defenses, and improving user experience.  Progress   Intelligent analysis technologies for encrypted traffic provide new ideas and methods for addressing network security challenges. (1) Feature engineering: (a) Statistical features: Starting from basic statistical features of encrypted traffic packets, such as packet size, quantity, arrival time, and rate, feature selection techniques are used for screening, enabling the processed data to well reflect the internal features of encrypted traffic. (b) Behavioral features: Through observation and analysis of network traffic, features such as access frequencies and protocol usage habits are parsed to determine behavior patterns. (2) Deep learning methods: (a) Convolutional Neural Network (CNN): Its convolutional and pooling layers automatically extract local features from encrypted traffic data, effectively capturing key information. For example, an improved multi-scale CNN achieves a classification accuracy of 86.77% on the ISCXVPN2016 dataset. (b) Recurrent Neural Network (RNN): It is adept at processing time-series data, learning long-term dependencies through its memory units to analyze temporal features like connection duration and traffic trends. (c) Graph Neural Network (GNN): Suitable for data with complex relational structures, it models the graph structure of encrypted traffic to excavate potential relationships between nodes. (d) Transformer architecture: With capabilities for parallel computing and processing long sequences, it uses the attention mechanism to capture long-distance dependencies in traffic data. For instance, a traffic Transformer method incorporating masked autoencoders improves accuracy to 98.07% on the ISCXVPN2016 dataset. (3) Other cutting-edge methods: (a) Federated learning: It enables multiple participants to jointly construct a global model by exchanging sub-model parameters without sharing original traffic data, thus protecting privacy and improving model performance. Validated cases show the performance gap compared to centralized learning can be narrowed to 0.8%. (b) Multimodal feature fusion: This method extracts features from traffic data of different modalities and fuses them into a unified representation to construct a comprehensive analysis architecture. It enhances model efficacy by integrating heterogeneous features, successfully increasing accuracy and F1-score for multi-task classification to 93.75% and 91.95%, respectively. (c) Generative model-driven approaches: Utilizing methods like Generative Adversarial Networks (GAN) and diffusion models, they learn the distribution of real traffic data to generate high-quality synthetic samples, alleviating data scarcity and class imbalance. For example, traffic generated by diffusion model-based methods shows significantly improved similarity to real traffic in key features like packet size and inter-arrival time, by up to 43.4% and 39.02% compared to baseline models.  Conclusions  This paper explains the necessity of intelligent encrypted traffic analysis technologies, systematically summarizes key technologies and related research, providing theoretical and technical support for the field. However, challenges remain: (1) Coping with network complexity: The heterogeneity and dynamic nature of modern networks lead to diverse encryption algorithms and inconsistent traffic structures, making it difficult for traditional rules to adapt. Simultaneously, network adjustments and user behavior changes cause dynamic evolution of traffic features, increasing analysis difficulty. (2) Insufficient model robustness: Encrypted traffic features are highly environment-dependent, causing accuracy degradation after migration. Models are also sensitive to non-ideal inputs and vulnerable to adversarial example attacks, which threaten model judgments. (3) Conflict between privacy protection and data compliance: Encrypted traffic carries sensitive information, and traditional analysis risks exposing original features. Directly collected metadata can still be associated with user identities, complicating compliance with anonymization regulations.  Prospects   Future work can focus on: (1) Enhancing dynamic adaptability: Constructing a full-link adaptive mechanism that integrates multi-dimensional information to achieve dynamic context awareness; introducing incremental learning frameworks to respond in real-time to feature changes; and combining algorithms like genetic algorithms and reinforcement learning to dynamically adapt detection strategies. (2) Improving anti-attack capability: Building a comprehensive protection system encompassing adversarial sample detection, model defense, and attack traceability, including designing monitoring modules and employing adversarial training. (3) Strengthening privacy protection and compliance: Introducing differential privacy by adding controllable noise during feature extraction or to model parameters, and adopting homomorphic encryption technology to support analytical tasks directly on ciphertexts. (4) Promoting synergy between reverse engineering and Explainable AI (XAI): Utilizing reverse engineering to deeply analyze protocol structures as precise inputs for XAI, and leveraging XAI methods to enhance model transparency, forming a closed-loop optimization between reverse analysis and model interpretation.
Two-Channel Joint Coding Detection for Cyber-Physical Systems Against Integrity Attacks
MO Xiaolei, ZENG Weixin, FU Jiawei, DOU Keqin, WANG Yanwei, SUN Ximing, LIN Sida, SUI Tianju
Available online  , doi: 10.11999/JEIT250729
Abstract:
  Objective  Cyber-Physical Systems (CPS) are widely applied across infrastructure, aviation, energy, healthcare, manufacturing, and transportation, as computing, control, and sensing technologies advance. Due to the real-time interaction between information and physical processes, such systems are exposed to security risks during data exchange. Attacks on CPS can be grouped into availability, integrity, and reliability attacks based on information security properties. Integrity attacks manipulate data streams to disrupt the consistency between system inputs and outputs. Compared with the other two types, integrity attacks are more difficult to detect because of their covert and dynamic nature. Existing detection strategies generally modify control signals, sensing signals, or system models. Although these approaches can detect specific categories of attacks, they may reduce control performance and increase model complexity and response delay.  Methods  A joint additive and multiplicative coding detection scheme for the two-channel structure of control and output is proposed. Three representative integrity attacks are tested, including a control-channel bias attack, an output-channel replay attack, and a two-channel covert attack. These attacks remain stealthy by partially or fully obtaining system information and manipulating data so the residual-based χ2 detector output stays below the detection threshold. The proposed method introduces paired additive watermarking signals with positive and negative patterns, together with paired multiplicative coding and decoding matrices on both channels. These additional unknown signals and parameters introduce information uncertainty to the attacker and cause the residual statistics to deviate from the expected values constructed using known system information. The watermarking pairs and matrix pairs operate through different mechanisms. One uses opposite-sign injection, while the other uses a mutually inverse transformation. Therefore, normal control performance is maintained when no attack is present. The time-varying structure also prevents attackers from reconstructing or bypassing the detection mechanism.  Results and Discussions  Simulation experiments on an aerial vehicle trajectory model are conducted to assess both the influence of integrity attacks on flight paths and the effectiveness of the proposed detection scheme. The trajectory is modeled using Newton’s equations of motion, and attitude dynamics and rotational motion are omitted to focus on positional behavior. Detection performance with and without the proposed method is compared under the three attack scenarios (Fig. 2, Fig. 3, Fig. 4). The results show that the proposed scheme enables effective identification of all attack types and maintains stable system behavior, demonstrating its practical applicability and improvement over existing approaches.  Conclusions  This study addresses the detection of integrity attacks in CPS. Three representative attack types (bias, replay, and covert attacks) are modeled, and the conditions required for their successful execution are analyzed. A detection approach combining additive watermarking and multiplicative encoding matrices is proposed and shown to detect all three attack types. The design uses paired positive–negative additive watermarks and paired encoding and decoding matrices to ensure accurate detection while maintaining normal control performance. A time-varying configuration is adopted to prevent attackers from reconstructing or bypassing the detection elements. Using an aerial vehicle trajectory simulation, the proposed approach is demonstrated to be effective and applicable to cyber-physical system security enhancement.
One-pass Architectural Synthesis for Continuous-Flow Microfluidic Biochips Based on Deep Reinforcement Learning
LIU Genggeng, JIAO Xinyue, PAN Youlin, HUANG Xing
Available online  , doi: 10.11999/JEIT251058
Abstract:
Continuous-Flow Microfluidic Biochips (CFMBs) are widely applied in biomedical research because of miniaturization, high reliability, and low sample consumption. As integration density increases, design complexity significantly rises. Conventional stepwise design methods treat binding, scheduling, layout, and routing as separate stages, with limited information exchange across stages, which leads to reduced solution quality and extended design cycles. To address this limitation, a one-pass architectural synthesis method for CFMBs is proposed based on Deep Reinforcement Learning (DRL). Graph Convolutional Neural networks (GCNs) are used to extract state features, capturing structural characteristics of operations and their relationships. Proximal Policy Optimization (PPO), combined with the A* algorithm and list scheduling, ensures rational layout and routing while providing accurate information for operation scheduling. A multiobjective reward function is constructed by normalizing and weighting biochemical reaction time, total channel length, and valve count, enabling efficient exploration of the decision space through policy gradient updates. Experimental results show that the proposed method achieves a 2.1% reduction in biochemical reaction time, a 21.3% reduction in total channel length, and a 65.0% reduction in valve count on benchmark test cases, while maintaining feasibility for larger-scale chips.  Objective  CFMBs have gained sustained attention in biomedical applications because of miniaturization, high reliability, and low sample consumption. With increasing integration density, design complexity escalates substantially. Traditional stepwise design methods often yield suboptimal solutions, extended design cycles, and feasibility limitations for large-scale chips. To address these challenges, a one-pass architectural synthesis framework is proposed that integrates DRL to achieve coordinated optimization of binding, scheduling, layout, and routing.  Methods  All CFMB design tasks are integrated into a unified optimization framework formulated as a Markov decision process. The state space includes device binding information, device locations, operation priorities, and related parameters, whereas the action space adjusts device placement, operation-to-device binding, and operation priority. High-dimensional state features are extracted using GCNs. PPO is applied to iteratively update policies. The reward function accounts for biochemical reaction time, total flow-channel length, and the number of additional valves. These metrics are evaluated using the A* algorithm and list scheduling, normalized, and weighted to balance trade-offs among objectives.  Results and Discussions  Based on the current state and candidate actions, architectural solutions are generated iteratively through PPO-guided policy updates combined with the A* algorithm and list scheduling. The defined reward function enables the generation of CFMB architectures with improved overall quality. Experimental results show an average reduction of 2.1% in biochemical reaction time, an average reduction of 21.3% in total flow-channel length, with a maximum reduction of 57.1% in the ProteinSplit benchmark, and an average reduction of 65.0% in additional valve count compared with existing methods. These improvements reduce manufacturing cost and operational risk.  Conclusions  A one-pass architectural synthesis method for CFMBs based on DRL is proposed to address flow-layer design challenges. By applying GCN-based state feature extraction and PPO-based policy optimization, the multiobjective design problem is transformed into a sequential decision-making process that enables joint optimization of binding, scheduling, layout, and routing. Experimental results obtained from multiple benchmark test cases confirm improved performance in biochemical reaction completion time, total channel length, and valve count, while preserving scalability for larger chip designs.
Multi-Scale Region of Interest Feature Fusion for Palmprint Recognition
MA Yuxuan, ZHANG Feifei, LI Guanghui, TANG Xin, DONG Zhengyang
Available online  , doi: 10.11999/JEIT250940
Abstract:
  Objective  Accurate localization of the Region Of Interest (ROI) is a prerequisite for high-precision palmprint recognition. In contactless and uncontrolled application scenarios, complex background illumination and diverse hand postures frequently cause ROI localization offsets. Most existing deep learning-based recognition methods rely on a single fixed-size ROI as input. Although some approaches adopt multi-scale convolution kernels, fusion at the ROI level is not performed, which makes these methods highly sensitive to localization errors. Therefore, small deviations in ROI extraction often result in severe performance degradation, which restricts practical deployment. To overcome this limitation, a Multi-scale ROI Feature Fusion Mechanism is proposed, and a corresponding model, termed ROI3Net, is designed. The objective is to construct a recognition system that is inherently robust to localization errors by integrating complementary information from multiple ROI scales. This strategy reinforces shared intrinsic texture features while suppressing scale-specific noise introduced by positioning inaccuracies.  Methods  The proposed ROI3Net adopts a dual-branch architecture consisting of a Feature Extraction Network and a lightweight Weight Prediction Network (Fig. 4). The Feature Extraction Network employs a sequence of Multi-Scale Residual Blocks (MSRBs) to process ROIs at three progressive scales (1.00×, 1.25×, and 1.50×) in parallel. Within each MSRB, dense connections are applied to promote feature reuse and reduce information loss (Eq. 3). Convolutional Block Attention Modules (CBAMs) are incorporated to adaptively refine features in both the channel and spatial dimensions. The Weight Prediction Network is implemented as an end-to-end lightweight module. It takes raw ROI images as input and processes them using a serialized convolutional structure (Conv2d-BN-GELU-MaxPool), followed by a Multi-Layer Perceptron (MLP) head, to predict a dynamic weight vector for each scale. This subnetwork is optimized for efficiency, containing 2.38 million parameters, which accounts for approximately 6.2% of the total model parameters, and requiring 103.2 MFLOPs, which corresponds to approximately 2.1% of the total computational cost. The final feature representation is obtained through a weighted summation of multi-scale features (Eq. 1 and Eq. 2), which mathematically maximizes the information entropy of the fused feature vector.  Results and Discussions  Experiments are conducted on six public palmprint datasets: IITD, MPD, NTU-CP, REST, CASIA, and BMPD. Under ideal conditions with accurate ROI localization, ROI3Net demonstrates superior performance compared with state-of-the-art single-scale models. For instance, a Rank-1 accuracy of 99.90% is achieved on the NTU-CP dataset, and a Rank-1 accuracy of 90.17% is achieved on the challenging REST dataset (Table 1). Model robustness is further evaluated by introducing a random 10% localization offset. Under this condition, conventional models exhibit substantial performance degradation. For example, the Equal Error Rate (EER) of the CO3Net model on NTU-CP increases from 2.54% to 15.66%. In contrast, ROI3Net maintains stable performance, with the EER increasing only from 1.96% to 5.01% (Fig. 7, Table 2). The effect of affine transformations, including rotation (±30°) and scaling (0.85\begin{document}$ \sim $\end{document}1.15×), is also analyzed. Rotation causes feature distortion because standard convolution operations lack rotation invariance, whereas the proposed multi-scale mechanism effectively compensates for translation errors by expanding the receptive field (Table 3). Generalization experiments further confirm that embedding this mechanism into existing models, including CCNet, CO3Net, and RLANN, significantly improves robustness (Table 6). In terms of efficiency, although the theoretical computational load increases by approximately 150%, the actual GPU inference time increases by only about 20% (6.48 ms) because the multi-scale branches are processed independently and in parallel (Table 7).  Conclusions  A Multi-scale ROI Feature Fusion Mechanism is presented to reduce the sensitivity of palmprint recognition systems to localization errors. By employing a lightweight Weight Prediction Network to adaptively fuse features extracted from different ROI scales, the proposed ROI3Net effectively combines fine-grained texture details with global semantic information. Experimental results confirm that this approach significantly improves robustness to translation errors by recovering truncated texture information, whereas the efficient design of the Weight Prediction Network limits computational overhead. The proposed mechanism also exhibits strong generalization ability when integrated into different backbone networks. This study provides a practical and resilient solution for palmprint recognition in unconstrained environments. Future work will explore non-linear fusion strategies, such as graph neural networks, to further exploit cross-scale feature interactions.
Joint Mask and Multi-Frequency Dual Attention GAN Network for CT-to-DWI Image Synthesis in Acute Ischemic Stroke
ZHANG Zehua, ZHAO Ning, WANG Shuai, WANG Xuan, ZHENG Qiang
Available online  , doi: 10.11999/JEIT250643
Abstract:
  Objective  In the clinical management of Acute Ischemic Stroke (AIS), Computed Tomography (CT) and Diffusion-Weighted Imaging (DWI) serve complementary roles at different stages. CT is widely applied for initial evaluation due to its rapid acquisition and accessibility, but it has limited sensitivity in detecting early ischemic changes, which can result in diagnostic uncertainty. In contrast, DWI demonstrates high sensitivity to early ischemic lesions, enabling visualization of diffusion-restricted regions soon after symptom onset. However, DWI acquisition requires a longer time, is susceptible to motion artifacts, and depends on scanner availability and patient cooperation, thereby reducing its clinical accessibility. The limited availability of multimodal imaging data remains a major challenge for timely and accurate AIS diagnosis. Therefore, developing a method capable of rapidly and accurately generating DWI images from CT scans has important clinical significance for improving diagnostic precision and guiding treatment planning. Existing medical image translation approaches primarily rely on statistical image features and overlook anatomical structures, which leads to blurred lesion regions and reduced structural fidelity.  Methods  This study proposes a Joint Mask and Multi-Frequency Dual Attention Generative Adversarial Network (JMMDA-GAN) for CT-to-DWI image synthesis to assist in the diagnosis and treatment of ischemic stroke. The approach incorporates anatomical priors from brain masks and adaptive multi-frequency feature fusion to improve image translation accuracy. JMMDA-GAN comprises three principal modules: a mask-guided feature fusion module, a multi-frequency attention encoder, and an adaptive fusion weighting module. The mask-guided feature fusion module integrates CT images with anatomical masks through convolution, embedding spatial priors to enhance feature representation and texture detail within brain regions and ischemic lesions. The multi-frequency attention encoder applies Discrete Wavelet Transform (DWT) to decompose images into low-frequency global components and high-frequency edge components. A dual-path attention mechanism facilitates cross-scale feature fusion, reducing high-frequency information loss and improving structural detail reconstruction. The adaptive fusion weighting module combines convolutional neural networks and attention mechanisms to dynamically learn the relative importance of input features. By assigning adaptive weights to multi-scale features, the module selectively enhances informative regions and suppresses redundant or noisy information. This process enables effective integration of low- and high-frequency features, thereby improving both global contextual consistency and local structural precision.  Results and Discussions  Extensive experiments were performed on two independent clinical datasets collected from different hospitals to assess the effectiveness of the proposed method. JMMDA-GAN achieved Mean Squared Error (MSE) values of 0.0097 and 0.0059 on Clinical Dataset 1 and Clinical Dataset 2, respectively, exceeding state-of-the-art models by reducing MSE by 35.8% and 35.2% compared with ARGAN. The proposed network reached peak Signal-to-Noise Ratio (PSNR) values of 26.75 and 28.12, showing improvements of 30.7% and 7.9% over the best existing methods. For Structural Similarity Index (SSIM), JMMDA-GAN achieved 0.753 and 0.844, indicating superior structural preservation and perceptual quality. Visual analysis further demonstrates that JMMDA-GAN restores lesion morphology and fine texture features with higher fidelity, producing sharper lesion boundaries and improved structural consistency compared with other methods. Cross-center generalization and multi-center mixed experiments confirm that the model maintains stable performance across institutions, highlighting its robustness and adaptability in clinical settings. Parameter sensitivity analysis shows that the combination of Haar wavelet and four attention heads achieves an optimal balance between global structural retention and local detail reconstruction. Moreover, superpixel-based gray-level correlation experiments demonstrate that JMMDA-GAN exceeds existing models in both local consistency and global image quality, confirming its capacity to generate realistic and diagnostically reliable DWI images from CT inputs.  Conclusions  This study proposes a novel JMMDA-GAN designed to enhance lesion and texture detail generation by incorporating anatomical structural information. The method achieves this through three principal modules. (1) The mask-guided feature fusion module effectively integrates anatomical structure information, with particular optimization of the lesion region. The mask-guided network focuses on critical lesion features, ensuring accurate restoration of lesion morphology and boundaries. By combining mask and image data, the method preserves the overall anatomical structure while enhancing lesion areas, preventing boundary blurring and texture loss commonly observed in traditional approaches, thereby improving diagnostic reliability. (2) The multi-frequency feature fusion module jointly optimizes low- and high-frequency features to enhance image detail. This integration preserves global structural integrity while refining local features, producing visually realistic and high-fidelity images. (3) The adaptive fusion weighting module dynamically adjusts the learning strategy for frequency-domain features according to image content, enabling the network to manage texture variations and complex anatomical structures effectively, thereby improving overall image quality. Through the coordinated function of these modules, the proposed method enhances image realism and diagnostic precision. Experimental results demonstrate that JMMDA-GAN exceeds existing advanced models across multiple clinical datasets, highlighting its potential to support clinicians in the diagnosis and management of AIS.
Modeling, Detection, and Defense Theories and Methods for Cyber-Physical Fusion Attacks in Smart Grid
WANG Wenting, TIAN Boyan, WU Fazong, HE Yunpeng, WANG Xin, YANG Ming, FENG Dongqin
Available online  , doi: 10.11999/JEIT250659
Abstract:
  Significance   Smart Grid (SG), the core of modern power systems, enables efficient energy management and dynamic regulation through cyber–physical integration. However, its high interconnectivity makes it a prime target for cyberattacks, including False Data Injection Attacks (FDIAs) and Denial-of-Service (DoS) attacks. These threats jeopardize the stability of power grids and may trigger severe consequences such as large-scale blackouts. Therefore, advancing research on the modeling, detection, and defense of cyber–physical attacks is essential to ensure the safe and reliable operation of SGs.  Progress   Significant progress has been achieved in cyber–physical security research for SGs. In attack modeling, discrete linear time-invariant system models effectively capture diverse attack patterns. Detection technologies are advancing rapidly, with physical-based methods (e.g., physical watermarking and moving target defense) complementing intelligent algorithms (e.g., deep learning and reinforcement learning). Defense systems are also being strengthened: lightweight encryption and blockchain technologies are applied to prevention, security-optimized Phasor Measurement Unit (PMU) deployment enhances equipment protection, and response mechanisms are being continuously refined.  Conclusions  Current research still requires improvement in attack modeling accuracy and real-time detection algorithms. Future work should focus on developing collaborative protection mechanisms between the cyber and physical layers, designing solutions that balance security with cost-effectiveness, and validating defense effectiveness through high-fidelity simulation platforms. This study establishes a systematic theoretical framework and technical roadmap for SG security, providing essential insights for safeguarding critical infrastructure.  Prospects   Future research should advance in several directions: (1) deepening synergistic defense mechanisms between the information and physical layers; (2) prioritizing the development of cost-effective security solutions; (3) constructing high-fidelity information–physical simulation platforms to support research; and (4) exploring the application of emerging technologies such as digital twins and interpretable Artificial Intelligence (AI).
Tri-Frequency Wearable Antenna Loaded with Artificial Magnetic Conductors
JIN Bin, ZHANG Jialin, DU Chengzhu, CHU Jun
Available online  , doi: 10.11999/JEIT251050
Abstract:
A tri-band wearable antenna based on an Artificial Magnetic Conductor (AMC) is designed for on-body wireless applications. The design objective is to achieve multi-band operation with enhanced radiation characteristics and reduced electromagnetic exposure under wearable conditions. The antenna adopts a tri-frequency monopole with a trident structure, while the AMC unit employs a three-layer square-ring configuration. Both the antenna and the AMC are fabricated on a semiflexible Rogers 4003 substrate. A 4 × 5 AMC array is positioned on the back of the antenna, forming an integrated structure that improves radiation directionality and suppresses backward radiation. The integrated antenna exhibits measured operating bandwidths of 2.38~2.52 GHz, 3.30~3.86 GHz, and 5.54~7.86 GHz. These frequency ranges cover the ISM band (2.400~2.4835 GHz), the 5G n78 band (3.30~3.80 GHz), and the 5G/WiFi 5.8 GHz band (5.725~5.875 GHz). The measured gains at 2.4 GHz, 3.5 GHz, and 5.8 GHz are corresponding to improvements of 5.3 dB, 4.6 dB, and 2.2 dB compared with the unloaded antenna. The front-to-back ratio improves by 19.8 dB, 16.7 dB, and 12.4 dB relative to the antenna without the AMC. The AMC reflector effectively reduces the Specific Absorption Rate (SAR), with the maximum value maintained below 0.025 W/kg/g, which is lower than the limits specified by the U.S. Federal Communications Commission and the European Telecommunications Standards Institute. Antenna performance is further evaluated when attached to the human chest, back, and thigh, and the measured results indicate stable operation, supporting safe and flexible wearable applications.
An EEG Emotion Recognition Model Integrating Memory and Self-attention Mechanisms
LIU Shanrui, BI Yingzhou, HUO Leigang, GAN Qiujing, ZHOU shuheng
Available online  , doi: 10.11999/JEIT250737
Abstract:
  Objective  ElectroEncephaloGraphy (EEG) is a noninvasive technique for recording neural signals and provides rich emotional and cognitive information for brain science research and affective computing. Although Transformer-based models demonstrate strong global modeling capability in EEG emotion recognition, their multi-head self-attention mechanisms do not reflect the characteristics of brain-generated signals that exhibit a forgetting effect. In human cognition, emotional or cognitive states from distant time points gradually decay, whereas existing Transformer-based approaches emphasize temporal relevance only and neglect this forgetting behavior. This limitation reduces recognition performance. Therefore, a model is designed to account for both temporal relevance and the intrinsic forgetting effect of brain activity.  Methods  A novel EEG emotion recognition model, termed Memory Self-Attention (MSA), is proposed by embedding a memory-based forgetting mechanism into the standard self-attention framework. The MSA mechanism integrates global semantic modeling with a biologically inspired memory decay component. For each attention head, a memory forgetting score is learned through two independent linear decay curves to represent natural attenuation over time. These scores are combined with conventional attention weights so that temporal relationships are adjusted by distance-aware forgetting behavior. This design improves performance with a negligible increase in model parameters and computational cost. An Aggregated Convolutional Neural Network (ACNN) is first applied to extract spatiotemporal features across EEG channels. The MSA module then captures global dependencies and memory-aware interactions. The refined representations are finally passed to a classification head to generate predictions.  Results and Discussions  The proposed model is evaluated on several benchmark EEG emotion recognition datasets. On the DEAP binary classification task, classification accuracies of 98.87% for valence and 98.30% for arousal are achieved. On the SEED three-class task, an accuracy of 97.64% is obtained, and on the SEED-IV four-class task, the accuracy reaches 95.90%. These results (Figs. 35, Tables 35) exceed those of most mainstream methods, indicating the effectiveness and robustness of the proposed approach across different datasets and emotion classification settings.  Conclusions  An effective and biologically informed method for EEG-based emotion recognition is presented by incorporating a memory forgetting mechanism into a Transformer architecture. The proposed MSA model captures both temporal correlations and forgetting characteristics of brain signals, providing a lightweight and accurate solution for multi-class emotion recognition. Experimental results confirm its strong performance and generalizability.
High-Efficiency Side-Channel Analysis: From Collaborative Denoising to Adaptive B-Spline Dimension Reduction
LUO Yuling, XU Haiyang, OUYANG Xue, FU Qiang, QIN Sheng, LIU Junxiu
Available online  , doi: 10.11999/JEIT251047
Abstract:
  Objective  The performance of side-channel attacks is often constrained by the low signal-to-noise ratio of raw power traces, the masking of local leakage by redundant high-dimensional data, and the reliance on empirically chosen preprocessing parameters. Existing studies typically optimize individual stages, such as denoising or dimensionality reduction, in isolation, lack a unified framework, and fail to balance signal-to-noise ratio enhancement with the preservation of local leakage features. A unified analysis framework is therefore proposed to integrate denoising, adaptive parameter selection, and dimensionality reduction while preserving local leakage characteristics. Through coordinated optimization of these components, both the efficiency and robustness of side-channel attacks are improved.  Methods  Based on the similarity of power traces corresponding to identical plaintexts and the local approximation properties of B-splines, a side-channel analysis method combining collaborative denoising and Adaptive B-Spline Dimension Reduction (ABDR) is presented. First, a Collaborative Denoising Framework (CDF) is constructed, in which high-quality traces are selected using a plaintext-mean template, and targeted denoising is performed via singular value decomposition guided by a singular-value template. Second, a Neighbourhood Asymmetry Clustering (NAC) method is applied to adaptively determine key thresholds within the CDF. Finally, an ABDR algorithm is proposed, which allocates knots non-uniformly according to the variance distribution of power traces, thereby enabling efficient data compression while preserving critical local leakage features.  Results and Discussions  Experiments conducted on two datasets based on 8-bit AVR (OSR2560) and 32-bit ARM Cortex-M4 (OSR407) architectures demonstrate that the CDF significantly enhances the signal-to-noise ratio, with improvements of 60% on OSR2560 (Fig. 2) and 150% on OSR407 (Fig. 4). The number of power traces required for successful key recovery is reduced from 3 000/2 400 to 1 200/1 500 for the two datasets, respectively (Figs. 3 and 5). Through adaptive threshold selection in the CDF, NAC achieves faster and more stable guessing-entropy convergence than fixed-threshold and K-means-based strategies, which enhances overall robustness (Fig. 6). The ABDR algorithm places knots densely in high-variance leakage regions and sparsely in low-variance regions. While maintaining a high attack success rate, it reduces the data dimensionality from 5 000 and 5 500 to 1 000 and 500, respectively, corresponding to a compression rate of approximately 80%. At the optimal dimensionality (Fig. 7), the correlation coefficients of the correct key reach 0.186 0 on OSR2560 and 0.360 5 on OSR407, both exceeding those obtained using other dimensionality reduction methods. These results indicate superior local information retention and attack efficiency (Tables 3 and 4).  Conclusions  The results confirm that the proposed CDF substantially improves the signal-to-noise ratio of power traces, while NAC enables adaptive parameter selection and enhances robustness. Through accurate local modeling, ABDR effectively alleviates the trade-off between high-dimensional data reduction and the preservation of critical leakage information. Comprehensive experimental validation shows that the integrated framework addresses key challenges in side-channel analysis, including low signal-to-noise ratio, redundancy-induced information masking, and dependence on empirical parameters, and provides a practical and scalable solution for real-world attack scenarios.
Research on Proximal Policy Optimization for Autonomous Long-Distance Rapid Rendezvous of Spacecraft
LIN Zheng, HU Haiying, DI Peng, ZHU Yongsheng, ZHOU Meijiang
Available online  , doi: 10.11999/JEIT250844
Abstract:
  Objective   With increasing demands from deep-space exploration, on-orbit servicing, and space debris removal missions, autonomous long-distance rapid rendezvous capabilities are required for future space operations. Traditional trajectory planning approaches based on analytical methods or heuristic optimization show limitations when complex dynamics, strong disturbances, and uncertainties are present, which makes it difficult to balance efficiency and robustness. Deep Reinforcement Learning (DRL) combines the approximation capability of deep neural networks with reinforcement learning-based decision-making, which supports adaptive learning and real-time decisions in high-dimensional continuous state and action spaces. In particular, Proximal Policy Optimization (PPO) is a representative policy gradient method because of its training stability, sample efficiency, and ease of implementation. Integration of DRL with PPO for spacecraft long-distance rapid rendezvous is therefore expected to overcome the limits of conventional methods and provide an intelligent, efficient, and robust solution for autonomous guidance in complex orbital environments.   Methods   A spacecraft orbital dynamics model is established by incorporating J2 perturbation, together with uncertainties arising from position and velocity measurement errors and actuator deviations during on-orbit operations. The long-distance rapid rendezvous problem is formulated as a Markov Decision Process, in which the state space includes position, velocity, and relative distance, and the action space is defined by impulse duration and direction. Fuel consumption and terminal position and velocity constraints are integrated into the model. On this basis, a DRL framework based on PPO is constructed. The policy network outputs maneuver command distributions, whereas the value network estimates state values to improve training stability. To address convergence difficulties caused by sparse rewards, an enhanced dense reward function is designed by combining a position potential function with a velocity guidance function. This design guides the agent toward the target while enabling gradual deceleration and improved fuel efficiency. The optimal maneuver strategy is obtained through simulation-based training, and robustness is evaluated under different uncertainty conditions.   Results and Discussions   Based on the proposed DRL framework, comprehensive simulations are conducted to assess effectiveness and robustness. In Case 1, three reward structures are examined: sparse reward, traditional dense reward, and an improved dense reward that integrates a relative position potential function with a velocity guidance term. The results show that reward design strongly affects convergence behavior and policy stability. Under sparse rewards, insufficient process feedback limits exploration of feasible actions. Traditional dense rewards provide continuous feedback and enable gradual convergence, but terminal velocity deviations are not fully corrected at later stages, which leads to suboptimal convergence and incomplete satisfaction of terminal constraints. In contrast, the improved dense reward guides the agent toward favorable behaviors from early training stages while penalizing undesirable actions at each step, which accelerates convergence and improves robustness. The velocity guidance term allows anticipatory adjustments during mid-to-late approach phases rather than delaying corrections to the terminal stage, resulting in improved fuel efficiency.Simulation results show that the maneuvering spacecraft performs 10 impulsive maneuvers, achieving a terminal relative distance of 21.326 km, a relative velocity of 0.005 0 km/s, and a total fuel consumption of 111.2123 kg. To evaluate robustness under realistic uncertainties, 1,000 Monte Carlo simulations are performed. As summarized in Table 6, the mission success rate reaches 63.40%, and fuel consumption in all trials remains within acceptable bounds. In Case 2, PPO performance is compared with that of Deep Deterministic Policy Gradient (DDPG) for a multi-impulse fast-approach rendezvous mission. PPO results show five impulsive maneuvers, a terminal separation of 2.281 8 km, a relative velocity of 0.003 8 km/s, and a total fuel consumption of 4.148 6 kg. DDPG results show a fuel consumption of 4.322 5 kg, a final separation of 4.273 1 km, and a relative velocity of 0.002 0 km/s. Both methods satisfy mission requirements with comparable fuel use. However, DDPG requires a training time of 9 h 23 min, whereas PPO converges within 6 h 4 min, indicating lower computational cost. Overall, the improved PPO framework provides better learning efficiency, policy stability, and robustness.  Conclusions   The problem of autonomous long-distance rapid rendezvous under J2 perturbation and uncertainties is investigated, and a PPO-based trajectory optimization method is proposed. The results demonstrate that feasible maneuver trajectories satisfying terminal constraints can be generated under limited fuel and transfer time, with improved convergence speed, fuel efficiency, and robustness. The main contributions include: (1) development of an orbital dynamics framework that incorporates J2 perturbation and uncertainty modeling, with formulation of the rendezvous problem as a Markov Decision Process; (2) design of an enhanced dense reward function that combines position potential and velocity guidance, which improves training stability and convergence efficiency; and (3) simulation-based validation of PPO robustness in complex orbital environments. Future work will address sensor noise, environmental disturbances, and multi-spacecraft cooperative rendezvous in more complex mission scenarios to further improve practical applicability and generalization.
Auxiliary Screening for Hypertrophic Cardiomyopathy With Heart Failure with Preserved Ejection Fraction Utilizing Smartphone-Acquired Heart Sound Analysis
DONG Xianpeng, MENG Xiangbin, ZHANG Kuo, FANG Guanchen, GAI Weihao, WANG Wenyao, WANG Jingjia, GAO Jun, PAN Junjun, TANG Zhenchao, SONG Zhen
Available online  , doi: 10.11999/JEIT250830
Abstract:
  Objective  Heart Failure with preserved Ejection Fraction (HFpEF) is highly prevalent among patients with Hypertrophic CardioMyopathy (HCM), and early identification is critical for improving disease management. However, early screening for HFpEF remains challenging because symptoms are non-specific, diagnostic procedures are complex, and follow-up costs are high. Smartphones, owing to their wide accessibility, low cost, and portability, provide a feasible means to support heart sound-based screening. In this study, smartphone-acquired heart sounds from patients with HCM are used to develop and train an ensemble learning classification model for early detection and dynamic self-monitoring of HFpEF in the HCM population.  Methods  The proposed HFpEF screening framework consists of three components: preprocessing, feature extraction, and model training and fusion based on ensemble learning (Fig. 1). During preprocessing, smartphone-acquired heart sounds are subjected to bandpass filtering and wavelet denoising to improve signal quality, followed by segmentation into individual cardiac cycles. For feature extraction, Mel-Frequency Cepstral Coefficients (MFCCs) and Short-Time Fourier Transform (STFT) time-frequency spectra are calculated (Fig. 3). For classification, a stacking ensemble strategy is applied. Base learners, including a Support Vector Machine (SVM) and a Convolutional Neural Network (CNN), are trained, and their predicted probabilities are combined to construct a new feature space. A Logistic Regression (LR) meta-learner is then trained on this feature space to identify HFpEF in patients with HCM.  Results and Discussions  The classification performance of the three models is evaluated using the same patient-level independent test set. The SVM base learner achieves an Area Under the Curve (AUC) of 0.800, with an accuracy of 0.766, sensitivity of 0.659, and specificity of 0.865 (Table 5). The CNN base learner attains an AUC of 0.850, with an accuracy of 0.789, sensitivity of 0.622, and specificity of 0.944 (Table 5). By comparison, the ensemble-based LR classifier demonstrates superior performance, reaching an AUC of 0.900, with an accuracy of 0.813, sensitivity of 0.768, and specificity of 0.854 (Table 5). Relative to the base learners, the ensemble model exhibits a significant overall performance improvement after probability-based feature fusion (Fig. 5). Compared with existing clinical HFpEF risk scores, the proposed method shows higher predictive performance and stronger dynamic monitoring capability, supporting its suitability for risk stratification and follow-up warning in home settings. Compared with professional heart sound acquisition devices, the smartphone-acquired approach provides greater accessibility and cost efficiency, supporting its application in auxiliary HFpEF screening for high-risk HCM populations.  Conclusions  The challenges of clinical HFpEF screening in patients with HCM are addressed by proposing a smartphone-acquired heart sound analysis approach combined with an ensemble learning prediction model, resulting in an accessible and easily implemented auxiliary screening pipeline. The effectiveness of smartphone-based heart sound analysis for initial HFpEF screening in patients with HCM is validated, demonstrating its feasibility as an economical auxiliary tool for early HFpEF detection. This approach provides a non-invasive, convenient, and efficient screening strategy for patients with HCM complicated by HFpEF.
A Review of Research on Voiceprint Fault Diagnosis of Transformers
GONG Wenjie, LIN Guosong, WEI Xiaoguang
Available online  , doi: 10.11999/JEIT251076
Abstract:
  Significance   Voiceprint fault diagnosis of transformers has become an active research area for ensuring the safe and reliable operation of power systems. Traditional monitoring methods, such as dissolved gas analysis, infrared temperature measurement, and online partial discharge monitoring, exhibit limited real-time capability and rely heavily on expert experience. These limitations hinder effective detection of early-stage faults. Voiceprint fault diagnosis captures operational voiceprint signals from transformers and enables non-contact monitoring for early anomaly warning. This approach offers advantages in real-time performance, sensitivity, and fault coverage. This review systematically traces the technological evolution from traditional signal analysis to deep learning and compares the advantages, limitations, and application scenarios of different models across multiple dimensions. Key challenges are identified, including limited robustness to noise and imbalanced datasets. Potential research directions are proposed, including integration of physical mechanisms with data-driven methods and improvement of diagnostic transparency and interpretability. These analyses provide theoretical support and practical guidance for promoting the transition of voiceprint fault diagnosis from laboratory research to engineering applications.  Progress   Research on voiceprint fault diagnosis of transformers has progressed from traditional signal analysis to an intelligent recognition paradigm based on deep learning, reflecting a clear technological evolution. A bibliometric analysis of 188 papers from the CNKI and Web of Science databases shows that annual publications remained at 1–10 papers between 1997 and 2020, corresponding to an exploratory stage. Studies during this period focused mainly on fundamental voiceprint signal processing methods, including acoustic wave detection, wavelet transform, and Empirical Mode Decomposition (EMD). After 2020, Variational Modal Decomposition (VMD), Mel spectrum, and Mel Frequency Cepstral Coefficient (MFCC) were gradually applied to voiceprint feature extraction. Since 2021, publication output has increased rapidly and reached a historical peak in 2023. This growth was driven by advances in image and speech processing technologies. Early studies emphasized time-domain and frequency-domain analysis of voiceprint signals. Recent research increasingly converts voiceprint signals into two-dimensional time–frequency spectrogram representations. Model architectures have evolved from single-channel feature inputs with single-model outputs to complex frameworks with multi-channel feature extraction and multi-model fusion. Classical machine learning models, including Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Random Forest (RF), and Back Propagation Neural Network (BPNN), form the foundation of voiceprint fault diagnosis but are limited in handling high-dimensional features. Deep learning models, such as Convolutional Neural Network (CNN), Residual Neural Network (ResNet), Recurrent Neural Network (RNN), and Transformer, demonstrate advantages in automatic feature extraction and complex pattern recognition, although they require substantial computational resources.  Conclusions  This review summarizes the technological development of transformer voiceprint fault diagnosis from machine learning to deep learning. Although deep learning methods achieve high recognition accuracy for complex voiceprint signals, five major challenges remain. These challenges include limited robustness to noise in non-stationary environments, severe data imbalance caused by scarce fault samples, the black-box nature of deep learning models, fragmented evaluation systems resulting from inconsistent data acquisition standards, and insufficient cross-modal fusion of multi-source data. Sensitivity to environmental noise limits diagnostic performance under varying operating conditions. Data imbalance reduces recognition accuracy for rare fault types. Limited interpretability restricts fault mechanism analysis and diagnostic credibility. Inconsistent sensor placement and sampling parameters lead to poor comparability across datasets. Single-modal voiceprint analysis restricts effective utilization of complementary information from other data sources. Addressing these challenges is essential for advancing voiceprint fault diagnosis from laboratory validation to field deployment.  Prospects   Future research should focus on five directions. First, noise-robust voiceprint feature extraction methods based on physical mechanisms should be developed to address non-stationary interference in complex operating environments. Second, the lack of real-world fault data should be alleviated by constructing electromagnetic field–structural mechanics–acoustic coupling models of transformers to generate high-fidelity voiceprint fault samples, while unsupervised clustering methods should be applied to improve annotation efficiency and quality. Third, explainable deep learning architectures for voiceprint fault diagnosis that incorporate physical mechanisms should be designed. Attention mechanisms combined with SHapley Additive exPlanations, Grad-CAM, and physical equations can support process-level and post hoc interpretation of diagnostic results. Fourth, industry-wide collaboration is required to establish standardized voiceprint data acquisition protocols, benchmark datasets, and unified evaluation systems. Fifth, cross-modal fusion models based on multi-channel and multi-feature analysis should be developed to enable integrated transformer fault diagnosis through comprehensive utilization of multi-source information.
Multimodal Pedestrian Trajectory Prediction with Multi-Scale Spatio-Temporal Group Modeling and Diffusion
KONG Xiangyan, GAO YuLong, WANG Gang
Available online  , doi: 10.11999/JEIT250900
Abstract:
  Objective  With the rapid advancement of autonomous driving and social robotics, accurate pedestrian trajectory prediction has become pivotal for ensuring system safety and enhancing interaction efficiency. Existing group-based modeling approaches predominantly focus on local spatial interaction, often overlooking latent grouping characteristics across the temporal dimension. To address these challenges, this research proposes a multi-scale spatiotemporal feature construction method that achieves the decoupling of trajectory shape from absolute spatiotemporal coordinates, enabling the model to accurately capture the latent group associations over different time intervals. Simultaneously, spatiotemporal interaction three-element format encoding mechanism is introduced to deeply extract the dynamic relationships between individuals and groups. By integrating the reverse process length mechanism of diffusion models, the proposed approach incrementally mitigates prediction uncertainty. This research not only offers an intelligent solution for multi-modal trajectory prediction in complex, crowded environments but also provides robust theoretical support for improving the accuracy and robustness of long-range trajectory forecasting.  Methods  The proposed algorithm performs deep modeling of pedestrian trajectories through multi-scale spatiotemporal group modeling. The system is designed across three key dimensions: group construction, interaction modeling, and trajectory generation. First, to address the limitations of traditional methods that focus on local spatiotemporal relationships while overlooking cross-dimensional latent characteristics, A multi-scale trajectory grouping model is designed. Its core innovation lies in extracting trajectory offsets to represent trajectory shapes, successfully decoupling motion features from absolute positions. This enables the model to accurately capture latent group associations among agents following similar paths over different periods. Second, a coding method based on spatiotemporal interaction three-element format is proposed. By defining neural interaction strength, interaction categories, and category functions, this method deeply analyzes the complex associations between agents and groups. This not only captures fine-grained individual interactions but also effectively reveals the global dynamic evolution of collective behavior. Finally, a Diffusion Model is introduced for multimodal prediction. Through the reverse process length mechanism of the diffusion model, the model converges progressively, effectively eliminating uncertainty during the prediction process and transforming a fuzzy prediction space into clear and plausible future trajectories.  Results and Discussions  In this study, the proposed model was evaluated against 11 state-of-the-art baseline algorithms using the NBA dataset (Table 1). Experimental results indicate that this model achieves a significant advantage in the minADE20. Notably, it demonstrates a substantial performance leap over GroupNet+CVAE in long-term prediction tasks, with minADE20 and minFDE20 improvements of 0.18 and 0.36, respectively, at the 4-second prediction horizon. Although the model slightly underperforms compared to MID in long-term trends—likely due to the frequent and intense shifts in group dynamics within NBA scenarios—it exhibits exceptional precision in instantaneous prediction. This provides strong empirical evidence for the effectiveness of multi-scale grouping strategy, based on historical trajectories, in capturing complex dynamic interactions. On the ETH/UCY datasets (Table 2), the MSGD method achieved consistent performance gains across all five sub-scenarios. Particularly in the pedestrian-dense and interaction-heavy UNIV scene, the proposed method surpassed all baseline models by leveraging the advantages of multi-scale modeling. While MSGD is slightly behind PPT in terms of long-distance endpoint constraints, it maintains a lead in minADE20. Furthermore, it outperforms Trajectory++ in velocity smoothness and directional coherence (std dev: 0.7012) (Table 3). These results suggest that while fitting the geometric shape of trajectories, the method generates naturally smooth paths that align more closely with the physical laws of human motion. Ablation studies systematically verified the independent contributions of the diffusion model, spatiotemporal feature extraction, and multi-scale grouping modules to the overall accuracy (Table 4). Grouping sensitivity analysis on the NBA dataset revealed that a full-court grouping strategy (group size of 11) significantly enhances long-term stability, resulting in a further reduction of minFDE20 by 0.026–0.03 at the 4-second (Table 5). Simultaneously, configurations with group sizes of 5 or 2 validate the significance of team formations and “one-on-one” local offensive/defensive dynamics in trajectory prediction (Table 6). Additionally, sensitivity analysis of diffusion steps and training epochs revealed a “complementary” relationship: moderately increasing the number of steps (e.g., 30–40) refines the denoising process and significantly improves accuracy, whereas excessive iterations may lead to overfitting (Table 7). Finally, qualitative visualization intuitively demonstrates that the multimodal trajectories generated by MSGD have a high degree of overlap with ground-truth data (Fig.2).  Conclusions  This study proposes a novel trajectory prediction algorithm that enhances performance primarily in two aspects: (1) It effectively captures pedestrian interactions by extracting spatiotemporal features; (2) It strengthens the modeling of collective behavior by grouping pedestrians across multiple scales. Experimental results demonstrate that the algorithm achieves state-of-the-art (SOTA) performance on both the NBA and ETH/UCY datasets. Furthermore, ablation studies verify the effectiveness of each constituent module. Despite its superior performance and adaptability, the proposed algorithm has two primary limitations: first, the current model does not account for explicit environmental information (such as maps or obstacles); second, the diffusion model involves high computational overhead during inference. Future work will focus on improvements and research in these two directions.
Hybrid PUF Tag Generation Technology for Battery Anti-counterfeiting
HE Zhangqing, LUO Siyu, ZHANG Junming, ZHANG Yin, WAN Meilin
Available online  , doi: 10.11999/JEIT250967
Abstract:
  Objective  With the global transition towards a low-carbon economy, power batteries have become crucial energy storage carriers. The traceability and security of their entire life cycle are foundational to industrial governance. In 2023, the Global Battery Alliance (GBA) introduced the "Battery Passport" system, requiring each battery to have a unique, tamper-proof, and verifiable digital identity. However, traditional digital tag solutions—such as QR codes and RFID—rely on pre-written static storage, making them vulnerable to physical cloning, data extraction, and environmental degradation. To address these issues, this paper proposes a battery anti-counterfeiting tag generation technology based on hybrid Physical Unclonable Function (PUF). The technology leverages a triple physical coupling mechanism among the battery, PCB, and IC to generate a unique battery ID, ensuring strong physical binding and anti-counterfeiting capabilities at the system level.  Methods  The proposed battery anti-counterfeiting tag consists of four core modules: an off-chip RC battery fingerprint extraction circuit, an on-chip Arbiter PUF module, an on-chip delay compensation module, and a reliability enhancement module. The off-chip RC circuit utilizes the physical coupling between the battery negative tab and the PCB's copper-clad area to form a capacitor structure, which introduces inherent manufacturing variations as a entropy source. The on-chip Arbiter PUF converts manufacturing deviations into a unique digital signature. To mitigate systemic biases caused by asymmetrical routing and off-circuit delays, a programmable delay compensation module with coarse and fine-tuning units is integrated. A reliability enhancement module is also embedded to automatically filter out unreliable response bits by monitoring delay deviations, thereby improving the reliability of the generated responses without complex error-correcting codes.  Results and Discussions  The proposed structure was implemented and tested using an FPGA Spartan-6 chip, a custom PCB, and 100Ah blade batteries. Experimental results demonstrate excellent performance: the randomness of the tag reached 48.85%, and the uniqueness averaged 49.15% under normal conditions (Fig. 11). The stability (RA) was as high as 99.98% at room temperature and normal voltage, and remained above 98% even under extreme conditions (100°C, 1.05V) (Fig. 12). To evaluate anti-desoldering capability, three physical tampering scenarios were tested: battery replacement, PCB replacement, and IC replacement. The average response change rates were 14.86%, 24.58%, and 41.66%, respectively (Fig. 13), confirming the strong physical binding among the battery, PCB, and chip. These results validate that the proposed triple physical coupling mechanism effectively resists counterfeiting and tampering.  Conclusions  This paper presents a battery anti-counterfeiting tag generation technology based on a triple physical coupling mechanism. By binding the battery tab, PCB, and chip into a unified physical structure and extracting unique fingerprints from manufacturing variations, the proposed method achieves high randomness, uniqueness, and stability. The tag is highly sensitive to physical tampering, providing a reliable foundation for battery authentication throughout its life cycle. Future work will focus on validating the structure with more advanced chip fabrication processes and different PCB manufacturers, as well as further optimizing the design for broader application.
Resilient Average Consensus for Second-Order Multi-Agent Systems: Algorithms and Application
FANG Chongrong, HUAN Yuehui, ZHENG Wenzhe, BAO Xianchen, LI Zheng
Available online  , doi: 10.11999/JEIT251155
Abstract:
  Objective  Multi-agent systems (MASs) are pivotal for collaborative tasks in dynamic environments, with consensus algorithms serving as a cornerstone for applications like formation control. However, MASs are vulnerable to misbehaviors (e.g., malicious attacks or accidental faults), which can disrupt consensus and compromise system performance. While resilient consensus methods exist for first-order systems, they are inadequate for second-order MASs, where agents’ dynamics involve both position and velocity. This work addresses the gap by developing a resilient average consensus framework for second-order MASs that ensures accurate collaboration under misbehaviors. The primary challenges include distributed error detection and compensating two-dimensional state errors (position and velocity) using one-dimensional acceleration inputs.  Methods  The study first derives sufficient conditions for second-order average consensus under misbehaviors, leveraging graph theory and Lyapunov stability analysis. The system is modeled as an undirected graph \begin{document}$ \mathcal{G}=(\mathcal{V},\mathcal{E}) $\end{document}, where agents follow double-integrator dynamics. Two algorithms are proposed: Finite Input-Errors Detection-Compensation (FIDC): For finite control input errors, detection strategies (1 and 2) use two-hop communication information to identify discrepancies in neighbors’ states or control inputs. Compensation Scheme I designs input sequences to satisfy consensus conditions (Corollary 1). Infinite Attack Detection-Compensation (IADC): For infinite errors in control input, velocity, and position, detection strategies are extended to identify falsified data. Compensation Schemes 2 and 3 mitigate errors, while an exponentially decaying error bound isolates persistent attackers. The algorithms are distributed and require no global knowledge.  Results and Discussions  Simulations on a 10-agent network validate the algorithms’ efficacy. Under FIDC, agents achieve exact average consensus despite finite input errors from malicious and faulty agents (Fig. 5). IADC ensures consensus among normal agents after isolating malicious ones exceeding the error bound (Fig. 6). Experimental evaluations on a multi-robot platform demonstrate resilience against real-world faults (e.g., actuator failures) and attacks (e.g., false data injection). In fault scenarios, FIDC reduces formation center deviation from 180mm to 34mm (Fig. 8). For attacks, IADC isolates malicious robots, allowing normal agents to converge correctly (Fig. 9). Discussions on relaxing Assumption 1 (non-adjacent misbehaving agents) reveal that Detection Strategy 3 and majority voting can handle certain connected malicious topologies (Fig. 3Fig. 4), though complex cases require further study.  Conclusions  This work proposes a novel resilient average consensus framework for second-order MASs. Theoretically, sufficient conditions ensure consensus under misbehaviors, while FIDC and IADC algorithms enable distributed detection, compensation, and isolation of errors. Simulations and physical experiments confirm that the methods achieve accurate average consensus against both finite and infinite errors. Future work will explore extensions to directed networks, time-varying topologies, and higher-dimensional systems.
Multi-UAV RF Signals CNN|Triplet-DNN Heterogeneous Network Feature Extraction and Type Recognition
ZHAO Shen, LI Guangxuan, ZHOU Xiancheng, HUANG Wendi, YANG Lingling, GAO Liping
Available online  , doi: 10.11999/JEIT250757
Abstract:
  Objective  To address the detection requirements for multiple types of unmanned aerial vehicles (UAVs) operating simultaneously, the pivotal strategy involves extracting model-specific information features from the radio frequency (RF) time-frequency spectrum. Consequently, an innovative CNN|Triplet-DNN heterogeneous network architecture has been developed to optimize feature extraction and classification methodologies. This solution effectively resolves the challenge of identifying individual models within the coexisting signals of multiple UAVs, thereby laying the groundwork for efficient management and control of multiple UAVs in complex operational environments.  Methods  The CNN|Triplet-DNN heterogeneous network architecture adopts a parallel-branch structure that integrates convolutional neural network (CNN) and Triplet Convolutional Neural Network (Triplet-CNN) components. Specifically, Branch 1 employs a lightweight CNN architecture to extract global features from RF time-frequency diagrams while minimizing computational complexity. Branch 2 incorporates an enhanced center loss function to improve the discriminative capability of global features, thereby effectively resolving the ambiguity in feature boundaries of time-frequency diagrams under complex scenarios. Branch 3, built on the Triplet-CNN framework, utilizes Triplet Loss to simultaneously capture both local and global features of RF time-frequency diagrams. The complementary features from each branch are subsequently integrated and processed via a DNN fully connected layer combined with the Softmax activation function, generating probability distributions for drone signal classification. This approach significantly enhances the performance of aircraft type recognition and classification.  Results and Discussions  RF signals from the open-source DroneRFa dataset were superimposed to simulate multi-drone coexistence signals, while real-world drone signals were collected through controlled flight experiments to construct a comprehensive drone signal database. (1) based on the single-drone RF time-frequency diagrams from the open-source dataset, ablation experiments(Fig.7) were conducted on the three-branch structure of CNN|Triplet-DNN model to demonstrate the scientificity and rationality of its design, and each model was trained. (2) the simulated multi-drone coexistence signal dataset was employed for identification tasks to evaluate the recognition performance of each model under multi-drone coexistence scenarios. Experimental results(Fig.10) demonstrate that the recognition accuracy for four or fewer drone types ranges from 83% to 100%, thereby validating the efficacy of the CNN|Triplet-DNN model. (3) Each model was trained using the flight dataset and then applied to identify actual multi-drone coexistence signals. The CNN|Triplet-DNN model achieved(Fig.14) recognition accuracies of 86%, 57%, and 73% for two, three, and four drone types, respectively. Comparative analysis with the CNN, Triplet-CNN, and Transformer reveals that the CNN|Triplet-DNN exhibits superior generalizability. Notably, all models experienced performance degradation when tested against real-world data compared to the open-source dataset, primarily due to the dynamic adjustment of drone communication frequency bands, which adversely affects multi-drone recognition performance.  Conclusions  To tackle the challenge of coexistence identification for RF signals emitted by multiple UAVs, a novel heterogeneous network architecture integrating CNN|Triplet-DNN is proposed. This model, leveraging a three-branch structural framework and backpropagation algorithm, demonstrates superior capability in extracting discriminative features of aircraft models. The incorporation of DNN significantly enhances the model's generalization capacity. The efficacy and practical applicability of the proposed approach have been validated through comprehensive experiments utilizing open-source datasets and real-world flight scenarios. Future research directions will focus on dataset expansion, model optimization for dynamic communication frequency band adaptation, and enhancement of recognition performance in complex coexistence environments.
Neighboring Mutual-Coupling Channel Model and Tunable-Impedance Optimization Method for Reconfigurable-Intelligent-Surface Aided Communications
WU Wei, WANG Wennai
Available online  , doi: 10.11999/JEIT251109
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RIS) attract increasing attention due to their ability to controllably manipulate electromagnetic wave propagation. A typical RIS consists of a dense array of Reflecting Elements (REs) with inter-element spacing no greater than half a wavelength, under which electromagnetic mutual coupling inevitably occurs between adjacent REs. This effect becomes more pronounced when the element spacing is smaller than half a wavelength and can significantly affect the performance and efficiency of RIS-assisted systems. Accurate modeling of mutual coupling is therefore essential for RIS optimization. However, existing mutual-coupling-aware channel models usually suffer from high computational complexity because of the large dimensionality of the mutual-impedance matrix, which restricts their practical use. To address this limitation, a simplified mutual-coupling-aware channel model based on a sparse neighboring mutual-coupling matrix is proposed, together with an efficient optimization method for configuring RIS tunable impedances.  Methods  First, a simplified mutual-coupling-aware channel model is established through two main steps. (1) A neighboring mutual-coupling matrix is constructed by exploiting the exponential decay of mutual impedance with inter-element distance. (2) A closed-form approximation of the mutual impedance between the transmitter or receiver and the REs is derived under far-field conditions. By taking advantage of the rapid attenuation of mutual impedance as spacing increases, only eight or three mutual-coupling parameters, together with one self-impedance parameter, are retained. These parameters are arranged into a neighboring mutual-coupling matrix using predefined support matrices. To further reduce computational burden, the distance term in the mutual-impedance expression is approximated by a central value under far-field assumptions, which allows the original integral formulation to be simplified into a compact analytical expression. Based on the resulting channel model, an efficient optimization method for RIS tunable impedances is developed. Through impedance decomposition, a closed-form expression for the optimal tunable-impedance matrix is derived, enabling low-complexity RIS configuration with computational cost independent of the number of REs.  Results and Discussions  The accuracy and computational efficiency of the proposed simplified models, as well as the effectiveness of the proposed impedance optimization method, are validated through numerical simulations. First, the two simplified models are evaluated against a reference model. The first simplified model accounts for mutual coupling among elements separated by at most one intermediate unit, whereas the second model considers only immediately adjacent elements. Results indicate that channel gain increases as element spacing decreases, with faster growth observed at smaller spacings (Fig. 4). The modeling error between the simplified models and the reference model remains below 0.1 when the spacing does not exceed λ/4, but increases noticeably at larger spacings. Error curves further show that the modeling errors of both simplified models become negligible when the spacing is below λ/4, indicating that the second model can be adopted to further reduce complexity (Fig. 6). Second, the computational complexity of the proposed models is compared with that of the reference model. When the number of REs exceeds four, the complexity of computing the mutual-coupling matrix in the reference model exceeds that of the proposed neighboring mutual-coupling model. As the number of REs increases, the complexity of the reference model grows rapidly, whereas that of the proposed model remains constant (Fig. 5). Finally, the proposed impedance optimization method is compared with two benchmark methods (Fig. 7, Fig. 8). When the element spacing is no greater than λ/4, the channel gain achieved by the proposed method approaches that of the benchmark method. As the spacing increases beyond this range, a clear performance gap emerges. In all cases, the proposed method yields higher channel gain than the coherent phase-shift optimization method.  Conclusions  The integration of a large number of densely arranged REs in an RIS introduces notable mutual coupling effects, which can substantially influence system performance and therefore must be considered in channel modeling and impedance optimization. A simplified mutual-coupling-aware channel model based on a neighboring mutual-coupling matrix has been proposed, together with an efficient tunable-impedance optimization method. By combining the neighboring mutual-coupling matrix with a simplified mutual-impedance expression derived under far-field assumptions, a low-complexity channel model is obtained. Based on this model, a closed-form solution for the optimal RIS tunable impedances is derived using impedance decomposition. Simulation results confirm that the proposed channel model and optimization method maintain satisfactory accuracy and effectiveness when the element spacing does not exceed λ/4. The proposed framework provides practical theoretical support and useful design guidance for analyzing and optimizing RIS-assisted systems under mutual coupling effects.
A Review of Joint EEG-fMRI Methods for Visual Evoked Response Studies
WEI Zhiwei, XIAO Xiaolin, XU Minpeng, MING Dong
Available online  , doi: 10.11999/JEIT250781
Abstract:
  Significance   The study of visual evoked responses (VERs) using non-invasive neuroimaging techniques is a cornerstone of neuroscience, providing critical insights into the mechanisms of human visual information processing. Among the available modalities, electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) are paramount. EEG captures neural electrical activity with millisecond-level temporal resolution but is fundamentally limited by its poor spatial localization capabilities. Conversely, fMRI provides millimeter-level spatial precision by measuring the blood-oxygen-level-dependent (BOLD) signal, yet its temporal resolution is inherently constrained by the sluggish nature of hemodynamic responses. This intrinsic trade-off between temporal and spatial resolution significantly hampers the ability of any single modality to fully elucidate complex visual processes such as attentional modulation, motion perception, and multi-sensory integration. To overcome this bottleneck, the joint application of EEG and fMRI has emerged as a powerful multimodal approach. By synchronously acquiring both datasets, this integrated technique synergistically combines the distinct strengths of each modality, offering a comprehensive spatiotemporal perspective on the complex dynamics of visual neural networks. Despite its growing adoption, existing literature often lacks a focused, systematic review that specifically details the core methodologies, illustrates key applications, and outlines the persistent challenges and future trends of joint EEG-fMRI in VER research. This review aims to fill this gap by providing a comprehensive and structured overview of the field, serving as a foundational reference for researchers seeking to leverage this advanced technique to explore the visual system.  Progress   This review first elaborates on the foundational technologies that enable joint EEG-fMRI studies, starting with the synchronous acquisition of data. This is addressed through MR-compatible EEG systems and dedicated synchronization hardware. The core of the review then systematically analyzes data fusion methodologies, which are categorized into asymmetric and symmetric approaches. Asymmetric fusion uses one modality to constrain the analysis of the other, exemplified by EEG-informed fMRI analysis, which uses single-trial EEG features to model fMRI data, and fMRI-informed EEG source imaging, which uses fMRI activation maps as spatial priors to enhance source localization accuracy. In contrast, symmetric fusion treats both modalities equally, with data-driven techniques like joint independent component analysis (joint ICA) being widely adopted to reveal shared underlying neural sources without strong biophysical assumptions. The application of these methodologies has yielded significant breakthroughs across multiple domains. In visual mechanism analysis, the technique has been instrumental in dissecting the complex feedforward and feedback dynamics of cortical areas involved in vision. In clinical diagnosis and evaluation, joint EEG-fMRI provides objective neurophysiological biomarkers for visual disorders like amblyopia and epilepsy by identifying distinct patterns of cortical activation deficits and network dysfunctions. In the field of brain-computer interfaces (BCIs), the fusion of multimodal features has significantly improved the accuracy and robustness of decoding visual intentions.  Conclusions  This review critically examines the joint EEG-fMRI landscape for VER studies, systematically classifying the key data acquisition and fusion methodologies and highlighting their representative applications. The analysis reveals that the choice of an optimal fusion strategy—be it asymmetric or symmetric, data-driven or model-driven—is highly dependent on the specific research question, available data quality, and underlying assumptions. While the technique has proven useful in advancing basic neuroscience, clinical diagnostics, and BCI development, its broader adoption is still hindered by persistent challenges. At the system level, hardware-induced artifacts, particularly the severe electromagnetic interference in ultra-high-field MRI environments, remain a major technical obstacle that compromises data quality. At the algorithmic level, the inherent mismatch in spatiotemporal scales between the fast, transient EEG signals and the slow, delayed BOLD response continues to pose a core fusion challenge. This is further complicated by high inter-subject variability in neural responses, which limits the generalizability of analytical models and decoding algorithms across individuals. These limitations underscore the need for continued innovation in both hardware engineering and computational methods to unlock the full potential of this powerful multimodal technique.  Prospects   Looking ahead, the research landscape for joint EEG-fMRI methods in VER studies is poised for significant evolution, constituting a long-term and complex process. With the integration of emerging technologies such as artificial intelligence, the methodological frameworks in this domain will evolve toward greater intelligence and automation. System-level trends point toward the development of next-generation hardware, including ultra-high-field MRI systems combined with artifact-immune EEG sensors and real-time artifact correction algorithms. Furthermore, the establishment of open-access, multi-center EEG-fMRI databases (following standards like BIDS) and standardized analysis pipelines will be crucial for improving the reproducibility and comparability of research findings, fostering a collaborative ecosystem. Algorithm-level trends are increasingly centered on the integration of artificial intelligence and deep learning. End-to-end neural network architectures, such as those incorporating spatiotemporal attention mechanisms, hold the promise of learning the complex, non-linear transformations between EEG and fMRI data directly, thus overcoming the limitations of traditional linear models. Moreover, leveraging transfer learning and personalized modeling frameworks can address the challenge of inter-subject variability, leading to the development of adaptive and robust models for visual decoding and clinical applications. Concurrently, as clinical and BCI applications accelerate, the critical challenge of balancing model complexity with interpretive clarity and computational efficiency warrants in-depth investigation. Ultimately, these synergistic advancements in hardware and algorithms will deepen our understanding of the visual system’s computational principles, refine the diagnosis and treatment of visual disorders, and propel the development of more intuitive and powerful brain-computer interfaces.
Security Protection for Vessel Positioning in Smart Waterway Systems Based on Extended Kalman Filter–Based Dynamic Encoding
TANG Fengjian, YAN Xia, SUN Zeyi, ZHU Zhaowei, YANG Wen
Available online  , doi: 10.11999/JEIT250846
Abstract:
  Objective  With the rapid development of intelligent shipping systems, vessel positioning data face severe privacy leakage risks during wireless transmission. Traditional privacy-preserving methods, such as differential privacy and homomorphic encryption, suffer from data distortion, high computational overhead, or reliance on costly communication links, making it difficult to achieve both data integrity and efficient protection. This study addresses the characteristics of vessel stabilization systems and proposes a dynamic encoding scheme enhanced by time-varying perturbations. By integrating the Extended Kalman Filter (EKF) and introducing unstable temporal perturbations during encoding, the scheme uses receiver-side acknowledgments (ACK feedback) to achieve reference-time synchronization and independently generates synchronized perturbations through a shared random seed. Theoretical analysis and simulations show that the proposed method achieves nearly zero precision loss in state estimation for legitimate receivers, whereas decoding errors of eavesdroppers grow exponentially after a single packet loss, effectively countering both single- and multi-channel eavesdropping attacks. The shared-seed synchronization mechanism avoids complex key management and reduces communication and computational costs, making the scheme suitable for resource-constrained maritime wireless sensor networks.  Methods  The proposed dynamic encoding scheme introduces a time-varying perturbation term into the encoding process. The perturbation is governed by an unstable matrix to induce exponential error growth for eavesdroppers. The encoded signal is constructed from the difference between the current state estimate and a time-scaled reference state, combined with the perturbation term. A shared random seed between legitimate parties enables deterministic and synchronized generation of the perturbation sequence without online key exchange. At the legitimate receiver, the perturbation is canceled during decoding, enabling accurate state recovery. Local state estimation at each sensor node is performed using EKF, and the overall communication process is reinforced by acknowledgment-based synchronization to maintain consistency between the sender and receiver.  Results and Discussions  Simulations are conducted in a wireless sensor network with four sensors tracking vessel states, including position, velocity, and heading. The results indicate that legitimate receivers achieve nearly zero estimation error (Fig. 3), whereas eavesdroppers exhibit exponentially increasing errors after a single packet loss (Fig. 4). The error growth rate depends on the instability of the perturbation matrix, confirming the theoretical divergence. In multi-channel scenarios, independent perturbation sequences for each channel prevent cross-channel correlation attacks (Fig. 5). The scheme maintains low communication and computational overhead, making it practical for maritime environments. Furthermore, the method shows strong robustness to packet loss and channel variations, satisfying SOLAS requirements for data integrity and reliability.  Conclusions  A dynamic encoding scheme with time-varying perturbations is proposed for privacy-preserving vessel state estimation. By integrating EKF with an unstable perturbation mechanism, the method ensures high estimation precision for legitimate users and exponential error growth for eavesdroppers. The main contributions are as follows: (1) an encoding framework that achieves zero precision loss for legitimate receivers; (2) a lightweight synchronization mechanism based on shared random seeds, which removes complex key management; and (3) theoretical guarantees of exponential error divergence for eavesdroppers under single- or multi-channel attacks. The scheme is robust to packet loss and channel asynchrony, complies with SOLAS data integrity requirements, and is suitable for resource-limited maritime networks. Future work will extend the method to nonlinear vessel dynamics, adaptive perturbation optimization, and validation in real maritime communication environments.
Unsupervised 3D Medical Image Segmentation With Sparse Radiation Measurement
YU Xiaofan, ZOU Lanlan, GU Wenqi, CAI Jun, KANG Bin, DING Kang
Available online  , doi: 10.11999/JEIT250841
Abstract:
  Objective  Three-dimensional medical image segmentation is a central task in medical image analysis. Compared with two-dimensional imaging, it captures organ and lesion morphology more completely and provides detailed structural information, supporting early disease screening, personalized surgical planning, and treatment assessment. With advances in artificial intelligence, three-dimensional segmentation is viewed as a key technique for diagnostic support, precision therapy, and intraoperative navigation. However, methods such as SwinUNETR-v2 and UNETR++ depend on extensive voxel-level annotations, which create high annotation costs and restrict clinical use. High-quality segmentation also often requires multi-view projections to recover full volumetric information, increasing radiation exposure and patient burden. Segmentation under sparse radiation measurements is therefore an important challenge. Neural Attenuation Fields (NAF) have recently been introduced for low-dose reconstruction by recovering linear attenuation coefficient fields from sparse views, yet their suitability for three-dimensional segmentation remains insufficiently examined. To address this limitation, a unified framework termed NA-SAM3D is proposed, integrating NAF-based reconstruction with interactive segmentation to enable unsupervised three-dimensional segmentation under sparse-view conditions, reduce annotation dependence, and improve boundary perception.  Methods  The framework is designed in two stages. In the first stage, sparse-view reconstruction is performed with NAF to generate a continuous three-dimensional attenuation coefficient tensor from sparse X-ray projections. Ray sampling and positional encoding are applied to arbitrary three-dimensional points, and the encoded features are forwarded to a Multi-Layer Perceptron (MLP) to predict linear attenuation coefficients that serve as input for segmentation. In the second stage, interactive segmentation is performed. A three-dimensional image encoder extracts high-dimensional features from the attenuation coefficient tensor, and clinician-provided point prompts specify regions of interest. These prompts are embedded into semantic features by an interactive user module and fused with image features to guide the mask decoder in producing initial masks. Because point prompts provide only local positional cues, boundary ambiguity and mask expansion may occur. To address these issues, a Density-Guided Module (DGM) is introduced at the decoder output stage. NAF-derived attenuation coefficients are transformed into a density-aware attention map, which is fused with the initial masks to strengthen tissue-boundary perception and improve segmentation accuracy in complex anatomical regions.  Results and Discussions  NA-SAM3D is evaluated on a self-constructed colorectal cancer dataset comprising 299 patient cases (collected in collaboration with Nanjing Hospital of Traditional Chinese Medicine) and on two public benchmarks: the Lung CT Segmentation Challenge (LCTSC) and the Liver Tumor Segmentation Challenge (LiTS). The results show that NA-SAM3D achieves overall better performance than mainstream unsupervised three-dimensional segmentation methods based on full radiation observation (SAM-MED series) and reaches accuracy comparable to, or in some cases higher than, the fully supervised SwinUNETR-v2. Compared with SAM-MED3D, NA-SAM3D increases the Dice on the LCTSC dataset by more than 3%, while HD95 and ASD decrease by 5.29 mm and 1.32 mm, respectively, indicating improved boundary localization and surface consistency. Compared with the sparse-field-based method SA3D, NA-SAM3D achieves higher Dice scores on all three datasets (Table 1). Compared with the fully supervised SwinUNETR-v2, NA-SAM3D reduces HD95 by 1.28 mm, and the average Dice is only 0.3% lower. Compared with SA3D, NA-SAM3D increases the average Dice by about 6.6% and reduces HD95 by about 11 mm, further confirming its capacity to restore structural details and boundary information under sparse-view conditions (Table 2). Although the overall performance remains slightly lower than that of the fully supervised UNETR++ model, NA-SAM3D still shows strong competitiveness and good generalization under label-free inference. Qualitative analysis shows that in complex pelvic and intestinal regions, NA-SAM3D produces clearer boundaries and higher contour consistency (Fig. 3). On public datasets, segmentation of the lung and liver also shows superior boundary localization and contour integrity (Fig. 4). Three-dimensional visualization further confirms that in colorectal, lung, and liver regions, NA-SAM3D achieves stronger structural continuity and boundary preservation than SAM-MED2D and SAM-MED3D (Fig. 5). The DGM further enhances boundary sensitivity, increasing Dice and mIoU by 1.20% and 3.31% on the self-constructed dataset, and by 4.49 and 2.39 percentage points on the LiTS dataset (Fig. 6).  Conclusions  An unsupervised three-dimensional medical image segmentation framework, NA-SAM3D, is presented, integrating NAF-based reconstruction with interactive segmentation to achieve high-precision segmentation under sparse radiation measurements. The DGM effectively uses attenuation coefficient priors to enhance boundary recognition in complex lesion regions. Experimental results show that the framework approaches the performance of fully supervised methods under unsupervised inference and yields an average Dice improvement of 2.0%, indicating strong practical value and clinical potential for low-dose imaging and complex anatomical segmentation. Future work will refine the model for additional anatomical regions and assess its practical use in preoperative planning.
Design of Dynamic Resource Awareness and Task Offloading Schemes in Multi-Access Edge Computing Networks
ZHANG Bingxue, LI Xisheng, YOU Jia
Available online  , doi: 10.11999/JEIT250640
Abstract:
  Objective  With the development of industrial Internet of Things and the widespread use of multi-mode terminal equipment, multi-access edge computing has become a key technology to support low delay and energy-efficient industrial applications. The task offloading mechanism of edge computing is the core method to solve the large number and complex task processing requirements of multi-mode terminals. In the multi-access edge computing system, the network selection of end users has a great impact on the offloading mechanism and resource allocation. However, the existing network selection mechanism focuses on the user's selection decision, and ignores the impact of user’s task execution, task data offloading transmission and processing on network performance. For the research on the formulation of task offloading mechanism, the existing research focuses on the offloading delay, energy consumption optimization and resource allocation, ignoring the impact of multi-access heterogeneous network collaborative computing on resource costs and the dynamic resource balance between heterogeneous networks. In order to meet these challenges, this paper considers the impact of users’ diverse needs and heterogeneous resource providers’ differentiated capabilities on the decision-making of offloading in a complex computing environment, and makes the decision-making of user task execution cost optimization and rational allocation of dynamic resources in multi-access heterogeneous networks, so as to reduce the system operation cost, improve the quality of service, and efficiently and cooperatively utilize heterogeneous resources.  Methods  According to the multi-access edge computing network model, this paper establishes the cost calculation model for the task execution time, energy consumption and communication resource consumption of different networks for the end-user task selection. Based on the auction theory, it establishes the cost-effective model of computing task evaluation and bidding for the interaction between users and edge servers, and establishes the objective optimization problem according to the combinatorial two-way auction theory. Then, a dynamic resource sensing and task offloading algorithm based on auction mechanism is proposed. Through the two-way broadcast of the task information to be accessed and the required resources, network selection judgment and dynamic resource allocation are carried out. Only when the available resources meet the user resource constraints can the server offer effective bidding. An effective bidding edge server is proposed to compete for the opportunity of user task execution until the user obtains an optimal bidding and corresponding server to complete the auction matching process of the user task.  Results and Discussions  The dynamic resource allocation and task offloading algorithm based on auction mechanism considers the heterogeneous network status and resource usage, and selects the task offloading location according to the resource allocation. By setting the simulation system parameters, the edge computing model of heterogeneous wireless network cooperation is constructed, and the impact of network size on task offload cost and task offload data volume is analyzed. The simulation results show that the dynamic resource allocation and task offloading algorithm based on auction mechanism can reduce the system cost by at least 5% compared with other benchmark algorithms (Fig. 3), which is more obvious when there are more end users. Changes in the number of servers in heterogeneous networks have a certain impact on users' selection of a network for task offloading (Fig. 4, 5, 6). Under different algorithms, the proposed algorithm has a 10% improvement in the amount of task offload data compared with the benchmark algorithm (Fig. 7. 8). Finally, the impact of the change of the communication resource cost parameter on the user’s choice of 5G public network for task offloading is studied. The larger the communication cost parameter, the amount of data processed by the end user’s choice of 5G public network offloading task is significantly reduced (Fig. 9).  Conclusions  Aiming at the complex data processing requirements of multi-mode terminals, this paper constructs a multi-access edge computing cooperation network architecture for multi-mode terminals. The flexible and intelligent selection of wireless communication network by multi-mode terminals provides more resources for end-user task offloading. A server bidding and user target bidding model is established based on the auction model, and a dynamic resource perception and task unloading algorithm based on the auction mechanism is proposed to offload multi-mode terminal tasks, network selection and resource allocation. The algorithm first dynamically adjusts and selects the offloading network and allocates computing and communication resources according to the access tasks, and then selects the task offloading location with the minimum execution cost according to the bidding competition of each edge server. The results show that the proposed algorithm can effectively reduce the system cost compared with the benchmark algorithm, and improve the amount of data offloading from end-user tasks to multi edge servers, make full use of edge computing resources, and improve the system energy efficiency and operation efficiency.
Research on Low Leakage Current Voltage Sampling Method for Multi-cell Series Battery Packs
GUO Zhongjie, GAO Yuyang, DONG Jianfeng, BAI Ruokai
Available online  , doi: 10.11999/JEIT250733
Abstract:
  Objective  The battery voltage sampling circuit is one of the key components of the Battery Management Integrated Circuit (BMIC). It is responsible for real-time monitoring of the battery’s voltage status, and its working performance directly determines the safety status of the series battery pack. The traditional resistive voltage sampling circuit has the problem of channel leakage current, which will affect the consistency of battery voltage and sampling accuracy. Meanwhile, the level-shifting circuit in the high-voltage domain includes high-voltage operational amplifiers, and a large number of high-voltage MOSFETs result in additional area overhead.  Methods  This paper proposes a low leakage current battery voltage sampling circuit applied to 14-series lithium batteries. Improved on the basis of the traditional resistive voltage sampling circuit, the channel leakage current can be reduced to the pA level by designing an operational amplifier isolated active drive technology. According to the different voltage domains of the series battery pack, different voltage conversion methods are adopted. The first section of the battery is isolated using a unity-gain buffer, and then voltage conversion is performed through a resistive voltage division method. Batteries from Section 2 to 13 adopt operational amplifier isolated active driving to synchronously follow the voltage across the batteries, and then convert the followed voltage into a ground-referenced voltage through a level-shifting circuit. The voltage sampling process of the highest-section battery consumes the power of the entire series battery pack and will not affect the consistency of the series battery pack. Therefore, the highest-section battery directly uses the level-shifting circuit for voltage conversion.  Results and Discussions  This paper conducts a detailed design and complete performance verification of the circuit based on the 0.35 μm high-voltage BCD process. The overall layout area of the designed battery voltage sampling circuit is 3105 μm × 638 μm (Fig. 10). From the verification results, it can be concluded that under the different processes and temperatures, after adopting the operational amplifier isolated active drive technology designed in this paper, the maximum channel leakage current is only 48.9 pA. However, the minimum channel leakage current of the traditional voltage sampling circuit is 1.169×106 pA (Fig. 12, Fig. 13). Reduce the impact of the sampling process on battery inconsistency from 18.56% to 2.122 ppm (Fig. 14). In addition, under comprehensive PVT verification conditions, the maximum measurement error of the battery voltage sampling circuit designed in this paper is 0.9 mV (Fig. 15, Fig. 16, Fig. 17).  Conclusions  This paper proposes an operational amplifier isolated active drive technology to mitigate the issue in traditional resistive voltage sampling circuits where channel leakage current affects battery voltage consistency and sampling accuracy. Through the battery voltage sampling circuit designed in this paper, the maximum channel leakage current is 48.9 pA, the inconsistency of battery voltage is 2.122 ppm, and the maximum measurement error is 1.25 mV. It can achieve extremely low channel leakage current while ensuring sampling accuracy. The low-leakage-current battery voltage sampling circuit proposed can be applied to the 14-series lithium battery management chip.
Federated Semi-Supervised Image Segmentation with Dynamic Client Selection
LIU Zhenbing, LI Huanlan, WANG Baoyuan, LU Haoxiang, PAN Xipeng
Available online  , doi: 10.11999/JEIT250834
Abstract:
  Objective  Multicenter validation is an inevitable trend in clinical research, yet strict privacy regulations, heterogeneous cross-institutional data distributions and scarce pixel-level annotations limit the applicability of conventional centralized medical image segmentation models. This study aims to develop a federated semi-supervised framework that jointly exploits labelled and unlabeled prostate MRI data from multiple hospitals, explicitly considering dynamic client participation and non-independent and identically distributed (Non-IID) data, so as to improve segmentation accuracy and robustness under real-world constraints.  Methods  A cross-silo federated semi-supervised learning paradigm is adopted, in which clients with pixel-wise annotations act as labeled clients and those without annotations act as unlabeled clients. Each client maintains a local student network for prostate segmentation. On unlabeled clients, a teacher network with the same architecture is updated by the exponential moving average of student parameters and generates perturbed pseudo-labels to supervise the student through a hybrid consistency loss that combines Dice and binary cross-entropy terms. To mitigate the negative influence of heterogeneous and low-quality updates, a performance-driven dynamic client selection and aggregation strategy is introduced. At each communication round, clients are evaluated on their local validation sets, and only those whose Dice scores exceed a threshold are retained; then a Top-K subset is aggregated with normalized contribution weights derived from validation Dice, with bounds to avoid gradient vanishing and single-client dominance. For unlabeled clients, a penalty factor is applied to down-weight unreliable pseudo-labeled updates. As the segmentation backbone, a Multi-scale Feature Fusion U-Net (MFF-UNet) is constructed. Starting from a standard encoder–decoder U-Net, an FPN-like pyramid is inserted into the encoder, where multi-level feature maps are channel-aligned by 1×1 convolutions, fused in a top–down pathway by upsampling and element-wise addition, and refined using 3×3 convolutions. The decoder progressively upsamples these fused features and combines them with encoder features via skip connections, enabling joint modelling of global semantics and fine-grained boundaries. The framework is evaluated on T2-weighted prostate MRI from six centers: three labeled clients and three unlabeled clients. All 3D volumes are resampled, sliced into 2D axial images, resized and augmented. Dice coefficient and 95th percentile Hausdorff distance (HD95) are used as evaluation metrics.  Results and Discussions  On the six-center dataset, the proposed method achieves average Dice scores of 0.8405 on labeled clients and 0.7868 on unlabeled clients, with corresponding HD95 values of 8.04 and 8.67 pixels, respectively. These results are consistently superior to or on par with several representative federated semi-supervised or mixed-supervision methods, and the improvements are most pronounced on distribution-shifted unlabeled centers. Qualitative visualization shows that the proposed method produces more complete and smoother prostate contours with fewer false positives in challenging low-contrast or small-volume cases, compared with the baselines. Attention heatmaps extracted from the final decoder layer demonstrate that U-Net suffers from attention drift, SegMamba displays diffuse responses and nnU-Net exhibits weak activations for small lesions, whereas MFF-UNet focuses more precisely on the prostate region with stable high responses, indicating enhanced discriminative capability and interpretability.  Conclusions  A federated semi-supervised prostate MRI segmentation framework that integrates teacher–student consistency learning, multi-scale feature fusion and performance-driven dynamic client selection is presented. The method preserves patient privacy by keeping data local, alleviates annotation scarcity by exploiting unlabeled clients and explicitly addresses client heterogeneity through reliability-aware aggregation. Experiments on a six-center dataset demonstrate that the proposed framework achieves competitive or superior overlap and boundary accuracy compared with state-of-the-art federated semi-supervised methods, particularly on distribution-shifted unlabeled centers. The framework is model-agnostic and can be extended to other organs, imaging modalities and cross-institutional segmentation tasks under stringent privacy and regulatory constraints.
Progress in Modeling Cardiac Myocyte Calcium Cycling and Investigating Arrhythmia Mechanisms: A Study Focused on the Ryanodine Receptor
GAO Ying, ZHANG Yucheng, WANG Wenyao, SU Xuanyi, SONG Zhen
Available online  , doi: 10.11999/JEIT250957
Abstract:
  Significance   Ryanodine receptor (RyR) is an essential regulator of cardiac intracellular calcium homeostasis by controlling the release of Ca2+ from the sarcoplasmic reticulum (SR). Its functional abnormalities, such as overactivation or impaired activity, are critical mechanisms underlying early and delayed afterdepolarizations, significantly increasing the risk of arrhythmias. The dynamic coupling between electrical activity and calcium cycling in cardiomyocytes involves highly dynamic and spatially organized processes that are challenging to fully capture experimentally. Conventional experimental techniques, such as animal models and pharmacological studies, are limited by high costs and difficulties in controlling variables. As a result, developing mathematical models and computer simulations of the RyR has become a crucial approach for investigating RyR function regulation under physiological and pathological conditions, as well as its arrhythmogenic mechanisms. This review provides a systematic overview of RyR biology and modeling. It begins by synthesizing RyR structural features and fundamental functional properties to establish a mechanistic basis for gating and regulation. Next, it evaluates contemporary and emerging modeling techniques, outlining the merits and limitations of various computational approaches. The review then summarizes the integration of RyR models into cardiac Ca2+ cycling frameworks and their applications across cardiomyocyte subtypes. Furthermore, the review covers arrhythmogenic mechanisms arising from RyR dysfunction and examines targeted drug therapies designed to normalize channel activity. Finally, it highlights artificial intelligence and cardiac digital twins as emerging paradigms for advancing RyR modeling and therapeutic applications.  Progress   The accumulation of RyR structural data has driven continuous innovation in modeling strategies. Early models often used phenomenological strategies that were practical but mechanistically limited. Markov models now represent the dominant computational framework for simulating RyR gating behavior, enabling detailed replication of calcium sparks and other key events through discrete state transitions. A key advantage of deterministic integration over other numerical methods for solving Markov models is its superior computational efficiency and remarkable flexibility in adapting to diverse cardiomyocyte types. However, it ignores the stochastic nature of RyR opening and fails to reproduce stochastic fluctuations in intracellular calcium concentration, potentially leading to discrepancies between simulations and physiological reality. In contrast, stochastic Markov models can capture these random behaviors, which are critical for investigating arrhythmogenic phenomena like calcium waves. However, they necessitate substantial experimental data and considerable computational resources, consequently hindering their broader-scale application. The development of artificial intelligence methods, including the use of deep neural networks to compress Markov models into single equations, has substantially improved computational efficiency. Meanwhile, structural biology advances have clarified the conformational dynamics of RyRs and subunit cooperativity in gating, especially in diastolic calcium leak, prompting more detailed models like those incorporating subunit interactions or molecular dynamics. Additionally, various RyR models have been successfully integrated into cardiac action potential frameworks, serving as powerful tools for investigating arrhythmogenic mechanisms like delayed afterdepolarizations (DADs) and early afterdepolarizations (EADs). These models not only enhance the understanding of electrical disturbances caused by RyR dysfunction but also provide a valuable platform for antiarrhythmic drug screening and mechanistic research.  Conclusion  Several RyR models have been developed that accurately simulate essential physiological processes such as calcium sparks, enabling broad application in cardiomyocyte calcium dynamics studies. However, current modeling efforts face considerable challenges:(1) Lack of a unified modeling framework. There is still no unified RyR model capable of accurately simulating calcium dynamics across the wide spectrum of physiological and pathological conditions. To select appropriate model for intracellular calcium handling, careful evaluation of the specific effects of different models is necessary. (2) Computational burden restricts multiscale integration. While multiscale models are essential to bridge arrhythmic mechanisms from cellular calcium dynamics to tissue-level propagation by incorporating heterogeneity, their high computational cost presents a formidable barrier to scaling for clinically relevant applications. (3) Underdeveloped pacemaker cell models. Existing research focuses largely on ventricular and atrial myocytes, while pacemaker cell models are relatively underdeveloped and often employ “common pool” approximations that fail to capture spatial calcium gradients. Future research should therefore prioritize the development of detailed pacemaker cell models that represent calcium release unit (CRU) networks and incorporate realistic RyR dynamics. While still in early stages of development for RyR modeling, emerging approaches like artificial intelligence and cardiac digital twins thus offer substantial potential to advance both mechanistic understanding and applications in precision medicine.  Prospects   The future of RyR research will increasingly rely on combining multidisciplinary advances across structural biology, biophysics, and computational science. Integrative efforts are essential to bridge molecular-scale conformational changes of RyR to organ-level cardiac function, which will enable the creation of scalable and clinically actionable models that not only deepen mechanistic insight but also accelerate translational innovation in precision cardiology. Emerging tools like AI and cardiac digital twins offer a pathway toward clinically relevant, multi-scale cardiac models that incorporate patient-specific electrophysiology and calcium handling. Such models could profoundly improve our understanding of arrhythmia mechanisms and heart failure pathophysiology, while also serving as predictive platforms for mechanism-based personalized antiarrhythmic therapy development.
A Neural Network-Based Robust Direction Finding Algorithm for Mixed Circular and Non-Circular Signals Under Array Imperfections
YU Qi, YIN Jiexin, LIU Zhengwu, WANG Ding
Available online  , doi: 10.11999/JEIT250884
Abstract:
  Objective   Direction Of Arrival (DOA) estimation is affected by low Signal-to-Noise Ratios (SNR), the coexistence of Circular Signals (CSs) and Non-Circular Signals (NCSs), and multiple forms of array imperfections. Conventional subspace-based estimators exhibit model mismatch in such environments and show reduced accuracy. Although neural-network methods provide data-driven alternatives, the effective use of the distinctive statistical properties of NCSs and the maintenance of robustness against diverse array errors remain insufficiently addressed. The objective is to design a DOA estimation algorithm that operates reliably for mixed CSs and NCSs in the presence of array imperfections and provides improved estimation accuracy in challenging operating conditions.  Methods   A robust DOA estimation algorithm is proposed based on an improved Vision Transformer (ViT) model. A six-channel image-like input is first constructed by fusing features derived from the covariance matrix and pseudo-covariance matrix of the received signal. These channels include the real component, imaginary component, magnitude, phase, magnitude ratio reflecting the NCS characteristic, and the phase of the pseudo-covariance matrix. A gradient-masking mechanism is introduced to adaptively fuse core and auxiliary features. The ViT architecture is then modified: the standard patch-embedding module is replaced with a convolutional layer to extract local information, and a dual-class-token attention mechanism, placed at the sequence head and tail, is designed to enhance feature representation. A standard Transformer encoder is used for deep feature learning, and DOA estimation is performed through a multi-label classification head.  Results and Discussions   Extensive simulations are carried out to assess the proposed algorithm (6C-ViT) against MUSIC, NC-MUSIC, a Convolutional Neural Network (6C-CNN), a Residual Network (6C-ResNet), and a MultiLayer Perceptron (6C-MLP). Performance is evaluated using Root Mean Square Error (RMSE) and angular estimation error under different operating conditions. Under single-source scenarios with low SNR and no array errors, 6C-ViT achieves near-zero RMSE across most angles and shows minor edge deviations (Fig. 2). It maintains the lowest RMSE across the SNR range from –20 dB to 15 dB (Fig. 3), indicating good generalization to unseen SNR levels. In dual-source scenarios containing mixed CS and NCSs under array errors, 6C-ViT shows clear advantages. Its estimation errors fluctuate slightly around zero, whereas competing techniques present larger errors and pronounced instabilities, especially near array edges (Fig. 4). Its RMSE decreases steadily as SNR increases and reaches below 0.1° at high SNR, while traditional approaches saturate around 0.4° (Fig. 5). Robust behavior is further observed across different numbers of signal sources (K = 1, 2, 3) and snapshot counts (100 to 2 000). 6C-ViT preserves high accuracy and stability under these variations, whereas other methods show marked degradation or instability, most evident at low snapshot counts or with multiple sources (Fig. 6). When evaluated using unknown modulation types, including UQPSK with a non-circularity rate of 0.6 and 64QAM, under array errors, 6C-ViT continues to produce the lowest RMSE across most angles (Fig. 7), demonstrating strong generalization capability. Ablation studies (Fig. 8) confirm the contributions of the six-channel input, the gradient masking module, the convolutional embedding, and the dual class token mechanism. The complete configuration yields the highest accuracy and the most stable performance.  Conclusions   Strong robustness is demonstrated in complex scenarios that contain mixed CS and NCSs, multiple array imperfections, low SNR, and closely spaced sources. By fusing multi-dimensional features of the received signal and using an enhanced Transformer architecture, the algorithm attains higher estimation accuracy and improved generalization across different signal types, error conditions, snapshot counts, and noise levels compared with subspace- and neural-network-based baselines. The method provides a reliable DOA estimation solution for demanding practical environments.
Dynamic State Estimation of Distribution Network by Integrating High-degree Cubature Kalman Filter and Long Short-Term Memory Under False Data Injection Attack
XU Daxing, SU Lei, HAN Heqiao, WANG Hailun, ZHANG Heng, CHEN Bo
Available online  , doi: 10.11999/JEIT250805
Abstract:
  Objective  Dynamic state estimation of distribution networks is presented as a core technique for maintaining secure and stable operation in cyber-physical power systems. Its practical performance is limited by strong system nonlinearity, high-dimensional state characteristics, and the threat posed by False Data Injection Attack (FDIA). A method that integrates High-degree Cubature Kalman Filter (HCKF) with Long Short-Term Memory network (LSTM) is proposed. HCKF is applied to enhance estimation precision in nonlinear high-dimensional scenarios. The estimation outputs from HCKF and Weighted Least Squares (WLS) are combined for rapid FDIA identification using residual-based analysis. The LSTM model is then employed to reconstruct measurement data of compromised nodes and refine state estimation results. The approach is validated on the IEEE 33-bus distribution system, demonstrating reliable accuracy enhancement and effective attack resilience.  Methods   The strong nonlinearity of distribution networks limits the estimation accuracy of dynamic methods based on the Cubature Kalman Filter (CKF). A hybrid measurement state estimation model that combines data from Phasor Measurement Unit (PMU) and Supervisory Control And Data Acquisition (SCADA) is established. HCKF is applied to enhance estimation performance in nonlinear, high-dimensional scenarios by generating higher-order cubature points. Under FDIA, the estimation outputs from WLS and HCKF are jointly assessed, allowing rapid intrusion detection through residual evaluation and state consistency checking. Once an attack is identified, an LSTM model performs time-series prediction to reconstruct the measurement data of compromised nodes. The reconstructed data replace abnormal values, enabling correction of the final state estimation.  Results and Discussions  Experiments on the IEEE 33-bus distribution system show that without FDIA, HCKF achieves higher estimation accuracy for voltage magnitude and phase angle than CKF. The Average voltage Relative Error (ARE) of voltage magnitude decreases by 57.9%, and the corresponding phase-angle error decreases by 28.9%, confirming the superiority of the method for strongly nonlinear and high-dimensional state estimation. Under FDIA, residual-based detection effectively identifies cyber attacks and avoids false alarms and missed detections. The prediction error of LSTM for the measurement data of compromised nodes and their associated branches remains on the order of 10–6, indicating high reconstruction fidelity. The combined HCKF and LSTM maintains stable state tracking after intrusion, and its performance exceeds that of WLS and adaptive Unscented Kalman Filter.  Conclusions  The dynamic state estimation method that integrates HCKF and LSTM enhances adaptability to strong nonlinearity and high-dimensional characteristics of distribution networks. Rapid and accurate FDIA identification is achieved through residual evaluation, and LSTM reconstructs the measurement data of compromised nodes with high reliability. The method maintains high estimation accuracy under normal operation and preserves stability and precision under cyber intrusion. It offers technical support for secure and stable operation of distribution networks in the presence of malicious attacks.
Design of a CNN Accelerator Based on Systolic Array Collaboration with Inter-Layer Fusion
LU Di, WANG Zhen Fa
Available online  , doi: 10.11999/JEIT250867
Abstract:
  Objective  With the rapid deployment of deep learning in edge computing, the demand for efficient Convolutional Neural Network (CNN) accelerators has become increasingly urgent. Although traditional CPUs and GPUs provide strong computational power, they suffer from high power consumption, large latency, and limited scalability in real-time embedded scenarios. FPGA-based accelerators, owing to their reconfigurability and parallelism, present a promising alternative. However, existing implementations often face challenges such as low resource utilization, memory access bottlenecks, and difficulties in balancing throughput with energy efficiency. To address these issues, this paper proposes a systolic array–based CNN accelerator with layer-fusion optimization, combined with an enhanced memory hierarchy and computation scheduling strategy. By designing hardware-oriented convolution mapping methods and employing lightweight quantization schemes, the proposed accelerator achieves improved computational efficiency and reduced resource consumption while meeting real-time inference requirements, making it suitable for complex application scenarios such as intelligent surveillance and autonomous driving.  Methods  This paper addresses the critical challenges commonly observed in FPGA-based Convolutional Neural Network (CNN) accelerators, including data transfer bottlenecks, insufficient resource utilization, and low processing unit efficiency. We propose a hybrid CNN accelerator architecture based on systolic array–assisted layer fusion, in which computation-intensive adjacent layers are deeply bound and executed consecutively within the same systolic array. This design reduces frequent off-chip memory access of intermediate results, decreases data transfer overhead and power consumption, and improves both computation speed and overall energy efficiency. A dynamically reconfigurable systolic array method is further developed to provide hardware-level adaptability for multi-dimensional matrix multiplications, thereby avoiding the resource waste of deploying dedicated hardware for different computation scales, reducing overall FPGA logic resource consumption, and enhancing adaptability and flexibility of hardware resources. In addition, a streaming systolic array computation scheme is introduced through carefully orchestrated computation flow and control logic, ensuring that processing elements within the systolic array remain in a high-efficiency working state. Data continuously flows through the computation engine in a highly pipelined and parallelized manner, improving the utilization of internal processing units, reducing idle cycles, and ultimately enhancing overall throughput.  Results and Discussions  To explore the optimal quantization precision of neural network models, experiments were conducted on the MNIST dataset using two representative architectures, VGG16 and ResNet50, under fixed-point quantization with 12-bit, 10-bit, 8-bit, and 6-bit precision. The results, as shown in Table 1, indicate that when the quantization bit width falls below 8 bits, model inference accuracy drops significantly, suggesting that excessively low precision severely compromises the representational capacity of the model. On the proposed accelerator architecture, VGG16, ResNet50, and YOLOv8n achieved peak computational performances of 390.25 GOPS, 360.27 GOPS, and 348.08 GOPS, respectively. To comprehensively evaluate the performance advantages of the proposed accelerator, comparisons were made with FPGA accelerator designs reported in existing literature, as summarized in Table 4. Table 5 further presents a comparison of the proposed accelerator with conventional CPU and GPU platforms in terms of performance and energy efficiency. During the acceleration of VGG16, ResNet50, and YOLOv8n, the proposed accelerator achieved computational throughput that was 1.76×, 3.99×, and 2.61× higher than that of the corresponding CPU platforms, demonstrating significant performance improvements unattainable by general-purpose processors. Moreover, in terms of energy efficiency, the proposed accelerator achieved improvements of 3.1× (VGG16), 2.64× (ResNet50), and 2.96× (YOLOv8n) compared with GPU platforms, highlighting its superior energy utilization efficiency.  Conclusions  This paper proposes a systolic array–assisted layer-fusion CNN accelerator architecture. First, a theoretical analysis of the accelerator’s computational density is conducted, demonstrating the performance advantages of the proposed design. Second, to address the design challenge arising from the variability in local convolution window sizes of the second layer, a novel dynamically reconfigurable systolic array method is introduced. Furthermore, to enhance the overall computational efficiency, a streaming systolic array scheme is developed, in which data continuously flows through the computation engine in a highly pipelined and parallelized manner. This design reduces idle cycles within the systolic array and improves the overall throughput of the accelerator. Experimental results show that the proposed accelerator achieves high throughput with minimal loss in inference accuracy. Specifically, peak performance levels of 390.25 GOPS, 360.27 GOPS, and 348.08 GOPS were attained for VGG16, ResNet50, and YOLOv8n, respectively. Compared with traditional CPU and GPU platforms, the proposed design exhibits superior energy efficiency, demonstrating that the accelerator architecture is particularly well-suited for resource-constrained and energy-sensitive application scenarios such as edge computing.
A Multi-scale Spatiotemporal Correlation Attention and State Space Modeling-based Approach for Precipitation Nowcasting
ZHENG Hui, CHEN Fu, HE Shuping, QIU Xuexing, ZHU Hongfang, WANG Shaohua
Available online  , doi: 10.11999/JEIT250786
Abstract:
  Objective  Precipitation nowcasting, as one of the most representative tasks in the field of meteorological forecasting, uses radar echoes or precipitation sequences to predict precipitation distribution in the next 0-2 hours. It provides scientific and technological support for disaster warning and key decision-making, and maximizes the protection of people's lives and property. Current mainstream methods generally have problems such as loss of local details, inadequate representation of conditional information, and insufficient adaptability to complex areas. Therefore, this paper proposes a PredUMamba model based on the diffusion model. In this model, on the one hand, a Mamba block based on an adaptive zigzag scanning mechanism is introduced, which not only fully mines the key local detail information but also effectively reduces the computational complexity. On the other hand, a multi-scale spatio-temporal correlation attention model is designed to enhance the interaction ability of spatio-temporal hierarchical features while achieving a comprehensive representation of conditional information. More importantly, a radar echo dataset tailored for precipitation nowcasting in complex regions was constructed, specifically a radar dataset from the southern Anhui mountainous area, to validate the model's ability to accurately predict sudden, extreme rainfall events in complex areas. This research provides a new intelligent solution and theoretical support for precipitation nowcasting.  Methods  The PredUMamba model proposed in this paper adopts a two-stage diffusion model network. In the first stage, a frame-by-frame Variational Auto Encoder (VAE) is trained to map precipitation data in pixel space to a low-dimensional latent space. In the second stage, a diffusion network is constructed on the latent space after VAE encoding. In the diffusion network, this paper proposes an adaptive zigzag Mamba module, which adopts a spatio-temporal alternating adaptive zigzag scanning strategy, in which sequential scanning is performed within the rows of the data block and turn-back scanning is performed between rows, effectively capturing the detailed features of the precipitation field while maintaining low computational complexity. In addition, this paper designs a multi-scale spatio-temporal correlation attention module on both temporal and spatial scales. On the temporal scale, adaptive convolution kernels and convolution layers containing attention mechanisms are used to capture local and global information. On the spatial scale, a lightweight correlation attention is designed to aggregate spatial information, thus enhancing the ability to mine historical conditional information. Finally, this paper constructs a radar dataset for the southern Anhui mountainous area for the precipitation nowcasting task in complex terrain areas, which helps to verify the adaptability of the PredUMamba model and other models in the field to complex terrain areas.  Results and Discussions  In the PredUMamba model, by designing the adaptive zigzag Mamba module and the multi-scale spatio-temporal correlation attention module, the mining capability of the intrinsic spatio-temporal jointness of the data is enhanced, which can more accurately capture the characteristics of the conditional information and make prediction results that are more in line with the actual situation. Experimental results show that the PredUMamba model achieves the best performance in all indicators on the Southern Anhui Mountain Area and Shanghai radar datasets. On the SEVIR dataset, FVD, CSI_pool4, and CSI_pool16 are all superior to other methods, the CSI and CRPS also achieve very competitive results. In addition, further visualization prediction results show that PredUMamba's prediction results do not blur over time (Fig. 4), which indicates that the model has higher stability, and also has significant advantages in detail generation and overall motion trend capture, which indicates that the model can better generate edge details aligned with real precipitation conditions while maintaining accurate motion pattern predictions.  Conclusions  This paper proposes an innovative PredUMamba model based on a diffusion network architecture. The model significantly improves the model performance by introducing the Mamba module with adaptive zigzag scanning mechanism and the multi-scale spatio-temporal correlation attention module. The adaptive zigzag scanning Mamba module effectively captures the fine-grained spatio-temporal characteristics of precipitation data through a scanning strategy that alternates time and space, while reducing computational complexity. The multi-scale spatio-temporal correlation attention module enhances the ability to mine historical conditional information through a dual-branch network in the time dimension and a lightweight correlation attention mechanism in the spatial dimension, realizing the joint representation of local and global features. In order to verify the applicability of the model in complex terrain areas, this paper also constructed a radar dataset for the southern Anhui mountainous area. This dataset covers precipitation information under various terrain conditions and provides important support for extreme precipitation prediction in complex terrain areas. In addition, this study further conducts comparative experiments on the constructed dataset and some public datasets in the field. The experimental results show that the PredUMamba model achieved the best results in all indicators on the southern Anhui mountainous area and Shanghai radar datasets. On the SEVIR dataset, FVD, CSI_pool4 and CSI_pool16 all outperformed other methods, and the CRPS and CSI also achieved very competitive results. However, this study is only designed around a purely data-driven intelligent forecasting method, future work will focus on combining physical condition constraint information to improve the interpretability of the model and further optimize the prediction accuracy of small and medium-scale convective systems.
Complete Coverage Path Planning Algorithm Based on Rulkov-like Chaotic Mapping
LIU Sicong, HE Ming, LI Chunbiao, HAN Wei, LIU Chengzhuo, XIA Hengyu
Available online  , doi: 10.11999/JEIT250887
Abstract:
  Objective  This study proposes a Complete Coverage Path Planning (CCPP) algorithm based on a sine-constrained Rulkov-Like Hyper-Chaotic (SRHC) mapping. The work addresses key challenges in robotic path planning and focuses on improving coverage efficiency, path unpredictability, and obstacle adaptability for mobile robots in complex environments, including disaster rescue, firefighting, and unknown-terrain exploration. Traditional methods often exhibit predictable movement patterns, fall into local optima, and show inefficient backtracking, which motivates the development of an approach that uses chaotic dynamics to strengthen exploration capability.  Methods  The SRHC-CCPP algorithm integrates three components: 1. SRHC Mapping A hyper-chaotic system with nonlinear coupling (Eq. 1) generates highly unpredictable trajectories. Lyapunov exponent analysis (Fig. 3ab), phase-space diagrams (Fig. 1), and parameter-sensitivity studies (Table 1) confirm chaotic behavior under conditions such as a=0.01 and b=1.3. 2. Memory-Driven Exploration A dynamic visitation grid prioritizes uncovered regions and reduces redundancy (Algorithm 1). 3.Collision detection combined with normal-vector reflection reduces oscillations in cluttered environments (Fig. 4). Simulations employ a Mecanum-wheel robot model (Eq. 2) to provide omnidirectional mobility.  Results and Discussions  1. Efficiency: SRHC-CCPP achieved faster coverage and improved uniformity in both obstacle-free and obstructed scenarios (Fig. 56). The chaotic driver increased path diversity by 37% compared with rule-based methods. 2. Robustness: The algorithm demonstrated initial-value sensitivity and adaptability to environmental noise (Table 2). 3. Scalability Its low computational overhead supported deployment in large-scale grids (>104 cells).  Conclusions  The SRHC-CCPP algorithm advances robotic path planning by: 1. Merging hyper-chaotic unpredictability with memory-guided efficiency, which reduces repetitive loops. 2. Offering real-time obstacle negotiation through adaptive reflection mechanics. 3. Providing a versatile framework suited to applications that require high coverage reliability and dynamic responsiveness. Future work may examine multi-agent extensions and three-dimensional environments.
Speaker Verification Based on Tide-Ripple Convolution Neural Network
CHEN Chen, YI Zhixin, LI Dongyuan, CHEN Deyun
Available online  , doi: 10.11999/JEIT250713
Abstract:
  Objective  State-of-the-art speaker verification models typically rely on fixed receptive fields, which limits their ability to represent multi-scale acoustic patterns while increasing parameter counts and computational loads. Speech contains layered temporal–spectral structures, yet the use of dynamic receptive fields to characterize these structures is still not well explored. The design principles for effective dynamic receptive field mechanisms also remain unclear.  Methods  Inspired by the non-linear coupling behavior of tidal surges, a Tide-Ripple Convolution (TR-Conv) layer is proposed to form a more effective receptive field. TR-Conv constructs primary and auxiliary receptive fields within a window by applying power-of-two interpolation. It then employs a scan-pooling mechanism to capture salient information outside the window and an operator mechanism to perceive fine-grained variations within it. The fusion of these components produces a variable receptive field that is multi-scale and dynamic. A Tide-Ripple Convolutional Neural Network (TR-CNN) is developed to validate this design. To mitigate label noise in training datasets, a total loss function is introduced by combining a NoneTarget with Dynamic Normalization (NTDN) loss and a weighted Sub-center AAM Loss variant, improving model robustness and performance.  Results and Discussions  The TR-CNN is evaluated on the VoxCeleb1-O/E/H benchmarks. The results show that TR-CNN achieves a competitive balance of accuracy, computation, and parameter efficiency (Table 1). Compared with the strong ECAPA-TDNN baseline, the TR-CNN (C=512, n=1) model attains relative EER reductions of 4.95%, 4.03%, and 6.03%, and MinDCF reductions of 31.55%, 17.14%, and 17.42% across the three test sets, while requiring 32.7% fewer parameters and 23.5% less computation (Table 2). The optimal TR-CNN (C=1024, n=1) model further improves performance, achieving EERs of 0.85%, 1.10%, and 2.05%. Robustness is strengthened by the proposed total loss function, which yields consistent improvements in EER and MinDCF during fine-tuning (Table 3). Additional evaluations, including ablation studies (Tables 5 and 6), component analyses (Fig. 3 and Table 4), and t-SNE visualizations (Fig. 4), confirm the effectiveness and robustness of each module in the TR-CNN architecture.  Conclusions  This research proposes a simple and effective TR-Conv layer built on the T-RRF mechanism. Experimental results show that TR-Conv forms a more expressive and effective receptive field, reducing parameter count and computational cost while exceeding conventional one-dimensional convolution in speech feature modeling. It also exhibits strong lightweight characteristics and scalability. Furthermore, a total loss function combining the NTDN loss and a Sub-center AAM loss variant is proposed to enhance the discriminability and robustness of speaker embeddings, particularly under label noise. TR-Conv shows potential as a general-purpose module for integration into deeper and more complex network architectures.
Architecture and Operational Dynamics for Enabling Symbiosis and Evolution of Network Modalities
ZHANG Huifeng, HU Yuxiang, ZHU Jun, ZOU Tao, HUANGFU Wei, LONG Keping
Available online  , doi: 10.11999/JEIT250949
Abstract:
  Objective  The paradigm shift toward polymorphic networks enables dynamic deployment of diverse network modalities on shared infrastructure but introduces two fundamental challenges. First, symbiosis complexity arises from the absence of formal mechanisms to orchestrate coexistence conditions, intermodal collaboration, and resource efficiency gains among heterogeneous network modalities, which results in inefficient resource use and performance degradation. Second, evolutionary uncertainty stems from the lack of lifecycle-oriented frameworks to govern triggering conditions (e.g., abrupt traffic surges), optimization objectives (service-level agreement compliance and energy efficiency), and transition paths (e.g., seamless migration from IPv6 to GEO-based modalities) during network modality evolution, which constrains adaptive responses to vertical industry demands such as vehicular networks and smart manufacturing. This study aims to establish a theoretical and architectural foundation to address these gaps by proposing a three-plane architecture that supports dynamic coexistence and evolution of polymorphic networks with deterministic service-level agreement guarantees.  Methods  The architecture decouples network operation into four domains: (1) The business domain dynamically clusters services using machine learning according to quality-of-service requirements. (2) The modal domain generates specialized network modalities through software-defined interfaces. (3) The function domain enables baseline capability pooling by atomizing network functions into reusable components. (4) The resource domain supports fine-grained resource scheduling through elementization techniques. The core innovation lies in three synergistic planes: (1) The evolutionary decision plane applies predictive analytics for adaptive selection and optimization of network modalities. (2) The intelligent generation plane orchestrates modality deployment with global resource awareness. (3) The symbiosis platform plane dynamically composes baseline capabilities to support modality coexistence.  Results and Discussions  The proposed architecture advances beyond conventional approaches by avoiding virtualization overhead through native deployment of network modalities directly on polymorphic network elements. Resource elementization and capability pooling jointly support efficient cross-modality resource sharing. Closed-loop interactions among the decision, generation, and symbiosis planes enable autonomous network evolution that adapts to time-varying service demands under unified control objectives.  Conclusions  A theoretically grounded framework is presented to support dynamic symbiosis of heterogeneous network modalities on shared infrastructure through business-driven decision mechanisms and autonomous evolution. The architecture provides a scalable foundation for future systems that integrate artificial intelligence. Future work will extend this paradigm to integrated 6G satellite-terrestrial scenarios, where spatial-temporal resource complementarity is expected to play a central role.
A Reliable Service Chain Option for Global Migration of Intelligent Twins in Vehicular Metaverses
QIU Xianyi, WEN Jinbo, KANG Jiawen, ZHANG Tao, CAI Chengjun, LIU Jiqiang, XIAO Ming
Available online  , doi: 10.11999/JEIT250612
Abstract:
  Objective   As an emerging paradigm that integrates metaverses with intelligent transportation systems, vehicular metaverses are becoming a driving force in the transformation of the automotive industry. Within this context, intelligent twins act as digital counterparts of vehicles, covering their entire lifecycle and managing vehicular applications to provide immersive services. However, seamless migration of intelligent twins across RoadSide Units (RSUs) faces challenges such as excessive transmission delays and data leakage, particularly under cybersecurity threats like Distributed Denial of Service (DDoS) attacks. To address these issues, this paper proposes a globally optimized scheme for secure and dynamic intelligent twin migration based on RSU chains. The proposed approach mitigates transmission latency and enhances network security, ensuring that intelligent twins can be migrated reliably and securely through RSU chains even in the presence of multiple types of DDoS attacks.  Methods   A set of reliable RSU chains is first constructed using a communication interruption–free mechanism, which enables the rational deployment of intelligent twins for seamless RSU connectivity. This mechanism ensures continuous communication by dynamically reconfiguring RSU chains according to real-time network conditions and vehicle mobility. The secure migration of intelligent twins along these RSU chains is then formulated as a Partially Observable Markov Decision Process (POMDP). The POMDP framework incorporates dynamic network state variables, including RSU load, available bandwidth, computational capacity, and attack type. These variables are continuously monitored to support decision-making. Migration efficiency and security are evaluated based on total migration delay and the number of DDoS attacks encountered; these metrics serve as reward functions for optimization. Deep Reinforcement Learning (DRL) agents iteratively learn from their interactions with the environment, refining RSU chain selection strategies to maximize both security and efficiency. Through this algorithm, the proposed scheme mitigates excessive transmission delays caused by network attacks in vehicular metaverses, ensuring reliable and secure intelligent twin migration even under diverse DDoS attack scenarios.  Results and Discussions   The proposed secure dynamic intelligent twin migration scheme employs the MADRL framework to select efficient and secure RSU chains within the POMDP. By defining a suitable reward function, the efficiency and security of intelligent twin migration are evaluated under varying RSU chain lengths and different attack scenarios. Simulation results confirm that the scheme enhances migration security in vehicular metaverses. Shorter RSU chains yield lower migration delays than longer ones, owing to reduced handovers and lower communication overhead (Fig. 2). Additionally, the total reward reaches its maximum when the RSU chain length is 6 (Fig. 3). The MADQN approach exhibits strong defense capabilities against DDoS attacks. Under direct attacks, MADQN achieves final rewards that are 65.3% and 51.8% higher than those obtained by random and greedy strategies, respectively. Against indirect attacks, MADQN improves performance by 9.3%. Under hybrid attack conditions, MADQN increases the final reward by 29% and 30.9% compared with the random and greedy strategies, respectively (Fig. 4), demonstrating the effectiveness of the DRL-based defense strategy in handling complex and dynamic threats. Experimental comparisons with other DRL algorithms, including PPO, A2C, and QR-DQN, further highlight the superiority of MADQN under direct, indirect, and hybrid DDoS attacks (Figs. 57). Overall, the proposed scheme ensures reliable and efficient intelligent twin migration across RSUs even under diverse security threats, thereby supporting high-quality interactions in vehicular metaverses.  Conclusions   This study addresses the challenge of secure and efficient global migration of intelligent twins in vehicular metaverses by integrating RSU chains with a POMDP-based optimization framework. Using the MADQN algorithm, the proposed scheme improves both the efficiency and security of intelligent twin migration under diverse network conditions and attack scenarios. Simulation results confirm significant gains in performance. Along identical driving routes, shorter RSU chains provide higher migration efficiency and stronger defense capabilities. Under various types of DDoS attacks, MADQN consistently outperforms baseline strategies, achieving higher final rewards than random and greedy approaches across all scenarios. Compared with other DRL algorithms, MADQN increases the final reward by up to 50.1%, demonstrating superior adaptability in complex attack environments. Future work will focus on enhancing the communication security of RSU chains, including the development of authentication mechanisms to ensure that only authorized vehicles can access RSU edge communication networks.
Flexible Network Modal Packet Processing Pipeline Construction Mechanism for Cloud-Network Convergence Environment
ZHU Jun, XU Qi, ZHANG Fujun, WANG Yongjie, ZOU Tao, LONG Keping
Available online  , doi: 10.11999/JEIT250806
Abstract:
  Objective  With the deep integration of information network technologies and vertical application domains, the demand for cloud–network convergence infrastructure becomes increasingly significant, and the boundaries between cloud computing and network technologies are gradually fading. The advancement of cloud–network convergence technologies gives rise to diverse network service requirements, creating new challenges for the flexible processing of multimodal network packets. The device-level network modal packet processing flexible pipeline construction mechanism is essential for realizing an integrated environment that supports multiple network technologies. This mechanism establishes a flexible protocol packet processing pipeline architecture that customizes a sequence of operations such as packet parsing, packet editing, and packet forwarding according to different network modalities and service demands. By enabling dynamic configuration and adjustment of the processing flow, the proposed design enhances network adaptability and meets both functional and performance requirements across heterogeneous transmission scenarios.  Methods  Constructing a device-level flexible pipeline faces two primary challenges: (1) it must flexibly process diverse network modal packet protocols across polymorphic network element devices. This requires coordination of heterogeneous resources to enable rapid identification, accurate parsing, and correct handling of packets in various formats; (2) the pipeline construction must remain flexible, offering a mechanism to dynamically generate and configure pipeline structures that can adjust not only the number of stages but also the specific functions of each stage. To address these challenges, this study proposes a polymorphic network element abstraction model that integrates heterogeneous resources. The model adopts a hyper-converged hardware architecture that combines high-performance switching ASIC chips with more programmable but less computationally powerful FPGA and CPU devices. The coordinated operation of hardware and software ensures unified and flexible support for custom network protocols. Building upon the abstraction model, a protocol packet flexible processing compilation mechanism is designed to construct a configurable pipeline architecture that meets diverse network service transmission requirements. This mechanism adopts a three-stage compilation structure consisting of front-end, mid-end, and back-end processes. In response to adaptation issues between heterogeneous resources and differentiated network modal demands, a flexible pipeline technology based on Intermediate Representation (IR) slicing is further proposed. This technology decomposes and reconstructs the integrated IR of multiple network modalities into several IR subsets according to specific optimization methods, preserving original functionality and semantics. By applying the IR slicing algorithm, the mechanism decomposes and maps the hybrid processing logic of multimodal networks onto heterogeneous hardware resources, including ASICs, FPGAs, and CPUs. This process enables flexible customization of network modal processing pipelines and supports adaptive pipeline construction for different transmission scenarios.  Results and Discussions  To demonstrate the construction effectiveness of the proposed flexible pipeline, a prototype verification system for polymorphic network elements is developed. As shown in Fig. 6, the system is equipped with Centec CTC8180 switch chips, multiple domestic FPGA chips, and domestic multi-core CPU chips. On this polymorphic network element prototype platform, protocol processing pipelines for IPv4, GEO, and MF network modalities are constructed, compiled, and deployed. As illustrated in Fig. 7, packet capture tests verify that different network modalities operate through distinct packet processing pipelines. To further validate the core mechanism of network modal flexible pipeline construction, the IR code size before and after slicing is compared across the three network modalities and allocation strategies described in Section 6.2. The integrated P4 code for the three modalities, after front-end compilation, produces an unsliced intermediate code containing 32,717 lines. During middle-end compilation, slicing is performed according to the modal allocation scheme, generating IR subsets for ASIC, CPU, and FPGA with code sizes of 23,164, 23,282, and 22,772 lines, respectively. The performance of multimodal protocol packet processing is then assessed, focusing on the effects of different traffic allocation strategies on network protocol processing performance. As shown in Fig. 9, the average packet processing delay in Scheme 1 is significantly higher than in the other schemes, reaching 4.237 milliseconds. In contrast, the average forwarding delays in Schemes 2, 3, and 4 decrease to 54.16 microseconds, 32.63 microseconds, and 15.48 microseconds, respectively. These results demonstrate that adjusting the traffic allocation strategy, particularly the distribution of CPU resources for GEO and MF modalities, effectively mitigates processing bottlenecks and markedly improves the efficiency of multimodal network communication.  Conclusions  Experimental evaluations verify the superiority of the proposed flexible pipeline in construction effectiveness and functional capability. The results indicate that the method effectively addresses complex network environments and diverse service demands, demonstrating stable and high performance. Future work focuses on further optimizing the architecture and expanding its applicability to provide more robust and flexible technical support for protocol packet processing in hyper-converged cloud–network environments.
Ultra-Low-Power IM3 Backscatter Passive Sensing System for IoT Applications
HUANG Ruiyang, WU Pengde
Available online  , doi: 10.11999/JEIT250787
Abstract:
  Objective  With advances in wireless communication and electronic manufacturing, the Internet of Things (IoT) continues to expand across healthcare, agriculture, logistics, and other sectors. The rapid increase in IoT devices creates significant energy challenges, as billions of units generate substantial cumulative consumption, and battery-powered nodes require recurrent charging that raises operating costs and contributes to electronic waste. Energy-efficient strategies are therefore needed to support sustainable IoT deployment. Current approaches focus on improving energy availability and lowering device power demand. Energy Harvesting (EH) technology enables the collection and storage of solar, thermal, kinetic, and Radio Frequency (RF) energy for Ambient IoT (AmIoT) applications. However, conventional IoT devices, particularly those containing active RF components, often require high power, and limited EH efficiency can constrain real-time sensing transmission. To address these constraints, this work proposes an Intermodulation-Product-Third-Order (IM3) backscatter passive sensing system that enables direct analog sensing transmission while maintaining RF EH efficiency.  Methods  The IM3 signal is a nonlinear distortion product generated when two fundamental tones pass through nonlinear devices such as transistors and diodes, producing components at 2f1f2 and 2f2f1. The central contribution of this work is the establishment of a controllable functional relationship between sensor information and IM3 signal frequencies, enabling information encoding through IM3 frequency characteristics. The regulatory element is an embedded impedance module designed as a parallel resonant tank composed of resistors, inductors, and capacitors and integrated into the rectifier circuit. Adjusting the tank’s resonant frequency regulates the conversion efficiency from the fundamental tones to IM3 components: when the resonant frequency approaches a target IM3 frequency, a high-impedance load is produced, lowering the conversion efficiency of that specific IM3 component while leaving other IM3 components unchanged. Sensor information modulates the resonant frequency by generating a DC voltage applied to a voltage-controlled varactor. By mapping sensor information to impedance states, impedance states to IM3 conversion efficiency, and IM3 frequency features back to sensor information, passive sensing is achieved.  Results and Discussions  A rectifying transmitter operating in the UHF 900 MHz band is designed and fabricated (Fig. 8). One signal source is fixed at 910.5 MHz, and the other scans 917~920 MHz, generating IM3 components in the 923.5~929.5 MHz range. Both sources provide an output power of 0 dBm, and the transmitted sensor information is expressed as a DC voltage. Experimental measurements show a power trough in the backscattered IM3 spectrum; as the DC voltage varies from 0 to 5 V, the trough position shifts accordingly (Fig. 9), with more than 10 dB attenuation across the range, giving adequate resolution determined by the varactor diode’s capacitance ratio. The embedded impedance module shows minimal effect on RF-to-DC efficiency (Fig. 10): at a fixed DC voltage, efficiency decreases by approximately 5 basis points at the modulation frequency, independent of input power, and under fixed input power, different sampled voltages cause about 5 basis points of efficiency reduction at different frequencies. These results confirm that the rectifier circuit maintains stable efficiency and meets low-power data transmission requirements.  Conclusions  This paper proposes a passive sensing system based on backscattered IM3 signals that enables simultaneous efficient RF EH and sensing readout. The regulation mechanism between the difference-frequency embedded impedance module and backscattered IM3 intensity is demonstrated. Driven by sensing information, the module links the sensed quantity to IM3 intensity to realize passive readout. Experimental results show that the embedded impedance reduces the target-frequency IM3 component by more than 10 dB, and the RF-to-DC efficiency decreases by only 5 percentage points during readout. Tests in a microwave anechoic chamber indicate that the error between the IM3-derived bias voltage and the measured value remains within 5%, confirming stable operation. The system addresses the energy-information transmission constraint and supports battery-free communication for passive sensor nodes. It extends device lifespan and reduces maintenance costs in Ultra-Low-Power scenarios such as wireless sensor networks and implantable medical devices, offering strong engineering relevance.
MCL-PhishNet: A Multi-Modal Contrastive Learning Network for Phishing URL Detection
DONG Qingwei, FU Xueting, ZHANG Benkui
Available online  , doi: 10.11999/JEIT250758
Abstract:
  Objective  The growing complexity and rapid evolution of phishing attacks present challenges to traditional detection methods, including feature redundancy, multi-modal mismatch, and limited robustness to adversarial samples.  Methods  MCL-PhishNet is proposed as a Multi-Modal Contrastive Learning framework that achieves precise phishing URL detection through a hierarchical syntactic encoder, bidirectional cross-modal attention mechanisms, and curriculum contrastive learning strategies. In this framework, multi-scale residual convolutions and Transformers jointly model local grammatical patterns and global dependency relationships of URLs, whereas a 17-dimensional statistical feature set improves robustness to adversarial samples. The dynamic contrastive learning mechanism optimizes the feature-space distribution through online spectral-clustering-based semantic subspace partitioning and boundary-margin constraints.  Results and Discussions  This study demonstrates consistent performance across different datasets (EBUU17 accuracy 99.41%, PhishStorm 99.41%, Kaggle 99.30%), validating the generalization capability of MCL-PhishNet. The three datasets differ significantly in sample distribution, attack types, and feature dimensions, yet the method in this study maintains stable high performance, indicating that the multimodal contrastive learning framework has strong cross-scenario adaptability. Compared to methods optimized for specific datasets, this approach avoids overfitting to particular dataset distributions through end-to-end learning and an adaptive feature fusion mechanism.  Conclusions  This paper addresses the core challenges in phishing URL detection, such as the difficulty of dynamic syntax pattern modeling, multimodal feature mismatches, and insufficient adversarial robustness, and proposes a multimodal contrastive learning framework, MCL-PhishNet. Through a collaborative mechanism of hierarchical syntax encoding, dynamic semantic distillation, and curriculum optimization, it achieves 99.41% accuracy and a 99.65% F1 score on datasets like EBUU17 and PhishStorm, improving existing state-of-the-art methods by 0.27%~3.76%. Experiments show that this approach effectively captures local variation patterns in URLs (such as numeric substitution attacks in ‘payp41-log1n.com’) through a residual convolution-Transformer collaborative architecture and reduces the false detection rate of path-sensitive parameters to 0.07% via a bidirectional cross-modal attention mechanism. However, the proposed framework has relatively high complexity. Although the hierarchical encoding module of MCL-PhishNet (including multi-scale CNNs, Transformers, and gated networks) improves detection accuracy, it also increases the number of model parameters. Moreover, the current model is trained primarily on English-based public datasets, resulting in significantly reduced detection accuracy for non-Latin characters (such as Cyrillic domain confusions) and regional phishing strategies (such as ‘fake’ URLs targeting local payment platforms).
Research on Collaborative Reasoning Framework and Algorithms of Cloud-Edge Large Models for Intelligent Auxiliary Diagnosis Systems
HE Qian, ZHU Lei, LI Gong, YOU Zhengpeng, YUAN Lei, JIA Fei
Available online  , doi: 10.11999/JEIT250828
Abstract:
  Objective  The deployment of Large Language Models (LLMs) in intelligent auxiliary diagnosis is constrained by limited computing resources for local hospital deployment and by privacy risks related to the transmission and storage of medical data in cloud environments. Low-parameter local LLMs show 20%–30% lower accuracy in medical knowledge question answering and 15%–25% reduced medical knowledge coverage compared with full-parameter cloud LLMs, whereas cloud-based systems face inherent data security concerns. To address these issues, a cloud-edge LLM collaborative reasoning framework and related algorithms are proposed for intelligent auxiliary diagnosis systems. The objective is to design a cloud-edge collaborative reasoning agent equipped with intelligent routing and dynamic semantic desensitization to enable adaptive task allocation between the edge (hospital side) and cloud (regional cloud). The framework is intended to achieve a balanced result across diagnostic accuracy, data privacy protection, and resource use efficiency, providing a practical technical path for the development of medical artificial intelligence systems.  Methods  The proposed framework adopts a layered architectural design composed of a four-tier progressive architecture on the edge side and a four-tier service-oriented architecture on the cloud side (Fig. 1). The edge side consists of resource, data, model, and application layers, with the model layer hosting lightweight medical LLMs and the cloud-edge collaborative agent. The cloud side comprises AI IaaS, AI PaaS, AI MaaS, and AI SaaS layers, functioning as a center for computing power and advanced models. The collaborative reasoning process follows a structured workflow (Fig. 2), beginning with user input parsed by the agent to extract key clinical features, followed by reasoning node decision-making. Two core technologies support the agent: 1) Intelligent routing: This mechanism defaults to edge-side processing and dynamically selects the reasoning path (edge or cloud) through a dual-driven weight update strategy. It integrates semantic feature similarity computed through Chinese word segmentation and pre-trained medical language models and incorporates historical decision data, with an exponential moving average used to update feature libraries for adaptive optimization. 2) Dynamic semantic desensitization: Employing a three-stage architecture (sensitive entity recognition, semantic correlation analysis, and hierarchical desensitization decision-making), this technology identifies sensitive entities through a domain-enhanced Named Entity Recognition (NER) model, calculates entity sensitivity and desensitization priority, and applies a semantic similarity constraint to prevent excessive desensitization. Three desensitization strategies (complete deletion, general replacement, partial masking) are used based on entity sensitivity. Experimental validation is conducted with two open-source Chinese medical knowledge graphs (CMeKG and CPubMedKG) containing more than 2.7 million medical entities. The experimental environment (Fig. 3) deploys a qwen3:1.7b model on the edge and the Jiutian LLM on the cloud, with a 5,000-sample evaluation dataset divided into entity-level, relation-level, and subgraph-level questions. Performance is assessed with three metrics: answer accuracy, average token consumption, and average response time.  Results and Discussions  Experimental results show that the proposed framework achieves strong performance across the main evaluation dimensions. For answer accuracy, the intelligent routing mechanism attains 72.44% on CMeKG (Fig. 4) and 66.20% on CPubMedKG (Fig. 5), which are higher than the edge-side LLM alone (60.73% and 54.18%) and close to the cloud LLM (72.68% and 66.49%). These results indicate that the framework maintains diagnostic consistency with cloud-based systems while taking advantage of edge-side capabilities. For resource use, the intelligent routing model reduces average token consumption to 61.27, representing 45.63% of the cloud LLM’s token usage (131.68) (Fig. 6), which supports substantial cost reduction. For response time, the edge-side LLM shows latency greater than 6 s because of limited computing power, whereas the cloud LLM reaches 0.44 s latency through dedicated line access (8% of the 5.46 s latency under internet access). The intelligent routing model produces average latency values between those of the edge and cloud LLMs under both access modes (Fig. 7), consistent with expected trade-offs. The framework also shows applicability across common medical scenarios (Table 1), including outpatient triage, chronic disease management, medical image analysis, intensive care, and health consultation, by combining local real-time processing with cloud-based deep reasoning. Limitations appear in emergency rescue settings with weak network conditions because of latency constraints and in rare disease diagnosis because of limited edge-side training samples and potential loss of specific features during desensitization. Overall, the results verify that the cloud-edge collaborative reasoning mechanism reduces computing resource overhead while preserving consistency in diagnostic results.  Conclusions  This study constructs a cloud-edge LLM collaborative reasoning framework for intelligent auxiliary diagnosis systems, addressing the challenges of limited local computing power and cloud data privacy risks. Through the integration of intelligent routing, prompt engineering adaptation, and dynamic semantic desensitization, the framework achieves balanced optimization of diagnostic accuracy, data security, and resource economy. Experimental validation shows that its accuracy is comparable to cloud-only LLMs while resource consumption is substantially reduced, providing a feasible technical path for medical intelligence development. Future work focuses on three directions: intelligent on-demand scheduling of computing and network resources to mitigate latency caused by edge-side computing constraints; collaborative deployment of localized LLMs with Retrieval-Augmented Generation (RAG) to raise edge-side standalone accuracy above 90%; and expansion of diagnostic evaluation indicators to form a three-dimensional scenario–node–indicator system incorporating sensitivity, specificity, and AUC for clinical-oriented assessment.
Crosstalk-Free Frequency-Spin Multiplexed Multifunctional Device Realized by Nested Meta-Atoms
ZHANG Ming, DONG Peng, TAO En, YANG Lin, HAN Qi, HE Yuhang, HOU Weimin, LI Kang
Available online  , doi: 10.11999/JEIT251202
Abstract:
  Objective  To address the challenges of high manufacturing costs and signal crosstalk in existing multi-dimensional multiplexed metasurfaces, this study proposes a crosstalk-free, frequency-spin multiplexed single-layer metasurface based on nested bi-spectral meta-atoms. By physically superimposing two C-shaped split-ring resonators targeting the Ku-band (12.5 GHz) and K-band (22 GHz), the design achieves four fully independent information channels (two frequencies and two spin states) without relying on spatial division or multi-layer stacking. The objective is to demonstrate independent, high-performance vortex beam generation and holographic imaging, offering a simplified, low-cost solution for advanced 6G communication and sensing systems.  Methods  The metasurface employs a reflective metal-dielectric-metal structure where each unit cell nests an outer (OCSRR) and inner (ICSRR) resonator. Through parameter sweeps using CST Microwave Studio, specific structures were selected to ensure high cross-polarization conversion at target frequencies while maintaining negligible response at non-target bands. Independent spin multiplexing is realized by combining transmission phase and geometric phase via controlled resonator rotation. Two prototypes were fabricated using PCB technology: MS1 for generating focused vortex beams (l= +1, +2, +3, +4) and MS2 for holographic imaging (“H”, “B”, “K”, “D”). Performance was validated via near-field scanning measurements under oblique incidence using a vector network analyzer.  Results and Discussions  Simulations and experimental measurements confirm the excellent frequency selectivity and spin decoupling of the nested design. The OCSRR and ICSRR dictate responses at 12.5 GHz and 22 GHz respectively, behaving as a linear superposition with minimal crosstalk. MS1 successfully generated four focused vortex beams with distinct topological charges, achieving an average mode purity of 88.25%. MS2 reconstructed four independent, clear holographic images with high channel isolation. The close agreement between measured results and simulations verifies the device's robustness and the effectiveness of the crosstalk-free design strategy under practical illumination conditions.  Conclusions  This work demonstrates a reliable method for constructing crosstalk-free frequency-spin multiplexed metasurfaces using nested meta-atoms. By enabling simultaneous, independent manipulation of electromagnetic waves across four channels on a single layer, the proposed approach significantly reduces design complexity and fabrication costs. The successful realization of multi-channel vortex beams and holography highlights the potential of this technology for integrated, multi-functional applications in next-generation wireless communications and optical systems.
Minimax Robust Kalman Filtering under Multistep Random Measurement Delays and Packet Dropouts
YANG Chunshan, ZHAO Ying, LIU Zheng, QIU Yuan, JING Benqin
Available online  , doi: 10.11999/JEIT250741
Abstract:
  Objective  Networked Control Systems (NCSs) provide advantages such as flexible installation, convenient maintenance, and reduced cost, but they also present challenges arising from random measurement delays and packet dropouts caused by communication network unreliability and limited bandwidth. Moreover, system noise variance may fluctuate significantly under strong electromagnetic interference. In NCSs, time delays are random and uncertain. When a set of Bernoulli-distributed random variables is used to describe multistep random measurement delays and packet dropouts, the fictitious noise method in existing studies introduces autocorrelation among different components, which complicates the computation of fictitious noise variances and makes it difficult to establish robustness. This study presents a solution for minimax robust Kalman filtering in systems characterized by uncertain noise variance, multistep random measurement delays, and packet dropouts.  Methods  The main challenges lie in model transformation and robustness verification. When a set of Bernoulli-distributed random variables is employed to represent multistep random measurement delays and packet dropouts, a series of strategies are applied to address the minimax robust Kalman filtering problem. First, a new model transformation method is proposed based on the flexibility of the Hadamard product in multidimensional data processing, after which a robust time-varying Kalman estimator is designed in a unified framework following the minimax robust filtering principle. Second, the robustness proof is established using matrix elementary transformation, strictly diagonally dominant matrices, the Gerŝgorin circle theorem, and the Hadamard product theorem within the framework of the generalized Lyapunov equation method. Additionally, by converting the Hadamard product into a matrix product through matrix factorization, a sufficient condition for the existence of a steady-state estimator is derived, and the robust steady-state Kalman estimator is subsequently designed.  Results and Discussions  The proposed minimax robust Kalman filter extends the robust Kalman filtering framework and provides new theoretical support for addressing the robust fusion filtering problem in complex NCSs. The curves (Fig. 5) present the actual accuracy \begin{document}${\text{tr}}{{\mathbf{\bar P}}^l}(N)$\end{document}, \begin{document}$l = a,b,c,d$\end{document} as a function of \begin{document}$ 0.1 \le {\alpha _0} $\end{document}, \begin{document}${\alpha _1} $\end{document}, \begin{document}${\alpha _2} \le 1 $\end{document}. It is observed that situation (1) achieves the highest robust accuracy, followed by situations (2) and (3), whereas situation (4) exhibits poorer accuracy. This difference arises because the estimators in situation (1) receive measurements with one-step random delay, whereas situation (4) experiences a higher packet loss rate. The curves (Fig. 5) confirm the validity and effectiveness of the proposed method. Another simulation is conducted for a mass-spring-damper system. The comparison between the proposed approach and the optimal robust filtering method (Table 2, Fig. 7) indicates that although the proposed method ensures that the actual prediction error variance attains the minimum upper bound, its actual accuracy is slightly lower than the optimal prediction accuracy.  Conclusions  The minimax robust Kalman filtering problem is investigated for systems characterized by uncertain noise variance, multistep random measurement delays, and packet dropouts. The system noise variance is uncertain but bounded by known conservative upper limits, and a set of Bernoulli-distributed random variables with known probabilities is used to represent the multistep random measurement delays and packet dropouts between the sensor and the estimator. The Hadamard product is used to enhance the model transformation method, followed by the design of a minimax robust time-varying Kalman estimator. Robustness is demonstrated through matrix elementary transformation, the Gerschgorin circle theorem, the Hadamard product theorem, matrix factorization, and the Lyapunov equation method. A sufficient condition is established for the time-varying generalized Lyapunov equation to possess a unique steady-state positive semidefinite solution, based on which a robust steady-state estimator is constructed. The convergence between the time-varying and steady-state estimators is also proven. Two simulation examples verify the effectiveness of the proposed approach. The presented methods overcome the limitations of existing techniques and provide theoretical support for solving the robust fusion filtering problem in complex NCSs.
Power Grid Data Recovery Method Driven by Temporal Composite Diffusion Networks
YAN Yandong, LI Chenxi, LI Shijie, YANG Yang, GE Yuhao, HUANG Yu
Available online  , doi: 10.11999/JEIT250435
Abstract:
  Objective  Smart grid construction drives modern power systems, and distribution networks serve as the key interface between the main grid and end users. Their stability, power quality, and efficiency depend on accurate data management and analysis. Distribution networks generate large volumes of multi-source heterogeneous data that contain user consumption records, real-time meteorology, equipment status, and marketing information. These data streams often become incomplete during collection or transmission due to noise, sensor failures, equipment aging, or adverse weather. Missing data reduces the reliability of real-time monitoring and affects essential tasks such as load forecasting, fault diagnosis, health assessment, and operational decision making. Conventional approaches such as mean or regression imputation lack the capacity to maintain temporal dependencies. Generative models such as Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) do not represent the complex statistical characteristics of grid data with sufficient accuracy. This study proposes a diffusion model based data recovery method for distribution networks. The method is designed to reconstruct missing data, preserve semantic and statistical integrity, and enhance data utility to support smart grid stability and efficiency.  Methods  This paper proposes a power grid data augmentation method based on diffusion models. The core of the method is that input Gaussian noise is mapped to the target distribution space of the missing data so that the recovered data follows its original distribution characteristics. To reduce semantic discrepancy between the reconstructed data and the actual data, the method uses time series sequence embeddings as conditional information. This conditional input guides and improves the diffusion generation process so that the imputation remains consistent with the surrounding temporal context.  Results and Discussions  Experimental results show that the proposed diffusion model based data augmentation method achieves higher accuracy in recovering missing power grid data than conventional approaches. The performance demonstrates that the method improves the completeness and reliability of datasets that support analytical tasks and operational decision making in smart grids.  Conclusions  This study proposes and validates a diffusion model based data augmentation method designed to address data missingness in power distribution networks. Traditional restoration methods and generative models have difficulty capturing the temporal dependencies and complex distribution characteristics of grid data. The method presented here uses temporal sequence information as conditional guidance, which enables accurate imputation of missing values and preserves the semantic integrity and statistical consistency of the original data. By improving the accuracy of distribution network data recovery, the method provides a reliable approach for strengthening data quality and supports the stability and efficiency of smart grid operations.
Research on Directional Modulation Multi-carrier Waveform Design for Integrated Sensing and Communication
HUANG Gaojian, ZHANG Shengzhuang, DING Yuan, LIAO Kefei, JIN Shuanggen, LI Xingwang, OUYANG Shan
Available online  , doi: 10.11999/JEIT250680
Abstract:
  Objective  With the concurrent evolution of wireless communication and radar technologies, spectrum congestion has become increasingly severe. Integrated Sensing and Communication (ISAC) has emerged as an effective approach that unifies sensing and communication functionalities to achieve efficient spectrum and hardware sharing. Orthogonal Frequency Division Multiplexing (OFDM) signals are regarded as a key candidate waveform due to their high flexibility. However, estimating target azimuth angles and suppressing interference from non-target directions remain computationally demanding, and confidential information transmitted in these directions is vulnerable to eavesdropping. To address these challenges, the combination of Directional Modulation (DM) and OFDM, termed OFDM-DM, provides a promising solution. This approach enables secure communication toward the desired direction, suppresses interference in other directions, and reduces radar signal processing complexity. The potential of OFDM-DM for interference suppression and secure waveform design is investigated in this study.  Methods  As a physical-layer security technique, DM is used to preserve signal integrity in the intended direction while deliberately distorting signals in other directions. Based on this principle, an OFDM-DM ISAC waveform is developed to enable secure communication toward the target direction while simultaneously estimating distance, velocity, and azimuth angle. The proposed waveform has two main advantages: the Bit Error Rate (BER) at the radar receiver is employed for simple and adjustable azimuth estimation, and interference from non-target directions is suppressed without additional computational cost. The waveform maintains the OFDM constellation in the target direction while distorting constellation points elsewhere, which reduces correlation with the original signal and enhances target detection through time-domain correlation. Moreover, because element-wise complex division in the Two-Dimensional Fast Fourier Transform (2-D FFT) depends on signal integrity, phase distortion in signals from non-target directions disrupts phase relationships and further diminishes the positional information of interference sources.  Results and Discussions  In the OFDM-DM ISAC system, the transmitted signal retains its communication structure within the target beam, whereas constellation distortion occurs in other directions. Therefore, the BER at the radar receiver exhibits a pronounced main lobe in the target direction, enabling accurate azimuth estimation (Fig. 5). In the time-domain correlation algorithm, the target distance is precisely determined, while correlation in non-target directions deteriorates markedly due to DM, thereby achieving effective interference suppression (Fig. 6). Additionally, during 2-D FFT processing, signal distortion disrupts the linear phase relationship among modulation symbols in non-target directions, causing conventional two-dimensional spectral estimation to fail and further suppressing positional information of interference sources (Fig. 7). Additional simulations yield one-dimensional range and velocity profiles (Fig. 8). The results demonstrate that the OFDM-DM ISAC waveform provides structural flexibility, physical-layer security, and low computational complexity, making it particularly suitable for environments requiring high security or operating under strong interference conditions.  Conclusions  This study proposes an OFDM-DM ISAC waveform and systematically analyzes its advantages in both sensing and communication. The proposed waveform inherently suppresses interference from non-target directions, eliminating target ambiguity commonly encountered in traditional ISAC systems and thereby enhancing sensing accuracy. Owing to the spatial selectivity of DM, only legitimate directions can correctly demodulate information, whereas unintended directions fail to recover valid data, achieving intrinsic physical-layer security. Compared with existing methods, the proposed waveform simultaneously attains secure communication and interference suppression without additional computational burden, offering a lightweight and high-performance solution suitable for resource-constrained platforms. Therefore, the OFDM-DM ISAC waveform enables high-precision sensing while maintaining communication security and hardware feasibility, providing new insights for multi-carrier ISAC waveform design.
Optimized Design of Non-Transparent Bridge for Heterogeneous Interconnects in Hyper-converged Infrastructure
ZHENG Rui, SHEN Jianliang, LV Ping, DONG Chunlei, SHAO Yu, ZHU Zhengbin
Available online  , doi: 10.11999/JEIT250272
Abstract:
  Objective  The integration of heterogeneous computing resource clusters into modern Hyper-Converged Infrastructure (HCI) systems imposes stricter performance requirements in latency, bandwidth, throughput, and cross-domain transmission stability. Traditional HCI systems primarily rely on the Ethernet TCP/IP protocol, which exhibits inherent limitations, including low bandwidth efficiency, high latency, and limited throughput. Existing PCIe Switch products typically employ Non-Transparent Bridges (NTBs) for conventional dual-system connections or intra-server communication; however, they do not meet the performance demands of heterogeneous cross-domain transmission within HCI environments. To address this limitation, a novel Dual-Mode Non-Transparent Bridge Architecture (D-MNTBA) is proposed to support dual transmission modes. D-MNTBA combines a fast transmission mode via a bypass mechanism with a stable transmission mode derived from the Traditional Data Path Architecture (TDPA), thereby aligning with the data characteristics and cross-domain streaming demands of HCI systems. Hardware-level enhancements in address and ID translation schemes enable D-MNTBA to support more complex mappings while minimizing translation latency. These improvements increase system stability and effectively support the cross-domain transmission of heterogeneous data in HCI systems.  Methods  To overcome the limitations of traditional single-pass architectures and the bypass optimizations of the TDPA, the proposed D-MNTBA incorporates both a fast transmission path and a stable transmission path. This dual-mode design enables the NTB to leverage the data characteristics of HCI systems for telegram-based streaming, thereby reducing dependence on intermediate protocols and data format conversions. The stable transmission mode ensures reliable message delivery, while the fast transmission mode—enhanced through hardware-level optimizations in address and ID translation—supports high-real-time cross-domain communication. This combination improves overall transmission performance by reducing both latency and system overhead. To meet the low-latency demands of the bypass transmission path, the architecture implements hardware-level enhancements to the address and ID conversion modules. The address translation module is expanded with a larger lookup table, allowing for more complex and flexible mapping schemes. This enhancement enables efficient utilization of non-contiguous and fragmented address spaces without compromising performance. Simultaneously, the ID conversion module is optimized through multiple conversion strategies and streamlined logic, significantly reducing the time required for ID translation.  Results and Discussions  Address translation in the proposed D-MNTBA is validated through emulation within a constructed HCI environment. The simulation log for indirect address translation shows no errors or deadlocks, and successful hits are observed on BAR2/3. During dual-host disk access, packet header addresses and payload content remain consistent, with no packet loss detected (Fig. 14), indicating that indirect address translation is accurately executed under D-MNTBA. ID conversion performance is evaluated by comparing the proposed architecture with the TDPA implemented in the PEX8748 chip. The switch based on D-MNTBA exhibits significantly shorter ID conversion times. A maximum reduction of approximately 34.9% is recorded, with an ID conversion time of 71 ns for a 512-byte payload (Fig. 15). These findings suggest that the ID function mapping method adopted in D-MNTBA effectively reduces conversion latency and enhances system performance. Throughput stability is assessed under sustained heavy traffic with payloads ranging from 256 to 2048 bytes. The maximum throughputs of D-MNTBA, the Ethernet card, and PEX8748 are measured at 1.36 GB/s, 0.97 GB/s, and 0.9 GB/s, respectively (Fig. 16). Compared to PEX8748 and the Ethernet architecture, D-MNTBA improves throughput by approximately 51.1% and 40.2%, respectively, and shows the slowest degradation trend, reflecting superior stability in heterogeneous cross-domain transmission. Bandwidth comparison reveals that D-MNTBA outperforms TDPA and the Ethernet card, with bandwidth improvements of approximately 27.1% and 19.0%, respectively (Fig. 17). These results highlight the significant enhancement in cross-domain transmission performance achieved by the proposed architecture in heterogeneous environments.  Conclusions  This study proposes a Dual-Mode D-MNTBA to address the challenges of heterogeneous interconnection in HCI systems. By integrating a fast transmission path enabled by a bypass architecture with the stable transmission path of the TDPA, D-MNTBA accommodates the specific data characteristics of cross-domain transmission in heterogeneous environments and enables efficient message routing. D-MNTBA enhances transmission stability while improving system-wide performance, offering robust support for high-real-time cross-domain transmission in HCI. It also reduces latency and overhead, thereby improving overall transmission efficiency. Compared with existing transmission schemes, D-MNTBA achieves notable gains in performance, making it a suitable solution for the demands of heterogeneous domain interconnects in HCI systems. However, the architectural enhancements, particularly the bypass design and associated optimizations, increase logic resource utilization and power consumption. Future work should focus on refining hardware design, layout, and wiring strategies to reduce logic complexity and resource consumption without compromising performance.
Considering Workload Uncertainty in Strategy Gradient-based Hyper-heuristic Scheduling for Software Projects
SHEN Xiaoning, SHI Jiangyi, MA Yanzhao, CHEN Wenyan, SHE Juan
Available online  , doi: 10.11999/JEIT250769
Abstract:
  Objective  The Software Project Scheduling Problem (SPSP) is essential for allocating resources and arranging tasks in software development, and it affects economic efficiency and competitiveness. Deterministic assumptions used in traditional models overlook common fluctuations in task effort caused by requirement changes or estimation deviation. These assumptions often reduce feasibility and weaken scheduling stability in dynamic development settings. This study develops a multi-objective model that integrates task effort uncertainty and represents it using asymmetric triangular interval type-2 fuzzy numbers to reflect real development conditions. The aim is to improve decision quality under uncertainty by designing an optimization method that shortens project duration and increases employee satisfaction, thereby strengthening robustness and adaptability in software project scheduling.  Methods  A Policy Gradient-based Hyper-Heuristic Algorithm (PGHHA) is developed to solve the formulated model. The framework contains a High-Level Strategy (HLS) and a set of Low-Level Heuristics (LLHs). The High-Level Strategy applies an Actor-Critic reinforcement learning structure. The Actor network selects appropriate LLHs based on real-time evolutionary indicators, including population convergence and diversity, and the Critic network evaluates the actions selected by the Actor. Eight LLHs are constructed by combining two global search operators, the matrix crossover operator and the Jaya operator with random jitter, with two local mining strategies, duration-based search and satisfaction-based search. Each LLH is configured with two neighborhood depths (V1=5 and V2=20), determined through Taguchi orthogonal experiments. Each candidate solution is encoded as a real-valued task-employee effort matrix. Constraints including skill coverage, maximum dedication, and maximum participant limits are applied during optimization. A prioritized experience replay mechanism is introduced to reuse historical trajectories, which accelerates convergence and improves network updating efficiency.  Results and Discussions  Experimental evaluation is performed on twelve synthetic cases and three real software projects. The algorithm is assessed against six representative methods to validate the proposed strategies. HyperVolume Ratio (HVR) and Inverted Generational Distance (IGD) are used as performance indicators, and statistical significance is examined using Wilcoxon rank-sum tests with a 0.05 threshold. The findings show that the PGHHA achieves better convergence and diversity than all comparison methods in most cases. The quantitative improvements are reflected in the summarized values (Table 5, Table 6). The visual distribution of Pareto fronts (Fig. 4, Fig. 5) shows that the obtained solutions lie below those of alternative algorithms and display more uniform coverage, indicating higher convergence precision and improved spread. The computational cost increases because of neural network training and the experience replay mechanism, as shown in Fig. 6. However, the improvement in solution quality is acceptable considering the longer planning period of software development. Modeling effort uncertainty with asymmetric triangular interval type-2 fuzzy numbers enhances system stability. The adaptive heuristic selection driven by the Actor-Critic mechanism and the prioritized experience replay strengthens performance under dynamic and uncertain conditions. Collectively, the evidence indicates that the PGHHA provides more reliable support for software project scheduling, maintaining diversity while optimizing conflicting objectives under uncertain workload environments.  Conclusions  A multi-objective software project scheduling model is developed in this study, where task effort uncertainty is represented using asymmetric triangular interval type-2 fuzzy numbers. A PGHHA is designed to solve the model. The algorithm applies an Actor-Critic reinforcement learning structure as the high-level strategy to adaptively select LLHs according to the evolutionary state. A prioritized experience replay mechanism is incorporated to enhance learning efficiency and accelerate convergence. Tests on synthetic and real cases show that: (1) The proposed algorithm delivers stronger convergence and diversity under uncertainty than six representative algorithms; (2) The combination of global search operators and local mining strategies maintains a suitable balance between exploration and exploitation. (3) The use of type-2 fuzzy representation offers a more stable characterization of effort uncertainty than type-1 fuzzy numbers. The current work focuses on a single-project context. Future work will extend the model to multi-project environments with shared resources and inter-project dependencies. Additional research will examine adaptive reward strategies and lightweight network designs to reduce computational demand while preserving solution quality.
An Implicit Certificate-Based Lightweight Authentication Scheme for Power Industrial Internet of Things
WANG Sheng, ZHANG Linghao, TENG Yufei, LIU Hongli, HAO Junyang, WU Wenjuan
Available online  , doi: 10.11999/JEIT250457
Abstract:
  Objective  The rapid development of the Internet of Things, cloud computing, and edge computing drives the evolution of the Power Industrial Internet of Things (PIIoT) into core infrastructure for smart power systems. In this architecture, terminal devices collect operational data and send it to edge gateways for preliminary processing before transmission to cloud platforms for further analysis and control. This structure improves efficiency, reliability, and security in power systems. However, the integration of traditional industrial systems with open networks introduces cybersecurity risks. Resource-constrained devices in PIIoT are exposed to threats that may lead to data leakage, privacy exposure, or disruption of power services. Existing authentication mechanisms either impose high computational and communication overhead or lack sufficient protection, such as forward secrecy or resistance to replay and man-in-the-middle attacks. This study focuses on designing a lightweight and secure authentication method suitable for the PIIoT environment. The method is intended to meet the operational needs of power terminal devices with limited computing capability while ensuring strong security protection.  Methods  A secure and lightweight identity authentication scheme is designed to address these challenges. Implicit certificate technology is applied during device identity registration, embedding public key authentication information into the signature rather than transmitting a complete certificate during communication. Compared with explicit certificates, implicit certificates are shorter and allow faster verification, reducing transmission and validation overhead. Based on this design, a lightweight authentication protocol is constructed using only hash functions, XOR operations, and elliptic curve point multiplication. This protocol supports secure mutual authentication and session key agreement while remaining suitable for resource-constrained power terminal devices. A formal analysis is then performed to evaluate security performance. The results show that the scheme achieves secure mutual authentication, protects session key confidentiality, ensures forward secrecy, and resists replay and man-in-the-middle attacks. Finally, experimental comparisons with advanced authentication protocols are conducted. The results indicate that the proposed scheme requires significantly lower computational and communication overhead, supporting its feasibility for practical deployment.  Results and Discussions  The proposed scheme is evaluated through simulation and numerical comparison with existing methods. The implementation is performed on a virtual machine configured with 8 GB RAM, an Intel i7-12700H processor, and Ubuntu 22.04, using the Miracl-Python cryptographic library. The security level is set to 128 bits, with the ed25519 elliptic curve, SHA-256 hash function, and AES-128 symmetric encryption. Table 1 summarizes the performance of the cryptographic primitives. As shown in Table 2, the proposed scheme achieves the lowest computational cost, requiring three elliptic curve point multiplications on the device side and five on the gateway side. These values are substantially lower than those of traditional certificate-based authentication, which may require up to 14 and 12 operations, respectively. Compared with other representative authentication approaches, the proposed method further reduces the computational burden on devices, improving suitability for resource-limited environments. Table 3 shows that communication overhead is also minimized, with the smallest total message size (3 456 bits) and three communication rounds, attributed to the implicit certificate mechanism. As shown in Fig. 5, the authentication process exhibits the shortest execution time among all evaluated schemes. The runtime is 47.72 ms on devices and 82.88 ms on gateways, indicating lightweight performance and suitability for deployment in Industrial Internet of Things applications.  Conclusions  A lightweight and secure identity authentication scheme based on implicit certificates is presented for resource-constrained terminal devices in the PIIoT. Through the integration of a low-overhead authentication protocol and efficient certificate processing, the scheme maintains a balance between security and performance. It enables secure mutual authentication, protects session key confidentiality, and ensures forward secrecy while keeping computational and communication overhead minimal. Security analysis and experimental evaluation confirm that the scheme provides stronger protection and higher efficiency compared with existing approaches. It offers a practical and scalable solution for enhancing the security architecture of modern power systems.
Unsupervised Anomaly Detection of Hydro-Turbine Generator Acoustics by Integrating Pre-Trained Audio Large Model and Density Estimation
WU Ting, WEN Shulin, YAN Zhaoli, FU Gaoyuan, LI Linfeng, LIU Xudu, CHENG Xiaobin, YANG Jun
Available online  , doi: 10.11999/JEIT250934
Abstract:
  Objective  Hydro-Turbine Generator Units (HTGUs) require reliable early fault detection to maintain operational safety and reduce maintenance cost. Acoustic signals provide a non-intrusive and sensitive monitoring approach, but their use is limited by complex structural acoustics, strong background noise, and the scarcity of abnormal data. An unsupervised acoustic anomaly detection framework is presented, in which a large-scale pretrained audio model is integrated with density-based k-nearest neighbors estimation. This framework is designed to detect anomalies using only normal data and to maintain robustness and strong generalization across different operational conditions of HTGUs.  Methods  The framework performs unsupervised acoustic anomaly detection for HTGUs using only normal data. Time-domain signals are preprocessed with Z-score normalization and Fbank features, and random masking is applied to enhance robustness and generalization. A large-scale pretrained BEATs model is used as the feature encoder, and an Attentive Statistical Pooling module aggregates frame-level representations into discriminative segment-level embeddings by emphasizing informative frames. To improve class separability, an ArcFace loss replaces the conventional classification layer during training, and a warm-up learning rate strategy is adopted to ensure stable convergence. During inference, density-based k-nearest neighbors estimation is applied to the learned embeddings to detect acoustic anomalies.  Results and Discussions  The effectiveness of the proposed unsupervised acoustic anomaly detection framework for HTGUs is examined using data collected from eight real-world machines. As shown in Fig. 7 and Table 2, large-scale pretrained audio representations show superior capability compared with traditional features in distinguishing abnormal sounds. With the FED-KE algorithm, the framework attains high accuracy across six metrics, with Hmean reaching 98.7% in the wind tunnel and exceeding 99.9% in the slip-ring environment, indicating strong robustness under complex industrial conditions. As shown in Table 4, ablation studies confirm the complementary effects of feature enhancement, ASP-based representation refinement, and density-based k-NN inference. The framework requires only normal data for training, reducing dependence on scarce fault labels and enhancing practical applicability. Remaining challenges include computational cost introduced by the pretrained model and the absence of multimodal fusion, which will be addressed in future work.  Conclusions  An unsupervised acoustic anomaly detection framework is proposed for HTGUs, addressing the scarcity of fault samples and the complexity of industrial acoustic environments. A pretrained large-scale audio foundation model is adopted and fine-tuned with turbine-specific strategies to improve the modeling of normal operational acoustics. During inference, a density-estimation-based k-NN mechanism is applied to detect abnormal patterns using only normal data. Experiments conducted on real-world hydropower station recordings show high detection accuracy and strong generalization across different operating conditions, exceeding conventional supervised approaches. The framework introduces foundation-model-based audio representation learning into the hydro-turbine domain, provides an efficient adaptation strategy tailored to turbine acoustics, and integrates a robust density-based anomaly scoring mechanism. These components jointly reduce dependence on labeled anomalies and support practical deployment for intelligent condition monitoring. Future work will examine model compression, such as knowledge distillation, to enable on-device deployment, and explore semi-/self-supervised learning and multimodal fusion to enhance robustness, scalability, and cross-station adaptability.
A Review on Phase Rotation and Beamforming Scheme for Intelligent Reflecting Surface Assisted Wireless Communication Systems
XING Zhitong, LI Yun, WU Guangfu, XIA Shichao
Available online  , doi: 10.11999/JEIT250790
Abstract:
  Objective  Since the large-scale commercial deployment of 5G networks in 2020 and the continued development of 6G technology, modern communication systems need to function under increasingly complex channel conditions. These include ultra-high-density urban environments and remote areas such as oceanic regions, deserts, and forests. To meet these challenges, low-energy solutions capable of dynamically adjusting and reconfiguring wireless channels are required. Such solutions would improve transmission performance by lowering latency, increasing data rates, and strengthening signal reception, and would support more efficient deployment in demanding environments. The Intelligent Reflecting Surface (IRS) has gained attention as a promising approach for reshaping channel conditions. Unlike traditional active relays, an IRS operates passively and adds minimal energy consumption. When integrated with communication architectures such as Single Input Single Output (SISO), Multiple Input Single Output (MISO), and Multiple Input Multiple Output (MIMO), an IRS can improve transmission efficiency, reduce power consumption, and enhance adaptability in complex scenarios. This paper reviews IRS-assisted communication systems, with emphasis on signal transmission models, beamforming methods, and phase-shift optimization strategies.  Methods  This review examines IRS technology in modern communication systems by analyzing signal transmission models across three fundamental configurations. The discussion begins with IRS-assisted SISO systems, in which IRS control of incident signals through reflection and phase shifting improves single-antenna communication by mitigating traditional propagation constraints. The analysis then extends to MISO and MIMO architectures, where the relationship between IRS phase adjustments and MIMO precoding is assessed to determine strategies that support high spectral efficiency. Based on these transmission models, this review surveys joint optimization and precoding methods tailored for IRS-enhanced MIMO systems. These algorithms can be grouped into four categories that meet different operational requirements. The first aims to minimize power consumption by reducing total energy use while maintaining acceptable communication quality, which is important for energy-sensitive applications such as IoT systems and green communication scenarios. The second seeks to maximize energy efficiency by optimizing the ratio of achievable data rate to power consumption rather than lowering energy use alone, thereby improving performance per unit of energy. The third focuses on maximizing the sum rate by increasing aggregated throughput across users to strengthen overall system capacity in high-density 5G and 6G environments. The fourth prioritizes fairness-aware rate maximization by applying resource allocation methods that ensure equitable bandwidth distribution among users while sustaining high Quality of Service (QoS). Together, these optimization approaches provide a framework for advancing IRS-assisted MIMO systems and allow engineers and researchers to balance performance, energy efficiency, and user fairness according to specific application needs in next-generation wireless networks.  Results and Discussions  This review shows that IRS assisted communication systems provide important capabilities for next-generation wireless networks through four major advantages. First, IRS strengthens system performance by reconfiguring propagation environments and improving signal strength and coverage in non-line-of-sight conditions, including urban canyons, indoor environments, and remote regions, while also maintaining reliable connectivity in high-mobility cases such as vehicular communication. Second, the technology supports high energy efficiency because of its passive operation, which adds minimal power overhead yet improves spectral efficiency. This characteristic is valuable for sustainable large-scale IoT deployments and green 6G systems that may incorporate energy-harvesting designs. Third, IRS shows strong adaptability when integrated with different communication architectures, including SISO for basic signal enhancement, MISO for improved beamforming, and MIMO for spatial multiplexing, enabling use across environments ranging from ultra-dense urban networks to remote or airborne communication platforms. Finally, recent progress in beamforming and phase-shift optimization strengthens system performance through coherent signal combining, interference suppression in multi-user settings, and low-latency operation for time-critical applications. Machine learning methods such as deep reinforcement learning are also being investigated for real-time optimization. Together, these capabilities position IRS as a key technology for future 6G networks with the potential to support smart radio environments and broad-area connectivity, although further study is required to address challenges in channel estimation, scalability, and standardization.  Conclusions  This review highlights the potential of IRS technology in next-generation wireless communication systems. By enabling dynamic channel reconfiguration with minimal energy overhead, IRS strengthens the performance of SISO, MISO, and MIMO systems and supports reliable operation in complex propagation environments. The surveyed signal transmission models and optimization methods form a technical basis for continued development of IRS-assisted communication frameworks. As research and industry move toward 6G, IRS is expected to support ultra-reliable, low-latency, and energy-efficient global connectivity. Future studies should address practical deployment challenges such as hardware design, real-time signal processing, and progress toward standardization.
Defeating Voice Conversion Forgery by Active Defense with Diffusion Reconstruction
TIAN Haoyuan, CHEN Yuxuan, CHEN Beijing, FU Zhangjie
Available online  , doi: 10.11999/JEIT250709
Abstract:
  Objective  Voice deep generation technology is able to produce speech that is perceptually realistic. Although it enriches entertainment and everyday applications, it is also exploited for voice forgery, creating risks to personal privacy and social security. Existing active defense techniques serve as a major line of protection against such forgery, yet their performance remains limited in balancing defensive strength with the imperceptibility of defensive speech examples, and in maintaining robustness.  Methods  An active defense method against voice conversion forgery is proposed on the basis of diffusion reconstruction. The diffusion vocoder PriorGrad is used as the generator, and the gradual denoising process is guided by the diffusion prior of the target speech so that the protected speech is reconstructed and defensive speech examples are obtained directly. A multi-scale auditory perceptual loss is further introduced to suppress perturbation amplitudes in frequency bands sensitive to the human auditory system, which improves the imperceptibility of the defensive examples.  Results and Discussions  Defense experiments conducted on four leading voice conversion models show that the proposed method maintains the imperceptibility of defensive speech examples and, when speaker verification accuracy is used as the evaluation metric, improves defense ability by about 32% on average in white-box scenarios and about 16% in black-box scenarios compared with the second-best method, achieving a stronger balance between defense ability and imperceptibility (Table 2). In robustness experiments, the proposed method yields an average improvement of about 29% in white-box scenarios and about 18% in black-box scenarios under three compression attacks (Table 3), and an average improvement of about 35% in the white-box scenario and about 17% in the black-box scenario under Gaussian filtering attack (Table 4). Ablation experiments further show that the use of multi-scale auditory perceptual loss improves defense ability by 5% to 10% compared with the use of single-scale auditory perceptual loss (Table 5).  Conclusions  An active defense method against voice conversion forgery based on diffusion reconstruction is proposed. Defensive speech examples are reconstructed directly through a diffusion vocoder so that the generated audio better approximates the distribution of the original target speech, and a multi-scale auditory perceptual loss is integrated to improve the imperceptibility of the defensive speech. Experimental results show that the proposed method achieves stronger defense performance than existing approaches in both white-box and black-box scenarios and remains robust under compression coding and smoothing filtering. Although the method demonstrates clear advantages in defense performance and robustness, its computational efficiency requires further improvement. Future work is directed toward diffusion generators that operate with a single time step or fewer time steps to enhance computational efficiency while maintaining defense performance.
A Polymorphic Network Backend Compiler for Domestic Switching Chips
TU Huaqing, WANG Yuanhong, XU Qi, ZHU Jun, ZOU Tao, LONG Keping
Available online  , doi: 10.11999/JEIT250132
Abstract:
  Objective  The P4 language and programmable switching chips offer a feasible approach for deploying polymorphic networks. However, polymorphic network packets written in P4 cannot be directly executed on the domestically produced TsingMa.MX programmable switching chip developed by Centec, which necessitates the design of a specialized compiler to translate and deploy the P4 language on this chip. Existing backend compilers are mainly designed and optimized for software-programmable switches such as BMv2, FPGAs, and Intel Tofino series chips, rendering them unsuitable for compiling polymorphic network programs for the TsingMa.MX chip. To resolve this limitation, a backend compiler named p4c-TsingMa is proposed for the TsingMa.MX switching chip. This compiler enables the translation of high-level network programming languages into executable formats for the TsingMa.MX chip, thereby supporting the concurrent parsing and forwarding of multiple network modal packets.  Methods  p4c-TsingMa first employs a preorder traversal approach to extract key information, including protocol types, protocol fields, and actions, from the Intermediate Representation (IR). It then performs instruction translation to generate corresponding control commands for the TsingMa.MX chip. Additionally, p4c-TsingMa adopts a User Defined Field (UDF) entry merging method to consolidate matching instructions from different network modalities into a unified lookup table. This design enables the extraction of multiple modal matching entries in a single operation, thereby enhancing chip resource utilization.  Results and Discussions  The p4c-TsingMa compiler is implemented in C++, mapping network modal programs written in the P4 language into configuration instructions for the TsingMa.MX switching chip. A polymorphic network packet testing environment (Fig. 7) is established, where multiple types of network data packets are simultaneously transmitted to the same switch port. According to the configured flow tables, the chip successfully identifies polymorphic network data packets and forwards them to their corresponding ports (Fig. 8). Additionally, the table entry merging algorithm improves register resource utilization by 37.5% to 75%, enabling the chip to process more than two types of modal data packets concurrently.  Conclusions  A polymorphic network backend compiler, p4c-TsingMa, is designed specifically for domestic switching chips. By utilizing the FlexParser and FlexEdit functions of the TsingMa chip, the compiler translates polymorphic network programs into executable commands for the TsingMa.MX chip, enabling the chip to parse and modify polymorphic data packets. Experimental results demonstrate that p4c-TsingMa achieves high compilation efficiency and improves register resource utilization by 37.5% to 75%.
The Storage and Calculation of Biological-like Neural Networks for Locally Active Memristor Circuits
LI Fupeng, WANG Guangyi, LIU Jingbiao, YING Jiajie
Available online  , doi: 10.11999/JEIT250631
Abstract:
  Objective  At present, binaryBinary computing systems have encounteredreached bottlenecks in terms of power consumption, operation speed, and storage capacity. In contrast, the biological nervous system seems to have unlimited capacity. The biological nervous system has significant advantages indemonstrates remarkable low-power computing and dynamic storage capability, which is closely relatedattributed to the working mechanism of neurons transmitting neural signals through the directional secretion of neurotransmitters. After analyzing the Hodgkin-–Huxley model of the squid giant axon, Professor Leon Chua proposed that synapses could be composed of locally passive memristors, and neurons could be made up of locally active memristors. The two types of memristors share similar , both exhibiting electrical characteristics with similar to those of nerve fibers. Since the first experimental claim of memristors was claimed to be found, locally active memristive devices have been identified in the research of devices with layered structures. The, and circuits constructed from thosethese devices exhibit differentdisplay various types of neuromorphic _dynamics under different excitations,. However, ano single two-terminal device capable of achieving multi-state storage has not yet been reported. Locally active memristors have advantages inare advantageous for generating biologically -inspired neural signals. Various forms of locally active memristor, as their models can produce neural morphological signals based on spike pulses. The generation of neuralsuch signals involvesis achieved through the amplification and computation of stimulus signals, and its working mechanism can beexternal stimuli, a process realized using capacitance-controlled memristor oscillators. When a memristor operates in the locally active domiandomain, the output voltage of its third-order circuit undergoes a period-doubling bifurcation as the capacitance in the circuit changes regularly, forming a multi-state mapping between capacitance values and oscillating voltages. In this paper, the localwork, a locally active memristor-based third-order circuitiscircuit is used as a unit to generate neuromorphic signals, thereby forming a biologically -inspired neural operation unit, and an. An operation network can be formed based on the operation unitconstructed from such units, providing a framework for storage and computation in biological-like neural networks.  Methods  The mathematical model of the Chua Corsage Memristor proposed by Leon Chua wasis selected for analysis. The characteristics of the partial locallocally active domain wereare examined, and an appropriate operating point and together with external components were chosenis determined to establish a third-order memristor chaotic circuit. Circuit simulation and analysis were then conductedare performed on this circuit. When the memristor operates in the locally active domain, the oscillator formed by its third-order circuit can simultaneously perform the functions ofperforms signal amplification, computation, and storage. In this wayconfiguration, the third-order circuit can be performis regarded as thea nerve cell, and the variable capacitors are treated as cynapses. Enables the synapses. The electrical signal and the dielectric capacitor to workoperate in succession, allowingenabling the third-order oscillation circuit of the memristor to functionbehave like a neuron, withwhere alternating electricalelectric fields and neurotransmitters formingcapacitive dynamics mimic neurotransmitter-mediated processes to form a brain-like computing and storage system. The secretion of biological neurotransmitters hasis characterized by a threshold characteristic, and , with the membrane threshold voltage controlsregulating the secretionrelease of neurotransmitters to the postsynaptic membrane, thereby forming the transmission oftransmitting neural signals. The Analogously, the step peak value of the oscillation circuit can servefunctions as the trigger voltage for the transfer of the capacity electrolytecapacitive charge.  Results and Discussions  This study utilizes the third-order circuit of a locallocally active memristor to generate stable voltage oscillations exhibiting period-doubling bifurcation voltage signal oscillations as the external capacitance changes. The variation of varies. Changes in capacitance in the circuit causes different forms of electrical signals lead to be seriallydifferent forms of electrical signals being sequentially output at the memristor terminals of the memristor, and, with the voltage amplitude of these signals changes stably in a exhibiting stable periodic mannervariation. This results inestablishes a stable multi-state mapping relationship between the changed capacitance values and the output voltage signalsignals, thereby forming the basis of a storage and computing unit, and, subsequently, a storage and computing network. Currently, At present, it remains necessary to develop a structure that enablesallows the dielectric to transfer and change theadjust capacitance value to the next stage under the control of thea modulated voltage threshold needs to be realized. It is similar, analogous to the function of neurotransmitter secretion. The in biological systems. These results demonstrate the feasibility of using the third-order oscillation circuit of thea memristor as a storage and computing unit is expounded, and and highlight the potential for constructing a storage and computing structurearchitecture based on the change of capacitance value is obtainedvariation.  Conclusions  When the Chua Corsage Memristor operates in its locally active domain, its third-order circuit –powered solely by a voltage-stabilized source generates stable period-doubling bifurcation oscillations as the external capacitance changesvaries. The seriallysequentially output oscillating signals exhibitdisplay stable voltage amplitudes/and periods and haswith threshold characteristics. The change of the Variations in capacitance in the circuit causesinduce different forms of electrical signals to be serially output at the memristor terminals of the memristor, and the, with voltage amplitude of these signals changes stably in a periodic manner.amplitudes changing periodically and stably. This results inestablishes a stable multi-state mapping relationship between the changed capacitance values and the output voltage signalvoltages, thereby forming a storage and computing unit, and subsequently, by extension, a storage and computing network. Currently, aA structure is need to realize the transfer of thethat enables dielectric transfer and capacitance adjustment to the subordinatenext stage under the control of thea modulated voltage threshold, similar to the function of neurotransmitter secretion., still needs to be developed. The findings demonstrate the feasibility of using the third-order oscillation circuit of thea memristor as a storage and computing unit is obtained, and describe a potential storage and computing structurearchitecture based on the variation of capacitance value is describedvariation.
Adaptive Cache Deployment Based on Congestion Awareness and Content Value in LEO Satellite Networks
LIU Zhongyu, XIE Yaqin, ZHANG Yu, ZHU Jianyue
Available online  , doi: 10.11999/JEIT250670
Abstract:
  Objective  Low Earth Orbit (LEO) satellite networks are central to future space–air–ground integrated systems, offering global coverage and low-latency communication. However, their high-speed mobility leads to rapidly changing topologies, and strict onboard cache constraints hinder efficient content delivery. Existing caching strategies often overlook real-time network congestion and content attributes (e.g., freshness), which leads to inefficient resource use and degraded Quality of Service (QoS). To address these limitations, we propose an adaptive cache placement strategy based on congestion awareness. The strategy dynamically couples real-time network conditions, including link congestion and latency, with a content value assessment model that incorporates both popularity and freshness.This integrated approach enhances cache hit rates, reduces backhaul load, and improves user QoS in highly dynamic LEO satellite environments, enabling efficient content delivery even under fluctuating traffic demands and resource constraints.  Methods  The proposed strategy combines a dual-threshold congestion detection mechanism with a multi-dimensional content valuation model. It proceeds in three steps. First, satellite nodes monitor link congestion in real time using dual latency thresholds and relay congestion status to downstream nodes through data packets. Second, a two-dimensional content value model is constructed that integrates popularity and freshness. Popularity is updated dynamically using an Exponential Weighted Moving Average (EWMA), which balances historical and recent request patterns to capture temporal variations in demand. Freshness is evaluated according to the remaining data lifetime, ensuring that expired or near-expired content is deprioritized to maintain cache efficiency and relevance. Third, caching thresholds are adaptively adjusted according to congestion level, and a hop count control factor is introduced to guide caching decisions. This coordinated mechanism enables the system to prioritize high-value content while mitigating congestion, thereby improving overall responsiveness and user QoS.  Results and Discussions  Simulations conducted on ndnSIM demonstrate the superiority of the proposed strategy over PaCC (Popularity-Aware Closeness-based Caching), LCE (Leave Copy Everywhere), LCD (Leave Copy Down), and Prob (probability-based caching with probability = 0.5). The key findings are as follows. (1) Cache hit rate. The proposed strategy consistently outperforms conventional methods. As shown in Fig. 8, the cache hit rate rises markedly with increasing cache capacity and Zipf parameter, exceeding those of LCE, LCD, and Prob. Specifically, the proposed strategy achieves improvements of 43.7% over LCE, 25.3% over LCD, 17.6% over Prob, and 9.5% over PaCC. Under high content concentration (i.e., larger Zipf parameters), the improvement reaches 29.1% compared with LCE, highlighting the strong capability of the strategy in promoting high-value content distribution. (2) Average routing hop ratio. The proposed strategy also reduces routing hops compared with the baselines. As shown in Fig. 9, the average hop ratio decreases as cache capacity and Zipf parameter increase. Relative to PaCC, the proposed strategy lowers the average hop ratio by 2.24%, indicating that content is cached closer to users, thereby shortening request paths and improving routing efficiency. (3) Average request latency. The proposed strategy achieves consistently lower latency than all baseline methods. As summarized in Table 2 and Fig. 10, the reduction is more pronounced under larger cache capacities and higher Zipf parameters. For instance, with a cache capacity of 100 MB, latency decreases by approximately 2.9%, 5.8%, 9.0%, and 10.3% compared with PaCC, Prob, LCD, and LCE, respectively. When the Zipf parameter is 1.0, latency reductions reach 2.7%, 5.7%, 7.2%, and 8.8% relative to PaCC, Prob, LCD, and LCE, respectively. Concretely, under a cache capacity of 100 MB and Zipf parameter of 1.0, the average request latency of the proposed strategy is 212.37 ms, compared with 236.67 ms (LCE), 233.45 ms (LCD), 225.42 ms (Prob), and 218.62 ms (PaCC).  Conclusions  This paper presents a congestion-aware adaptive caching placement strategy for LEO satellite networks. By combining real-time congestion monitoring with multi-dimensional content valuation that considers both dynamic popularity and freshness, the strategy achieves balanced improvements in caching efficiency and network stability. Simulation results show that the proposed method markedly enhances cache hit rates, reduces average routing hops, and lowers request latency compared with existing schemes such as PaCC, Prob, LCD, and LCE. These benefits hold across different cache sizes and request distributions, particularly under resource-constrained or highly dynamic conditions, confirming the strategy’s adaptability to LEO environments. The main innovations include a closed-loop feedback mechanism for congestion status, dynamic adjustment of caching thresholds, and hop-aware content placement, which together improve resource utilization and user QoS. This work provides a lightweight and robust foundation for high-performance content delivery in satellite–terrestrial integrated networks. Future extensions will incorporate service-type differentiation (e.g., delay-sensitive vs. bandwidth-intensive services), and orbital prediction to proactively optimize cache migration and updates, further enhancing efficiency and adaptability in 6G-enabled LEO networks.
Performance Optimization of UAV-RIS-assisted Communication Networks Under No-Fly Zone Constraints
XU Junjie, LI Bin, YANG Jingsong
Available online  , doi: 10.11999/JEIT250681
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RIS) mounted on Unmanned Aerial Vehicles (UAVs) are considered an effective approach to enhance wireless communication coverage and adaptability in complex or constrained environments. However, two major challenges remain in practical deployment. The existence of No-Fly Zones (NFZs), such as airports, government facilities, and high-rise areas, restricts the UAV flight trajectory and may result in communication blind spots. In addition, the continuous attitude variation of UAVs during flight causes dynamic misalignment between the RIS and the desired reflection direction, which reduces signal strength and system throughput. To address these challenges, a UAV-RIS-assisted communication framework is proposed that simultaneously considers NFZ avoidance and UAV attitude adjustment. In this framework, a quadrotor UAV equipped with a bottom-mounted RIS operates in an environment containing multiple polygonal NFZs and a group of Ground Users (GUs). The aim is to jointly optimize the UAV trajectory, RIS phase shift, UAV attitude (represented by Euler angles), and Base Station (BS) beamforming to maximize the system sum rate while ensuring complete obstacle avoidance and stable, high-quality service for GUs located both inside and outside NFZs.  Methods  To achieve this objective, a multi-variable coupled non-convex optimization problem is formulated, jointly capturing UAV trajectory, RIS configuration, UAV attitude, and BS beamforming under NFZ constraints. The RIS phase shifts are dynamically adjusted according to the UAV orientation to maintain beam alignment, and UAV motion follows quadrotor dynamics while avoiding polygonal NFZs. Because of the high dimensionality and non-convexity of the problem, conventional optimization approaches are computationally intensive and lack real-time adaptability. To address this issue, the problem is reformulated as a Markov Decision Process (MDP), which enables policy learning through deep reinforcement learning. The Soft Actor-Critic (SAC) algorithm is employed, leveraging entropy regularization to improve exploration efficiency and convergence stability. The UAV-RIS agent interacts iteratively with the environment, updating actor-critic networks to determine UAV position, RIS phase configuration, and BS beamforming. Through continuous learning, the proposed framework achieves higher throughput and reliable NFZ avoidance, outperforming existing benchmarks.  Results and Discussions  As shown in (Fig. 3), the proposed SAC algorithm achieves higher communication rates than PPO, DDPG, and TD3 during training, benefiting from entropy-regularized exploration that prevents premature convergence. Although DDPG converges faster, it exhibits instability and inferior long-term performance. As illustrated in (Fig. 4), the UAV trajectories under different conditions demonstrate the proposed algorithm’s capability to achieve complete obstacle avoidance while maintaining reliable communication. Regardless of variations in initial UAV positions, BS locations, or NFZ configurations, the UAV consistently avoids all NFZs and dynamically adjusts its trajectory to serve users located both inside and outside restricted zones, indicating strong adaptability and scalability of the proposed model. As shown in (Fig. 5), increasing the number of BS antennas enhances system performance. The proposed framework significantly outperforms fixed phase shift, random phase shift, and non-RIS schemes because of improved beamforming flexibility.  Conclusions  This paper investigates a UAV-RIS-assisted wireless communication system in which a quadrotor UAV carries an RIS to enhance signal reflection and ensure NFZ avoidance. Unlike conventional approaches that emphasize avoidance alone, a path integral-based method is proposed to generate obstacle-free trajectories while maintaining reliable service for GUs both inside and outside NFZs. To improve generality, NFZs are represented as prismatic obstacles with regular n-sided polygonal cross-sections. The system jointly optimizes UAV trajectory, RIS phase shifts, UAV attitude, and BS beamforming. A DRL framework based on the SAC algorithm is developed to enhance system efficiency. Simulation results demonstrate that the proposed approach achieves reliable NFZ avoidance and maximized sum rate, outperforms benchmarks in communication performance, scalability, and stability.
Comparison of DeepSeek-V3.1 and ChatGPT-5 in Multidisciplinary Team Decision-making for Colorectal Liver Metastases
ZHANG Yangzi, XU Ting, GAO Zhaoya, SI Zhenduo, XU Weiran
Available online  , doi: 10.11999/JEIT250849
Abstract:
  Objective   ColoRectal Cancer (CRC) is the third most commonly diagnosed malignancy worldwide. Approximately 25~50% of patients with CRC develop liver metastases during the course of their disease, which increases the disease burden. Although the MultiDisciplinary Team (MDT) model improves survival in ColoRectal Liver Metastases (CRLM), its broader implementation is limited by delayed knowledge updates and regional differences in medical standards. Large Language Models (LLMs) can integrate multimodal data, clinical guidelines, and recent research findings, and can generate structured diagnostic and therapeutic recommendations. These features suggest potential to support MDT-based care. However, the actual effectiveness of LLMs in MDT decision-making for CRLM has not been systematically evaluated. This study assesses the performance of DeepSeek-V3.1 and ChatGPT-5 in supporting MDT decisions for CRLM and examines the consistency of their recommendations with MDT expert consensus. The findings provide evidence-based guidance and identify directions for optimizing LLM applications in clinical practice.  Methods   Six representative virtual CRLM cases are designed to capture key clinical dimensions, including colorectal tumor recurrence risk, resectability of liver metastases, genetic mutation profiles (e.g., KRAS/BRAF mutations, HER2 amplification status, and microsatellite instability), and patient functional status. Using a structured prompt strategy, MDT treatment recommendations are generated separately by the DeepSeek-V3.1 and ChatGPT-5 models. Independent evaluations are conducted by four MDT specialists from gastrointestinal oncology, gastrointestinal surgery, hepatobiliary surgery, and radiation oncology. The model outputs are scored using a 5-point Likert scale across seven dimensions: accuracy, comprehensiveness, frontier relevance, clarity, individualization, hallucination risk, and ethical safety. Statistical analysis is performed to compare the performance of DeepSeek-V3.1 and ChatGPT-5 across individual cases, evaluation dimensions, and clinical disciplines.  Results and Discussions   Both LLMs, DeepSeek-V3.1 and ChatGPT-5, show robust performance across all six virtual CRLM cases, with an average overall score of ≥ 4.0 on a 5-point scale. This performance indicates that clinically acceptable decision support is provided within a complex MDT framework. DeepSeek-V3.1 shows superior overall performance compared with ChatGPT-5 (4.27±0.77 vs. 4.08±0.86, P=0.03). Case-by-case analysis shows that DeepSeek-V3.1 performs significantly better in Cases 1, 4, and 6 (P=0.04, P<0.01, and P =0.01, respectively), whereas ChatGPT-5 receives higher scores in Case 2 (P<0.01). No significant differences are observed in Cases 3 and 5 (P=0.12 and P=1.00, respectively), suggesting complementary strengths across clinical scenarios (Table 3). In the multidimensional assessment, both models receive high scores (range: 4.12\begin{document}$ \sim $\end{document}4.87) in clarity, individualization, hallucination risk, and ethical safety, confirming that readable, patient-tailored, reliable, and ethically sound recommendations are generated. Improvements are still needed in accuracy, comprehensiveness, and frontier relevance (Fig. 1). DeepSeek-V3.1 shows a significant advantage in frontier relevance (3.90±0.65 vs. 3.24±0.72, P=0.03) and ethical safety (4.87±0.34 vs. 4.58±0.65, P= 0.03) (Table 4), indicating more effective incorporation of recent evidence and more consistent delivery of ethically robust guidance. For the case with concomitant BRAF V600E and KRAS G12D mutations, DeepSeek-V3.1 accurately references a phase III randomized controlled study published in the New England Journal of Medicine in 2025 and recommends a triple regimen consisting of a BRAF inhibitor + EGFR monoclonal antibody + FOLFOX. By contrast, ChatGPT-5 follows conventional recommendations for RAS/BRAF mutant populations-FOLFOXIRI+bevacizumab-without integrating recent evidence on targeted combination therapy. This difference shows the effect of timely knowledge updates on the clinical value of LLM-generated recommendations. For MSI-H CRLM, ChatGPT-5’s recommendation of “postoperative immunotherapy” is not supported by phase III evidence or existing guidelines. Direct use of such recommendations may lead to overtreatment or ineffective therapy, representing a clear ethical concern and illustrating hallucination risks in LLMs. Discipline-specific analysis shows notable variation. In radiation oncology, DeepSeek-V3.1 provides significantly more precise guidance on treatment timing, dosage, and techniques than ChatGPT-5 (4.55±0.67 vs. 3.38±0.91, P<0.01), demonstrating closer alignment with clinical guidelines. In contrast, ChatGPT-5 performs better in gastrointestinal surgery (4.48±0.67 vs. 4.17 ±0.85, P=0.02), with experts rating its recommendations on surgical timing and resectability as more concise and accurate. No significant differences are identified in gastrointestinal oncology and hepatobiliary surgery (P=0.89 and P=0.14, respectively), indicating comparable performance in these areas (Table 5). These findings show a performance bias across medical sub-specialties, demonstrating that LLM effectiveness depends on the distribution and quality of training data.  Conclusions   Both DeepSeek-V3.1 and ChatGPT-5 demonstrated strong capabilities in providing reliable recommendations for CRLM-MDT decision-making. Specifically, DeepSeek-V3.1 showed notable advantages in integrating cutting-edge knowledge, ensuring ethical safety, and performing in the field of radiation oncology, whereas ChatGPT-5 excelled in gastrointestinal surgery, reflecting a complementary strength between the two models. This study confirms the feasibility of leveraging LLMs as “MDT collaborators”, offering a readily applicable and robust technical solution to bridge regional disparities in clinical expertise and enhance the efficiency of decision-making. However, model hallucination and insufficient evidence grading remain key limitations. Moving forward, mechanisms such as real-world clinical validation, evidence traceability, and reinforcement learning from human feedback are expected to further advance LLMs into more powerful auxiliary tools for CRLM-MDT decision support.
A Deception Jamming Discrimination Algorithm Based on Phase Fluctuation for Airborne Distributed Radar System
LV Zhuoyu, YANG Chao, SUO Chengyu, WEN Cai
Available online  , doi: 10.11999/JEIT240787
Abstract:
  Objective   Deception jamming in airborne distributed radar systems presents a crucial challenge, as false echoes generated by Digital Radio Frequency Memory (DRFM) devices tend to mimic true target returns in amplitude, delay, and Doppler characteristics. These similarities complicate target recognition and subsequently degrade tracking accuracy. To address this problem, attention is directed to phase fluctuation signatures, which differ inherently between authentic scattering responses and synthesized interference replicas. Leveraging this distinction is proposed as a means of improving discrimination reliability under complex electromagnetic confrontation conditions.  Methods   A signal-level fusion discrimination algorithm is proposed based on phase fluctuation variance. Five categories of synchronization errors that affect the phase of received echoes are analyzed and corrected, including filter mismatch, node position errors, and equivalent amplitude-phase deviations. Precise matched filters are constructed through a fine-grid iterative search to eliminate residual phase distortion caused by limited sampling resolution. Node position errors are estimated using a DRFM-based calibration array, and equivalent amplitude-phase deviations are corrected through an eigendecomposition-based procedure. After calibration, phase vectors associated with target returns are extracted, and the variance of these vectors is taken as the discrimination criterion. Authentic targets present large phase fluctuations due to complex scattering, whereas DRFM-generated replicas exhibit only small variations.  Results and Discussions   Simulation results show that the proposed method achieves reliable discrimination under typical airborne distributed radar conditions. When the signal-to-noise ratio is 25 dB and the jamming-to-noise ratio is 3 dB, the misjudgment rate for false targets approaches zero when more than five receiving nodes are used (Fig.10, Fig.11). The method remains robust even when only a few false targets are present and performs better than previously reported approaches, where discrimination fails in single- or dual-false-target scenarios (Fig.14). High recognition stability is maintained across different jamming-to-noise ratios and receiver quantities (Fig.13). The importance of system-level error correction is confirmed, as discrimination accuracy declines significantly when synchronization errors are not compensated (Fig.12).  Conclusions   A phase-fluctuation-based discrimination algorithm for airborne distributed radar systems is presented. By correcting system-level errors and exploiting the distinct fluctuation behavior of phase signatures from real and false echoes, the method achieves reliable deception-jamming discrimination in complex electromagnetic environments. Simulations indicate stable performance under varying numbers of false targets, demonstrating good applicability for distributed configurations. Future work will aim to enhance robustness under stronger environmental noise and clutter.
Performance Analysis for Self-Sustainable Intelligent Metasurface Based Reliable and Secure Communication Strategies
QU Yayun, CAO Kunrui, WANG Ji, XU Yongjun, CHEN Jingyu, DING Haiyang, JIN Liang
Available online  , doi: 10.11999/JEIT250637
Abstract:
  Objective  The Reconfigurable Intelligent Surface (RIS) is generally powered by a wired method, and its power cable functions as a “tail” that restricts RIS maneuverability during outdoor deployment. A Self-Sustainable Intelligent Metasurface (SIM) that integrates RIS with energy harvesting is examined, and an amplified SIM architecture is presented. The reliability and security of SIM communication are analyzed, and the analysis provides a basis for its rational deployment in practical design.  Methods   The static wireless-powered and dynamic wireless-powered SIM communication strategies are proposed to address the energy and information outage challenges faced by SIM. The communication mechanism of the un-amplified SIM and amplified SIM (U-SIM and A-SIM) under these two strategies is examined. New integrated performance metrics of energy and information, termed joint outage probability and joint intercept probability, are proposed to evaluate the strategies from the perspectives of communication reliability and communication security.  Results and Discussions   The simulations evaluate the effect of several critical parameters on the communication reliability and security of each strategy. The results indicate that: (1) Compared to alternative schemes, at low base station transmit power, A-SIM achieves optimal reliability under the dynamic wireless-powered strategy and optimal security under the static wireless-powered strategy (Figs. 2 and 3). (2) Under the same strategy type, increasing the number of elements at SIM generally enhances reliability but reduces security. With a large number of elements, U-SIM maintains higher reliability than A-SIM, while A-SIM achieves higher security than U-SIM (Figs. 4 and 5). (3) An optimal amplification factor maximizes communication reliability for SIM systems (Fig. 6).  Conclusions   The results show that the dynamic wireless-powered strategy can mitigate the reduction in the reliability of SIM communication caused by insufficient energy. Although the amplified noise of A-SIM decreases reliability, it can improve security. Under the same static or dynamic strategies, as the number of elements at SIM increases, A-SIM provides better security, whereas U-SIM provides better reliability.
Energy Consumption Optimization of Cooperative NOMA Secure Offload for Mobile Edge Computing
CHEN Jian, MA Tianrui, YANG Long, LÜ Lu, XU Yongjun
Available online  , doi: 10.11999/JEIT250606
Abstract:
  Objective  Mobile Edge Computing (MEC) is used to strengthen the computational capability and response speed of mobile devices by shifting computing and caching functions to the network edge. Non-Orthogonal Multiple Access (NOMA) further supports high spectral efficiency and large-scale connectivity. Because wireless channels are broadcast, the MEC offload transmission process is exposed to potential eavesdropping. To address this risk, physical-layer security is integrated into a NOMA-MEC system to safeguard secure offloading. Existing studies mainly optimize performance metrics such as energy use, latency, and throughput, or improve security through NOMA-based co-channel interference and cooperative interference. However, the combined effect of performance and security has not been fully examined. To reduce the energy required for secure offloading, a cooperative NOMA secure offload scheme is designed. The distinctive feature of the proposed scheme is that cooperative nodes provide forwarding and computational assistance at the same time. Through joint local computation between users and cooperative nodes, the scheme strengthens security in the offload process while reducing system energy consumption.  Methods  The joint design of computational and communication resource allocation for the nodes is examined by dividing the offloading procedure into two stages: NOMA offloading and cooperative offloading. Offloading strategies for different nodes in each stage are considered, and an optimization problem is formulated to minimize the weighted total system energy consumption under secrecy outage constraints. To handle the coupled multi-variable and non-convex structure, secrecy transmission rate constraints and secrecy outage probability constraints, originally expressed in probabilistic form, are first transformed. The main optimization problem is then separated into two subproblems: slot and task allocation, and power allocation. For the non-convex power allocation subproblem, the non-convex constraints are replaced with bilinear substitutions, and sequential convex approximations are applied. An alternating iterative resource allocation algorithm is ultimately proposed, allowing the load, power, and slot assignment between users and cooperative nodes to be adjusted according to channel conditions so that energy consumption is minimized while security requirements are satisfied.  Results and Discussions  Theoretical analysis and simulation results show that the proposed scheme converges quickly and maintains low computational complexity. Relative to existing NOMA full-offloading schemes, assisted computing schemes, and NOMA cooperative interference schemes, the proposed offloading design reduces system energy consumption and supports a higher load under identical secrecy constraints. The scheme also demonstrates strong robustness, as its performance is less affected by weak channel conditions or increased eavesdropping capability.  Conclusions  The study shows that system energy consumption and security constraints are closely coupled. In the MECg offloading process, communication, computation, and security are not independent. Performance and security can be improved at the same time through the effective use of cooperative nodes. When cooperative nodes are present, NOMA and forwarding cooperation can reduce the effects of weak channel conditions or high eavesdropping risks on secure and reliable transmission. Cooperative nodes can also share users’ local computational load to strengthen overall system performance. Joint local computation between users and cooperative nodes further reduces the security risks associated with long-distance wireless transmission. Thus, secure offloading in MEC is not only a Physical Layer Security issue in wireless transmission but also reflects the coupled relationship between communication and computation that is specific to MEC. By making full use of idle resources in the network, cooperative communication and computation among idle nodes can enhance system security while maintaining performance.
High Area-efficiency Radix-4 Number Theoretic Transform Hardware Architecture with Conflict-free Memory Access Optimization for Lattice-based Cryptography
ZHENG Jiwen, ZHAO Shilei, ZHANG Ziyue, LIU Zhiwei, YU Bin, HUANG Hai
Available online  , doi: 10.11999/JEIT250687
Abstract:
  Objective  The advancement of Post-Quantum Cryptography (PQC) standardization increases the demand for efficient Number Theoretic Transform (NTT) hardware modules. Existing high-radix NTT studies primarily optimize in-place computation and configurability, yet the performance is constrained by complex memory access behavior and a lack of designs tailored to the parameter characteristics of lattice-based schemes. To address these limitations, a high area-efficiency radix-4 NTT design with a Constant-Geometry (CG) structure is proposed. The modular multiplication unit is optimized through an analysis of common modulus properties and the integration of multi-level operations, while memory allocation and address-generation strategies are refined to reduce capacity requirements and improve data-access efficiency. The design supports out-of-place storage and achieves conflict-free memory access, providing an effective hardware solution for radix-4 CG NTT implementation.  Methods  At the algorithmic level, the proposed radix-4 CG NTT/INTT employs a low-complexity design and removes the bit-reversal step to reduce multiplication count and computation cycles, with a redesigned twiddle-factor access scheme. For the modular multiplication step, which is the most time-consuming stage in the radix-4 butterfly, the critical path is shortened by integrating the multiplication with the first-stage K−RED reduction and simplifying the correction logic. To support three parameter configurations, a scalable modular-multiplication method is developed through an analysis of the shared properties of the moduli. At the architectural level, two coefficients are concatenated and stored at the same memory address. A data-decomposition and reorganization scheme is designed to coordinate memory interaction with the dual-butterfly units efficiently. To achieve conflict-free memory access, a cyclic memory-reuse strategy is employed, and read and write address-generation schemes using sequential and stepped access patterns are designed, which reduces required memory capacity and lowers control-logic complexity.  Results and Discussions  Experimental results on Field Programmable Gate Arrays demonstrate that the proposed NTT architecture achieves high operating frequency and low resource consumption under three parameter configurations, together with notable improvement in the Area–Time Product (ATP) compared with existing designs (Table 1). For the configuration with 256 terms and a modulus of 7 681, the design uses 2 397 slices, 4 BRAMs, and 16 DSPs, achieves an operating frequency of 363 MHz, and yields at least a 56.4% improvement in ATP. For the configuration with 256 terms and a modulus of 8 380 417, it uses 3 760 slices, 6 BRAMs, and 16 DSPs, achieves an operating frequency of 338 MHz, and yields at least a 69.8% improvement in ATP. For the configuration with 1 024 terms and a modulus of 12 289, it uses 2 379 slices, 4 BRAMs, and 16 DSPs, achieves an operating frequency of 357 MHz, and yields at least a 50.3% improvement in ATP.  Conclusions  A high area-efficiency radix-4 NTT hardware architecture for lattice-based PQC is proposed. The use of a low-complexity radix-4 CG NTT/INTT and the removal of the bit-reversal step reduce latency. Through an analysis of shared characteristics among three moduli and the merging of partial computations, a scalable modular-multiplication architecture based on K²−RED reduction is designed. The challenges of increased storage requirements and complex address-generation logic are addressed by reusing memory efficiently and designing sequential and stepped address-generation schemes. Experimental results show that the proposed design increases operating frequency and reduces resource consumption, yielding lower ATP under all three parameter configurations. As the present work focuses on a dual-butterfly architecture, future research may examine higher-parallelism designs to meet broader performance requirements.
LLM-based Data Compliance Checking for Internet of Things Scenarios
LI Chaohao, WANG Haoran, ZHOU Shaopeng, YAN Haonan, ZHANG Feng, LU Tianyang, XI Ning, WANG Bin
Available online  , doi: 10.11999/JEIT250704
Abstract:
  Objective  The implementation of regulations such as the Data Security Law of the People’s Republic of China, the Personal Information Protection Law of the People’s Republic of China, and the European Union General Data Protection Regulation (GDPR) has established data compliance checking as a central mechanism for regulating data processing activities, ensuring data security, and protecting the legitimate rights and interests of individuals and organizations. However, the characteristics of the Internet of Things (IoT), defined by large numbers of heterogeneous devices and the dynamic, extensive, and variable nature of transmitted data, increase the difficulty of compliance checking. Logs and traffic data generated by IoT devices are long, unstructured, and often ambiguous, which results in a high false-positive rate when traditional rule-matching methods are applied. In addition, the dynamic business environments and user-defined compliance requirements further increase the complexity of rule design, maintenance, and decision-making.  Methods  A large language model-driven data compliance checking method for IoT scenarios is proposed to address the identified challenges. In the first stage, a fast regular expression matching algorithm is employed to efficiently screen potential non-compliant data based on a comprehensive rule database. This process produces structured preliminary checking results that include the original non-compliant content and the corresponding violation type. The rule database incorporates current legislation and regulations, standard requirements, enterprise norms, and customized business requirements, and it maintains flexibility and expandability. By relying on the efficiency of regular expression matching and generating structured preliminary results, this stage addresses the difficulty of reviewing large volumes of long IoT text data and enhances the accuracy of the subsequent large language model review. In the second stage, a Large Language Model (LLM) is employed to evaluate the precision of the initial detection results. For different categories of violations, the LLM adaptively selects different prompt words to perform differentiated classification detection.  Results and Discussions  Data are collected from 52 IoT devices operating in a real environment, including log and traffic data (Table 2). A compliance-checking rule library for IoT devices is established in accordance with the Cybersecurity Law, the Data Security Law, other relevant regulations, and internal enterprise information-security requirements. Based on this library, the collected data undergo a first-stage rule-matching process, yielding a false-positive rate of 64.3% and identifying 55 080 potential non-compliant data points. Three aspects are examined: benchmark models, prompt schemes, and role prompts. In the benchmark model comparison, eight mainstream large language models are used to evaluate detection performance (Table 5), including Qwen2.5-32B-Instruct, DeepSeek-R1-70B, and DeepSeek-R1-0528 with different parameter configurations. After review and testing by the large language model, the initial false-positive rate is reduced to 6.9%, which demonstrates a substantial improvement in the quality of compliance checking. The model’s own error rate remains below 0.01%. The prompt-engineering assessment shows that prompt design exerts a strong effect on review accuracy (Table 6). When general prompts are applied, the final false-positive rate remains high at 59%. When only chain-of-thought prompts or concise sample prompts are used, the false-positive rate is reduced to approximately 12% and 6%, respectively, and the model’s own error rate decreases to about 30% and 13%. Combining these strategies further reduces the error rate of the small-sample prompt approach to 0.01%. The effect of system-role prompt words on review accuracy is also evaluated (Table 7). Simple role prompts yield higher accuracy and F1 scores than the absence of role prompts, whereas detailed role prompts provide a clearer overall advantage than simple role prompts. Ablation experiments (Table 8) further examine the contribution of rule classification and prompt engineering to compliance checking. Knowledge supplementation is applied to reduce interference and misjudgment among rules, lower prompt redundancy, and decrease the false-alarm rate during large language model review.  Conclusions  A large language model-driven data compliance checking method for IoT scenarios is presented. The method is designed to address the challenge of assessing compliance in large-scale unstructured device data. Its feasibility is verified through rationality analysis experiments, and the results indicate that false-positive rates are effectively reduced during compliance checking. The initial rule-based method yields a false-positive rate of 64.3%, which is reduced to 6.9% after review by the large language model. Additionally, the error introduced by the model itself is maintained below 0.01%.
An Overview on Integrated Sensing and Communication for Low altitude economy
ZHU Zhengyu, WEN Xinping, LI Xingwang, WEI Zhiqing, ZHANG Peichang, LIU Fan, FENG Zhiyong
Available online  , doi: 10.11999/JEIT250747
Abstract:
The Low-altitude Internet of Things (IoT) develops rapidly, and the Low Altitude Economy is treated as a national strategic emerging industry. Integrated Sensing and Communication (ISAC) for the Low Altitude Economy is expected to support more complex tasks in complex environments and provides a foundation for improved security, flexibility, and multi-application scenarios for drones. This paper presents an overview of ISAC for the Low Altitude Economy. The theoretical foundations of ISAC and the Low Altitude Economy are summarized, and the advantages of applying ISAC to the Low Altitude Economy are discussed. Potential applications of key 6G technologies, such as covert communication and Millimeter-Wave (mm-wave) systems in ISAC for the Low Altitude Economy, are examined. The key technical challenges of ISAC for the Low Altitude Economy in future development are also summarized.  Significance   The integration of UAVs with ISAC technology is expected to provide considerable advantages in future development. When ISAC is applied, the overall system payload can be reduced, which improves UAV maneuverability and operational freedom. This integration offers technical support for versatile UAV applications. With ISAC, low-altitude network systems can conduct complex tasks in challenging environments. UAV platforms equipped with a single function do not achieve the combined improvement in communication and sensing that ISAC enables. ISAC-equipped drones are therefore expected to be used more widely in aerial photography, agriculture, surveying, remote sensing, and telecommunications. This development will advance related theoretical and technical frameworks and broaden the application scope of ISAC.  Progress  ISAC networks for the low-altitude economy offer efficient and flexible solutions for military reconnaissance, emergency disaster relief, and smart city management. The open aerial environment and dynamic deployment requirements create several challenges. Limited stealth increases exposure to hostile interception, and complex terrains introduce signal obstruction. High bandwidth and low latency are also required. Academic and industrial communities have investigated technologies such as covert communication, intelligent reflecting surfaces, and mm-wave communication to enhance the reliability and intelligence of ISAC in low-altitude operational scenarios.  Conclusions  This paper presents an overview of current applications, critical technologies, and ongoing challenges associated with ISAC in low-altitude environments. It examines the integration of emerging 6G technologies, including covert communication, Reconfigurable Intelligent Surfaces (RIS), and mm-wave communication within ISAC frameworks. Given the dynamic and complex characteristics of low-altitude operations, recent advances in UAV swarm power control algorithms and covert trajectory optimization based on deep reinforcement learning are summarized. Key unresolved challenges are also identified, such as spatiotemporal synchronization, multi-UAV resource allocation, and privacy preservation, which provide reference directions for future research.  Prospects   ISAC technology provides precise and reliable support for drone logistics, urban air mobility, and large-scale environmental monitoring in the low-altitude economy. Large-scale deployment of ISAC systems in complex and dynamic low-altitude environments remains challenging. Major obstacles include limited coordination and resource allocation within UAV swarms, spatiotemporal synchronization across heterogeneous devices, competing requirements between sensing and communication functions, and rising concerns regarding privacy and security in open airspace. These issues restrict the high-quality development of the low-altitude economy.
Finite-time Adaptive Sliding Mode Control of Servo Motors Considering Frictional Nonlinearity and Unknown Loads
ZHANG Tianyu, GUO Qinxia, YANG Tingkai, GUO Xiangji, MING Ming
Available online  , doi: 10.11999/JEIT250521
Abstract:
  Objective  Ultra-fast laser processing with an infinite field of view requires servo motor systems with superior tracking accuracy and robustness. However, such systems are highly nonlinear and affected by coupled unknown load disturbances and complex friction, which constrain the performance of conventional controllers. Although Sliding Mode Control (SMC) exhibits inherent robustness, traditional SMC and observer designs cannot achieve accurate finite-time disturbance compensation under strong nonlinearities, thus limiting high-speed and high-precision trajectory tracking. To address this limitation, a novel finite-time adaptive SMC approach is proposed to ensure rapid and precise angular position tracking within a finite time, satisfying the stringent synchronization requirements of advanced laser processing systems.  Methods  A novel control strategy is developed by integrating an adaptive disturbance observer fused with a Radial Basis Function Neural Network (RBFNN) and finite-time Sliding Mode Control (SMC). First, the unknown load disturbance and complex frictional nonlinear dynamics are combined into a unified "lumped disturbance" term, improving model generality and the ability to represent real operating conditions. Second, a finite-time adaptive disturbance observer is constructed to estimate this lumped disturbance. The observer utilizes the universal approximation capability of the RBFNN to learn and approximate the dynamic characteristics of unknown disturbances online. Simultaneously, a finite-time adaptive law based on the error norm is introduced to update the neural network weights in real time, ensuring rapid and accurate finite-time estimation of the lumped disturbance while reducing dependence on precise model parameters. Based on this design, a finite-time SMC is developed. The controller uses the observer’s disturbance estimation as a feedforward compensation term, incorporates a carefully formulated finite-time sliding surface and equivalent control law, and introduces a saturation function to suppress control input chattering. A suitable Lyapunov function is then constructed, and the finite-time stability theory is rigorously applied to prove the practical finite-time convergence of both the adaptive observer and the closed-loop control system, guaranteeing that the system tracking error converges to a bounded neighborhood near the origin within finite time.  Results and Discussions  To verify the effectiveness and superiority of the proposed control strategy, a typical Permanent Magnet Synchronous Motor (PMSM) servo system model is constructed in the MATLAB environment, and a simulation scenario with desired trajectories of varying frequencies is established. The proposed method is comprehensively compared with the widely used Proportional–Integral (PI) control and the advanced method reported in reference [7]. Simulation results demonstrate the following: 1. Tracking performance: Under various reference trajectories, the proposed controller enables the system to accurately follow the target trajectory with a tracking error substantially smaller than that of the PI controller. Compared with the method in reference [7], it achieves smoother responses and smaller residual errors, effectively eliminating the chattering observed in some operating conditions of the latter. 2 Disturbance rejection and robustness: The adaptive disturbance observer based on the RBFNN rapidly and effectively learns and compensates for the lumped disturbance composed of unknown load variations and frictional nonlinearities. Even in the presence of these disturbances, the proposed controller maintains high-precision trajectory tracking, demonstrating strong disturbance rejection and robustness to system parameter variations. 3. Control input characteristics: Compared with the reference methods, the control signal of the proposed approach quickly stabilizes after the initial transient phase, effectively suppressing chattering caused by high-frequency switching. The amplitude range of the control input remains reasonable, facilitating practical actuator implementation. 4. Comprehensive evaluation: Based on multiple error performance indices, including Integral Squared Error (ISE), Integral Absolute Error (IAE), Time-weighted Integral Absolute Error (ITAE), and Time-weighted Integral Squared Error (ITSE), the proposed controller consistently outperforms both PI control and the method in reference [7]. It demonstrates comprehensive advantages in suppressing transient errors rapidly and reducing overall error accumulation. The method also improves steady-state accuracy and achieves a balanced response speed with effective noise attenuation. 5. Observer performance: The RBFNN weight norm estimation converges rapidly and stabilizes at a low level after initial adaptation, confirming the effectiveness of the proposed adaptive law and the learning efficiency of the observer.  Conclusions  A finite-time sliding mode control strategy with an adaptive disturbance observer is proposed for servo systems used in ultra-fast laser processing. The method models unknown load disturbances and frictional nonlinearities as a lumped disturbance term. An adaptive observer, integrating an RBF neural network with a finite-time mechanism, accurately estimates this disturbance for real-time compensation. Based on the observer, a finite-time SMC law is formulated, and the practical finite-time stability of the closed-loop system is theoretically proven. Simulations conducted on a permanent magnet synchronous motor platform confirm that the proposed approach achieves superior tracking accuracy, robustness, and control smoothness compared with conventional PI and existing advanced methods. This work offers an effective solution for achieving high-precision control in nonlinear systems subject to strong disturbances.
A Survey of Lightweight Techniques for Segment Anything Model
LUO Yichang, QI Xiyu, ZHANG Borui, SHI Hanru, ZHAO Yan, WANG Lei, LIU Shixiong
Available online  , doi: 10.11999/JEIT250894
Abstract:
  Objective  The Segment Anything Model (SAM) demonstrates strong zero-shot generalization in image segmentation and sets a new direction for visual foundation models. The original SAM, especially the ViT-Huge version with about 637 million parameters, requires high computational resources and substantial memory. This restricts deployment in resource-limited settings such as mobile devices, embedded systems, and real-time tasks. Growing demand for efficient and deployable vision models has encouraged research on lightweight variants of SAM. Existing reviews describe applications of SAM, yet a structured summary of lightweight strategies across model compression, architectural redesign, and knowledge distillation is still absent. This review addresses this need by providing a systematic analysis of current SAM lightweight research, classifying major techniques, assessing performance, and identifying challenges and future research directions for efficient visual foundation models.  Methods  This review examines recent studies on SAM lightweight methods published in leading conferences and journals. The techniques are grouped into three categories based on their technical focus. The first category, Model Compression and Acceleration, covers knowledge distillation, network pruning, and quantization. The second category, Efficient Architecture Design, replaces the ViT backbone with lightweight structures or adjusts attention mechanisms. The third category, Efficient Feature Extraction and Fusion, refines the interaction between the image encoder and prompt encoder. A comparative assessment is conducted for representative studies, considering model size, computational cost, inference speed, and segmentation accuracy on standard benchmarks (Table 3).  Results and Discussions  The reviewed models achieve clear gains in inference speed and parameter efficiency. MobileSAM reduces the model to 9.6 M parameters, and Lite-SAM reaches up to 16× acceleration while maintaining suitable segmentation accuracy. Approaches based on knowledge distillation and hybrid design support generalization across domains such as medical imaging, video segmentation, and embedded tasks. Although accuracy and speed still show a degree of tension, the selection of a lightweight strategy depends on the intended application. Challenges remain in prompt design, multi-scale feature fusion, and deployment on low-power hardware platforms.  Conclusions  This review provides an overview of the rapidly developing field of SAM lightweight research. The development of efficient SAM models is a multifaceted challenge that requires a combination of compression, architectural innovation, and optimization strategies. Current studies show that real-time performance on edge devices can be achieved with a small reduction in accuracy. Although progress is evident, challenges remain in handling complex scenarios, reducing the cost of distillation data, and establishing unified evaluation benchmarks. Future research is expected to emphasize more generalizable lightweight architectures, explore data-free or few-shot distillation approaches, and develop standardized evaluation protocols that consider both accuracy and efficiency.
Key Technologies for Low-Altitude Internet Networks: Architecture, Security, and Optimization
WANG Yuntao, SU Zhou, GAO Yuan, BA Jianle
Available online  , doi: 10.11999/JEIT250947
Abstract:
Low-Altitude Intelligent Networks (LAINs) function as a core infrastructure for the emerging low-altitude digital economy by connecting humans, machines, and physical objects through the integration of manned and unmanned aircraft with ground networks and facilities. This paper provides a comprehensive review of recent research on LAINs from four perspectives: network architecture, resource optimization, security threats and protection, and large model-enabled applications. First, existing standards, general architecture, key characteristics, and networking modes of LAINs are investigated. Second, critical issues related to airspace resource management, spectrum allocation, computing resource scheduling, and energy optimization are discussed. Third, existing/emerging security threats across sensing, network, application, and system layers are assessed, and multi-layer defense strategies in LAINs are reviewed. Furthermore, the integration of large model technologies with LAINs is also analyzed, highlighting their potential in task optimization and security enhancement. Future research directions are discussed to provide theoretical foundations and technical guidance for the development of efficient, secure, and intelligent LAINs.  Significance   LAINs support the low-altitude economy by enabling the integration of manned and unmanned aircraft with ground communication, computing, and control networks. By providing real-time connectivity and collaborative intelligence across heterogeneous platforms, LAINs support applications such as precision agriculture, public safety, low-altitude logistics, and emergency response. However, LAINs continue to face challenges created by dynamic airspace conditions, heterogeneous platforms, and strict real-time operational requirements. The development of large models also presents opportunities for intelligent resource coordination, proactive defense, and adaptive network management, which signals a shift in the design and operation of low-altitude networks.  Progress  Recent studies on LAINs have reported progress in network architecture, resource optimization, security protection, and large model integration. Architecturally, hierarchical and modular designs are proposed to integrate sensing, communication, and computing resources across air, ground, and satellite networks, which enables scalable and interoperable operations. In system optimization research, attention is given to airspace resource management, spectrum allocation, computing offloading, and energy-efficient scheduling through distributed optimization and AI-driven orchestration methods. In security research, multi-layer defense frameworks are developed to address sensing-layer spoofing, network-layer intrusions, and application-layer attacks through cross-layer threat intelligence and proactive defense mechanisms. Large Language Models (LLMs), Vision-Language Models (VLMs), and Multimodal LLMs (MLLMs) also support intelligent task planning, anomaly detection, and autonomous decision-making in complex low-altitude environments, which enhances the resilience and operational efficiency of LAINs.  Conclusions  This survey provides a comprehensive review of the architecture, security mechanisms, optimization techniques, and large model applications in LAINs. The challenges in multi-dimensional resource coordination, cross-layer security protection, and real-time system adaptation are identified, and existing or potential approaches to address these challenges are analyzed. By synthesizing recent research on architectural design, system optimization, and security defense, this work offers a unified perspective for researchers and practitioners aiming to build secure, efficient, and scalable LAIN systems. The findings emphasize the need for integrated solutions that combine algorithmic intelligence, system engineering, and architectural innovation to meet future low-altitude network demands.  Prospects  Future research on LAINs is expected to advance the integration of architecture design, intelligent optimization, security defense, and privacy preservation technologies to meet the demands of rapidly evolving low-altitude ecosystems. Key directions include developing knowledge-driven architectures for cross-domain semantic fusion, service-oriented network slicing, and distributed autonomous decision-making. Furthermore, research should also focus on proactive cross-layer security mechanisms supported by large models and intelligent agents, efficient model deployment through AI-hardware co-design and hierarchical computing architectures, and improved multimodal perception and adaptive decision-making to strengthen system resilience and scalability. In addition, establishing standardized benchmarks, open-source frameworks, and realistic testbeds is essential to accelerate innovation and ensure secure, reliable, and intelligent deployment of LAIN systems in real-world environments.
A Learning-Based Security Control Method for Cyber-Physical Systems Based on False Data Detection
MIAO Jinzhao, LIU Jinliang, SUN Le, ZHA Lijuan, TIAN Engang
Available online  , doi: 10.11999/JEIT250537
Abstract:
  Objective  Cyber-Physical Systems (CPS) constitute the backbone of critical infrastructures and industrial applications, but the tight coupling of cyber and physical components renders them highly susceptible to cyberattacks. False data injection attacks are particularly dangerous because they compromise sensor integrity, mislead controllers, and can trigger severe system failures. Existing control strategies often assume reliable sensor data and lack resilience under adversarial conditions. Furthermore, most conventional approaches decouple attack detection from control adaptation, leading to delayed or ineffective responses to dynamic threats. To overcome these limitations, this study develops a unified secure learning control framework that integrates real-time attack detection with adaptive control policy learning. By enabling the dynamic identification and mitigation of false data injection attacks, the proposed method enhances both stability and performance of CPS under uncertain and adversarial environments.  Methods  To address false data injection attacks in CPS, this study proposes an integrated secure control framework that combines attack detection, state estimation, and adaptive control strategy learning. A sensor grouping-based security assessment index is first developed to detect anomalous sensor data in real time without requiring prior knowledge of attacks. Next, a multi-source sensor fusion estimation method is introduced to reconstruct the system’s true state, thereby improving accuracy and robustness under adversarial disturbances. Finally, an adaptive learning control algorithm is designed, in which dynamic weight updating via gradient descent approximates the optimal control policy online. This unified framework enhances both steady-state performance and resilience of CPS against sophisticated attack scenarios. Its effectiveness and security performance are validated through simulation studies under diverse false data injection attack settings.  Results and Discussions  Simulation results confirm the effectiveness of the proposed secure adaptive learning control framework under multiple false data injection attacks in CPS. As shown in Fig. 1, system states rapidly converge to steady values and maintain stability despite sensor attacks. Fig. 2 demonstrates that the fused state estimator tracks the true system state with greater accuracy than individual local estimators. In Fig. 3, the compensated observation outputs align closely with the original, uncorrupted measurements, indicating precise attack estimation. Fig. 4 shows that detection indicators for sensor groups 2–5 increase sharply during attack intervals, while unaffected sensors remain near zero, verifying timely and accurate detection. Fig. 5 further confirms that the estimated attack signals closely match the true injected values. Finally, Fig. 6 compares different control strategies, showing that the proposed method achieves faster stabilization and smaller state deviations. Together, these results demonstrate robust control, accurate state estimation, and real-time detection under unknown attack conditions.  Conclusions  This study addresses secure perception and control in CPS under false data injection attacks by developing an integrated adaptive learning control framework that unifies detection, estimation, and control. A sensor-level anomaly detection mechanism is introduced to identify and localize malicious data, substantially enhancing attack detection capability. The fusion-based state estimation method further improves reconstruction accuracy of true system states, even when observations are compromised. At the control level, an adaptive learning controller with online weight adjustment enables real-time approximation of the optimal control policy without requiring prior knowledge of the attack model. Future research will extend the proposed framework to broader application scenarios and evaluate its resilience under diverse attack environments.
A Two-Stage Framework for CAN Bus Attack Detection by Fusing Temporal and Deep Features
TAN Mingming, ZHANG Heng, WANG Xin, LI Ming, ZHANG Jian, YANG Ming
Available online  , doi: 10.11999/JEIT250651
Abstract:
  Objective  The Controller Area Network (CAN), the de facto standard for in-vehicle communication, is inherently vulnerable to cyberattacks. Existing Intrusion Detection Systems (IDSs) face a fundamental trade-off: achieving fine-grained classification of diverse attack types often requires computationally intensive models that exceed the resource limitations of on-board Electronic Control Units (ECUs). To address this problem, this study proposes a two-stage attack detection framework for the CAN bus that fuses temporal and deep features. The framework is designed to achieve both high classification accuracy and computational efficiency, thereby reconciling the tension between detection performance and practical deployability.  Methods  The proposed framework adopts a “detect-then-classify” strategy and incorporates two key innovations. (1) Stage 1: Temporal Feature-Aware Anomaly Detection. Two custom features are designed to quantify anomalies: Payload Data Entropy (PDE), which measures content randomness, and ID Frequency Mean Deviation (IFMD), which captures behavioral deviations. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network that exploits contextual temporal information to achieve high-recall anomaly detection. (2) Stage 2: Deep Feature-Based Fine-Grained Classification. Triggered only for samples flagged as anomalous, this stage employs a lightweight one-dimensional ParC1D-Net. The core ParC1D Block (Fig. 4) integrates depthwise separable one-dimensional convolution, Squeeze-and-Excitation (SE) attention, and a Feed-Forward Network (FFN), enabling efficient feature extraction with minimal parameters. Stage 1 is optimized using BCEWithLogitsLoss, whereas Stage 2 is trained with Cross-Entropy Loss.  Results and Discussions  The efficacy of the proposed framework is evaluated on public datasets. (1) State-of-the-art performance. On the Car-Hacking dataset (Table 5), an accuracy and F1-score of 99.99% are achieved, exceeding advanced baselines. On the more challenging Challenge dataset (Table 6), superior accuracy (99.90%) and a competitive F1-score (99.70% are also obtained. (2) Feature contribution analysis. Ablation studies (Tables 7 and 8) confirm the critical role of the proposed features. Removal of the IFMD feature results in the largest performance reduction, highlighting the importance of behavioral modeling. A synergistic effect is observed when PDE and IFMD are applied together. (3) Spatiotemporal efficiency. The complete model remains lightweight at only 0.39 MB. Latency tests (Table 9) demonstrate real-time capability, with average detection times of 0.62 ms on a GPU and 0.93 ms on a simulated CPU (batch size = 1). A system-level analysis (Section 3.5.4) further shows that the two-stage framework is approximately 1.65 times more efficient than a single-stage model in a realistic sparse-attack scenario.  Conclusions  This study establishes the two-stage framework as an effective and practical solution for CAN bus intrusion detection. By decoupling detection from classification, the framework resolves the trade-off between accuracy and on-board deployability. Its strong performance, combined with a minimal computational footprint, indicates its potential for securing real-world vehicular systems. Future research could extend the framework and explore hardware-specific optimizations.
A one-dimensional 5G millimeter-wave wide-angle Scanning Array Antenna Using AMC Structure
MA Zhangang, ZHANG Qing, FENG Sirun, ZHAO Luyu
Available online  , doi: 10.11999/JEIT250719
Abstract:
  Objective  With the rapid advancement of 5G millimeter-wave technology, antennas are required to achieve high gain, wide beam coverage, and compact size, particularly in environments characterized by strong propagation loss and blockage. Conventional millimeter-wave arrays often face difficulties in reconciling wide-angle scanning with high gain and broadband operation due to element coupling and narrow beamwidths. To overcome these challenges, this study proposes a one-dimensional linear array antenna incorporating an Artificial Magnetic Conductor (AMC) structure. The AMC’s in-phase reflection is exploited to improve bandwidth and gain while enabling wide-angle scanning of ±80° at 26 GHz. By adopting a 0.4-wavelength element spacing and stacked topology, the design provides an effective solution for 5G millimeter-wave terminals where spatial constraints and performance trade-offs are critical. The findings highlight the potential of AMC-based arrays to advance antenna technology for future high-speed, low-latency 5G applications by combining broadband operation, high directivity, and broad coverage within compact form factors.  Methods  This study develops a high-performance single-polarized one-dimensional linear millimeter-wave array antenna through a multi-layered structural design integrated with AMC technology. The design process begins with theoretical analysis of the pattern multiplication principle and array factor characteristics, which identify 0.4-wavelength element spacing as an optimal balance between wide-angle scanning and directivity. A stacked three-layer antenna unit is then constructed, consisting of square patch radiators on the top layer, a cross-shaped coupling feed structure in the middle layer, and an AMC-loaded substrate at the bottom. The AMC provides in-phase reflection in the 21–30 GHz band, enhancing bandwidth and suppressing surface wave coupling. Full-wave simulations (HFSS) are performed to optimize AMC dimensions, feed networks, and array layout, confirming bandwidth of 23.7–28 GHz, peak gain of 13.9 dBi, and scanning capability of ±80°. A prototype is fabricated using printed circuit board technology and evaluated with a vector network analyzer and anechoic chamber measurements. Experimental results agree closely with simulations, demonstrating an operational bandwidth of 23.3–27.7 GHz, isolation better than −15 dB, and scanning coverage up to ±80°. These results indicate that the synergistic interaction between AMC-modulated radiation fields and the array coupling mechanism enables a favorable balance among wide bandwidth, high gain, and wide-angle scanning.  Results and Discussions  The influence of array factor on directional performance is analyzed, and the maximum array factor is observed when the element spacing is between 0.4λ and 0.46λ (Fig. 2). The in-phase reflection of the AMC structure in the 21–30 GHz range significantly enhances antenna characteristics, broadening the bandwidth by 50% compared with designs without AMC and increasing the gain at 26 GHz by 1.5 dBi (Fig. 10, Fig. 13). The operational bandwidth of 23.3–27.7 GHz is confirmed by measurements (Fig. 17a). When the element spacing is optimized to 4.6 mm (0.4λ) and the coupling radiation mechanisms are adjusted, the H-plane half-power beamwidth (HPBW) of the array elements is extended to 180° (Fig. 8, Fig. 9), with a further gain improvement of 0.6 dBi at the scanning edges (Fig. 11b). The three-layer stacked structure—comprising the radiation, isolation, and AMC layers—achieves isolation better than –15 dB (Fig. 17a). Experimental validation demonstrates wide-angle scanning capability up to ±80°, showing close agreement between simulated and measured results (Fig. 11, Fig. 17b). The proposed antenna is therefore established as a compact, high-performance solution for 5G millimeter-wave terminals, offering wide bandwidth, high gain, and broad scanning coverage.  Conclusions  A one-dimensional linear wide-angle scanning array antenna based on an AMC structure is presented for 5G millimeter-wave applications. Through theoretical analysis, simulation optimization, and experimental validation, balanced improvement in broadband operation, high gain, and wide-angle scanning is achieved. Pattern multiplication theory and array factor analysis are applied to determine 0.4-wavelength element spacing as the optimal compromise between scanning angle and directivity. A stacked three-layer configuration is adopted, and the AMC’s in-phase reflection extends the bandwidth to 23.7–28.5 GHz, representing a 50% increase. Simulation and measurement confirm ±80° scanning at 26 GHz with a peak gain of 13.8 dBi, which is 1.3 dBi higher than that of non-AMC designs. The close consistency between experimental and simulated results verifies the feasibility of the design, providing a compact and high-performance solution for millimeter-wave antennas in mobile communication and vehicular systems. Future research is expected to explore dual-polarization integration and adaptation to complex environments.
Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction
WANG Yuao, HUANG Yeqi, LI Qingyuan, LIU Yun, JING Shenqi, SHAN Tao, GUO Yongan
Available online  , doi: 10.11999/JEIT250798
Abstract:
  Objective  Diabetes mellitus and its complications are recognized as major global health challenges, causing severe morbidity, high healthcare costs, and reduced quality of life. Accurate joint prediction of these conditions is essential for early intervention but is hindered by data heterogeneity, sparsity, and complex inter-entity relationships. To address these challenges, a Representation Learning Enhanced Knowledge Graph-based Multi-Disease Prediction (REKG-MDP) model is proposed. Electronic Health Records (EHRs) are integrated with supplementary medical knowledge to construct a comprehensive Medical Knowledge Graph (MKG), and higher-order semantic reasoning combined with relation-aware representation learning is applied to capture complex dependencies and improve predictive accuracy across multiple diabetes-related conditions.  Methods  The REKG-MDP framework consists of three modules. First, a MKG is constructed by integrating structured EHR data from the MIMIC-IV dataset with external disease knowledge. Patient-side features include demographics, laboratory indices, and medical history, whereas disease-side attributes cover comorbidities, susceptible populations, etiological factors, and diagnostic criteria. This integration mitigates data sparsity and enriches semantic representation. Second, a relation-aware embedding module captures four relational patterns: symmetric, antisymmetric, inverse, and compositional. These patterns are used to optimize entity and relation embeddings for semantic reasoning. Third, a Hierarchical Attention-based Graph Convolutional Network (HA-GCN) aggregates multi-hop neighborhood information. Dynamic attention weights capture both local and global dependencies, and a bidirectional mechanism enhances the modeling of patient–disease interactions.  Results and Discussions  Experiments demonstrate that REKG-MDP consistently outperforms four baselines: two machine learning models (DCKD-RF and bSES-AC-RUN-FKNN) and two graph-based models (KGRec and PyRec). Compared with the strongest baseline, REKG-MDP achieves average improvements in P, F1, and NDCG of 19.39%, 19.67%, and 19.39% for single-disease prediction (\begin{document}$ n=1 $\end{document}); 16.71%, 21.83%, and 23.53% for \begin{document}$ n=3 $\end{document}; and 22.01%, 20.34%, and 20.88% for \begin{document}$ n=5 $\end{document} (Table 4). Ablation studies confirm the contribution of each module. Removing relation-pattern modeling reduces performance metrics by approximately 12%, removing hierarchical attention decreases them by 5–6%, and excluding disease-side knowledge produces the largest decline of up to 20% (Fig. 5). Sensitivity analysis indicates that increasing the embedding dimension from 32 to 128 enhances performance by more than 11%, whereas excessive dimensionality (256) leads to over-smoothing (Fig. 6). Adjusting the \begin{document}$ \beta $\end{document} parameter strengthens sample discrimination, improving P, F1, and NDCG by 9.28%, 27.9%, and 8.08%, respectively (Fig. 7).  Conclusions  REKG-MDP integrates representation learning with knowledge graph reasoning to enable multi-disease prediction. The main contributions are as follows: (1) integrating heterogeneous EHR data with disease knowledge mitigates data sparsity and enhances semantic representation; (2) modeling diverse relational patterns and applying hierarchical attention improves the capture of higher-order dependencies; and (3) extensive experiments confirm the model’s superiority over state-of-the-art baselines, with ablation and sensitivity analyses validating the contribution of each module. Remaining challenges include managing extremely sparse data and ensuring generalization across broader populations. Future research will extend REKG-MDP to model temporal disease progression and additional chronic conditions.
Wave-MambaCT: Low-dose CT Artifact Suppression Method Based on Wavelet Mamba
CUI Xueying, WANG Yuhang, LIU Bin, SHANGGUAN Hong, ZHANG Xiong
Available online  , doi: 10.11999/JEIT250489
Abstract:
  Objective  Low-Dose Computed Tomography (LDCT) reduces patient radiation exposure but introduces substantial noise and artifacts into reconstructed images. Convolutional Neural Network (CNN)-based denoising approaches are limited by local receptive fields, which restrict their abilities to capture long-range dependencies. Transformer-based methods alleviate this limitation but incur quadratic computational complexity relative to image size. In contrast, State Space Model (SSM)–based Mamba frameworks achieve linear complexity for long-range interactions. However, existing Mamba-based methods often suffer from information loss and insufficient noise suppression. To address these limitations, we propose the Wave-MambaCT model.  Methods  The proposed Wave-MambaCT model adopts a multi-scale framework that integrates Discrete Wavelet Transform (DWT) with a Mamba module based on the SSM. First, DWT performs a two-level decomposition of the LDCT image, decoupling noise from Low-Frequency (LF) content. This design directs denoising primarily toward the High-Frequency (HF) components, facilitating noise suppression while preserving structural information. Second, a residual module combined with a Spatial-Channel Mamba (SCM) module extracts both local and global features from LF and HF bands at different scales. The noise-free LF features are then used to correct and enhance the corresponding HF features through an attention-based Cross-Frequency Mamba (CFM) module. Finally, inverse wavelet transform is applied in stages to progressively reconstruct the image. To further improve denoising performance and network stability, multiple loss functions are employed, including L1 loss, wavelet-domain LF loss, and adversarial loss for HF components.  Results and Discussions  Extensive experiments on the simulated Mayo Clinic datasets, the real Piglet datasets, and the hospital clinical dataset DeepLesion show that Wave-MambaCT provides superior denoising performance and generalization. On the Mayo dataset, a PSNR of 31.6528 is achieved, which is higher than that of the suboptimal method DenoMamba (PSNR 31.4219), while MSE is reduced to 0.00074 and SSIM and VIF are improved to 0.8851 and 0.4629, respectively (Table 1). Visual results (Figs. 46) demonstrate that edges and fine details such as abdominal textures and lesion contours are preserved, with minimal blurring or residual artifacts compared with competing methods. Computational efficiency analysis (Table 2) indicates that Wave-MambaCT maintains low FLOPs (17.2135 G) and parameters (5.3913 M). FLOPs are lower than those of all networks except RED-CNN, and the parameter count is higher only than those of RED-CNN and CTformer. During training, 4.12 minutes per epoch are required, longer only than RED-CNN. During testing, 0.1463 seconds are required per image, which is at a medium level among the compared methods. Generalization tests on the Piglet datasets (Figs. 7, 8, Tables 3, 4) and DeepLesion (Fig. 9) further confirm the robustness and generalization capacity of Wave-MambaCT.In the proposed design, HF sub-bands are grouped, and noise-free LF information is used to correct and guide their recovery. This strategy is based on two considerations. First, it reduces network complexity and parameter count. Second, although the sub-bands correspond to HF information in different orientations, they are correlated and complementary as components of the same image. Joint processing enhances the representation of HF content, whereas processing them separately would require a multi-branch architecture, inevitably increasing complexity and parameters. Future work will explore approaches to reduce complexity and parameters when processing HF sub-bands individually, while strengthening their correlations to improve recovery. For structural simplicity, SCM is applied to both HF and LF feature extraction. However, redundancy exists when extracting LF features, and future studies will explore the use of different Mamba modules for HF and LF features to further optimize computational efficiency.  Conclusions  Wave-MambaCT integrates DWT for multi-scale decomposition, a residual module for local feature extraction, and an SCM module for efficient global dependency modeling to address the denoising challenges of LDCT images. By decoupling noise from LF content through DWT, the model enables targeted noise removal in the HF domain, facilitating effective noise suppression. The designed RSCM, composed of residual blocks and SCM modules, captures fine-grained textures and long-range interactions, enhancing the extraction of both local and global information. In parallel, the Cross-band Enhancement Module (CEM) employs noise-free LF features to refine HF components through attention-based CFM, ensuring structural consistency across scales. Ablation studies (Table 5) confirm the essential contributions of both SCM and CEM modules to maintaining high performance. Importantly, the model’s staged denoising strategy achieves a favorable balance between noise reduction and structural preservation, yielding robustness to varying radiation doses and complex noise distributions.
Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Strategy
ZHANG Ruifeng, YANG Rongni
Available online  , doi: 10.11999/JEIT250746
Abstract:
  Objective   The open network architecture of cyber-physical systems (CPSs) enables remarkable flexibility and scalability, but it also renders CPSs highly vulnerable to cyber-attacks. Particularly, denial-of-service (DoS) attacks have emerged as one of the predominant threats, which can cause packet loss and reduce system performance by directly jamming channels. On the other hand, CPSs under dormant and active DoS attacks can be regarded as dual-mode switched systems with stable and unstable subsystems, respectively. Therefore, it is worth exploring how to utilize the switched system theory to design a secure control approach with high degrees of freedom and low conservatism. However, due to the influence of complex environments such as attacks and noises, it is difficult to model practical CPSs exactly. Currently, although a Q-learning-based control method demonstrates potential for handling unknown CPSs, the significant research gap exists in switched systems with unstable modes, particularly for establishing the evaluable stability criterion. Therefore, it remains to be investigated for unknown CPSs under DoS attacks to apply switched system theory to design the learning-based control algorithm and evaluable security criterion.   Methods   An online mode-dependent switching-Q-learning strategy is presented to study the data-driven evaluable criterion and secure control for unknown CPSs under DoS attacks. Initially, the CPSs under dormant and active DoS attacks are transformed into switched systems with stable and unstable subsystems, respectively. Subsequently, the optimal control problem of the value function is addressed for the model-based switched systems by designing a new generalized switching algebraic Riccati equation (GSARE) and obtaining the corresponding mode-dependent optimal security controller. Furthermore, the existence and uniqueness of the GSARE’s solution are proved. In what follows, with the help of model-based results, a data-driven optimal security control law is proposed by developing a novel online mode-dependent switching-Q-learning control algorithm. Finally, through utilizing the learned control gain and parameter matrices from the above algorithm, a data-driven evaluable security criterion with the attack frequency and duration is established based on the switching constraints and subsystem constraints.   Results and Discussions   In order to verify the efficiency and advantage of the proposed methods, comparative experiments of the wheeled robot are displayed in this work. Firstly, compare the model-based result (Theorem 1) and the data-driven result (Algorithm 1) as follows: From the iterative process curves of control gain and parameter matrices (Fig. 2 and Fig. 3), it can be observed that the optimal control gain and parameter matrices under threshold errors can all be successfully obtained from both the model-based GSARE and the data-driven algorithm. Meanwhile, the tracking errors of CPSs can converge to 0 by utilizing the above data-driven controller (Fig. 5), which ensures the exponential stability of CPSs and verifies the efficiency of our proposed switching-Q-learning algorithm. Secondly, it is evident from learning process curves (Fig.4) that although the initial value of the learned control gain is not stabilizable, the optimal control gain can still be successfully learned to stabilize the system from Algorithm 1. This result significantly reduces conservatism compared to existing Q-learning approaches, which take stabilizable initial control gains as the learning premise. Thirdly, compare the data-driven evaluable security criterion in Theorem 2 of this work and existing criteria as follows: While the switching parameters learned from Algorithm 1 do not satisfy the popular switching constraint to obtain the model dwell-time, by utilizing the evaluable security criterion proposed in this paper, the attack frequency and duration are obtained based on the new switching constraints and subsystem constraints. Furthermore, it is seen from the comparison of the evaluable security criteria (Tab.1) that our proposed evaluable security criterion is less conservative than the existing evaluable criteria. Finally, the learned optimal controller and the obtained DoS attack constraints are applied to the tracking control experiment of a wheeled robot under DoS attacks, and the result is compared with existing results via Q-learning controllers. It is evident from the tracking trajectory comparisons of the robot (Fig.6 and Fig.7) that the robot enables significantly faster and more accurate trajectory tracking with the help of our proposed switching-Q-learning controller. Therefore, the efficiency and advantage of the proposed algorithm and criterion in this work are verified.   Conclusions   Based on the learning strategy and the switched system theory, this study presents an online mode-dependent switching-Q-learning control algorithm and the corresponding evaluable security criterion for the unknown CPSs under DoS attacks. The detailed results are provided as follows: (1) By representing the unknown CPSs under dormant and active DoS attacks as unknown switched systems with stable and unstable subsystems, respectively, the security problem of CPSs under DoS attacks is transformed into a stabilization problem of the switched systems, which offers high design freedom and low conservatism. (2) A novel online mode-dependent switching-Q-learning control algorithm is developed for unknown switched systems with unstable modes. Through the comparative experiments, the proposed switching-Q-learning algorithm effectively increases the design freedom of controllers and decreases conservatism over existing Q-learning algorithms. (3) A new data-driven evaluable security criterion with the attack frequency and duration is established based on the switching constraints and subsystem constraints. It is evident from the comparative criteria that the proposed criterion demonstrates significantly reduced conservatism over existing evaluable criteria via single subsystem constraints and traditional model dwell-time constraints.
Entropy Quantum Collaborative Planning Method for Emergency Path of Unmanned Aerial Vehicles Driven by Survival Probability
WANG Enliang, ZHANG Zhen, SUN Zhixin
Available online  , doi: 10.11999/JEIT250694
Abstract:
  Objective  Natural disaster emergency rescue places stringent requirements on the timeliness and safety of Unmanned Aerial Vehicle (UAV) path planning. Conventional optimization objectives, such as minimizing total distance, often fail to reflect the critical time-sensitive priority of maximizing the survival probability of trapped victims. Moreover, existing algorithms struggle with the complex constraints of disaster environments, including no-fly zones, caution zones, and dynamic obstacles. To address these challenges, this paper proposes an Entropy-Enhanced Quantum Ripple Synergy Algorithm (E2QRSA). The primary goals are to establish a survival probability maximization model that incorporates time decay characteristics and to design a robust optimization algorithm capable of efficiently handling complex spatiotemporal constraints in dynamic disaster scenarios.  Methods  E2QRSA enhances the Quantum Ripple Optimization framework through four key innovations: (1) information entropy–based quantum state initialization, which guides population generation toward high-entropy regions; (2) multi-ripple collaborative interference, which promotes beneficial feature propagation through constructive superposition; (3) entropy-driven parameter control, which dynamically adjusts ripple propagation according to search entropy rates; and (4) quantum entanglement, which enables information sharing among elite individuals. The model employs a survival probability objective function that accounts for time-sensitive decay, base conditions, and mission success probability, subject to constraints including no-fly zones, warning zones, and dynamic obstacles.  Results and Discussions  Simulation experiments are conducted in medium- and large-scale typhoon disaster scenarios. The proposed E2QRSA achieves the highest survival probabilities of 0.847 and 0.762, respectively (Table 1), exceeding comparison algorithms such as SEWOA and PSO by 4.2–16.0%. Although the paths generated by E2QRSA are not the shortest, they are the most effective in maximizing survival chances. The ablation study (Table 3) confirms the contribution of each component, with the removal of multi-ripple interference causing the largest performance decrease (9.97%). The dynamic coupling between search entropy and ripple parameters (Fig. 2) is validated, demonstrating the effectiveness of the adaptive control mechanism. The entanglement effect (Fig. 4) is shown to maintain population diversity. In terms of constraint satisfaction, E2QRSA-planned paths consume only 85.2% of the total available energy (Table 5), ensuring a safe return, and all static and dynamic obstacles are successfully avoided, as visually verified in the 3D path plots (Figs. 6 and 7).  Conclusions  E2QRSA effectively addresses the challenge of UAV path planning for disaster relief by integrating adaptive entropy control with quantum-inspired mechanisms. The survival probability objective captures the essential requirements of disaster scenarios more accurately than conventional distance minimization. Experimental validation demonstrates that E2QRSA achieves superior solution quality and faster convergence, providing a robust technical basis for strengthening emergency response capabilities.
A Method for Named Entity Recognition in Military Intelligence Domain Using Large Language Models
LI Yongbin, LIU Lian, ZHENG Jie
Available online  , doi: 10.11999/JEIT250764
Abstract:
  Objective  Named Entity Recognition (NER) is a fundamental task in information extraction within specialized domains, particularly military intelligence. It plays a critical role in situation assessment, threat analysis, and decision support. However, conventional NER models face major challenges. First, the scarcity of high-quality annotated data in the military intelligence domain is a persistent limitation. Due to the sensitivity and confidentiality of military information, acquiring large-scale, accurately labeled datasets is extremely difficult, which severely restricts the training performance and generalization ability of supervised learning–based NER models. Second, military intelligence requires handling complex and diverse information extraction tasks. The entities to be recognized often possess domain-specific meanings, ambiguous boundaries, and complex relationships, making it difficult for traditional models with fixed architectures to adapt flexibly to such complexity or achieve accurate extraction. This study aims to address these limitations by developing a more effective NER method tailored to the military intelligence domain, leveraging Large Language Models (LLMs) to enhance recognition accuracy and efficiency in this specialized field.  Methods  To achieve the above objective, this study focuses on the military intelligence domain and proposes a NER method based on LLMs. The central concept is to harness the strong semantic reasoning capabilities of LLMs, which enable deep contextual understanding of military texts, accurate interpretation of complex domain-specific extraction requirements, and autonomous execution of extraction tasks without heavy reliance on large annotated datasets. To ensure that general-purpose LLMs can rapidly adapt to the specialized needs of military intelligence, two key strategies are employed. First, instruction fine-tuning is applied. Domain-specific instruction datasets are constructed to include diverse entity types, extraction rules, and representative examples relevant to military intelligence. Through fine-tuning with these datasets, the LLMs acquire a more precise understanding of the characteristics and requirements of NER in this field, thereby improving their ability to follow targeted extraction instructions. Second, Retrieval-Augmented Generation (RAG) is incorporated. A domain knowledge base is developed containing expert knowledge such as entity dictionaries, military terminology, and historical extraction cases. During the NER process, the LLM retrieves relevant knowledge from this base in real time to support entity recognition. This strategy compensates for the limited domain-specific knowledge of general LLMs and enhances recognition accuracy, particularly for rare or complex entities.  Results and Discussions  Experimental results indicate that the proposed LLM–based NER method, which integrates instruction fine-tuning and RAG, achieves strong performance in military intelligence NER tasks. Compared with conventional NER models, it demonstrates higher precision, recall, and F1-score, particularly in recognizing complex entities and managing scenarios with limited annotated data. The effectiveness of this method arises from several key factors. The powerful semantic reasoning capability of LLMs enables a deeper understanding of contextual nuances and ambiguous expressions in military texts, thereby reducing missed and false recognitions commonly caused by rigid pattern-matching approaches. Instruction fine-tuning allows the model to better align with domain-specific extraction requirements, ensuring that the recognition results correspond more closely to the practical needs of military intelligence analysis. Furthermore, the incorporation of RAG provides real-time access to domain expert knowledge, markedly enhancing the recognition of entities that are highly specialized or morphologically variable within military contexts. This integration effectively mitigates the limitations of traditional models that lack sufficient domain knowledge.  Conclusions  This study proposes a LLM–based NER method for the military intelligence domain, effectively addressing the challenges of limited annotated data and complex extraction requirements encountered by traditional models. By combining instruction fine-tuning and RAG, general-purpose LLMs can be rapidly adapted to the specialized demands of military intelligence, enabling the construction of an efficient domain-specific expert system at relatively low cost. The proposed method provides an effective and scalable solution for NER tasks in military intelligence scenarios, enhancing both the efficiency and accuracy of information extraction in this field. It offers not only practical value for military intelligence analysis and decision support but also methodological insight for NER research in other specialized domains facing similar data and complexity constraints, such as aerospace and national security. Future research will focus on optimizing instruction fine-tuning strategies, expanding the domain knowledge base, and reducing computational cost to further improve model performance and applicability.
Secrecy Rate Maximization Algorithm for IRS Assisted UAV-RSMA Systems
WANG Zhengqiang, KONG Weidong, WAN Xiaoyu, FAN Zifu, DUO Bin
Available online  , doi: 10.11999/JEIT250452
Abstract:
  Objective  Under the stringent requirements of Sixth-Generation(6G) mobile communication networks for spectral efficiency, energy efficiency, low latency, and wide coverage, Unmanned Aerial Vehicle (UAV) communication has emerged as a key solution for 6G and beyond, leveraging its Line-of-Sight propagation advantages and flexible deployment capabilities. Functioning as aerial base stations, UAVs significantly enhance network performance by improving spectral efficiency and connection reliability, demonstrating irreplaceable value in critical scenarios such as emergency communications, remote area coverage, and maritime operations. However, UAV communication systems face dual challenges in high-mobility environments: severe multi-user interference in dense access scenarios that substantially degrades system performance, alongside critical physical-layer security threats resulting from the broadcast nature and spatial openness of wireless channels that enable malicious interception of transmitted signals. Rate-Splitting Multiple Access (RSMA) mitigates these challenges by decomposing user messages into common and private streams, thereby providing a flexible interference management mechanism that balances decoding complexity with spectral efficiency. This makes RSMA especially suitable for high-density user access scenarios. In parallel, Intelligent Reflecting Surfaces (IRS) have emerged as a promising technology to dynamically reconfigure wireless propagation through programmable electromagnetic unit arrays. IRS improves the quality of legitimate links while reducing the capacity of eavesdropping links, thereby enhancing physical-layer security in UAV communications. It is noteworthy that while existing research has predominantly centered on conventional multiple access schemes, the application potential of RSMA technology in IRS-assisted UAV communication systems remains relatively unexplored. Against this background, this paper investigates secure transmission strategies in IRS-assisted UAV-RSMA systems.  Methods  This paper investigates the effect of eavesdroppers on the security performance of UAV communication systems and proposes an IRS-assisted RSMA-based UAV communication model. The system comprises a multi-antenna UAV base station, an IRS mounted on a building, multiple single-antenna legitimate users, and multiple single-antenna eavesdroppers. The optimization problem is formulated to maximize the system secrecy rate by jointly optimizing precoding vectors, common secrecy rate allocation, IRS phase shifts, and UAV positioning. The problem is highly non-convex due to the strong coupling among these variables, rendering direct solutions intractable. To overcome this challenge, a two-layer optimization framework is developed. In the inner layer, with UAV position fixed, an alternating optimization strategy divides the problem into two subproblems: (1) joint optimization of precoding vectors and common secrecy rate allocation and (2) optimization of IRS phase shifts. Non-convex constraints are transformed into convex forms using techniques such as Successive Convex Approximation (SCA), relaxation variables, first-order Taylor expansion, and Semidefinite Relaxation (SDR). In the outer layer, the Particle Swarm Optimization (PSO) algorithm determines the UAV deployment position based on the optimized inner-layer variables.  Results and Discussions  Simulation results show that the proposed algorithm outperforms RSMA without IRS, NOMA with IRS, and NOMA without IRS in terms of secrecy rate. (Fig. 2) illustrates that the secrecy rate increases with the number of iterations and converges under different UAV maximum transmit power levels and antenna configurations. (Fig. 3) demonstrates that increasing UAV transmit power significantly enhances the secrecy rate for both the proposed and benchmark schemes. This improvement arises because higher transmit power strengthens the signal received by legitimate users, increasing their achievable rates and enhancing system secrecy performance. (Fig. 4) indicates that the secrecy rate grows with the number of UAV antennas. This improvement is due to expanded signal coverage and greater spatial degrees of freedom, which amplify effective signal strength in legitimate user channels. (Fig. 5) shows that both the proposed scheme and NOMA with IRS achieve higher secrecy rate as the number of IRS reflecting elements increases. The additional elements provide greater spatial degrees of freedom, improving channel gains for legitimate users and strengthening resistance to eavesdropping. In contrast, benchmark schemes operating without IRS assistance exhibit no performance improvement and maintain constant secrecy rate. This result highlights the critical role of the IRS in enabling secure communications. Finally, (Fig. 6) demonstrates the optimal UAV position when \begin{document}${P_{\max }} = 30{\text{ dBm}}$\end{document}. Deploying the UAV near the center of legitimate users and adjacent to the IRS minimizes the average distance to users, thereby reducing path loss and fully exploiting IRS passive beamforming. This placement strengthens legitimate signals while suppressing the eavesdropping link, leading to enhanced secrecy performance.  Conclusions  This study addresses secure communication scenarios with multiple eavesdroppers by proposing an IRS-assisted secure resource allocation algorithm for UAV-enabled RSMA systems. An optimization problem is formulated to maximize the system secrecy rate under multiple constraints, including UAV transmit power, by jointly optimizing precoding vectors, common rate allocation, IRS configurations, and UAV positioning. Due to the non-convex nature of the problem, a hierarchical optimization framework is developed to decompose it into two subproblems. These are effectively solved using techniques such as SCA, SDR, Gaussian randomization, and PSO. Simulation results confirm that the proposed algorithm achieves substantial secrecy rate gains over three benchmark schemes, thereby validating its effectiveness.
BIRD1445: Large-scale Multimodal Bird Dataset for Ecological Monitoring
WANG Hongchang, XIAN Fengyu, XIE Zihui, DONG Miaomiao, JIAN Haifang
Available online  , doi: 10.11999/JEIT250647
Abstract:
  Objective  With the rapid advancement of Artificial Intelligence (AI) and growing demands in ecological monitoring, high-quality multimodal datasets have become essential for training and deploying AI models in specialized domains. Existing bird datasets, however, face notable limitations, including challenges in field data acquisition, high costs of expert annotation, limited representation of rare species, and reliance on single-modal data. To overcome these constraints, this study proposes an efficient framework for constructing large-scale multimodal datasets tailored to ecological monitoring. By integrating heterogeneous data sources, employing intelligent semi-automatic annotation pipelines, and adopting multi-model collaborative validation based on heterogeneous attention fusion, the proposed approach markedly reduces the cost of expert annotation while maintaining high data quality and extensive modality coverage. This work offers a scalable and intelligent strategy for dataset development in professional settings and provides a robust data foundation for advancing AI applications in ecological conservation and biodiversity monitoring.  Methods  The proposed multimodal dataset construction framework integrates multi-source heterogeneous data acquisition, intelligent semi-automatic annotation, and multi-model collaborative verification to enable efficient large-scale dataset development. The data acquisition system comprises distributed sensing networks deployed across natural reserves, incorporating high-definition intelligent cameras, custom-built acoustic monitoring devices, and infrared imaging systems, supplemented by standardizedpublic data to enhance species coverage and modality diversity. The intelligent annotation pipeline is built upon four core automated tools: (1) spatial localization annotation leverages object detection algorithms to generate bounding boxes; (2) fine-grained classification employs Vision Transformer models for hierarchical species identification; (3) pixel-level segmentation combines detection outputs with SegGPT models to produce instance-level masks; and (4) multimodal semantic annotation uses Qwen large language models to generate structured textual descriptions. To ensure annotation quality and minimize manual verification costs, a multi-scale attention fusion verification mechanism is introduced. This mechanism integrates seven heterogeneous deep learning models, each with different feature perception capacities across local detail, mid-level semantic, and global contextual scales. A global weighted voting module dynamically assigns fusion weights based on model performance, while a prior knowledge-guided fine-grained decision module applies category-specific accuracy metrics and Top-K model selection to enhance verification precision and computational efficiency.  Results and Discussions  The proposed multi-scale attention fusion verification method dynamically assesses data quality based on heterogeneous model predictions, forming the basis for automated annotation validation. Through optimized weight allocation and category-specific verification strategies, the collaborative verification framework evaluates the effect of different model combinations on annotation accuracy. Experimental results demonstrate that the optimal verification strategy—achieved by integrating seven specialized models—outperforms all baseline configurations across evaluation metrics. Specifically, the method attains a Top-1 accuracy of 95.39% on the CUB-200-2011 dataset, exceeding the best-performing single-model baseline, which achieves 91.79%, thereby yielding a 3.60% improvement in recognition precision. The constructed BIRD1445 dataset, comprising 3.54 million samples spanning 1,445 bird species and four modalities, outperforms existing datasets in terms of coverage, quality, and annotation accuracy. It serves as a robust benchmark for fine-grained classification, density estimation, and multimodal learning tasks in ecological monitoring.  Conclusions  This study addresses the challenge of constructing large-scale multimodal datasets for ecological monitoring by integrating multi-source data acquisition, intelligent semi-automatic annotation, and multi-model collaborative verification. The proposed approach advances beyond traditional manual annotation workflows by incorporating automated labeling pipelines and heterogeneous attention fusion mechanisms as the core quality control strategy. Comprehensive evaluations on benchmark datasets and real-world scenarios demonstrate the effectiveness of the method: (1) the verification strategy improves annotation accuracy by 3.60% compared to single-model baselines on the CUB-200-2011 dataset; (2) optimal trade-offs between precision and computational efficiency are achieved using Top-K = 3 model selection, based on performance–complexity alignment; and (3) in large-scale annotation scenarios, the system ensures high reliability across 1,445 species categories. Despite its effectiveness, the current approach primarily targets species with sufficient data. Future work should address the representation of rare and endangered species by incorporating advanced data augmentation and few-shot learning techniques to mitigate the limitations posed by long-tail distributions.
Optimal Federated Average Fusion of Gaussian Mixture–Probability Hypothesis Density Filters
XUE Yu, XU Lei
Available online  , doi: 10.11999/JEIT250759
Abstract:
  Objective  To realize optimal decentralized fusion tracking of uncertain targets, this study proposes a federated average fusion algorithm for Gaussian Mixture–Probability Hypothesis Density (GM-PHD) filters, designed with a hierarchical structure. Each sensor node operates a local GM-PHD filter to extract multi-target state estimates from sensor measurements. The fusion node performs three key tasks: (1) maintaining a master filter that predicts the fusion result from the previous iteration; (2) associating and merging the GM-PHDs of all filters; and (3) distributing the fused result and several parameters to each filter. The association step decomposes multi-target density fusion into four categories of single-target estimate fusion. We derive the optimal single-target estimate fusion both in the absence and presence of missed detections. Information assignment applies the covariance upper-bounding theory to eliminate correlation among all filters, enabling the proposed algorithm to achieve the accuracy of Bayesian fusion. Simulation results show that the federated fusion algorithm achieves optimal tracking accuracy and consistently outperforms the conventional Arithmetic Average (AA) fusion method. Moreover, the relative reliability of each filter can be flexibly adjusted.  Methods  The multi-sensor multi-target density fusion is decomposed into multiple groups of single-target component merging through the association operation. Federated filtering is employed as the merging strategy, which achieves the Bayesian optimum owing to its inherent decorrelation capability. Section 3 rigorously extends this approach to scenarios with missed detections. To satisfy federated filtering’s requirement for prior estimates, a master filter is designed to compute the predicted multi-target density, thereby establishing a hierarchical architecture for the proposed algorithm. In addition, auxiliary measures are incorporated to compensate for the observed underestimation of cardinality.  Results and Discussions  modified Mahalanobis distance (Fig.3). The precise association and the single-target decorrelation capability together ensure the theoretical optimality of the proposed algorithm, as illustrated in Fig. 2. Compared with conventional density fusion, the Optimal Sub-Pattern Assignment (OSPA) error is reduced by 8.17% (Fig. 4). The advantage of adopting a small average factor for the master filter is demonstrated in Figs. 5 and 6. The effectiveness of the measures for achieving cardinality consensus is also validated (Fig. 7). Another competitive strength of the algorithm lies in the flexibility of adjusting the average factors (Fig. 8). Furthermore, the algorithm consistently outperforms AA fusion across all missed detection probabilities (Fig. 9).  Conclusions  This paper achieves theoretically optimal multi-target density fusion by employing federated filtering as the merging method for single-target components. The proposed algorithm inherits the decorrelation capability and single-target optimality of federated filtering. A hierarchical fusion architecture is designed to satisfy the requirement for prior estimates. Extensive simulations demonstrate that: (1) the algorithm can accurately associate filtered components belonging to the same target, thereby extending single-target optimality to multi-target fusion tracking; (2) the algorithm supports flexible adjustment of average factors, with smaller values for the master filter consistently preferred; and (3) the superiority of the algorithm persists even under sensor malfunctions and high missed detection rates. Nonetheless, this study is limited to GM-PHD filters with overlapping Fields Of View (FOVs). Future work will investigate its applicability to other filter types and spatially non-overlapping FOVs.
Recent Advances of Programmable Schedulers
ZHAO Yazhu, GUO Zehua, DOU Songshi, FU Xiaoyang
Available online  , doi: 10.11999/JEIT250657
Abstract:
  Objective  In recent years, diversified user demands, dynamic application scenarios, and massive data transmissions have imposed increasingly stringent requirements on modern networks. Network schedulers play a critical role in ensuring efficient and reliable data delivery, enhancing overall performance and stability, and directly shaping user-perceived service quality. Traditional scheduling algorithms, however, rely largely on fixed hardware, with scheduling logic hardwired during chip design. These designs are inflexible, provide coarse and static scheduling granularity, and offer limited capability to represent complex policies. Therefore, they hinder rapid deployment, increase upgrade costs, and fail to meet the evolving requirements of heterogeneous and large-scale network environments. Programmable schedulers, in contrast, leverage flexible hardware architectures to support diverse strategies without hardware replacement. Scheduling granularity can be dynamically adjusted at the flow, queue, or packet level to meet varied application requirements with precision. Furthermore, they enable the deployment of customized logic through data plane programming languages, allowing rapid iteration and online updates. These capabilities significantly reduce maintenance costs while improving adaptability. The combination of high flexibility, cost-effectiveness, and engineering practicality positions programmable schedulers as a superior alternative to traditional designs. Therefore, the design and optimization of high-performance programmable schedulers have become a central focus of current research, particularly for data center networks and industrial Internet applications, where efficient, flexible, and controllable traffic scheduling is essential.  Methods  The primary objective of current research is to design universal, high-performance programmable schedulers. Achieving simultaneous improvements across multiple performance metrics, however, remains a major challenge. Hardware-based schedulers deliver high performance and stability but incur substantial costs and typically support only a limited range of scheduling algorithms, restricting their applicability in large-scale and heterogeneous network environments. In contrast, software-based schedulers provide flexibility in expressing diverse algorithms but suffer from inherent performance constraints. To integrate the high performance of hardware with the flexibility of software, recent designs of programmable schedulers commonly adopt First-In First-Out (FIFO) or Push-In First-Out (PIFO) queue architectures. These approaches emphasize two key performance metrics: scheduling accuracy and programmability. Scheduling accuracy is critical, as modern applications such as real-time communications, online gaming, telemedicine, and autonomous driving demand strict guarantees on packet timing and ordering. Even minor errors may result in increased latency, reduced throughput, or connection interruptions, compromising user experience and service reliability. Programmability, by contrast, enables network devices to adapt to diverse scenarios, supporting rapid deployment of new algorithms and flexible responses to application-specific requirements. Improvements in both accuracy and programmability are therefore essential for developing efficient, reliable, and adaptable network systems, forming the basis for future high-performance deployments.  Results and Discussions  The overall packet scheduling process is illustrated in (Fig. 1), where scheduling is composed of scheduling algorithms and schedulers. At the ingress or egress pipelines of end hosts or network devices, scheduling algorithms assign a Rank value to each packet, determining the transmission order based on relative differences in Rank. Upon arrival at the traffic manager, the scheduler sorts and forwards packets according to their Rank values. Through the joint operation of algorithms and schedulers, packet scheduling is executed while meeting quality-of-service requirements. A comparative analysis of the fundamental principles of FIFO and PIFO scheduling mechanisms (Fig. 2) highlights their differences in queue ordering and disorder control. At present, most studies on programmable schedulers build upon these two foundational architectures (Fig. 3), with extensions and optimizations primarily aimed at improving scheduling accuracy and programmability. Specific strategies include admission control, refinement of scheduling algorithms, egress control, and advancements in data structures and queue mechanisms. On this basis, the current research progress on programmable schedulers is reviewed and systematically analyzed. Existing studies are compared along three key dimensions: structural characteristics, expressive capability, and approximation accuracy (Table 1).  Conclusions  Programmable schedulers, as a key technology for next-generation networks, enable flexible traffic management and open new possibilities for efficient packet scheduling. This review has summarized recent progress in the design of programmable schedulers across diverse application scenarios. The background and significance of programmable schedulers within the broader packet scheduling process were first clarified. An analysis of domestic and international literature shows that most current studies focus on FIFO-based and PIFO-based architectures to improve scheduling accuracy and programmability. The design approaches of these two architectures were examined, the main technical methods for enhancing performance were summarized, and their structural characteristics, expressive capabilities, and approximation accuracy were compared, highlighting respective advantages and limitations. Potential improvements in existing research were also identified, and future development directions were discussed. Nevertheless, the design of a universal, high-performance programmable scheduler remains a critical challenge. Achieving optimal performance across multiple metrics while ensuring high-quality network services will require continued joint efforts from both academia and industry.
Research on ECG Pathological Signal Classification Empowered by Diffusion Generative Data
GE Beining, CHEN Nuo, JIN Peng, SU Xin, LU Xiaochun
Available online  , doi: 10.11999/JEIT250404
Abstract:
  Objective  ElectroCardioGram (ECG) signals are key indicators of human health. However, their complex composition and diverse features make visual recognition prone to errors. This study proposes a classification algorithm for ECG pathological signals based on data generation. A Diffusion Generative Network (DGN), also known as a diffusion model, progressively adds noise to real ECG signals until they approach a noise distribution, thereby facilitating model processing. To improve generation speed and reduce memory usage, a Knowledge Distillation-Diffusion Generative Network (KD-DGN) is proposed, which demonstrates superior memory efficiency and generation performance compared with the traditional DGN. This work compares the memory usage, generation efficiency, and classification accuracy of DGN and KD-DGN, and analyzes the characteristics of the generated data after lightweight processing. In addition, the classification effects of the original MIT-BIH dataset and an extended dataset (MIT-BIH-PLUS) are evaluated. Experimental results show that convolutional networks extract richer feature information from the extended dataset generated by DGN, leading to improved recognition performance of ECG pathological signals.  Methods  The generative network-based ECG signal generation algorithm is designed to enhance the performance of convolutional networks in ECG signal classification. The process begins with a Gaussian noise-based image perturbation algorithm, which obscures the original ECG data by introducing controlled randomness. This step simulates real-world variability, enabling the model to learn more robust representations. A diffusion generative algorithm is then applied to reconstruct and reproduce the data, generating synthetic ECG signals that preserve the essential characteristics of the original categories despite the added noise. This reconstruction ensures that the underlying features of ECG signals are retained, allowing the convolutional network to extract more informative features during classification. To improve efficiency, the approach incorporates knowledge distillation. A teacher-student framework is adopted in which a lightweight student model is trained from the original, more complex teacher ECG data generation model. This strategy reduces computational requirements and accelerates the data generation process, improving suitability for practical applications. Finally, two comparative experiments are designed to validate the effectiveness and accuracy of the proposed method. These experiments evaluate classification performance against existing approaches and provide quantitative evidence of its advantages in ECG signal processing.  Results and Discussions  The data generation algorithm yields ECG signals with a Signal-to-Noise Ratio (SNR) comparable to that of the original data, while presenting more discernible signal features. The student model constructed through knowledge distillation produces ECG samples with the same SNR as those generated by the teacher model, but with substantially reduced complexity. Specifically, the student model achieves a 50% reduction in size, 37.5% lower memory usage, and a 57% shorter runtime compared with the teacher model (Fig. 6). When the convolutional network is trained with data generated by the KD-DGN, its classification performance improves across all metrics compared with a convolutional network trained without KD-DGN. Precision reaches 95.7%, and the misidentification rate is reduced to approximately 3% (Fig. 9).  Conclusions  The DGN provides an effective data generation strategy for addressing the scarcity of ECG datasets. By supplying additional synthetic data, it enables convolutional networks to extract more diverse class-specific features, thereby improving recognition performance and reducing misidentification rates. Optimizing DGN with knowledge distillation further enhances efficiency, while maintaining SNR equivalence with the original DGN. This optimization reduces computational cost, conserves machine resources, and supports simultaneous task execution. Moreover, it enables the generation of new data without LOSS, allowing convolutional networks to learn from larger datasets at lower cost. Overall, the proposed approach markedly improves the classification performance of convolutional networks on ECG signals. Future work will focus on further algorithmic optimization for real-world applications.
Cross Modal Hashing of Medical Image Semantic Mining for Large Language Model
LIU Qinghai, WU Qianlin, LUO Jia, TANG Lun, XU Liming
Available online  , doi: 10.11999/JEIT250529
Abstract:
  Objective  A novel cross-modal hashing framework driven by Large Language Models (LLMs) is proposed to address the semantic misalignment between medical images and their corresponding textual reports. The objective is to enhance cross-modal semantic representation and improve retrieval accuracy by effectively mining and matching semantic associations between modalities.  Methods  The generative capacity of LLMs is first leveraged to produce high-quality textual descriptions of medical images. These descriptions are integrated with diagnostic reports and structured clinical data using a dual-stream semantic enhancement module, designed to reinforce inter-modality alignment and improve semantic comprehension. A structural similarity-guided hashing scheme is then developed to encode both visual and textual features into a unified Hamming space, ensuring semantic consistency and enabling efficient retrieval. To further enhance semantic alignment, a prompt-driven attention template is introduced to fuse image and text features through fine-tuned LLMs. Finally, a contrastive loss function with hard negative mining is employed to improve representation discrimination and retrieval accuracy.  Results and Discussions  Experiments are conducted on a multimodal medical dataset to compare the proposed method with existing cross-modal hashing baselines. The results indicate that the proposed method significantly outperforms baseline models in terms of precision and Mean Average Precision (MAP) (Table 3; Table 4). On average, a 7.21% improvement in retrieval accuracy and a 7.72% increase in MAP are achieved across multiple data scales, confirming the effectiveness of the LLM-driven semantic mining and hashing approach.  Conclusions  Experiments are conducted on a multimodal medical dataset to compare the proposed method with existing cross-modal hashing baselines. The results indicate that the proposed method significantly outperforms baseline models in terms of precision and Mean Average Precision (MAP) (Table 3; Table 4). On average, a 7.21% improvement in retrieval accuracy and a 7.72% increase in MAP are achieved across multiple data scales, confirming the effectiveness of the LLM-driven semantic mining and hashing approach.
Breakthrough in Solving NP-Complete Problems Using Electronic Probe Computers
XU Jin, YU Le, YANG Huihui, JI Siyuan, ZHANG Yu, YANG Anqi, LI Quanyou, LI Haisheng, ZHU Enqiang, SHI Xiaolong, WU Pu, SHAO Zehui, LENG Huang, LIU Xiaoqing
Available online  , doi: 10.11999/JEIT250352
Abstract:
This study presents a breakthrough in addressing NP-complete problems using a newly developed Electronic Probe Computer (EPC60). The system employs a hybrid serial–parallel computational model and performs large-scale parallel operations through seven probe operators. In benchmark tests on 3-coloring problems in graphs with 2,000 vertices, EPC60 achieves 100% accuracy, outperforming the mainstream solver Gurobi, which succeeds in only 6% of cases. Computation time is reduced from 15 days to 54 seconds. The system demonstrates high scalability and offers a general-purpose solution for complex optimization problems in areas such as supply chain management, finance, and telecommunications.  Objective   NP-complete problems pose a fundamental challenge in computer science. As problem size increases, the required computational effort grows exponentially, making it infeasible for traditional electronic computers to provide timely solutions. Alternative computational models have been proposed, with biological approaches—particularly DNA computing—demonstrating notable theoretical advances. However, DNA computing systems continue to face major limitations in practical implementation.  Methods  Computational Model: EPC is based on a non-Turing computational model in which data are multidimensional and processed in parallel. Its database comprises four types of graphs, and the probe library includes seven operators, each designed for specific graph operations. By executing parallel probe operations, EPC efficiently addresses NP-complete problems.Structural Features:EPC consists of four subsystems: a conversion system, input system, computation system, and output system. The conversion system transforms the target problem into a graph coloring problem; the input system allocates tasks to the computation system; the computation system performs parallel operations via probe computation cards; and the output system maps the solution back to the original problem format.EPC60 features a three-tier hierarchical hardware architecture comprising a control layer, optical routing layer, and probe computation layer. The control layer manages data conversion, format transformation, and task scheduling. The optical routing layer supports high-throughput data transmission, while the probe computation layer conducts large-scale parallel operations using probe computation cards.  Results and Discussions  EPC60 successfully solved 100 instances of the 3-coloring problem for graphs with 2,000 vertices, achieving a 100% success rate. In comparison, the mainstream solver Gurobi succeeded in only 6% of cases. Additionally, EPC60 rapidly solved two 3-coloring problems for graphs with 1,500 and 2,000 vertices, which Gurobi failed to resolve after 15 days of continuous computation on a high-performance workstation.Using an open-source dataset, we identified 1,000 3-colorable graphs with 1,000 vertices and 100 3-colorable graphs with 2,000 vertices. These correspond to theoretical complexities of O(1.3289n) for both cases. The test results are summarized in Table 1.Currently, EPC60 can directly solve 3-coloring problems for graphs with up to n vertices, with theoretical complexity of at least O(1.3289n).On April 15, 2023, a scientific and technological achievement appraisal meeting organized by the Chinese Institute of Electronics was held at Beijing Technology and Business University. A panel of ten senior experts conducted a comprehensive technical evaluation and Q&A session. The committee reached the following unanimous conclusions:1. The probe computer represents an original breakthrough in computational models.2. The system architecture design demonstrates significant innovation.3. The technical complexity reaches internationally leading levels.4. It provides a novel approach to solving NP-complete problems.Experts at the appraisal meeting stated, “This is a major breakthrough in computational science achieved by our country, with not only theoretical value but also broad application prospects.” In cybersecurity, EPC60 has also demonstrated remarkable potential. Supported by the National Key R&D Program of China (2019YFA0706400), Professor Xu Jin’s team developed an automated binary vulnerability mining system based on a function call graph model. Evaluation of the system using the Modbus Slave software showed over 95% vulnerability coverage, far exceeding the 75 vulnerabilities detected by conventional depth-first search algorithms. The system also discovered a previously unknown flaw, the “Unauthorized Access Vulnerability in Changyuan Shenrui PRS-7910 Data Gateway” (CNVD-2020-31406), highlighting EPC60’s efficacy in cybersecurity applications.The high efficiency of EPC60 derives from its unique computational model and hardware architecture. Given that all NP-complete problems can be polynomially reduced to one another, EPC60 provides a general-purpose solution framework. It is therefore expected to be applicable in a wide range of domains, including supply chain management, financial services, telecommunications, energy, and manufacturing.  Conclusions   The successful development of EPC offers a novel approach to solving NP-complete problems. As technological capabilities continue to evolve, EPC is expected to demonstrate strong computational performance across a broader range of application domains. Its distinctive computational model and hardware architecture also provide important insights for the design of next-generation computing systems.
Personalized Federated Learning Method Based on Collation Game and Knowledge Distillation
SUN Yanhua, SHI Yahui, LI Meng, YANG Ruizhe, SI Pengbo
Available online  , doi: 10.11999/JEIT221203
Abstract:
To overcome the limitation of the Federated Learning (FL) when the data and model of each client are all heterogenous and improve the accuracy, a personalized Federated learning algorithm with Collation game and Knowledge distillation (pFedCK) is proposed. Firstly, each client uploads its soft-predict on public dataset and download the most correlative of the k soft-predict. Then, this method apply the shapley value from collation game to measure the multi-wise influences among clients and quantify their marginal contribution to others on personalized learning performance. Lastly, each client identify it’s optimal coalition and then distill the knowledge to local model and train on private dataset. The results show that compared with the state-of-the-art algorithm, this approach can achieve superior personalized accuracy and can improve by about 10%.
The Range-angle Estimation of Target Based on Time-invariant and Spot Beam Optimization
Wei CHU, Yunqing LIU, Wenyug LIU, Xiaolong LI
Available online  , doi: 10.11999/JEIT210265
Abstract:
The application of Frequency Diverse Array and Multiple Input Multiple Output (FDA-MIMO) radar to achieve range-angle estimation of target has attracted more and more attention. The FDA can simultaneously obtain the degree of freedom of transmitting beam pattern in angle and range. However, its performance is degraded due to the periodicity and time-varying of the beam pattern. Therefore, an improved Estimating Signal Parameter via Rotational Invariance Techniques (ESPRIT) algorithm to estimate the target’s parameters based on a new waveform synthesis model of the Time Modulation and Range Compensation FDA-MIMO (TMRC-FDA-MIMO) radar is proposed. Finally, the proposed method is compared with identical frequency increment FDA-MIMO radar system, logarithmically increased frequency offset FDA-MIMO radar system and MUltiple SIgnal Classification (MUSIC) algorithm through the Cramer Rao lower bound and root mean square error of range and angle estimation, and the excellent performance of the proposed method is verified.
Wireless Communication and Internet of Things
Full Field-of-View Optical Calibration with Microradian-Level Accuracy for Space Laser Communication Terminals on Low-Earth-Orbit Constellation Applications
XIE Qingkun, XU Changzhi, BIAN Jingying, ZHENG Xiaosong, ZHANG Bo
Available online  , doi: 10.11999/JEIT250734
Abstract:
  Objective  The Coarse Pointing Assembly (CPA) is a core element in laser communication systems and supports wide-field scanning, active orbit-attitude compensation, and dynamic disturbance isolation. To address multi-source disturbances such as orbital perturbations and attitude maneuvers, a high-precision, high-bandwidth, and fast-response Pointing, Acquisition, and Tracking (PAT) algorithm is required. Establishing a full Field-Of-View (FOV) optical calibration model between the CPA and the detector is essential for suppressing image degradation caused by spatial pointing deviations. Conventional calibration methods often rely on ray tracing to simulate beam offsets and infer calibration relationships, yet they show several limitations. These limitations include high modeling complexity caused by non-coaxial paths, multi-reflective surfaces, and freeform optics; susceptibility to systematic errors generated by assembly tolerances, detector non-uniformity, and thermal drift; and restricted applicability across the full FOV due to spatial anisotropy. A high-precision calibration method that remains effective across the entire FOV is therefore needed to overcome these challenges and ensure stable and reliable laser communication links.  Methods  To achieve precise CPA-detector calibration and address the limitations of traditional approaches, this paper presents a full FOV optical calibration method with microradian-level accuracy. Based on the optical design characteristics of periscope-type laser terminals, an equivalent optical transmission model of the CPA is established and the mechanism of image rotation is examined. Leveraging the structural rigidity of the optical transceiver channel, the optical transmission matrix is simplified to a constant matrix, yielding a full-space calibration model that directly links CPA micro-perturbations to spot displacements. By correlating the CPA rotation angles between the calibration target points and the actual operating positions, the calibration task is further reduced to estimating the calibration matrix at the target points. Random micro-perturbations are applied to the CPA to induce corresponding micro-displacements of the detector spot. A calibration equation based on CPA motion and spot displacement is formulated, and the calibration matrix is obtained through least-squares regression. The full-space calibration relationship between the CPA and detector is then derived through matrix operations.  Results and Discussions  Using the proposed calibration method, an experimental platform (Fig. 4) is constructed for calibration and verification with a periscope laser terminal. Accurate measurements of the conjugate motion relationship between the CPA and the CCD detector spot are obtained (Table. 1). To evaluate calibration accuracy and full-space applicability, systematic verification is conducted through single-step static pointing and continuous dynamic tracking. In the static pointing verification, the mechanical rotary table is moved to three extreme diagonal positions, and the CPA performs open-loop pointing based on the established CPA-detector calibration relationship. Experimental results show that the spot reaches the intended target position (Fig. 5), with a pointing accuracy below 12 mrad (RMS). In the dynamic tracking experiment, system control parameters are optimized to maintain stable tracking of the platform beam. During low-angular-velocity motion of the rotary table, the laser terminal sustains stable tracking (Fig. 6). The CPA trajectory shows a clear conjugate relationship with the rotary table motion (Fig. 6(a), Fig. 6(b)), and the tracking accuracy in both orthogonal directions is below 5 mrad (Fig. 6(c), Fig. 6(d)). The independence of the optical transmission matrix from the selection of calibration target points is also examined. By increasing the spatial accessibility of calibration points, the method reduces operational complexity while maintaining calibration precision. Improved spatial distribution of calibration points further enhances calibration efficiency and accuracy.  Conclusions  This paper presents a full FOV optical calibration method with microradian-level accuracy based on single-target micro-perturbation measurement. To satisfy engineering requirements for rapid linking and stable tracking, a full-space optical matrix model for CPA-detector calibration is constructed using matrix optics. Random micro-perturbations applied to the CPA at a single target point generate a generalized transfer equation, from which the calibration matrix is obtained through least-squares estimation. Experimental results show that the model mitigates image rotation, mirroring, and tracking anomalies, suppresses calibration residuals to below 12 mrad across the full FOV, and limits the dynamic tracking error to within 5 mrad per axis. The method eliminates the need for additional hardware and complex alignment procedures, providing a high-precision and low-complexity solution that supports rapid deployment in the mass production of Low-Earth-Orbit (LEO) laser terminals.
Special Topic on Converged Cloud and Network Environment
Lightweight Incremental Deployment for Computing-Network Converged AI Services
WANG Qinding, TAN bin, HUANG Guangping, DUAN Wei, YANG Dong, ZHANG Hongke
Available online  , doi: 10.11999/JEIT250663
Abstract:
  Objective   The rapid expansion of Artificial Intelligence (AI) computing services has heightened the demand for flexible access and efficient utilization of computing resources. Traditional Domain Name System (DNS) and IP-based scheduling mechanisms are constrained in addressing the stringent requirements of low latency and high concurrency, highlighting the need for integrated computing-network resource management. To address these challenges, this study proposes a lightweight deployment framework that enhances network adaptability and resource scheduling efficiency for AI services.  Methods   The AI-oriented Service IDentifier (AISID) is designed to encode service attributes into four dimensions: Object, Function, Method, and Performance. Service requests are decoupled from physical resource locations, enabling dynamic resource matching. AISID is embedded within IPv6 packets (Fig. 5), consisting of a 64-bit prefix for identification and a 64-bit service-specific suffix (Fig. 4). A lightweight incremental deployment scheme is implemented through hierarchical routing, in which stable wide-area routing is managed by ingress gateways, and fine-grained local scheduling is handled by egress gateways (Fig. 6). Ingress and egress gateways are incrementally deployed under the coordination of an intelligent control system to optimize resource allocation. AISID-based paths are encapsulated at ingress gateways using Segment Routing over IPv6 (SRv6), whereas egress gateways select optimal service nodes according to real-time load data using a weighted least-connections strategy (Fig. 8). AISID lifecycle management includes registration, query, migration, and decommissioning phases (Table 2), with global synchronization maintained by the control system. Resource scheduling is dynamically adjusted according to real-time network topology and node utilization metrics (Fig. 7).  Results and Discussions   Experimental results show marked improvements over traditional DNS/IP architectures. The AISID mechanism reduces service request initiation latency by 61.3% compared to DNS resolution (Fig. 9), as it eliminates the need for round-trip DNS queries. Under 500 concurrent requests, network bandwidth utilization variance decreases by 32.8% (Fig. 10), reflecting the ability of AISID-enabled scheduling to alleviate congestion hotspots. Computing resource variance improves by 12.3% (Fig. 11), demonstrating more balanced workload distribution across service nodes. These improvements arise from AISID’s precise semantic matching in combination with the hierarchical routing strategy, which together enhance resource allocation efficiency while maintaining compatibility with existing IPv6/DNS infrastructure (Fig. 2, Fig. 3). The incremental deployment approach further reduces disruption to legacy networks, confirming the framework’s practicality and viability for real-world deployment.  Conclusions   This study establishes a computing-network convergence framework for AI services based on semantic-driven AISID and lightweight deployment. The key innovations include AISID’s semantic encoding, which enables dynamic resource scheduling and decoupled service access, together with incremental gateway deployment that optimizes routing without requiring major modifications to legacy networks. Experimental validation demonstrates significant improvements in latency reduction, bandwidth efficiency, and balanced resource utilization. Future research will explore AISID’s scalability across heterogeneous domains and its robustness under dynamic network conditions.
Geospatial Identifier Network Modal Design and Scenario Applications for Vehicle-infrastructure Cooperative Networks
PAN Zhongxia, SHEN Congqi, LUO Hanguang, ZHU Jun, ZOU Tao, LONG Keping
Available online  , doi: 10.11999/JEIT250807
Abstract:
  Objective  Vehicle-infrastructure cooperative Networks (V2X)are open and contain large numbers of nodes with high mobility, frequent topology changes, unstable wireless channels, and varied service requirements. These characteristics create challenges to efficient data transmission. A flexible network that supports rapid reconfiguration to meet different service requirements is considered essential in Intelligent Transportation Systems (ITS). With the development of programmable network technologies, programmable data-plane techniques are shifting the architecture from rigid designs to adaptive and flexible systems. In this work, a protocol standard based on geospatial information is proposed and combined with a polymorphic network architecture to design a geospatial identifier network modal. In this modal, the traditional three-layer protocol structure is replaced by packet forwarding based on geospatial identifiers. Packets carry geographic location information, and forwarding is executed directly according to this information. Addressing and routing based on geospatial information are more efficient and convenient than traditional IP-based approaches. A vehicle-infrastructure cooperative traffic system based on geospatial identifiers is further designed for intelligent transportation scenarios. This system supports direct geographic forwarding for road safety message dissemination and traffic information exchange. It enhances safety and improves route-planning efficiency within V2X.  Methods  The geospatial identifier network modal is built on a protocol standard that uses geographic location information and a flexible polymorphic network architecture. In this design, the traditional IP addressing mechanism in the three-layer network is replaced by a geospatial identifier protocol, and addressing and routing are executed on programmable polymorphic network elements. To support end-to-end transmission, a protocol stack for the geospatial identifier network modal is constructed, enabling unified transmission across different network modals. A dynamic geographic routing mechanism is further developed to meet the transmission requirements of the GEO modal. This mechanism functions in a multimodal network controller and uses the relatively stable coverage of roadside base stations to form a two-level mapping: “geographic region-base station/geographic coordinates-terminal.” This mapping supports precise path matching for GEO modal packets and enables flexible, centrally controlled geographic forwarding. To verify the feasibility of the geospatial identifier network modal, a vehicle-infrastructure cooperative intelligent transportation system supporting geospatial identifier addressing is developed. The system is designed to facilitate efficient dissemination of road safety and traffic information. The functional requirements of the system are analyzed, and the business processing flow and overall architecture are designed. Key hardware and software modules are also developed, including the geospatial representation data-plane code, traffic control center services, roadside base stations, and in-vehicle terminals, and their implementation logic is presented.  Results and Discussions  System evaluation is carried out from four aspects: evaluation environment, operational effectiveness, theoretical analysis, and performance testing. A prototype intelligent transportation system is deployed, as shown in Figure 7 and Figure 8. The prototype demonstrates correct message transmission based on the geospatial identifier modal. A typical vehicle-to-vehicle communication case is used to assess forwarding efficiency, where an onboard terminal (T3) sends a road-condition alert (M) to another terminal (T2). Sequence-based analysis is applied to compare forwarding performance between the GEO modal and a traditional IP protocol. Theoretical analysis indicates that the GEO modal provides higher forwarding efficiency, as shown in Fig. 9. Additional performance tests are conducted by adjusting the number of terminals (Fig. 10), background traffic (Fig. 11), and the traffic of the control center (Fig. 12) to observe the transmission behavior of geospatial identifier packets. The results show that the intelligent transportation system maintains stable and efficient transmission performance under varying network conditions. System evaluation confirms its suitability for typical vehicle-infrastructure cooperative communication scenarios, supporting massive connectivity and elastic traffic loads.  Conclusions  By integrating a flexible polymorphic network architecture with a protocol standard based on geographic information, a geospatial identifier network modal is developed and implemented. The modal enables direct packet forwarding based on geospatial location. A prototype vehicle-infrastructure cooperative intelligent transportation system using geospatial identifier addressing is also designed for intelligent transportation scenarios. The system supports applications such as road-safety alerts and traffic information broadcasting, improves vehicle safety, and enhances route-planning efficiency. Experimental evaluation shows that the system maintains stable and efficient performance under typical traffic conditions, including massive connectivity, fluctuating background traffic, and elastic service loads. With the continued development of vehicular networking technologies, the proposed system is expected to support broader intelligent transportation applications and contribute to safer and more efficient mobility systems.
Vision Enabled Multimodal Integrated Sensing and Communications: Key Technologies and Prototype Validation
ZHAO Chuanbin, XU Weihua, LIN bo, ZHANG Tengyu, FENG Yuan, GAO Feifei
Available online  , doi: 10.11999/JEIT250685
Abstract:
  Objective  Integrated Sensing And Communications (ISAC) is regarded as a key enabling technology for Sixth-Generation mobile communications (6G), as it simultaneously senses and monitors information in the physical world while maintaining communication with users. The technology supports emerging scenarios such as low-altitude economy, digital twin systems, and vehicle networking. Current ISAC research primarily concentrates on wireless devices that include base stations and terminals. Visual sensing, which provides strong visibility and detailed environmental information, has long been a major research direction in computer science. This study proposes the integration of visual sensing with wireless-device sensing to construct a multimodal ISAC system. In this system, visual sensing captures environmental information to assist wireless communications, and wireless signals help overcome limitations inherent to visual sensing.  Methods  The study first explores the correlation mechanism between environmental vision and wireless communications. Key algorithms for visual-sensing-assisted wireless communication are then discussed, including beam prediction, occlusion prediction, and resource scheduling and allocation methods for multiple base stations and users. These schemes demonstrate that visual sensing, used as prior information, enhances the communication performance of the multimodal ISAC system. The sensing gains provided by wireless devices combined with visual sensors are subsequently explored. A static-environment reconstruction scheme and a dynamic-target sensing scheme based on wireless-visual fusion are proposed to obtain global information about the physical world. In addition, a “vision-communication” simulation and measurement dataset is constructed, establishing a complete theoretical and technical framework for multimodal ISAC.  Results and Discussions  For visual-sensing-assisted wireless communications, the hardware prototype system constructed in this study is shown in (Fig. 6) and (Fig. 7), and the corresponding hardware test results are presented in (Table 1). The results show that visual sensing assists millimetre-wave communications in performing beam alignment and beam prediction more effectively, thereby improving system communication performance. For wireless-communication-assisted sensing, the hardware prototype system is shown in (Fig. 8), and the experimental results are shown in (Fig. 9) and (Table 2). The static-environment reconstruction obtained through wireless-visual fusion shows improved robustness and higher accuracy. Depth estimation based on visual and communication fusion also presents strong robustness in rainy and snowy weather, with the RMSE reduced by approximately 50% compared with pure visual algorithms. These experimental results indicate that vision-enabled multimodal ISAC systems present strong potential for practical application.  Conclusions  A multimodal ISAC system that integrates visual sensing with wireless-device sensing is proposed. In this system, visual sensing captures environmental information to assist wireless communications, and wireless signals help overcome the inherent limitations of visual sensing. Key algorithms for visual-sensing-assisted wireless communication are examined, including beam prediction, occlusion prediction, and resource scheduling and allocation for multiple base stations and users. The sensing gains brought by wireless devices combined with visual sensors are also analysed. Static-environment reconstruction and dynamic-target sensing schemes based on wireless-visual fusion are proposed to obtain global information about the physical world. A “vision-communication” simulation and measurement dataset is further constructed, forming a coherent theoretical and technical framework for multimodal ISAC. Experimental results show that vision-enabled multimodal ISAC systems present strong potential for use in 6G networks.
Robust Resource Allocation Algorithm for Active Reconfigurable Intelligent Surface-Assisted Symbiotic Secure Communication Systems
MA Rui, LI Yanan, TIAN Tuanwei, LIU Shuya, DENG Hao, ZHANG Jinlong
Available online  , doi: 10.11999/JEIT250811
Abstract:
  Objective  Research on Reconfigurable Intelligent Surface (RIS)-assisted symbiotic radio systems is mainly centered on passive RIS. In practice, passive RIS suffers from a pronounced double-fading effect, which restricts capacity gains in scenarios dominated by strong direct paths. This work examines the use of active RIS, whose amplification capability increases the signal-to-noise ratio of the secondary signal and strengthens the security of the primary signal. Imperfect Successive Interference Cancellation (SIC) is considered, and a penalized Successive Convex Approximation (SCA) algorithm based on alternating optimization is analyzed to enable robust resource allocation.  Methods  The original optimization problem is difficult to address directly because it contains complex and non-convex constraints. An alternating optimization strategy is therefore adopted to decompose the problem into two subproblems: the design of the transmit beamforming vector at the primary transmitter and the design of the reflection coefficient matrix at the active RIS. Variable substitution, equivalent transformation, and a penalty-based SCA method are then applied in an alternating iterative manner. For the beamforming design, the rank-one constraint is first transformed into an equivalent form. The penalty-based SCA method is used to recover the rank-one optimal solution, after which iterative optimization is carried out to obtain the final result. For the reflection coefficient matrix design, the problem is reformulated and auxiliary variables are introduced to avoid feasibility issues. A penalty-based SCA approach is then used to handle the rank-one constraint, and the solution is obtained using the CVX toolbox. Based on these procedures, a penalty-driven robust resource allocation algorithm is established through alternating optimization.  Results and Discussions  The convergence curves of the proposed algorithm under different numbers of primary transmitter antennas (K) and RIS reflecting elements (N) is shown (Fig.3). The total system power consumption decreases as the number of iterations increases and converges within a finite number of steps. The relationship between total power consumption and the Signal-to-Interference-and-Noise Ratio (SINR) threshold of the secondary signal is illustrated (Fig. 4). As the SINR threshold increases, the system requires more power to maintain the minimum service quality of the secondary signal, which results in higher total power consumption. In addition, as the imperfect interference cancellation factor decreases, the total power consumption is further reduced. To compare performance, three baseline algorithms are examined (Fig. 5): the passive RIS, the active RIS with random phase shift, and the non-robust algorithm. The total system power consumption under the proposed algorithm remains lower than that of the passive RIS and the active RIS with random phase shift. Although the active RIS consumes additional power, the corresponding reduction in transmit power more than compensates for this consumption, thereby improving overall energy efficiency. When random phase shifts are applied, the active beamforming and amplification capabilities of the RIS cannot be fully utilized. This forces the primary transmitter to compensate alone to meet performance constraints, which increases its power consumption. Furthermore, because imperfect SIC is considered in the proposed algorithm, additional transmit power is required to counter residual interference and satisfy the minimum SINR constraint of the secondary system. Therefore, the total power consumption remains higher than that of the non-robust algorithm. The effect of the secrecy rate threshold of the primary signal on the secure energy efficiency of the primary system under different values of N is shown (Fig. 6). The results indicate that an optimal secrecy rate threshold exists that maximizes the secure energy efficiency of the primary system. To investigate the effect of active RIS placement on total system power consumption, the node positions are rearranged (Fig. 7). As the active RIS is positioned closer to the receiver, the fading effect weakens and the total system power consumption decreases.  Conclusions  This paper investigates the total power consumption of an active RIS-assisted symbiotic secure communication system under imperfect SIC. To enhance system energy efficiency, a total power minimization problem is formulated with constraints on the quality of service for both primary and secondary signals and on the power and phase shift of the active RIS. To address the non-convexity introduced by uncertain disturbance parameters, variable substitution, equivalent transformation, and a penalty-based SCA method are applied to convert the original formulation into a convex optimization problem. Simulation results confirm the effectiveness of the proposed algorithm and show that it achieves a notable reduction in total system power consumption compared with benchmark schemes.
Service Migration Algorithm for Satellite-terrestrial Edge Computing Networks
FENG Yifan, WU Weihong, SUN Gang, WANG Ying, LUO Long, YU Hongfang
Available online  , doi: 10.11999/JEIT250835
Abstract:
  Objective   In highly dynamic Satellite-Terrestrial Edge Computing Networks (STECN), achieving coordinated optimization between user service latency and system migration cost is a central challenge in service migration algorithm design. Existing approaches often fail to maintain stable performance in such environments. To address this, a Multi-Agent Service Migration Optimization (MASMO) algorithm based on multi-agent deep reinforcement learning is proposed to provide an intelligent and forward-looking solution for dynamic service management in STECN.  Methods   The service migration optimization problem is formulated as a Multi-Agent Markov Decision Process (MAMDP), which offers a framework for sequential decision-making under uncertainty. The environment represents the spatiotemporal characteristics of a Low Earth Orbit (LEO) satellite network, where satellite movement and satellite-user visibility define time-varying service availability. Service latency is expressed as the sum of transmission delay and computation delay. Migration cost is modeled as a function of migration distance between satellite nodes to discourage frequent or long-range migrations. A Trajectory-Aware State Enhancement (TASE) method is proposed to incorporate predictable orbital information of LEO satellites into the agent state representation, improving proactive and stable migration actions. Optimization is performed using the recurrent Multi-Agent Proximal Policy Optimization (rMAPPO) algorithm, which is suitable for cooperative multi-agent tasks. The reward function balances the objectives by penalizing high migration cost and rewarding low service latency.  Results and Discussions  Simulations are conducted in dynamic STECN scenarios to compare MASMO with MAPPO, MADDPG, Greedy, and Random strategies. The results consistently confirm the effectiveness of MASMO. As the number of users increases, MASMO shows slower performance degradation. With 16 users, it reduces average service latency by 2.90%, 6.78%, 11.01%, and 14.63% compared with MAPPO, MADDPG, Greedy, and Random. It also maintains high cost efficiency, lowering migration cost by up to 14.69% at 12 users (Fig. 3). When satellite resources increase, MASMO consistently leverages the added availability to reduce both latency and migration cost, whereas myopic strategies such as Greedy do not exhibit similar improvements. With 10 satellites, MASMO achieves the lowest service latency and outperforms the next-best method by 7.53% (Fig. 4). These findings show that MASMO achieves an effective balance between transmission latency and migration latency through its forward-looking decision policy.  Conclusions   This study addresses the service migration challenge in STECN through the MASMO algorithm, which integrates the TASE method with rMAPPO. The method improves service latency and reduces migration cost at the same time, demonstrating strong performance advantages. The trajectory-enhanced state representation improves foresight and stability of migration behavior in predictable dynamic environments. This study assumes ideal real-time state perception, and future work should evaluate communication delays and partial observability, as well as investigate scalability in larger satellite constellations with heterogeneous user demands.
Satellite Navigation
Research on GRI Combination Design of eLORAN System
LIU Shiyao, ZHANG Shougang, HUA Yu
Available online  , doi: 10.11999/JEIT201066
Abstract:
To solve the problem of Group Repetition Interval (GRI) selection in the construction of the enhanced LORAN (eLORAN) system supplementary transmission station, a screening algorithm based on cross interference rate is proposed mainly from the mathematical point of view. Firstly, this method considers the requirement of second information, and on this basis, conducts a first screening by comparing the mutual Cross Rate Interference (CRI) with the adjacent Loran-C stations in the neighboring countries. Secondly, a second screening is conducted through permutation and pairwise comparison. Finally, the optimal GRI combination scheme is given by considering the requirements of data rate and system specification. Then, in view of the high-precision timing requirements for the new eLORAN system, an optimized selection is made in multiple optimal combinations. The analysis results show that the average interference rate of the optimal combination scheme obtained by this algorithm is comparable to that between the current navigation chains and can take into account the timing requirements, which can provide referential suggestions and theoretical basis for the construction of high-precision ground-based timing system.