Advanced Search
Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes /issues, but are citable by Digital Object Identifier (DOI).
Display Method:
A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents
WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian
 doi: 10.11999/JEIT250818
[Abstract](560) [FullText HTML](148) [PDF 3337KB](107)
Abstract:
  Objective   The rapid advancement of Remote Sensing (RS) technology has reshaped Earth observation research, shifting the field from static image analysis to intelligent, goal-oriented cognitive decision-making. Modern RS systems are expected to perceive complex scenes, reason over heterogeneous information, decompose high-level objectives into executable subtasks, and make decisions under uncertainty. These requirements motivate the development of RS agents, which extend perception models to include reasoning, planning, and interaction functions. However, existing RS datasets remain task-centric and fragmented, as they are usually designed for single-purpose supervised learning such as object detection or land-cover classification. They seldom support multimodal reasoning, instruction following, or multi-step decision-making, all of which are essential for agentic workflows. Current RS vision-language datasets also have limited scale, constrained modality coverage, and simplified text annotations, with insufficient use of non-optical data such as Synthetic Aperture Radar (SAR) and infrared imagery. They further lack instruction-driven interactions that reflect real human-agent collaboration. This study constructs a large-scale multimodal image-text instruction dataset tailored for RS agents. The objective is to establish a unified data foundation that supports perception, reasoning, planning, and decision-making. By training models on structured instructions across diverse modalities and task categories, the dataset supports the development and evaluation of next-generation RS foundation models with agentic capability.  Methods   The dataset is built through a systematic and extensible framework that integrates multi-source RS imagery with instruction-oriented textual supervision. A unified input-output paradigm is defined to ensure compatibility across heterogeneous tasks and model architectures. This paradigm formalizes interactions between visual inputs and language instructions, allowing models to process image pixels, text descriptions, spatial coordinates, region references, and action-oriented outputs. A standardized instruction schema encodes task objectives, constraints, and expected responses in a consistent format. The construction process includes three stages. (1) Data collection and integration: multimodal RS imagery is aggregated from authoritative sources, covering optical, SAR, and infrared modalities with different spatial resolutions, scene types, and geographic distributions. (2) Instruction generation: a hybrid strategy combines rule-based templates with refinement by Large Language Models (LLMs). Template-based generation ensures task completeness and structural consistency, whereas LLM rewriting improves linguistic diversity and instruction complexity. (3) Task categorization and organization: the dataset is organized into nine core task categories and 21 sub-datasets that span low-level perception, mid-level reasoning, and high-level decision-making. A validation pipeline performs automated syntax and format checks, cross-modal consistency verification, and manual review of representative samples to ensure semantic alignment between images and instructions.  Results and Discussions   The dataset contains more than 2 million multimodal instruction samples, making it one of the largest and most comprehensive instruction resources in the RS domain. The inclusion of optical, SAR, and infrared imagery supports cross-modal learning and reasoning across heterogeneous sensing mechanisms. Compared with existing RS datasets, this dataset emphasizes instruction diversity, task compositionality, and agent-oriented interaction rather than isolated perception tasks. Baseline experiments conducted using state-of-the-art multimodal LLMs and RS foundation models show that the dataset supports evaluation across the full spectrum of agentic capabilities, from visual grounding and reasoning to high-level decision-making. The experiments also highlight challenges inherent to RS data, including extreme scale variation, dense object distributions, and long-range spatial dependencies. These challenges indicate important research directions for improving multimodal reasoning and planning in complex RS environments.  Conclusions   This work presents a large-scale multimodal image-text instruction dataset designed for RS agents. By organizing data across nine task categories and 21 sub-datasets, it provides a unified and extensible benchmark for agent-centric RS research. The contributions include: (1) a unified multimodal instruction paradigm for RS agents; (2) a 2-million-sample dataset covering optical, SAR, and infrared modalities; (3) empirical validation demonstrating support for end-to-end agentic workflows from perception to decision-making; and (4) a comprehensive evaluation benchmark based on baseline experiments. Future work will extend the dataset to temporal and video-based RS scenarios, integrate dynamic decision-making processes, and further improve reasoning and planning capability in real-world, time-varying environments.
CaRS-Align: Channel Relation Spectra Alignment for Cross-Modal Vehicle Re-identification
SA Baihui, ZHUANG Jingyi, ZHENG Jinjie, ZHU Jianqing
 doi: 10.11999/JEIT250917
[Abstract](89) [FullText HTML](67) [PDF 7394KB](11)
Abstract:
  Objective  Visible and infrared images are two commonly used modalities in intelligent transportation scenarios and play a key role in vehicle re-identification. However, differences in imaging mechanisms and spectral responses lead to inconsistent visual characteristics between these modalities, which limits cross-modal vehicle re-identification. To address this problem, this paper proposes a Channel Relation Spectra Alignment (CaRS-Align) method that uses channel relation spectra, rather than channel-wise features, as the alignment target. This strategy reduces interference caused by imaging style differences at the relational-structure level. Within each modality, a channel relation spectrum is constructed to capture stable and semantically coordinated channel-to-channel relationships through correlation modeling. At the cross-modal level, the correlation between the corresponding channel relation spectra of the two modalities is maximized to achieve consistent alignment of relational structures. Experiments on the public MSVR310 and RGBN300 datasets show that CaRS-Align outperforms existing state-of-the-art methods. For example, on MSVR310, under infrared-to-visible retrieval, CaRS-Align achieves a Rank-1 accuracy of 64.35%, which is 2.58% higher than advanced existing methods.  Methods  CaRS-Align adopts a hierarchical optimization paradigm: (1) for each modality, a channel–channel relation spectrum is constructed by mining inter-channel dependencies, yielding a semantically coordinated relation matrix that preserves the organizational structure of semantic cues; (2) cross-modal consistency is achieved by maximizing the correlation between the relation spectra of the two modalities, enabling progressive optimization from intra-modal construction to cross-modal alignment; and (3) relation spectrum alignment is integrated with standard classification and retrieval objectives commonly used in re-identification to supervise backbone training for the vehicle re-identification model.  Results and Discussions  Compared with several state-of-the-art cross-modal re-identification methods on the RGBN300 and MSVR310 datasets, CaRS-Align demonstrates strong performance and achieves best or second-best results across both retrieval modes. As shown in (Table 1), on RGBN300 it attains 75.09% Rank-1 accuracy and 55.45% mean Average Precision (mAP) in the infrared-to-visible mode, and 76.60% Rank-1 accuracy and 56.12% mAP in the visible-to-infrared mode. As shown in (Table 2), similar advantages are observed on MSVR310, with 64.54% Rank-1 accuracy and 41.25% mAP in the visible-to-infrared mode, and 64.35% Rank-1 accuracy and 40.99% mAP in the infrared-to-visible mode. (Fig. 4) presents Top-10 retrieval results, where CaRS-Align reduces identity mismatches in both directions (Fig. 5) illustrates feature distance distributions, showing substantial overlap between intra-class and inter-class distances without CaRS-Align (Fig. 5(a)), whereas clearer separation is observed with CaRS-Align (Fig. 5(b)), confirming improved feature discrimination. These results indicate that modeling channel-level relational structures improves both retrieval modes, increases adaptability to modality shifts, and effectively reduces mismatches caused by cross-modal differences.  Conclusions  This paper proposes a visible–infrared cross-modal vehicle re-identification method based on CaRS-Align. Within each modality, a channel relation spectrum is constructed to preserve semantic co-occurrence structures. A CaRS-Align function is then designed to maximize the correlation between modalities, thereby achieving consistent alignment and improving cross-modal performance. Experiments on the MSVR310 and RGBN300 datasets demonstrate that CaRS-Align outperforms existing state-of-the-art methods in key metrics, including Rank-1 accuracy and mAP.
A Review of Causal Feature Learning in Deep Learning Image Classification Models
WANG Xiaodong, JIANG Ling, LI Huihui, WANG Buhong
 doi: 10.11999/JEIT250738
[Abstract](5) [FullText HTML](1) [PDF 1084KB](0)
Abstract:
  Significance   The Deep Learning mechanism is constructed based on statistical correlations rather than causal relationships. Consequently, severe challenges in terms of generalization, interpretability, and stability are inevitably faced by such models. In contrast to human cognition, which mainly relies on causal discovery and exploitation, current Deep Learning models are still confined to the bottom of the "Pearl Causal Hierarchy (PCH)". Thus, the integration of causal inference into Deep Learning is highly anticipated. As the most crucial branch of Deep Learning, image classification models (represented by Convolutional Neural Networks, CNNs) exhibit particularly prominent shortcomings, and the introduction of causal inference is urgently required to address the bottleneck. Among various solutions for integrating causal inference into these models, Causal Feature Learning (CFL), a framework that combines unsupervised machine learning and causal inference, exhibits significant advantages. It is confirmed by studies that causal relationships are implicitly embedded in the pixel information of input image data for image classification tasks. According to the proven Causal Coarsening Theorem (CCT), causal knowledge can be acquired from observed image data at minimal experimental cost. In classification tasks, the optimal solution is constituted by the Markov Boundary (MB) of the causal Bayesian network for the class variable. The research endeavor to establish a connection between deep image classification models and causal inference via CFL is strongly supported by these theories. In general, the research significance of CFL has become increasingly prominent, and it is positioned as one of the potential breakthrough directions in the development of next-generation models.  Progress   This paper presents a comprehensive survey of CFL in Deep Learning image classification models from three core issues: statistical causal inference theory, correlation analysis methods and CFL implementations. First, the relevant definitions of CFL technology and its two mainstream statistical implementation frameworks, including causal discovery based on the Structural Causal Model (SCM) and causal effect estimation based on the Rubin Causal Model (RCM), are introduced. Second, correlation analysis methods for Deep Learning image classification models, which lie at the threshold of the PCH, are systematically summarized from three perspectives: forward, backward, and horizontal. Third, following the auxiliary tools, the progress of CFL for image classification is classified into four main aspects: Causal Feature Discovery (CFD), Causal Feature Effect Estimation (CFEE), Causal Representation Learning (CRL) and Spurious Correlation Removal (SCR). CFD is grounded in the SCM framework, aiming to derive confounding-free causal graphs through explicit or implicit causal intervention analyses on image data or models. Under the RCM framework, CFEE leverages observed image data to complete the quantitative evaluation of the causal effects of features, while overcoming the impacts of unknown counterfactual samples and confounding biases. CRL focuses on selecting or extracting high-dimensional features from image data to learn causal relationships and mine low-dimensional cross-image representations. SCR eliminates non-causal features from images and preserves causal ones via diverse methods. In addition , available toolkits, top conference resources and academic organizations are listed. Furthermore, this paper discusses key technical issues and future research directions.  Conclusions  This review summarizes the technological development of CFL. In general, considerable progress has been made, but difficulties in different research directions still need to be overcome. The advantages of CFD lie in that it is based on the basic logic of causal theory with clear and simple structures and is easy to accept. However, CFD suffers from immature processing methods for high-dimensional image data and insufficient generalization ability. CFEE can effectively distinguish causal features from confounding features. Its evaluation results are closer to real decision-making logic and show strong universality. Common problems of CFEE include the requirement for observable confounding factors, high dependence on causal assumptions, insufficient computational efficiency. CRL has the advantages of more optional dimensions and the ability to discover causal factors that drive classification and exclude non-causal factors. The core problems to be solved currently include generalization bias, factor coupling, prior dependence, weak evaluation, and high cost. SCR has strong pertinence but poor generalization. From a macro perspective, the implementation of CFL should not be limited to specific methods. All methods that aim to build causal relationships from micro-variables such as image pixels to causal macro-variables such as global semantics can be included, so it is an open research topic.  Prospects   The goal of causal inference is to go beyond correlation and clarify the causal relationships between variables by designing more rigorous experiments or employing advanced statistical methods. This requires deeper assumptions about feature relationships and more generalizable exploration of underlying causal chains, both of which are highly challenging and will become the main focus of future scholars in this field. To address the technical challenges in CFL, this paper proposes that future research can focus on the following directions: (1) Unifying the construction paradigms and establishing standards for image-based Structural Causal Models (SCMs), so as to improve the standardization and consistency of causal discovery; (2) Developing the RCM supported by generative artificial intelligence, to address the problem of sample scarcity in causal effect estimation; (3) Reforming models with the aim of learning novel image causal representations, thereby fundamentally resolving the inherent deficiencies of CNNs in CFL; (4) Integrating spurious correlation analysis with reinforcement learning, and leveraging reinforcement learning to endow Deep Learning image classification models with meta-learning capabilities for causal exploration. It can be asserted that, with the resolution of these key issues in CFL, there must be a qualitative improvement in accuracy, generalization, interpretability, and stability of Deep Learning images classification models.
Optimized Implementation of Low-Depth Lightweight S-Boxes
FENG Zixi, LIU Yupeng, DOU Guowei, LIU Chengle
 doi: 10.11999/JEIT250690
[Abstract](7) [FullText HTML](2) [PDF 666KB](0)
Abstract:
  Objective  With the rapid development and widespread deployment of the Internet of Things (IoT), embedded systems, and mobile computing devices, ensuring secure communication and data protection on resource-constrained platforms has become a central focus in the field of information security. These devices are typically characterized by severe limitations in terms of computational capability, storage capacity, and energy consumption, which render traditional cryptographic algorithms inefficient or even infeasible in such environments. In response to these constraints, lightweight cryptographic algorithms have been proposed as an effective class of solutions. Their primary objective is to achieve comparable levels of security as traditional algorithms while significantly reducing the hardware and computational overhead through deliberate algorithmic simplifications and structural optimizations. These algorithms are designed to operate efficiently within tight resource bounds and are especially suitable for applications such as sensor networks, smart cards, RFID systems, and wearable devices. From the perspective of hardware implementation, the design of lightweight cryptographic algorithms must account for multiple performance indicators, including throughput, latency, power efficiency, chip area, and circuit depth. Among these, chip area and depth are considered particularly critical, as they directly influence the physical cost of production and the speed of computation. The Substitution-box (S-Box), as the core nonlinear component responsible for providing confusion in most symmetric encryption schemes, plays a decisive role in determining the security strength and implementation efficiency of the entire cipher. Therefore, exploring efficient methods to realize low-area and low-depth implementations of S-Boxes is of fundamental importance to the design of secure and practical lightweight cryptographic systems.  Methods  In this work, a novel S-Box optimization algorithm based on Boolean satisfiability (SAT) solving is proposed to simultaneously optimize two key hardware metrics: logic area and circuit depth. To this end, a circuit model with depth k and width w is constructed. Under a given area constraint, SAT solving techniques are employed to determine whether the circuit model can implement the target S-Box. By iteratively adjusting circuit depth, width, and area parameters, an optimized implementation scheme of the S-Box is eventually obtained. The method is specifically developed for 4-bit S-Boxes, which are widely adopted in many lightweight block ciphers, and it provides implementations that are highly efficient in both structural compactness and computational depth. This dual optimization approach helps to reduce hardware costs while maintaining low latency, making it especially suitable for scenarios where performance and energy efficiency are both critical. The proposed method begins by transforming the S-Box implementation problem into a formal SAT problem, enabling the use of powerful SAT solvers to exhaustively explore possible logic-level representations. In this transformation, a diverse set of logic gates—including 2-input, 3-input, and 4-input gates—is utilized to construct flexible logic networks. To enforce area and depth constraints, arithmetic operations such as binary addition and comparator logic are encoded into SAT-compatible Boolean constraints, which guide the solver toward low-area and low-depth solutions. To further accelerate the solving process and avoid redundant search paths, symmetry-breaking constraints are introduced. These constraints help eliminate logically equivalent but structurally different representations, thereby significantly reducing the size of the solution space. The Cadical SAT solver, known for its speed and efficiency in handling large-scale SAT problems, is employed to compute optimized S-Box implementations that minimize both depth and area. The proposed approach not only generates efficient implementations but also provides a general modeling framework that can be extended to other logic synthesis problems in cryptographic hardware design.  Results and Discussions  To validate the effectiveness of the proposed optimization method, a comprehensive set of experiments was conducted on 4-bit S-Boxes from several representative lightweight block ciphers, including Joltik, Piccolo, Rectangle, Skinny, Lblock, Lac, Midori, and Prøst. The results demonstrate that the method consistently produces high-quality implementations that are competitive or superior in terms of both chip area and circuit depth when compared with existing state-of-the-art results. Specifically, for the S-Boxes of Joltik and Piccolo, as well as for those used in Skinny and Rectangle, the generated implementations match the best known results in both metrics, indicating that the method can successfully reproduce optimal or near-optimal designs. In the cases of Lblock and Lac, although the logic area remains similar to prior results, the circuit depth is significantly reduced, from an initial value of 10 down to 3, which represents a substantial improvement in processing latency and suitability for real-time applications. For the inverse S-Box of the Rectangle cipher, the proposed implementation achieves the same circuit depth as previous designs but reduces the area from 24.33 gate equivalents (GE) to 17.66 GE, yielding a more compact and efficient realization. The optimization results for the Midori S-Box further confirm the effectiveness of the method, where both depth and area are improved—depth is reduced from 4 to 3, and area is brought down from 20.00 GE to 16.33 GE. For the Prøst cipher’s S-Box, two alternative implementations are presented to illustrate the trade-off between area and depth. The first achieves a depth of 4 with an area of 22.00 GE, matching the best known depth but at a higher area cost, while the second increases the depth to 5 but reduces the area significantly to 13.00 GE. These results demonstrate that the method not only supports flexible optimization under different design constraints but also contributes to a deeper understanding of the complexity and trade-offs involved in S-Box implementation.  Conclusions   This paper presents a SAT-based method for jointly optimizing S-box hardware implementations in terms of area and circuit depth. By modeling the S-box realization as a satisfiability problem and exploiting advanced constraint encoding, multi-input logic gates, and symmetry-breaking techniques, the method effectively reduces hardware complexity while maintaining or improving depth performance. Extensive experiments on various 4-bit S-boxes demonstrate that the proposed approach matches or outperforms existing results, particularly in reducing circuit depth and improving logic compactness. This makes it well suited for lightweight cryptographic systems operating under strict constraints on silicon area, speed, and energy consumption.Despite these advantages, the method still has limitations. While it achieves optimal or near-optimal results for 4-bit S-boxes, scalability to larger instances such as 5-bit or 8-bit S-boxes remains challenging due to the exponential growth of the search space and solving time. As model complexity increases, solving becomes computationally expensive and may not converge in practice. Future work will focus on improving modeling efficiency and solver performance through refined constraint generation, stronger pruning strategies, and heuristic-guided search, with the goal of extending the method to more complex S-boxes and other nonlinear components in lightweight and post-quantum cryptographic systems.
A Clipped NMS List Decoding Algorithm of LDPC Codes for 5G URLLC
ZHANG Xiaojun, SONG Xin, GAO Jian, MI Yonghao, NIU kai
 doi: 10.11999/JEIT250853
[Abstract](13) [FullText HTML](2) [PDF 2926KB](1)
Abstract:
  Objective  As one of the coding schemes in the fifth-generation (5G) wireless communication systems, Low-Density Parity-Check (LDPC) codes can achieve performance close to the Shannon limit through iterative decoding. However, in practical wireless transmission environments, the decoding performance of LDPC codes is susceptible to burst interference in wireless channels. The NMS decoding algorithm is highly sensitive to the distribution characteristics of input log-likelihood ratios (LLRs). Burst interference will cause LLRs to deviate from the Gaussian distribution, resulting in degradation in decoding performance. Meanwhile, 5G LDPC decoders are often equipped with a fixed number of processing units (PEs) according to the maximum lifting size to cover the full code length range. In URLLC (Ultra-Reliable Low-Latency Communications) short code transmission scenarios, the lifting size is much smaller than the maximum lifting size, leading to long-term idleness of a large number of processing units and insufficient utilization of hardware resources. To address the above issues, this paper proposes a Clipped Normalized Min-Sum List (CNMSL) decoding algorithm. By co-designing burst interference smoothing and idle resource reuse, it improves hardware resource utilization while enhancing decoding performance.  Methods  The statistical characteristics of LLRs over AWGN and interference channels are first analyzed, and the negative impact of burst interference on decoding performance is qualitatively illustrated to stem from the increased proportion of saturated LLRs induced by such interference. Next, the correlation between the optimal clipping threshold and channel noise variance, burst interference variance as well as burst probability is verified, which converges to a finite interval, the optimal threshold interval, when channel parameters undergo limited variations. On this basis, the CNMSL decoding algorithm is proposed. This algorithm constructs a list decoding architecture by reusing idle processing units in 5G LDPC decoders, where each decoding path performs independent and synchronous decoding to generate candidate codewords, and the optimal decoding result is screened out via CRC check. Meanwhile, an independent clipper is configured for each path with parameters set according to the optimal threshold interval, thereby effectively suppressing and mitigating the adverse effects of burst interference.  Results and Discussions  Experimental results show that the layered NMS algorithm almost fails to decode over interference channels without clipping mechanism. With a single clipping threshold, the algorithm works normally, and its BLER exhibits a convex-down trend of first decreasing and then increasing as the clipping threshold reduces. Under various channel conditions for both short and long codes, the single-clipping layered NMS algorithm with a clipping threshold of 3.5 achieves a gain of about 1 dB at \begin{document}$ BLER={10}^{-2} $\end{document} compared with that of 10, and the CNMSL algorithm further yields an additional gain of about 0.5 dB relative to the single-clipping NMS algorithm. In terms of hardware efficiency, when the lifting factor is less than 192, the PE utilization of the CNMSL algorithm is significantly higher than that of the layered NMS algorithm, with more remarkable improvement as the lifting factor decreases, and the average PE utilization of the CNMSL algorithm is increased by 69% compared with the layered NMS algorithm.  Conclusions  The CNMSL decoding algorithm is proposed in this paper, aiming to improve the error correction performance of the traditional layered NMS decoding algorithm over interference channels. By reusing idle PEs for list decoding to generate multiple candidate paths, the algorithm incurs no additional hardware overhead. In addition, an optimal threshold interval is defined to configure the clipper for each decoding path, which limits the proportion of saturated LLRs and makes the input LLRs follow a Gaussian or near-Gaussian distribution. Experimental results show that compared with the layered NMS decoding algorithm with a single clipper, the proposed CNMSL algorithm achieves a gain of approximately 0.5 dB for both short and long codes. Meanwhile, it increases the PE utilization by an average of 69%.
Drug Response Prediction Based on Graph Topology Attention Network
XU Peng, XU Hao, BAO Zhenshen, ZHOU Chi, LIU Wenbin
 doi: 10.11999/JEIT251099
[Abstract](12) [FullText HTML](6) [PDF 1149KB](1)
Abstract:
  Objective  A core goal in modern cancer research is to figure out why patients respond differently to the same therapy. Achieving this requires developing computational tools that combine genetic information and drug properties to forecast treatment outcomes, which is essential for advancing personalized oncology. Although some existing methods have made progress in predicting cancer drug responses, effectively extracting features of drugs and integrating multi-omics data from cell lines have become challenges. To address these challenges, employing Graph Neural Networks (GNNs) to process drug molecular graphs has become a promising strategy. This research proposes a model that utilizes a graph topology attention network to capture features from drug molecular graphs, while an attention mechanism is applied to integrate multi-omics data.  Methods  In this study, a drug response prediction method based on Graph Topology Attention Network(GTAT) is proposed. The model integrates topological graph information to predict drug responses in cell lines. The model utilizes drug SMILES strings to generate two distinct drug representations and incorporates multi-omics data for cell line characterization (Fig. 1). For drug feature extraction, SMILES strings are first parsed to construct molecular graphs, which are then processed by the GTAT. This network captures both the topological information of the molecular graph-level and atom-level features, thereby producing structured molecular representations. Simultaneously, Extended Connectivity Fingerprints are computed from the same SMILES strings and transformed into continuous feature vectors via a Multi-Layer Perceptron (MLP). The graph-based drug representation and the fingerprint-based representation are subsequently concatenated to form a comprehensive drug feature vector. For cell line representation, multi-omics data are processed through omics-specific neural networks. The resulting features are fused using multi-head self-attention mechanisms, enabling the model to capture contextual interactions across omics modalities and generate an integrated cell line representation. Finally, the drug and cell line features are combined and fed into an MLP classifier to predict drug response outcomes. The proposed model effectively integrates heterogeneous biological data sources and significantly enhances prediction accuracy through multi-modal learning and attention-based feature fusion.  Results and Discussions  The proposed method achieves competitive performance on both GDSC and CCLE benchmark datasets (Table 2). Specifically, on the GDSC dataset, our approach outperforms all competing methods across all four metrics—AUC, AUPR, F1-score, and Accuracy. Notably, it improves the AUPR by approximately 1.92% over the second-best method, MOFGCN, demonstrating its advantage in handling class imbalance. On the CCLE dataset, our method still achieves the best performance in terms of AUC and Accuracy. Although it is marginally lower than GADRP in AUPR and F1-score, the gap is minimal, and our approach exhibits more robust overall discriminative ability (as reflected by AUC). These results collectively validate the effectiveness and strong generalizability of our method in drug sensitivity prediction tasks. The observed variation in AUPR and F1-score performance between datasets can be attributed to inherent differences in sample size and class distribution characteristics. The limited scale of the CCLE dataset, combined with its specific class imbalance (approximately 4:1 ratio of resistant to sensitive samples), may constrain the model's capacity to fully learn the underlying data distribution, particularly for minority classes. In contrast, the GDSC dataset exhibits greater heterogeneity and a more pronounced class imbalance (approximately 8:1), which collectively contribute to increased prediction difficulty and consequently lower performance on certain metrics.  Conclusions  Accurately predicting drug response in cell lines remains a central challenge in precision medicine, with significant implications for accelerating drug development and advancing personalized treatment. However, constructing a high-accuracy predictive model capable of effectively integrating multi-source biological information is difficult due to the complexity of drug molecular structures and inherent heterogeneity of cell lines. To address this, a cell line drug response prediction model based on Graph Topology Attention Network is proposed. This model employs the graph topology attention network to extract molecular graph features of drugs, which are then fused with molecular fingerprint features. Meanwhile, multi-omics features of cell lines are integrated using an attention mechanism. Experimental results demonstrate that the proposed model achieves superior performance over existing state-of-the-art benchmarks on the employed dataset. This study provides a new perspective for predicting cell line drug response. Certain limitations are acknowledged, such as the use of only three types of omics features for cell line representation and the influence of sample size on predictive outcomes. The integration of more diverse omics features, the application of pre-trained large-scale models, and the clinical translation for personalized medicine will be the primary focus of future work.
Multi-dimensional Spatio-temporal Features Enhancement for Lip reading
MA JinLin, ZHONG YaoWei, MA RuiShi
 doi: 10.11999/JEIT251111
[Abstract](18) [FullText HTML](4) [PDF 1930KB](3)
Abstract:
  Objective  Lip reading is a challenging yet vital frontier in computer vision, dedicated to decoding spoken language solely from visual lip movements. The difficulty arises primarily from inherent ambiguities in the visual speech signal. On one hand, articulatory movements for different visemes can be extremely subtle. for instance, lip displacement differences as small as 0.3–0.7 mm for confusable pairs such as /p/–/b/ and /m/–/n/. These fine-grained spatial variations often lie below the effective resolution limits of conventional 3D convolutional neural networks. On the other hand, the natural co-articulation in speech introduces temporal ambiguity, where mouth shapes transiently blend multiple phonemes, making it difficult to isolate distinct visual units. These challenges are further compounded by real-world variables such as uneven lighting and significant inter-speaker articulation differences. As a result, current lip reading models frequently exhibit limitations in capturing discriminative spatiotemporal features, leading to suboptimal performance—especially for phonemes with minimal visual distinctions. Motivated by these issues, this work aims to develop a robust lip reading framework capable of effectively capturing and leveraging fine-grained spatiotemporal dependencies to improve recognition accuracy under diverse and realistic conditions.  Methods  To address the aforementioned limitations, this study proposes a novel lip reading framework named the Multi-dimensional Spatio-Temporal Enhancement Network (MSTEN), which is systematically designed to enhance spatial and temporal representations through integrated attention mechanisms and advanced residual learning. The framework incorporates three core components that collaboratively model the interdependencies between spatial and temporal features—an aspect often underutilized in conventional architectures. The first component, the Self-adjusting Spatio-temporal Attention (SaSTA) module, employs a self-adjusting mechanism operating concurrently across height, width, and temporal dimensions. It generates query, key, and value tensors via 1×1×1 3D convolutions, flattens them across spatial and temporal dimensions, and computes attention weights by multiplying the query with the transposed key, followed by softmax normalization. The resulting attention map is multiplied with the value vector and then combined with the original input via learnable parameters and a residual connection to preserve contextual information, yielding globally enhanced features. The second component, the Three-dimensional Enhanced Residual Block (TE-ResBlock), augments spatiotemporal feature extraction through temporal shift, multi-scale convolution, and channel shuffle. The temporal shift operation moves a quarter of the feature channels along the time axis to fuse adjacent frame information parameter-free, while multi-scale convolution uses parallel branches with kernel sizes of 3×3, 3×1, 1×3, and 1×1 to capture diverse receptive fields. Outputs are concatenated and processed via channel shuffle to improve cross-group information flow, with four TE-ResBlocks stacked for progressive feature refinement. The third component, the Multi-dimensional Adaptive Fusion (MDAF) module, deeply integrates spatial, temporal, and channel dimensions through three sub-modules: a Channel Enhancement Module (CEM) that recalibrates features using max pooling, temporal convolution, and sigmoid activation; a Spatial Enhancement Module (SEM) that expands the receptive field via identity mapping, standard and dilated convolution; and an Adaptive Temporal Capture Module (ATCM) that emphasizes dynamic movements using frame difference features and temporal weight maps. MDAF modules are inserted between TE-ResBlock stacks for iterative refinement. Finally, features from the MSTEN front-end are fed into a Densely Connected Temporal Convolutional Network (DC-TCN) back-end, which comprises four blocks, each containing three temporally convolutional layers with dense connections, to effectively model long-range phonological dependencies.  Results and Discussions  The proposed framework is comprehensively evaluated on the widely-used LRW dataset and GRID dataset, LRW comprising over 500,000 video clips from more than 1,000 speakers, GRID dataset consists of video clips from 34 speakers, with each speaker having 1,000 utterances and a total duration of 28 hours. Our model achieves an accuracy of 91.18%, representing an absolute improvement of 2.82 percentage points over a strong ResNet18 baseline, which underscores its substantial effectiveness. Ablation studies are conducted to dissect the contribution of each key component. The results clearly demonstrate that every proposed module brings a significant performance gain. Specifically, the introduction of the SaSTA module alone leads to an accuracy improvement of 2.09%, highlighting the crucial role of global spatiotemporal attention. The TE-ResBlock contributes a 1.73% increase, confirming its efficacy in multi-scale local feature extraction and inter-frame information fusion. Moreover, the MDAF module further enhances performance by 1.74%, emphasizing the benefit of adaptive multi-dimensional feature fusion, as detailed in Table 2.  Conclusions  This study presents a significant advancement in lipreading via the introduction of the MSTEN front-end network. The work is built upon three core contributions. First, the SaSTA module introduces an innovative mechanism for global context aggregation, effectively performing multi-dimensional feature weighting across height, width, and temporal sequences. Second, the TE-ResBlock tackles fundamental challenges in spatio-temporal modeling through a unique combination of temporal displacement, multi-scale convolution, and enhanced channel-wise interaction. Third, the MDAF module facilitates deep and synergistic integration of information from spatial, temporal, and channel dimensions. Together, these components work in concert to achieve state-of-the-art performance, reaching an accuracy of 91.18% on the challenging LRW dataset and 97.82% on the GRID dataset. Ablation studies further validate the individual and collective efficacy of each proposed innovation. Looking forward, future work will explore the extension of this framework to audio-visual speech recognition under noisy conditions, as well as the development of domain adaptation strategies to enhance robustness in low-resolution or resource-constrained scenarios.
Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting
YANG Zhenzhen, XU Yi, WANG Chengye, YANG Yongpeng
 doi: 10.11999/JEIT251188
[Abstract](9) [FullText HTML](7) [PDF 3339KB](2)
Abstract:
  Objective  With the rapid development of big data technology, time series data has been increasingly applied in areas such as meteorology, power systems, and finance. Nonetheless, mainstream methods for time series forecasting face notable challenges in multi-scale modeling and frequency-domain feature extraction, which prevents the comprehensive capture of crucial dynamic properties and periodic patterns in complex datasets. Traditional statistical approaches, including ARIMA, rely on assumptions of linear relationships, resulting in poor performance when handling nonlinear or high-dimensional time series data. Although deep learning methods, notably those based on convolutional neural network and Transformer, have improved forecasting accuracy through advanced feature extraction and long-range dependency modeling, limitations remain in the ability to efficiently extract and fuse multi-scale features, both in the temporal and frequency domains. These deficiencies lead to instability and suboptimal accuracy, particularly in dynamic and high-variety applications. This paper aims to address these challenges by proposing an intelligent forecasting framework that effectively models multi-scale information and enhances prediction accuracy in diverse scenarios.  Methods  The proposed method introduces a multi-scale frequency adapter and dual-path attention (MFADA) framework for time series forecasting. The framework integrates the multi-scale frequency adapter (MFA) and the multi-scale dual-path attention (MDA) two key modules. The MFA module efficiently captures multi-scale frequency features using the adaptive pooling and deep convolutions, which enhances the sensitivity to various frequency components and supports modeling of short-term and long-term dependencies. The MDA module applies a multi-scale attention mechanism to strengthen fine-grained modeling across both the temporal and feature dimensions, enabling effective extraction and fusion of comprehensive time and frequency information. The entire framework is designed with computational efficiency in mind to ensure scalability. Experimental validation on 8 public datasets demonstrates the superior performance and robustness compared to existing mainstream time series forecasting approaches.  Results and Discussions  Extensive experiments were conducted on 8 publicly available multivariate datasets, including ECL, Weather, ETT (ETTm1, ETTm2, ETTh1, ETTh2), Solar-Energy, and Traffic. The evaluation metrics used were mean absolute error (MAE) and mean squared error (MSE), with additional consideration given to parameter count, FLOPs, and training time for computational efficiency. Experimental comparisons with state-of-the-art models including Fredformer, Peri-midFormer, iTransformer, TFformer, PatchTST、MSGNet、TimesNet、TCM, show that the proposed MFADA consistently achieves superior forecasting performance across most datasets and forecasting horizons (Table 1), with the best average MSE and MAE of 0.163 and 0.261 on ECL and a 13.2% and 17.3% decrease versus TimesNet for forecasting length 96. On the periodic ETTm1 dataset, the average MSE reaches 0.377, outperforming MSGNet by 5.3%. Ablation studies (Table 2) demonstrate the importance of both MFA and MDA modules: removing MFA or reverting MDA to standard self-attention increases error rates on ECL, Weather, ETTh1, and ETTh2, indicating the synergistic contribution to modeling complexity. Complexity analysis (Fig. 2) reveals that MFADA achieves optimal balance among forecasting accuracy, parameter efficiency, and training time, outperforming Fredformer, MSGNet, and TimesNet. Visualization results for ECL and ETTh2 (Fig. 3, Fig. 4) confirm the ability of MFADA to track ground truth trends, forecast turning points, and outperform baselines in both global and local prediction fidelity. Notably, MFADA performance lags on the Traffic dataset due to its high spatial correlation, highlighting future directions for spatial structure integration.  Conclusions  This paper proposes MFADA, a novel time series forecasting method integrating multi-scale frequency adaptation and dual-path attention mechanisms. MFADA stands out with four key strengths: (1) The MFA module effectively extracts and merges multi-scale frequency-domain features, emphasizing diverse temporal scales through pyramid pooling and channel gating; (2) The MDA module captures multi-scale dependencies along both temporal and feature dimensions, enabling fine-grained dynamic modeling; (3) The architecture maintains computational efficiency using lightweight convolution and pooling operations; (4) Superior results across 8 datasets and various forecasting lengths demonstrate robust generalization, especially for multivariate and long-term forecasting scenarios. The extensive experiments confirm that MFADA advances the state-of-the-art in accurate and efficient time series forecasting, offering promising perspectives for both academic research and practical deployment. Future work will explore spatial correlation integration to further enhance model applicability.
Split-architecture Non-contact Optical Seismocardiography Triggering System for Cardiac Magnetic Resonance Imaging
GAO Qiannan, ZHANG Jiayu, ZHU Yingen, WANG Wenjin, JI Jiansong, JI Xiaoyue
 doi: 10.11999/JEIT251098
[Abstract](191) [FullText HTML](140) [PDF 3764KB](23)
Abstract:
  Objective  Cardiac-cycle synchronization is required in Cardiovascular Magnetic Resonance (CMR) to reduce motion artifacts and preserve quantitative accuracy. At high field strengths, the ElectroCardioGram (ECG) trigger is affected by magnetohydrodynamic effects and scanner-generated ElectroMagnetic Interference (EMI). Electrode placement and lead routing add setup burden. Contact-based mechanical sensors still require skin contact, and optical photoplethysmography introduces long physiological delay. A fully contactless and EMI-robust mechanical surrogate is therefore needed. This study develops a split-architecture, non-contact optical SeismoCardioGraphy (SCG) triggering system for CMR and evaluates its availability, beatwise detection performance, and timing characteristics under practical body-coil coverage.  Methods  The split-architecture system consists of a near-magnet optical acquisition unit and a far-magnet computation-and-triggering unit connected by fiber-optic links to minimize conductive pathways near the scanner (Fig. 2). The acquisition unit uses a defocused industrial camera and laser illumination to record speckle-pattern dynamics on the anterior chest without physical contact (Fig. 3). Dense optical flow is computed in a chest region of interest, and the displacement field is projected onto a principal motion direction to form a one-dimensional SCG sequence (Fig. 4). Drift suppression, smoothing, and short-window normalization are applied. Trigger timing is refined with a valley-constrained gradient search within a physiologically bounded window to reduce spurious detections and improve temporal consistency (Fig. 4). A benchmark dataset is acquired from 20 healthy volunteers under three coil configurations: no body coil, an ultra-flexible body coil, and a rigid body coil (Fig. 5, Fig. 6, Table 3). ECG serves as the reference, and CamPPG and radar are recorded for comparison. Beatwise precision, recall, and F1 score are computed against ECG R peaks, and availability is reported as the fraction of usable segments under unified quality criteria (Table 4). Backward and forward physiological delays and delay variability are summarized across subjects and coil conditions (Table 5, Table 6). Key windowing and refractory parameters are tested for sensitivity (Table 2). Runtime is measured to assess real-time feasibility, including the cost of dense optical flow and the overhead of one-dimensional processing and triggering (Table 7).  Results and Discussions  Under no-coil and ultra-flexible-coil conditions, the optical SCG trigger achieves high availability (about 97.6%) and strong beatwise performance. F1 reaches about 0.91 under the ultra-flexible coil (Table 4, Table 5). The backward physiological delay remains on the order of several tens of milliseconds, and delay jitter is generally within a few tens of milliseconds (Table 5, Table 6). Under the rigid body coil, performance decreases markedly. Mechanical decoupling between the coil surface and the chest wall weakens and distorts the vibration signature, which blurs AO-related features and increases false triggers (Fig. 1). This effect appears as lower precision and F1 and as a shift toward longer and more variable delays compared with the other conditions (Table 4, Table 6). Compared with CamPPG, which reflects peripheral blood-volume dynamics and typically lags further behind the ECG R peak, the optical SCG surrogate provides a more proximal mechanical marker with reduced trigger phase lag (Fig. 8, Table 5). EMI robustness is supported by representative segments: ECG waveforms show visible distortion under interference, whereas the optical SCG surrogate remains interpretable because acquisition and transmission near the scanner are fully optical and electrically isolated (Fig. 8). Parameter analysis supports a moderate processing window and a 0.5 s minimum interbeat interval as a stable choice across subjects (Table 2). Runtime analysis shows that dense optical flow dominates computational cost, whereas one-dimensional processing and triggering add little overhead. Throughput exceeds the acquisition frame rate, supporting real-time triggering (Table 7).  Conclusions  A split-architecture, non-contact optical SCG triggering system is developed and validated under three representative body-coil configurations. Fiber-optic separation between near-magnet acquisition and far-magnet processing improves EMI robustness while maintaining real-time trigger output. High availability, strong beatwise performance, and short physiological delay are demonstrated under no-coil and ultra-flexible-coil conditions (Table 4, Table 5). Rigid-coil coverage exposes a clear limitation caused by reduced mechanical coupling, which motivates further optimization for mechanically decoupled or heavily occluded scenarios (Fig. 1, Table 6).
Identification of Novel Protein Drug Targets for Respiratory Diseases by Integrating Human Plasma Proteome with Genome
MA Xinqian, NI Wentao
 doi: 10.11999/JEIT250796
[Abstract](185) [FullText HTML](81) [PDF 2981KB](6)
Abstract:
  Objective  Respiratory diseases are a major cause of global morbidity and mortality and place a heavy socioeconomic burden on healthcare systems. Epidemiological data indicate that Chronic Obstructive Pulmonary Disease (COPD), pneumonia, asthma, lung cancer, and tuberculosis are the five most significant pulmonary diseases worldwide. The COronaVIrus Disease 2019 (COVID-19) pandemic has introduced additional challenges for respiratory health and emphasizes the need for new diagnostic and therapeutic strategies. Integrating proteomics with Genome-Wide Association Studies (GWAS) provides a framework for connecting genetic variation to clinical phenotypes. Genetic variants associated with plasma protein levels, known as protein Quantitative Trait Loci (pQTLs), link the genome to complex respiratory phenotypes. This study evaluates the causal effects of druggable proteins on major respiratory diseases through proteome-wide Mendelian Randomization (MR) and colocalization analyses. The aim is to identify causal associations that can guide biomarker development and drug discovery, and to prioritize candidates for therapeutic repurposing.  Methods  Summary-level data for circulating protein levels are obtained from two large pQTL studies: the deCODE study and the UK Biobank Pharma Proteomics Project (UKB-PPP). Strictly defined cis-pQTLs are selected to ensure robust genetic instruments, yielding 2,918 proteins for downstream analyses. For disease outcomes, large GWAS summary statistics for 27 respiratory phenotypes are collected from previously published studies and international consortia. A two-sample MR design is applied to estimate the effects of plasma proteins on these phenotypes. To reduce confounding driven by Linkage Disequilibrium (LD), Bayesian colocalization analysis is used to assess whether genetic signals for protein levels and respiratory outcomes share a causal variant. The Posterior Probability of hypothesis 4 (PP4) serves as the primary metric, and PP4 > 0.8 is considered strong evidence of shared causality. Summary-data-based Mendelian Randomization (SMR) and the HEterogeneity In Dependent Instruments (HEIDI) test are used to validate the causal associations. Bidirectional MR and the Steiger test are applied to evaluate potential reverse causality. Protein-Protein Interaction (PPI) networks are generated through the STRING database to visualize functional connectivity and biological pathways associated with the causal proteins.  Results and Discussions  The causal effects of 2 918 plasma proteins on 27 respiratory phenotypes are evaluated (Fig. 1). A total of 694 protein–trait associations meet the Bonferroni-corrected threshold (P<1.7×10–5) when cis-instrumental variables are used (Fig. 2). The MR-Egger intercept test identifies 94 protein–disease associations with evidence of directional pleiotropy, which are excluded. Colocalization analysis indicates that 29 protein–phenotype associations show high-confidence evidence of a shared causal variant (PP4>0.8), and 39 show medium-level evidence (0.5<PP4<0.8). SMR validation confirms 26 associations (P<1.72×10–3), and 21 pass the HEIDI test (P>0.05). The findings provide insights into several respiratory diseases. For COPD, five proteins—NRX3A, NRX3B, ERK-1, COMMD1, and PRSS27—are identified as causal. The association between NRXN3 and COPD suggests a genetic connection between nicotine-addiction pathways and chronic lung decline. For asthma, TEF, CASP8, and IL7R show causal evidence, and the robust association between IL7R and asthma suggests that modulation of T-cell homeostasis may provide a therapeutic opportunity. The FUT3_FUT5 complex is uniquely associated with Idiopathic Pulmonary Fibrosis (IPF). CSF3 and LTBP2 are significantly associated with severe COVID-19. For lung cancer, subtype-specific causal proteins are identified, including BTN2A1 for squamous cell lung cancer, BTN1A1 for small cell lung carcinoma, and EHBP1 for lung adenocarcinoma. These findings provide a basis for the development of subtype-specific precision therapies.  Conclusions  This study identifies 29 plasma proteins with high-confidence causal associations across major respiratory diseases. Using MR and colocalization, a comprehensive map of molecular drivers of respiratory conditions is generated. These findings may support precision medicine strategies. However, the findings are limited by the focus on European populations and potential heterogeneity arising from different proteomic platforms. The associations are based on computational analysis, and further validation in independent cohorts and animal models is needed. Additional experimental studies and clinical trials are required to clarify the pathogenic roles and biological mechanisms of the identified proteins to support therapeutic innovation in respiratory medicine.
Spatio-Temporal Constrained Refined Nearest Neighbor Fingerprinting Localization
WANG Yifan, SUN Shunyuan, QIN Ningning
 doi: 10.11999/JEIT250777
[Abstract](151) [FullText HTML](96) [PDF 3219KB](21)
Abstract:
  Objective  Indoor fingerprint-based localization faces three key challenges. First, Dimensionality Reduction (DR), used to reduce storage and computational costs, often disrupts the geometric correlation between signal features and physical space, which reduces mapping accuracy. Second, signal features present temporal variability caused by human movement or environmental changes. During online mapping, this variability introduces bias and distorts similarity between target and reference points in the low-dimensional space. Third, pseudo-neighbor interference persists because environmental noise or imperfect similarity metrics lead to inaccurate neighbor selection and skew position estimates. To address these issues, this study proposes a Spatio-Temporal Constrained Refined Nearest Neighbor (STC-RNL) fingerprinting localization algorithm designed to provide robust, high-accuracy localization under complex interference conditions.  Methods  In the offline phase, a robust DR framework is constructed by integrating two constraints into a MultiDimensional Scaling (MDS) model. A spatial correlation constraint uses physical distances between reference points and assigns stronger associations to proximate locations to preserve alignment between low-dimensional features and the real layout. A temporal consistency constraint clusters multiple temporal signal samples from the same location into a compact region to suppress feature drift. These constraints, combined with the MDS structure-preserving loss, form the optimization objective, from which low-dimensional features and an explicit mapping matrix are obtained. In the online phase, a progressive refinement mechanism is applied. An initial candidate set is selected using a Euclidean distance threshold. A hybrid similarity metric is then constructed by enhancing shared-neighbor similarity with a Sigmoid-based strategy, which truncates low and smooths high similarities, and fusing it with Euclidean distance to improve discrimination of true neighbors. Subsequently, an iterative Z-score-based filtering procedure removes reference points that deviate from local group characteristics in feature and coordinate domains. The final position is estimated through a similarity-weighted average over the refined neighbor set, assigning higher weights to more reliable references.  Results and Discussions  The performance of STC-RNL is assessed on a private ITEC dataset and a public SYL dataset. The spatio-temporal constraints enhance the robustness of the mapping matrix under noisy conditions (Table 2). Compared with baseline DR methods, the proposed module reduces mean localization error by at least 6.30% in high-noise scenarios (Fig. 9). In the localization stage, the refined neighbor selection reduces pseudo-neighbor interference. On the ITEC dataset, STC-RNL achieves an average error of 0.959 m, improving performance by 9.61% to 33.68% compared with SSA-XGBoost and SPSO (Table 1). End-to-end comparisons show that STC-RNL reduces the average error by at least 12.42% on ITEC and by at least 7.08% on SYL (Table 2), and its CDF curves demonstrate faster convergence and higher precision, especially within the 1.2 m range (Fig. 10). These results indicate that the algorithm maintains high stability and accuracy with a lower maximum error across datasets.  Conclusions  The STC-RNL algorithm addresses structural distortion and mapping bias found in traditional DR-based localization. By jointly optimizing offline feature embedding with spatio-temporal constraints and online neighbor selection with progressive refinement, the coupling between signal features and physical coordinates is strengthened. The main innovation lies in a synergistic framework that ensures only high-confidence neighbors contribute to the final estimate, improving accuracy and robustness in dynamic environments. Experiments show that the model reduces average localization error by 12.42%\begin{document}$ \sim $\end{document}32.80% on ITEC and by 7.08%\begin{document}$ \sim $\end{document}13.67% on SYL relative to baseline algorithms, while achieving faster error convergence. Future research may incorporate nonlinear manifold modeling to further improve performance in heterogeneous access point environments.
Construction of Maximum Distance Separable Codes and Near Maximum Distance Separable Codes Based on Cyclic Subgroup of $ \mathbb{F}_{{q}^{2}}^{*} $
DU Xiaoni, XUE Jing, QIAO Xingbin, ZHAO Ziwei
 doi: 10.11999/JEIT251204
[Abstract](159) [FullText HTML](86) [PDF 929KB](24)
Abstract:
  Objective  The demand for higher performance and efficiency in error-correcting codes has increased with the rapid development of modern communication technologies. These codes detect and correct transmission errors. Because of their algebraic structure, straightforward encoding and decoding, and ease of implementation, linear codes are widely used in communication systems. Their parameters follow classical bounds such as the Singleton bound: for a linear code with length \begin{document}$ n $\end{document} and dimension \begin{document}$ k $\end{document}, the minimum distance \begin{document}$ d $\end{document} satisfies \begin{document}$ d\leq n-k+1 $\end{document}. When \begin{document}$ d=n-k+1 $\end{document}, the code is a Maximum Distance Separable (MDS) code. MDS codes are applied in distributed storage systems and random error channels. If \begin{document}$ d=n-k $\end{document}, the code is Almost MDS (AMDS); when both a code and its dual are AMDS, the code is Near MDS (NMDS). NMDS codes have geometric properties that are useful in cryptography and combinatorics. Extensive research has focused on constructing structurally simple, high-performance MDS and NMDS codes. This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over the finite field \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} of even characteristic using the cyclic subgroup \begin{document}$ {U}_{q+1} $\end{document}. Several families of optimal Locally Repairable Codes (LRCs) are also obtained. LRCs support efficient failure recovery by accessing a small set of local nodes, which reduces repair overhead and improves system availability in distributed and cloud-storage settings.  Methods  In 2021, Wang et al. constructed NMDS codes of dimension 3 using elliptic curves over \begin{document}$ {\mathbb{F}}_{q} $\end{document}. In 2023, Heng et al. obtained several classes of dimension-4 NMDS codes by appending appropriate column vectors to a base generator matrix. In 2024, Ding et al. presented four classes of dimension-4 NMDS codes, determined the locality of their dual codes, and constructed four classes of distance-optimal and dimension-optimal LRCs. Building on these works, this paper uses the unit circle \begin{document}$ {U}_{q+1} $\end{document} in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} and elliptic curves to construct generator matrices. By augmenting these matrices with two additional column vectors, several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} are obtained. The locality of the constructed NMDS codes is also determined, yielding several classes of optimal LRCs.  Results and Discussions  In 2023, Heng et al. constructed generator matrices with second-row entries in \begin{document}$ \mathbb{F}_{q}^{*} $\end{document} and with the remaining entries given by nonconsecutive powers of the second-row elements. In 2025, Yin et al. extended this approach by constructing generator matrices using elements of \begin{document}$ {U}_{q+1} $\end{document} and obtained infinite families of MDS and NMDS codes. Following this direction, the present study expands these matrices by appending two column vectors whose elements lie in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}. The resulting matrices generate several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document}. Several classes of NMDS codes with identical parameters but different weight distributions are also obtained. Computing the minimum locality of the constructed NMDS codes shows that some are optimal LRCs satisfying the Singleton-like, Cadambe–Mazumdar, Plotkin-like, and Griesmer-like bounds. All constructed MDS codes are Griesmer codes, and the NMDS codes are near Griesmer. These results show that the proposed constructions are more general and unified than earlier approaches.  Conclusions  This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} using elements of the unit circle \begin{document}$ {U}_{q+1} $\end{document} and oval polynomials, and by appending two additional column vectors with entries in \begin{document}$ {\mathbb{F}}_{q} $\end{document}. The minimum locality of the constructed NMDS codes is analyzed, and some of these codes are shown to be optimal LRCs. The framework generalizes earlier constructions, and the resulting codes are optimal or near-optimal with respect to the Griesmer bound.
FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization
WANG Yanlin, GAO Lijiang, YANG Haigang
 doi: 10.11999/JEIT260108
[Abstract](37) [FullText HTML](20) [PDF 2094KB](10)
Abstract:
6-input look-up tables (LUTs) are frequently used in commercial Field-Programmable Gate Arrays (FPGAs) to build programmable logic blocks, while related experiments reveal that their average application in circuits is less than 30%, resulting in a significant waste of programmable resources. In this paper, the 6-input LUTs are fractured based on fracturable factors and recombined with different granularities to construct several new Hybrid Basic Logic Elements (HBLE). Based on HBLE, several novel Hybrid Programmable Logic Block (HPLB) architectures are proposed. Then the Programmable Logic Blocks (PLB) of Xilinx is replaced by several innovative HPLB architectures. Concurrently, a statistical evaluation algorithm for the mapped netlist is proposed. Finally, several HPLB architectures are experimentally verified and evaluated as appropriate. Experimental evaluations of the three enhanced architectures show that the HPLBs achieve an average area reduction of more than 30% when compared to Xilinx’s PLBs without adding more input ports. The hybrid HPLB architectures constructed with a fracturable factor N=3 produces the best optimization results when taking into account both HPLB utilization and area optimization. Based on the MCNC and VTR benchmarks, resource consumption increased by an average of 8.27% and 27.64%, respectively, thereby improving FPGA logic efficiency.  Objective  Currently, modern commercial FPGA architectures employ 6-LUTs as the fundamental building blocks for Basic Logic Elements (BLEs). Only about 30% of the Logic Elements (LEs) in the circuit are ultimately translated to 6-LUTs when mapping 6-LUT BLEs, according to experimental results. Nevertheless, more than half of the logic resources are wasted when 6-LUTs implement functions with inputs smaller than 6. Programmable resources will unavoidably be significantly wasted as a result. A circuit design mapped to 100 4-LUTs can be mapped to 78 6-LUTs during 6-LUT mapping studies, according to experimental data, with the {6,5,4,3,2}-LUT function distribution being {23,32,17,9,13}. The findings indicate that only around 25% of the 6-LUTs are ultimately mapped to 6-input functions, with the remaining 6-LUTs being underutilized. This illustrates even more how inefficient technical mapping is for LUTs with large input K.Methods The fracturable factor N, which is the number of sub-LUTs that may be obtained from a single LUT, characterizes the fracturable and reconfigurable nature of LUT architectures in FPGAs. Motivated by this, we decompose a 6-LUT into several granularities according to the fracturable factor in order to address the previously described problem of low resource utilization. Three novel hybrid-granularity divisible logic (HBLE) structures are created by connecting and reconfiguring the resultant sub-LUTs with additional input ports and multiplexer modules. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. One undivided 6-LUT and one divisible 6-LUT, divided into two 5-LUTs with a divisibility factor N=2, make up the HBLE2 structure. One undivided 6-LUT and one divisible 6-LUT, divided into one 5-LUT and two 4-LUTs, with a divisibility factor N=3, are included in the HBLE3 structure. One undivided 6-LUT and one divisible 6-LUT, which divides into four 4-LUTs with a divisibility factor N=4, make up the HBLE4 structure. Adder units are supported by all three HBLE structures, allowing for both latched and direct combinational logic output. Additionally, they allow direct latched output by avoiding combinational logic. A Hybrid Programmable Logic Block (HPLB) is a novel structure created by merging several HBLEs. The MCNC circuit set and the VTR circuit set, the two most well-known academic circuit benchmarks (BMs), are chosen for experimental assessment. A Xilinx Virtex-7 FPGA is used to map each circuit set. The mapped netlist is then used to tally the kinds and numbers of LUTs that were utilized. The minimum number of CLBs needed is found once the data has been arranged using the corresponding greedy algorithms. Since each Xilinx CLB has eight 6-LUTs, the greedy approach uses # Total LUT Number / 8 to determine the smallest number of CLBs needed following BM mapping. In order to guarantee similar conditions, each structure also needs to be sorted using the greedy algorithm after Xilinx’s CLB structure is replaced with the HPLB structure suggested in this research. This results in the bare minimum of HPLBs needed. It is not possible to use every LUT in the mapped CLBs during actual packing owing to routing constraints. As a result, the smallest value that may be achieved in a theoretical optimization scenario is represented by the optimized result that is acquired following greedy algorithm restructuring.  Results and Discussions  The average number of HPLBs needed for both HPLB2 and HPLB3 structures drops by about 8% when CLB structures are swapped out for HPLBs in order to map the MCNC circuit set. However, the number of HPLBs needed increases by more than 30% on average as a result of the HPLB4 structure. The needed count is smaller when HPLBs are used in place of CLBs for mapping the VTR circuit set. On average, the HPLB2 and HPLB4 counts drop by less than 10%, whereas the HPLB3 count drops by around 30%. This enables SRAM scheduling and complete input pin use. On the other hand, because of resource waste, the uniform CLB structure results in higher CLB requirements when implementing functions with a tiny LUT input K. The HPLB4 structure performs worse than the HPLB3 structure, according to post-mapping HPLB counts. Both the MCNC and VTR circuit sets achieve average area reduction ratios over 30%, according to analysis of post-mapping area optimization. All three HPLB structures attained area optimization ratios of about 31% on the MCNC test set. Different optimization effects were seen in the VTR test circuit set: HPLB2 produced an average area reduction of 30.63%, whereas HPLB4 produced an average decrease of 51.21%. The HPLB2 structure produced a 45.22% area reduction, even though its optimization effect was marginally less than that of HPLB4. A thorough examination of the area optimization results showed that a higher divisibility factor N produces more noticeable benefits for integrating small-scale LUTs in circuits, resulting in higher area reduction ratios from the enhanced architectures.  Conclusions  In order to solve the issue of low resource utilization in 6-LUTs, this research proposes three split granularity-based HPLB enhancement architectures. In addition to establishing an assessment procedure and matching algorithms for the enhanced structures, these HPLBs take the place of Xilinx’s CLB structure in order to examine the new structure’s benefits in resource utilization. Based on the proportion differences of different LUTs in the post-mapping netlist, evaluation experiments using the MCNC and VTR circuit test suites show that, although HPLB4 achieves significant area optimization, it requires additional HPLBs, resulting in increased interconnect area. While both HPLB2 and HPLB3 structures obtain average area optimizations over 30%, HPLB3 produces a significantly greater HPLB count and area optimization than HPLB2 as the test circuit scale grows. Thus, after replacing the CLB structure, the HPLB3 structure provides a more balanced optimization impact, greatly improving the utilization of programmable resources when taking into account the combined aspects of HPLB usage count and area optimization.
Efficient and Verifiable Ciphertext Retrieval Scheme Based on Trusted Execution Environment
WU Axin, FENG Dengguo, ZHANG Min, CHI Jialin, YI Yuling
 doi: 10.11999/JEIT251358
[Abstract](29) [FullText HTML](8) [PDF 1319KB](3)
Abstract:
The ciphertext retrieval mechanism enables retrieval functionality over encrypted data. Symmetric Searchable Encryption (SSE) is a critical branch of ciphertext retrieval. However, due to considerations such as saving computing power, cloud servers may return incorrect or incomplete results. Moreover, attackers can also exploit these leaked information from search and access patterns to reconstruct the keyword details. Therefore, it is necessary and meaningful to protect the privacy of search and access patterns while achieving result verifiability. Nevertheless, existing verifiable SSE schemes that support search and access pattern privacy typically rely on keyword traversal mechanisms and their verification mechanisms are inefficient, which impose high computational and communication overheads on users. To address the above performance bottlenecks, this paper introduces an efficient and verifiable ciphertext retrieval scheme based on Trusted Execution Environment (TEE). To improve the efficiency of ciphertext retrieval, this scheme employs the collaborative implementation of hardware-level security isolation and oblivious data rearrangement to achieve keyword trapdoor size independent of the size of the keyword dictionary. Meanwhile, the correctness of the returned results is verified by embedding random numbers and blinding polynomial constant terms. Thanks to these designs, the scheme achieves significant efficiency improvements. Specifically, firstly, this scheme ensures that the size of keyword trapdoors depends solely on the number of query keywords, not the global dictionary size, effectively minimizing communication and computational costs. Secondly, this scheme requires storing only two random numbers to enable verifiability, substantially minimizing local storage overhead for users. Thirdly, the adoption of techniques, such as enabling data users to retrieve results via single-server and single-round interaction and leveraging symmetric homomorphic encryption, further enhances operational efficiency. Additionally, confidential computing within TEE weakens the security assumptions and trust level towards TEE. After formally proving the security of the proposed scheme using simulation-based methods, this paper has conducted a comprehensive performance evaluation. The evaluation results confirm that this scheme is significantly more efficient than other schemes with the same functionalities.
Physical Layer Security Game for Large Language Model-Based Inference in the Maritime Network
CHEN Haoyu, XIAO Liang, XU Xiaoyu, LI Jieling, WANG Zicheng, LIU Huanhuan, CHEN Hongyi
 doi: 10.11999/JEIT251269
[Abstract](28) [FullText HTML](8) [PDF 1448KB](4)
Abstract:
  Objective  The physical-layer security game reveals the interaction between user equipment (UE) and attackers, and provides performance bounds of anti-jamming transmission and physical-layer authentication schemes based on the equilibriums. However, existing game models overlook smart attackers that send jamming or spoofing signals, fail to account for the maritime wireless channels affected by evaporation ducts and sea wave fluctuations, and are difficult to evaluate the performance of large language models (LLMs)-based inference, such as the vessel traffic monitoring.  Methods  The anti-jamming maritime communication game for LLM inference is formulated, where the jammer first selects the jamming power and channel to reduce the signal-to-interference-plus-noise ratio at the server with less jamming cost, and the UEs then choose transmit power, channel, LLM sparsity ratio and control center to send sensing data (e.g., images, temperature, and humidity) to enhance the inference accuracy with less latency. The physical-layer authentication game for maritime wireless networks with LLM inference is further formulated. The spoofing attacker first selects the number of spoofing packets to degrade authentication accuracy with less cost. The control center then selects the fast authentication mode based on channel state or the safe authentication mode based on the received signal strength and the arrival interval of the packet from multiple ambient transmitters, and the test threshold to increase accuracy with less cost.  Results and Discussions  Based on the Stackelberg equilibrium (SE) under the LLM with 7 billion parameters, the performance bounds of the reinforcement learning (RL)-based anti-jamming inference scheme are provided to reveal the impact of evaporation duct height, wave height, maximum sparsity ratio of LLM and the quantization level on inference accuracy and latency. In addition, the performance bounds of the RL-based maritime spoofing detection scheme are provided based on the SE of the physical-layer authentication game to show the impact of the maximum number of spoofing packets on the authentication accuracy. Simulations are carried out based on the five UEs with the antenna height of 3 meters offloading the image, temperature and humidity using the transmit power up to 200 mW at 5.8 GHz with a bandwidth of 20 MHz to five control centers with antenna heights of 6 m. The jammer applies Deep Q-Network to choose the jamming power with a maximum transmit power of 200 mW for each 5.8 GHz channel, and the spoofing attacker applies the Deep Q-Network to select the number of spoofing packets up to 100. The results show that the inference accuracy and latency of the RL-based anti-jamming maritime communication scheme for LLM inference converge to the performance bounds with gaps of less than 0.6% after 2500 time slots. In addition, the RL-based authentication scheme converges after 1000 time slots with the gap of less than 1.6%.  Conclusions  In this paper, we have formulated the maritime physical-layer security game for LLM inference, addressing scenarios such as anti-jamming sensing data transmission and spoofing detection, aiming at investigating how UEs determine transmit power and channel, and how the control center selects authentication modes and test thresholds to enhance the physical-layer security mechanisms. The attacker chooses attack modes and parameters to degrade the inference accuracy, increase latency, and even cause denial-of-service. Based on the SE and the conditions, the performance bounds of the inference accuracy increase with the maximum transmit power and linearly decrease with the sparsity ratio. Furthermore, the impact of the maximum number of spoofing packets on the inference accuracy is provided. Simulation results show that the RL-based maritime physical-layer security schemes converge to the performance bounds, thereby validating the accuracy and effectiveness of the game model.
2026, 48(2).  
[Abstract](38) [FullText HTML](17) [PDF 6052KB](6)
Abstract:
2026, 48(2): 1-4.  
[Abstract](31) [FullText HTML](16) [PDF 281KB](4)
Abstract:
Special Topic on Converged Cloud and Network Environment
Recent Advances of Programmable Schedulers
ZHAO Yazhu, GUO Zehua, DOU Songshi, FU Xiaoyang
2026, 48(2): 459-470.   doi: 10.11999/JEIT250657
[Abstract](305) [FullText HTML](226) [PDF 1259KB](50)
Abstract:
  Objective  In recent years, diversified user demands, dynamic application scenarios, and massive data transmissions have imposed increasingly stringent requirements on modern networks. Network schedulers play a critical role in ensuring efficient and reliable data delivery, enhancing overall performance and stability, and directly shaping user-perceived service quality. Traditional scheduling algorithms, however, rely largely on fixed hardware, with scheduling logic hardwired during chip design. These designs are inflexible, provide only coarse and static scheduling granularity, and offer limited capability to represent complex policies. Therefore, they hinder rapid deployment, increase upgrade costs, and fail to meet the evolving requirements of heterogeneous and large-scale network environments. Programmable schedulers, in contrast, leverage flexible hardware architectures to support diverse strategies without hardware replacement. Scheduling granularity can be dynamically adjusted at the flow, queue, or packet level to meet varied application requirements with precision. Furthermore, they enable the deployment of customized logic through data plane programming languages, allowing rapid iteration and online updates. These capabilities significantly reduce maintenance costs while improving adaptability. The combination of high flexibility, cost-effectiveness, and engineering practicality positions programmable schedulers as a superior alternative to traditional designs. Therefore, the design and optimization of high-performance programmable schedulers have become a central focus of current research, particularly for data center networks and industrial Internet applications, where efficient, flexible, and controllable traffic scheduling is essential.  Methods  The primary objective of current research is to design universal, high-performance programmable schedulers. Achieving simultaneous improvements across multiple performance metrics, however, remains a major challenge. Hardware-based schedulers deliver high performance and stability but incur substantial costs and typically support only a limited range of scheduling algorithms, restricting their applicability in large-scale and heterogeneous network environments. In contrast, software-based schedulers provide flexibility in expressing diverse algorithms but suffer from inherent performance constraints. To integrate the high performance of hardware with the flexibility of software, recent designs of programmable schedulers commonly adopt First-In First-Out (FIFO) or Push-In First-Out (PIFO) queue architectures. These approaches emphasize two key performance metrics: scheduling accuracy and programmability. Scheduling accuracy is critical, as modern applications such as real-time communications, online gaming, telemedicine, and autonomous driving demand strict guarantees on packet timing and ordering. Even minor errors may result in increased latency, reduced throughput, or connection interruptions, compromising user experience and service reliability. Programmability, by contrast, enables network devices to adapt to diverse scenarios, supporting rapid deployment of new algorithms and flexible responses to application-specific requirements. Improvements in both accuracy and programmability are therefore essential for developing efficient, reliable, and adaptable network systems, forming the basis for future high-performance deployments.  Results and Discussions  The overall packet scheduling process is illustrated in (Fig. 1), where scheduling is composed of scheduling algorithms and schedulers. At the ingress or egress pipelines of end hosts or network devices, scheduling algorithms assign a Rank value to each packet, determining the transmission order based on relative differences in Rank. Upon arrival at the traffic manager, the scheduler sorts and forwards packets according to their Rank values. Through the joint operation of algorithms and schedulers, packet scheduling is executed while meeting quality-of-service requirements. A comparative analysis of the fundamental principles of FIFO and PIFO scheduling mechanisms (Fig. 2) highlights their differences in queue ordering and disorder control. At present, most studies on programmable schedulers build upon these two foundational architectures (Fig. 3), with extensions and optimizations primarily aimed at improving scheduling accuracy and programmability. Specific strategies include admission control, refinement of scheduling algorithms, egress control, and advancements in data structures and queue mechanisms. On this basis, the current research progress on programmable schedulers is reviewed and systematically analyzed. Existing studies are compared along three key dimensions: structural characteristics, expressive capability, and approximation accuracy (Table 1).  Conclusions  Programmable schedulers, as a key technology for next-generation networks, enable flexible traffic management and open new possibilities for efficient packet scheduling. This review has summarized recent progress in the design of programmable schedulers across diverse application scenarios. The background and significance of programmable schedulers within the broader packet scheduling process were first clarified. An analysis of domestic and international literature shows that most current studies focus on FIFO-based and PIFO-based architectures to improve scheduling accuracy and programmability. The design approaches of these two architectures were examined, the main technical methods for enhancing performance were summarized, and their structural characteristics, expressive capabilities, and approximation accuracy were compared, highlighting respective advantages and limitations. Potential improvements in existing research were also identified, and future development directions were discussed. Nevertheless, the design of a universal, high-performance programmable scheduler remains a critical challenge. Achieving optimal performance across multiple metrics while ensuring high-quality network services will require continued joint efforts from both academia and industry.
An Overview on Integrated Sensing and Communication for Low altitude economy
ZHU Zhengyu, WEN Xinping, LI Xingwang, WEI Zhiqing, ZHANG Peichang, LIU Fan, FENG Zhiyong
2026, 48(2): 471-486.   doi: 10.11999/JEIT250747
[Abstract](510) [FullText HTML](270) [PDF 2620KB](113)
Abstract:
The Low-altitude Internet of Things (IoT) develops rapidly, and the Low Altitude Economy is treated as a national strategic emerging industry. Integrated Sensing and Communication (ISAC) for the Low Altitude Economy is expected to support more complex tasks in complex environments and provides a foundation for improved security, flexibility, and multi-application scenarios for drones. This paper presents an overview of ISAC for the Low Altitude Economy. The theoretical foundations of ISAC and the Low Altitude Economy are summarized, and the advantages of applying ISAC to the Low Altitude Economy are discussed. Potential applications of key 6G technologies, such as covert communication and Millimeter-Wave (mm-wave) systems in ISAC for the Low Altitude Economy, are examined. The key technical challenges of ISAC for the Low Altitude Economy in future development are also summarized.  Significance   The integration of UAVs with ISAC technology is expected to provide considerable advantages in future development. When ISAC is applied, the overall system payload can be reduced, which improves UAV maneuverability and operational freedom. This integration offers technical support for versatile UAV applications. With ISAC, low-altitude network systems can conduct complex tasks in challenging environments. UAV platforms equipped with a single function do not achieve the combined improvement in communication and sensing that ISAC enables. ISAC-equipped drones are therefore expected to be used more widely in aerial photography, agriculture, surveying, remote sensing, and telecommunications. This development will advance related theoretical and technical frameworks and broaden the application scope of ISAC.  Progress  ISAC networks for the low-altitude economy offer efficient and flexible solutions for military reconnaissance, emergency disaster relief, and smart city management. The open aerial environment and dynamic deployment requirements create several challenges. Limited stealth increases exposure to hostile interception, and complex terrains introduce signal obstruction. High bandwidth and low latency are also required. Academic and industrial communities have investigated technologies such as covert communication, intelligent reflecting surfaces, and mm-wave communication to enhance the reliability and intelligence of ISAC in low-altitude operational scenarios.  Conclusions  This paper presents an overview of current applications, critical technologies, and ongoing challenges associated with ISAC in low-altitude environments. It examines the integration of emerging 6G technologies, including covert communication, Reconfigurable Intelligent Surfaces (RIS), and mm-wave communication within ISAC frameworks. Given the dynamic and complex characteristics of low-altitude operations, recent advances in UAV swarm power control algorithms and covert trajectory optimization based on deep reinforcement learning are summarized. Key unresolved challenges are also identified, such as spatiotemporal synchronization, multi-UAV resource allocation, and privacy preservation, which provide reference directions for future research.  Prospects   ISAC technology provides precise and reliable support for drone logistics, urban air mobility, and large-scale environmental monitoring in the low-altitude economy. Large-scale deployment of ISAC systems in complex and dynamic low-altitude environments remains challenging. Major obstacles include limited coordination and resource allocation within UAV swarms, spatiotemporal synchronization across heterogeneous devices, competing requirements between sensing and communication functions, and rising concerns regarding privacy and security in open airspace. These issues restrict the high-quality development of the low-altitude economy.
Vision Enabled Multimodal Integrated Sensing and Communications: Key Technologies and Prototype Validation
ZHAO Chuanbin, XU Weihua, LIN bo, ZHANG Tengyu, FENG Yuan, GAO Feifei
2026, 48(2): 487-498.   doi: 10.11999/JEIT250685
[Abstract](343) [FullText HTML](207) [PDF 9618KB](106)
Abstract:
  Objective  Integrated Sensing And Communications (ISAC) is regarded as a key enabling technology for Sixth-Generation mobile communications (6G), as it simultaneously senses and monitors information in the physical world while maintaining communication with users. The technology supports emerging scenarios such as low-altitude economy, digital twin systems, and vehicle networking. Current ISAC research primarily concentrates on wireless devices that include base stations and terminals. Visual sensing, which provides strong visibility and detailed environmental information, has long been a major research direction in computer science. This study proposes the integration of visual sensing with wireless-device sensing to construct a multimodal ISAC system. In this system, visual sensing captures environmental information to assist wireless communications, and wireless signals help overcome limitations inherent to visual sensing.  Methods  The study first explores the correlation mechanism between environmental vision and wireless communications. Key algorithms for visual-sensing-assisted wireless communication are then discussed, including beam prediction, occlusion prediction, and resource scheduling and allocation methods for multiple base stations and users. These schemes demonstrate that visual sensing, used as prior information, enhances the communication performance of the multimodal ISAC system. The sensing gains provided by wireless devices combined with visual sensors are subsequently explored. A static-environment reconstruction scheme and a dynamic-target sensing scheme based on wireless-visual fusion are proposed to obtain global information about the physical world. In addition, a “vision-communication” simulation and measurement dataset is constructed, establishing a complete theoretical and technical framework for multimodal ISAC.  Results and Discussions  For visual-sensing-assisted wireless communications, the hardware prototype system constructed in this study is shown in (Fig. 6) and (Fig. 7), and the corresponding hardware test results are presented in (Table 1). The results show that visual sensing assists millimetre-wave communications in performing beam alignment and beam prediction more effectively, thereby improving system communication performance. For wireless-communication-assisted sensing, the hardware prototype system is shown in (Fig. 8), and the experimental results are shown in (Fig. 9) and (Table 2). The static-environment reconstruction obtained through wireless-visual fusion shows improved robustness and higher accuracy. Depth estimation based on visual and communication fusion also presents strong robustness in rainy and snowy weather, with the RMSE reduced by approximately 50% compared with pure visual algorithms. These experimental results indicate that vision-enabled multimodal ISAC systems present strong potential for practical application.  Conclusions  A multimodal ISAC system that integrates visual sensing with wireless-device sensing is proposed. In this system, visual sensing captures environmental information to assist wireless communications, and wireless signals help overcome the inherent limitations of visual sensing. Key algorithms for visual-sensing-assisted wireless communication are examined, including beam prediction, occlusion prediction, and resource scheduling and allocation for multiple base stations and users. The sensing gains brought by wireless devices combined with visual sensors are also analysed. Static-environment reconstruction and dynamic-target sensing schemes based on wireless-visual fusion are proposed to obtain global information about the physical world. A “vision-communication” simulation and measurement dataset is further constructed, forming a coherent theoretical and technical framework for multimodal ISAC. Experimental results show that vision-enabled multimodal ISAC systems present strong potential for use in 6G networks.
Service Migration Algorithm for Satellite-terrestrial Edge Computing Networks
FENG Yifan, WU Weihong, SUN Gang, WANG Ying, LUO Long, YU Hongfang
2026, 48(2): 499-511.   doi: 10.11999/JEIT250835
[Abstract](194) [FullText HTML](122) [PDF 3753KB](45)
Abstract:
  Objective   In highly dynamic Satellite-Terrestrial Edge Computing Networks (STECN), achieving coordinated optimization between user service latency and system migration cost is a central challenge in service migration algorithm design. Existing approaches often fail to maintain stable performance in such environments. To address this, a Multi-Agent Service Migration Optimization (MASMO) algorithm based on multi-agent deep reinforcement learning is proposed to provide an intelligent and forward-looking solution for dynamic service management in STECN.  Methods   The service migration optimization problem is formulated as a Multi-Agent Markov Decision Process (MAMDP), which offers a framework for sequential decision-making under uncertainty. The environment represents the spatiotemporal characteristics of a Low Earth Orbit (LEO) satellite network, where satellite movement and satellite-user visibility define time-varying service availability. Service latency is expressed as the sum of transmission delay and computation delay. Migration cost is modeled as a function of migration distance between satellite nodes to discourage frequent or long-range migrations. A Trajectory-Aware State Enhancement (TASE) method is proposed to incorporate predictable orbital information of LEO satellites into the agent state representation, improving proactive and stable migration actions. Optimization is performed using the recurrent Multi-Agent Proximal Policy Optimization (rMAPPO) algorithm, which is suitable for cooperative multi-agent tasks. The reward function balances the objectives by penalizing high migration cost and rewarding low service latency.  Results and Discussions  Simulations are conducted in dynamic STECN scenarios to compare MASMO with MAPPO, MADDPG, Greedy, and Random strategies. The results consistently confirm the effectiveness of MASMO. As the number of users increases, MASMO shows slower performance degradation. With 16 users, it reduces average service latency by 2.90%, 6.78%, 11.01%, and 14.63% compared with MAPPO, MADDPG, Greedy, and Random. It also maintains high cost efficiency, lowering migration cost by up to 30.57% at 16 users (Fig. 4). When satellite resources increase, MASMO consistently leverages the added availability to reduce both latency and migration cost, whereas myopic strategies such as Greedy do not exhibit similar improvements. With 10 satellites, MASMO achieves the lowest service latency and outperforms the next-best method by 7.53% (Fig. 5). These findings show that MASMO achieves an effective balance between transmission latency and migration latency through its forward-looking decision policy.  Conclusions   This study addresses the service migration challenge in STECN through the MASMO algorithm, which integrates the TASE method with rMAPPO. The method improves service latency and reduces migration cost at the same time, demonstrating strong performance advantages. The trajectory-enhanced state representation improves foresight and stability of migration behavior in predictable dynamic environments. This study assumes ideal real-time state perception, and future work should evaluate communication delays and partial observability, as well as investigate scalability in larger satellite constellations with heterogeneous user demands.
Lightweight Incremental Deployment for Computing-Network Converged AI Services
WANG Qinding, TAN bin, HUANG Guangping, DUAN Wei, YANG Dong, ZHANG Hongke
2026, 48(2): 512-521.   doi: 10.11999/JEIT250663
[Abstract](432) [FullText HTML](263) [PDF 3230KB](61)
Abstract:
  Objective   The rapid expansion of Artificial Intelligence (AI) computing services has heightened the demand for flexible access and efficient utilization of computing resources. Traditional Domain Name System (DNS) and IP-based scheduling mechanisms are constrained in addressing the stringent requirements of low latency and high concurrency, highlighting the need for integrated computing-network resource management. To address these challenges, this study proposes a lightweight deployment framework that enhances network adaptability and resource scheduling efficiency for AI services.  Methods   The AI-oriented Service IDentifier (AISID) is designed to encode service attributes into four dimensions: Object, Function, Method, and Performance. Service requests are decoupled from physical resource locations, enabling dynamic resource matching. AISID is embedded within IPv6 packets (Fig. 5), consisting of a 64-bit prefix for identification and a 64-bit service-specific suffix (Fig. 4). A lightweight incremental deployment scheme is implemented through hierarchical routing, in which stable wide-area routing is managed by ingress gateways, and fine-grained local scheduling is handled by egress gateways (Fig. 6). Ingress and egress gateways are incrementally deployed under the coordination of an intelligent control system to optimize resource allocation. AISID-based paths are encapsulated at ingress gateways using Segment Routing over IPv6 (SRv6), whereas egress gateways select optimal service nodes according to real-time load data using a weighted least-connections strategy (Fig. 8). AISID lifecycle management includes registration, query, migration, and decommissioning phases (Table 2), with global synchronization maintained by the control system. Resource scheduling is dynamically adjusted according to real-time network topology and node utilization metrics (Fig. 7).  Results and Discussions   Experimental results show marked improvements over traditional DNS/IP architectures. The AISID mechanism reduces service request initiation latency by 61.3% compared to DNS resolution (Fig. 9), as it eliminates the need for round-trip DNS queries. Under 500 concurrent requests, network bandwidth utilization variance decreases by 32.8% (Fig. 10), reflecting the ability of AISID-enabled scheduling to alleviate congestion hotspots. Computing resource variance improves by 12.3% (Fig. 11), demonstrating more balanced workload distribution across service nodes. These improvements arise from AISID’s precise semantic matching in combination with the hierarchical routing strategy, which together enhance resource allocation efficiency while maintaining compatibility with existing IPv6/DNS infrastructure (Fig. 2, Fig. 3). The incremental deployment approach further reduces disruption to legacy networks, confirming the framework’s practicality and viability for real-world deployment.  Conclusions   This study establishes a computing-network convergence framework for AI services based on semantic-driven AISID and lightweight deployment. The key innovations include AISID’s semantic encoding, which enables dynamic resource scheduling and decoupled service access, together with incremental gateway deployment that optimizes routing without requiring major modifications to legacy networks. Experimental validation demonstrates significant improvements in latency reduction, bandwidth efficiency, and balanced resource utilization. Future research will explore AISID’s scalability across heterogeneous domains and its robustness under dynamic network conditions.
Flexible Network Modal Packet Processing Pipeline Construction Mechanism for Cloud-Network Convergence Environment
ZHU Jun, XU Qi, ZHANG Fujun, WANG Yongjie, ZOU Tao, LONG Keping
2026, 48(2): 522-533.   doi: 10.11999/JEIT250806
[Abstract](136) [FullText HTML](62) [PDF 3896KB](13)
Abstract:
  Objective  With the deep integration of information network technologies and vertical application domains, the demand for cloud-network convergence infrastructure becomes increasingly significant, and the boundaries between cloud computing and network technologies are gradually fading. The advancement of cloud-network convergence technologies gives rise to diverse network service requirements, creating new challenges for the flexible processing of multimodal network packets. The device-level network modal packet processing flexible pipeline construction mechanism is essential for realizing an integrated environment that supports multiple network technologies. This mechanism establishes a flexible protocol packet processing pipeline architecture that customizes a sequence of operations such as packet parsing, packet editing, and packet forwarding according to different network modalities and service demands. By enabling dynamic configuration and adjustment of the processing flow, the proposed design enhances network adaptability and meets both functional and performance requirements across heterogeneous transmission scenarios.  Methods  Constructing a device-level flexible pipeline faces two primary challenges: (1) it must flexibly process diverse network modal packet protocols across polymorphic network element devices. This requires coordination of heterogeneous resources to enable rapid identification, accurate parsing, and correct handling of packets in various formats; (2) the pipeline construction must remain flexible, offering a mechanism to dynamically generate and configure pipeline structures that can adjust not only the number of stages but also the specific functions of each stage. To address these challenges, this study proposes a polymorphic network element abstraction model that integrates heterogeneous resources. The model adopts a hyper-converged hardware architecture that combines high-performance switching ASIC chips with more programmable but less computationally powerful FPGA and CPU devices. The coordinated operation of hardware and software ensures unified and flexible support for custom network protocols. Building upon the abstraction model, a protocol packet flexible processing compilation mechanism is designed to construct a configurable pipeline architecture that meets diverse network service transmission requirements. This mechanism adopts a three-stage compilation structure consisting of front-end, mid-end, and back-end processes. In response to adaptation issues between heterogeneous resources and differentiated network modal demands, a flexible pipeline technology based on Intermediate Representation (IR) slicing is further proposed. This technology decomposes and reconstructs the integrated IR of multiple network modalities into several IR subsets according to specific optimization methods, preserving original functionality and semantics. By applying the IR slicing algorithm, the mechanism decomposes and maps the hybrid processing logic of multimodal networks onto heterogeneous hardware resources, including ASICs, FPGAs, and CPUs. This process enables flexible customization of network modal processing pipelines and supports adaptive pipeline construction for different transmission scenarios.  Results and Discussions  To demonstrate the construction effectiveness of the proposed flexible pipeline, a prototype verification system for polymorphic network elements is developed. As shown in Fig. 6, the system is equipped with Centec CTC8180 switch chips, multiple domestic FPGA chips, and domestic multi-core CPU chips. On this polymorphic network element prototype platform, protocol processing pipelines for IPv4, GEO, and MF network modalities are constructed, compiled, and deployed. As illustrated in Fig. 7, packet capture tests verify that different network modalities operate through distinct packet processing pipelines. To further validate the core mechanism of network modal flexible pipeline construction, the IR code size before and after slicing is compared across the three network modalities and allocation strategies described in Section 6.2. The integrated P4 code for the three modalities, after front-end compilation, produces an unsliced intermediate code containing 32,717 lines. During middle-end compilation, slicing is performed according to the modal allocation scheme, generating IR subsets for ASIC, CPU, and FPGA with code sizes of 23,164, 23,282, and 22,772 lines, respectively. The performance of multimodal protocol packet processing is then assessed, focusing on the effects of different traffic allocation strategies on network protocol processing performance. As shown in Fig. 9, the average packet processing delay in Scheme 1 is significantly higher than in the other schemes, reaching 4.237 milliseconds. In contrast, the average forwarding delays in Schemes 2, 3, and 4 decrease to 54.16 microseconds, 32.63 microseconds, and 15.48 microseconds, respectively. These results demonstrate that adjusting the traffic allocation strategy, particularly the distribution of CPU resources for GEO and MF modalities, effectively mitigates processing bottlenecks and markedly improves the efficiency of multimodal network communication.  Conclusions  Experimental evaluations verify the superiority of the proposed flexible pipeline in construction effectiveness and functional capability. The results indicate that the method effectively addresses complex network environments and diverse service demands, demonstrating stable and high performance. Future work focuses on further optimizing the architecture and expanding its applicability to provide more robust and flexible technical support for protocol packet processing in hyper-converged cloud-network environments.
Energy Consumption Optimization of Cooperative NOMA Secure Offload for Mobile Edge Computing
CHEN Jian, MA Tianrui, YANG Long, LÜ Lu, XU Yongjun
2026, 48(2): 534-544.   doi: 10.11999/JEIT250606
[Abstract](119) [FullText HTML](50) [PDF 2531KB](17)
Abstract:
  Objective  Mobile Edge Computing (MEC) is used to strengthen the computational capability and response speed of mobile devices by shifting computing and caching functions to the network edge. Non-Orthogonal Multiple Access (NOMA) further supports high spectral efficiency and large-scale connectivity. Because wireless channels are broadcast, the MEC offload transmission process is exposed to potential eavesdropping. To address this risk, physical-layer security is integrated into a NOMA-MEC system to safeguard secure offloading. Existing studies mainly optimize performance metrics such as energy use, latency, and throughput, or improve security through NOMA-based co-channel interference and cooperative interference. However, the combined effect of performance and security has not been fully examined. To reduce the energy required for secure offloading, a cooperative NOMA secure offload scheme is designed. The distinctive feature of the proposed scheme is that cooperative nodes provide forwarding and computational assistance at the same time. Through joint local computation between users and cooperative nodes, the scheme strengthens security in the offload process while reducing system energy consumption.  Methods  The joint design of computational and communication resource allocation for the nodes is examined by dividing the offloading procedure into two stages: NOMA offloading and cooperative offloading. Offloading strategies for different nodes in each stage are considered, and an optimization problem is formulated to minimize the weighted total system energy consumption under secrecy outage constraints. To handle the coupled multi-variable and non-convex structure, secrecy transmission rate constraints and secrecy outage probability constraints, originally expressed in probabilistic form, are first transformed. The main optimization problem is then separated into two subproblems: slot and task allocation, and power allocation. For the non-convex power allocation subproblem, the non-convex constraints are replaced with bilinear substitutions, and sequential convex approximations are applied. An alternating iterative resource allocation algorithm is ultimately proposed, allowing the load, power, and slot assignment between users and cooperative nodes to be adjusted according to channel conditions so that energy consumption is minimized while security requirements are satisfied.  Results and Discussions  Theoretical analysis and simulation results show that the proposed scheme converges quickly and maintains low computational complexity. Relative to existing NOMA full-offloading schemes, assisted computing schemes, and NOMA cooperative interference schemes, the proposed offloading design reduces system energy consumption and supports a higher load under identical secrecy constraints. The scheme also demonstrates strong robustness, as its performance is less affected by weak channel conditions or increased eavesdropping capability.  Conclusions  The study shows that system energy consumption and security constraints are closely coupled. In the MECg offloading process, communication, computation, and security are not independent. Performance and security can be improved at the same time through the effective use of cooperative nodes. When cooperative nodes are present, NOMA and forwarding cooperation can reduce the effects of weak channel conditions or high eavesdropping risks on secure and reliable transmission. Cooperative nodes can also share users’ local computational load to strengthen overall system performance. Joint local computation between users and cooperative nodes further reduces the security risks associated with long-distance wireless transmission. Thus, secure offloading in MEC is not only a Physical Layer Security issue in wireless transmission but also reflects the coupled relationship between communication and computation that is specific to MEC. By making full use of idle resources in the network, cooperative communication and computation among idle nodes can enhance system security while maintaining performance.
Performance Analysis for Self-Sustainable Intelligent Metasurface Based Reliable and Secure Communication Strategies
QU Yayun, CAO Kunrui, WANG Ji, XU Yongjun, CHEN Jingyu, DING Haiyang, JIN Liang
2026, 48(2): 545-555.   doi: 10.11999/JEIT250637
[Abstract](198) [FullText HTML](96) [PDF 5400KB](35)
Abstract:
  Objective  The Reconfigurable Intelligent Surface (RIS) is generally powered by a wired method, and its power cable functions as a “tail” that restricts RIS maneuverability during outdoor deployment. A Self-Sustainable Intelligent Metasurface (SIM) that integrates RIS with energy harvesting is examined, and an amplified SIM architecture is presented. The reliability and security of SIM communication are analyzed, and the analysis provides a basis for its rational deployment in practical design.  Methods   The static wireless-powered and dynamic wireless-powered SIM communication strategies are proposed to address the energy and information outage challenges faced by SIM. The communication mechanism of the un-amplified SIM and amplified SIM (U-SIM and A-SIM) under these two strategies is examined. New integrated performance metrics of energy and information, termed joint outage probability and joint intercept probability, are proposed to evaluate the strategies from the perspectives of communication reliability and communication security.  Results and Discussions   The simulations evaluate the effect of several critical parameters on the communication reliability and security of each strategy. The results indicate that: (1) Compared to alternative schemes, at low base station transmit power, A-SIM achieves optimal reliability under the dynamic wireless-powered strategy and optimal security under the static wireless-powered strategy (Figs. 2 and 3). (2) Under the same strategy type, increasing the number of elements at SIM generally enhances reliability but reduces security. With a large number of elements, U-SIM maintains higher reliability than A-SIM, while A-SIM achieves higher security than U-SIM (Figs. 4 and 5). (3) An optimal amplification factor maximizes communication reliability for SIM systems (Fig. 6).  Conclusions   The results show that the dynamic wireless-powered strategy can mitigate the reduction in the reliability of SIM communication caused by insufficient energy. Although the amplified noise of A-SIM decreases reliability, it can improve security. Under the same static or dynamic strategies, as the number of elements at SIM increases, A-SIM provides better security, whereas U-SIM provides better reliability.
Power Grid Data Recovery Method Driven by Temporal Composite Diffusion Networks
YAN Yandong, LI Chenxi, LI Shijie, YANG Yang, GE Yuhao, HUANG Yu
2026, 48(2): 556-566.   doi: 10.11999/JEIT250435
[Abstract](176) [FullText HTML](90) [PDF 1856KB](14)
Abstract:
  Objective  Smart grid construction drives modern power systems, and distribution networks serve as the key interface between the main grid and end users. Their stability, power quality, and efficiency depend on accurate data management and analysis. Distribution networks generate large volumes of multi-source heterogeneous data that contain user consumption records, real-time meteorology, equipment status, and marketing information. These data streams often become incomplete during collection or transmission due to noise, sensor failures, equipment aging, or adverse weather. Missing data reduces the reliability of real-time monitoring and affects essential tasks such as load forecasting, fault diagnosis, health assessment, and operational decision making. Conventional approaches such as mean or regression imputation lack the capacity to maintain temporal dependencies. Generative models such as Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) do not represent the complex statistical characteristics of grid data with sufficient accuracy. This study proposes a diffusion model based data recovery method for distribution networks. The method is designed to reconstruct missing data, preserve semantic and statistical integrity, and enhance data utility to support smart grid stability and efficiency.  Methods  This paper proposes a power grid data augmentation method based on diffusion models. The core of the method is that input Gaussian noise is mapped to the target distribution space of the missing data so that the recovered data follows its original distribution characteristics. To reduce semantic discrepancy between the reconstructed data and the actual data, the method uses time series sequence embeddings as conditional information. This conditional input guides and improves the diffusion generation process so that the imputation remains consistent with the surrounding temporal context.  Results and Discussions  Experimental results show that the proposed diffusion model based data augmentation method achieves higher accuracy in recovering missing power grid data than conventional approaches. The performance demonstrates that the method improves the completeness and reliability of datasets that support analytical tasks and operational decision making in smart grids.  Conclusions  This study proposes and validates a diffusion model based data augmentation method designed to address data missingness in power distribution networks. Traditional restoration methods and generative models have difficulty capturing the temporal dependencies and complex distribution characteristics of grid data. The method presented here uses temporal sequence information as conditional guidance, which enables accurate imputation of missing values and preserves the semantic integrity and statistical consistency of the original data. By improving the accuracy of distribution network data recovery, the method provides a reliable approach for strengthening data quality and supports the stability and efficiency of smart grid operations.
Optimized Design of Non-Transparent Bridge for Heterogeneous Interconnects in Hyper-converged Infrastructure
ZHENG Rui, SHEN Jianliang, LÜ Ping, DONG Chunlei, SHAO Yu, ZHU Zhengbin
2026, 48(2): 567-582.   doi: 10.11999/JEIT250272
[Abstract](548) [FullText HTML](361) [PDF 7065KB](22)
Abstract:
  Objective  The integration of heterogeneous computing resource clusters into modern Hyper-Converged Infrastructure (HCI) systems imposes stricter performance requirements in latency, bandwidth, throughput, and cross-domain transmission stability. Traditional HCI systems primarily rely on the Ethernet TCP/IP protocol, which exhibits inherent limitations, including low bandwidth efficiency, high latency, and limited throughput. Existing PCIe Switch products typically employ Non-Transparent Bridges (NTBs) for conventional dual-system connections or intra-server communication; however, they do not meet the performance demands of heterogeneous cross-domain transmission within HCI environments. To address this limitation, a novel Dual-Mode Non-Transparent Bridge Architecture (D-MNTBA) is proposed to support dual transmission modes. D-MNTBA combines a fast transmission mode via a bypass mechanism with a stable transmission mode derived from the Traditional Data Path Architecture (TDPA), thereby aligning with the data characteristics and cross-domain streaming demands of HCI systems. Hardware-level enhancements in address and ID translation schemes enable D-MNTBA to support more complex mappings while minimizing translation latency. These improvements increase system stability and effectively support the cross-domain transmission of heterogeneous data in HCI systems.  Methods  To overcome the limitations of traditional single-pass architectures and the bypass optimizations of the TDPA, the proposed D-MNTBA incorporates both a fast transmission path and a stable transmission path. This dual-mode design enables the NTB to leverage the data characteristics of HCI systems for telegram-based streaming, thereby reducing dependence on intermediate protocols and data format conversions. The stable transmission mode ensures reliable message delivery, while the fast transmission mode—enhanced through hardware-level optimizations in address and ID translation—supports high-real-time cross-domain communication. This combination improves overall transmission performance by reducing both latency and system overhead. To meet the low-latency demands of the bypass transmission path, the architecture implements hardware-level enhancements to the address and ID conversion modules. The address translation module is expanded with a larger lookup table, allowing for more complex and flexible mapping schemes. This enhancement enables efficient utilization of non-contiguous and fragmented address spaces without compromising performance. Simultaneously, the ID conversion module is optimized through multiple conversion strategies and streamlined logic, significantly reducing the time required for ID translation.  Results and Discussions  Address translation in the proposed D-MNTBA is validated through emulation within a constructed HCI environment. The simulation log for indirect address translation shows no errors or deadlocks, and successful hits are observed on BAR2/3. During dual-host disk access, packet header addresses and payload content remain consistent, with no packet loss detected (Fig. 14), indicating that indirect address translation is accurately executed under D-MNTBA. ID conversion performance is evaluated by comparing the proposed architecture with the TDPA implemented in the PEX8748 chip. The switch based on D-MNTBA exhibits significantly shorter ID conversion times. A maximum reduction of approximately 34.9% is recorded, with an ID conversion time of 71 ns for a 512-Byte payload (Fig. 15). These findings suggest that the ID function mapping method adopted in D-MNTBA effectively reduces conversion latency and enhances system performance. Throughput stability is assessed under sustained heavy traffic with payloads ranging from 256 to 2 048 Bytes. The maximum throughputs of D-MNTBA, the Ethernet card, and PEX8748 are measured at 1.36 GB/s, 0.97 GB/s, and 0.9 GB/s, respectively (Fig. 16). Compared to PEX8748 and the Ethernet architecture, D-MNTBA improves throughput by approximately 51.1% and 40.2%, respectively, and shows the slowest degradation trend, reflecting superior stability in heterogeneous cross-domain transmission. Bandwidth comparison reveals that D-MNTBA outperforms TDPA and the Ethernet card, with bandwidth improvements of approximately 27.1% and 19.0%, respectively (Fig. 17). These results highlight the significant enhancement in cross-domain transmission performance achieved by the proposed architecture in heterogeneous environments.  Conclusions  This study proposes a Dual-Mode D-MNTBA to address the challenges of heterogeneous interconnection in HCI systems. By integrating a fast transmission path enabled by a bypass architecture with the stable transmission path of the TDPA, D-MNTBA accommodates the specific data characteristics of cross-domain transmission in heterogeneous environments and enables efficient message routing. D-MNTBA enhances transmission stability while improving system-wide performance, offering robust support for high-real-time cross-domain transmission in HCI. It also reduces latency and overhead, thereby improving overall transmission efficiency. Compared with existing transmission schemes, D-MNTBA achieves notable gains in performance, making it a suitable solution for the demands of heterogeneous domain interconnects in HCI systems. However, the architectural enhancements, particularly the bypass design and associated optimizations, increase logic resource utilization and power consumption. Future work should focus on refining hardware design, layout, and wiring strategies to reduce logic complexity and resource consumption without compromising performance.
Geospatial Identifier Network Modal Design and Scenario Applications for Vehicle-infrastructure Cooperative Networks
PAN Zhongxia, SHEN Congqi, LUO Hanguang, ZHU Jun, ZOU Tao, LONG Keping
2026, 48(2): 583-596.   doi: 10.11999/JEIT250807
[Abstract](138) [FullText HTML](64) [PDF 6457KB](22)
Abstract:
  Objective  Vehicle-infrastructure cooperative Networks (V2X)are open and contain large numbers of nodes with high mobility, frequent topology changes, unstable wireless channels, and varied service requirements. These characteristics create challenges to efficient data transmission. A flexible network that supports rapid reconfiguration to meet different service requirements is considered essential in Intelligent Transportation Systems (ITS). With the development of programmable network technologies, programmable data-plane techniques are shifting the architecture from rigid designs to adaptive and flexible systems. In this work, a protocol standard based on geospatial information is proposed and combined with a polymorphic network architecture to design a geospatial identifier network modal. In this modal, the traditional three-layer protocol structure is replaced by packet forwarding based on geospatial identifiers. Packets carry geographic location information, and forwarding is executed directly according to this information. Addressing and routing based on geospatial information are more efficient and convenient than traditional IP-based approaches. A vehicle-infrastructure cooperative traffic system based on geospatial identifiers is further designed for intelligent transportation scenarios. This system supports direct geographic forwarding for road safety message dissemination and traffic information exchange. It enhances safety and improves route-planning efficiency within V2X.  Methods  The geospatial identifier network modal is built on a protocol standard that uses geographic location information and a flexible polymorphic network architecture. In this design, the traditional IP addressing mechanism in the three-layer network is replaced by a geospatial identifier protocol, and addressing and routing are executed on programmable polymorphic network elements. To support end-to-end transmission, a protocol stack for the geospatial identifier network modal is constructed, enabling unified transmission across different network modals. A dynamic geographic routing mechanism is further developed to meet the transmission requirements of the GEO modal. This mechanism functions in a multimodal network controller and uses the relatively stable coverage of roadside base stations to form a two-level mapping: “geographic region-base station/geographic coordinates-terminal.” This mapping supports precise path matching for GEO modal packets and enables flexible, centrally controlled geographic forwarding. To verify the feasibility of the geospatial identifier network modal, a vehicle-infrastructure cooperative intelligent transportation system supporting geospatial identifier addressing is developed. The system is designed to facilitate efficient dissemination of road safety and traffic information. The functional requirements of the system are analyzed, and the business processing flow and overall architecture are designed. Key hardware and software modules are also developed, including the geospatial representation data-plane code, traffic control center services, roadside base stations, and in-vehicle terminals, and their implementation logic is presented.  Results and Discussions  System evaluation is carried out from four aspects: evaluation environment, operational effectiveness, theoretical analysis, and performance testing. A prototype intelligent transportation system is deployed, as shown in Figure 7 and Figure 8. The prototype demonstrates correct message transmission based on the geospatial identifier modal. A typical vehicle-to-vehicle communication case is used to assess forwarding efficiency, where an onboard terminal (T3) sends a road-condition alert (M) to another terminal (T2). Sequence-based analysis is applied to compare forwarding performance between the GEO modal and a traditional IP protocol. Theoretical analysis indicates that the GEO modal provides higher forwarding efficiency, as shown in Fig. 9. Additional performance tests are conducted by adjusting the number of terminals (Fig. 10), background traffic (Fig. 11), and the traffic of the control center (Fig. 12) to observe the transmission behavior of geospatial identifier packets. The results show that the intelligent transportation system maintains stable and efficient transmission performance under varying network conditions. System evaluation confirms its suitability for typical vehicle-infrastructure cooperative communication scenarios, supporting massive connectivity and elastic traffic loads.  Conclusions  By integrating a flexible polymorphic network architecture with a protocol standard based on geographic information, a geospatial identifier network modal is developed and implemented. The modal enables direct packet forwarding based on geospatial location. A prototype vehicle-infrastructure cooperative intelligent transportation system using geospatial identifier addressing is also designed for intelligent transportation scenarios. The system supports applications such as road-safety alerts and traffic information broadcasting, improves vehicle safety, and enhances route-planning efficiency. Experimental evaluation shows that the system maintains stable and efficient performance under typical traffic conditions, including massive connectivity, fluctuating background traffic, and elastic service loads. With the continued development of vehicular networking technologies, the proposed system is expected to support broader intelligent transportation applications and contribute to safer and more efficient mobility systems.
An Implicit Certificate-Based Lightweight Authentication Scheme for Power Industrial Internet of Things
WANG Sheng, ZHANG Linghao, TENG Yufei, LIU Hongli, HAO Junyang, WU Wenjuan
2026, 48(2): 597-606.   doi: 10.11999/JEIT250457
[Abstract](162) [FullText HTML](65) [PDF 4616KB](24)
Abstract:
  Objective  The rapid development of the Internet of Things, cloud computing, and edge computing drives the evolution of the Power Industrial Internet of Things (PIIoT) into core infrastructure for smart power systems. In this architecture, terminal devices collect operational data and send it to edge gateways for preliminary processing before transmission to cloud platforms for further analysis and control. This structure improves efficiency, reliability, and security in power systems. However, the integration of traditional industrial systems with open networks introduces cybersecurity risks. Resource-constrained devices in PIIoT are exposed to threats that may lead to data leakage, privacy exposure, or disruption of power services. Existing authentication mechanisms either impose high computational and communication overhead or lack sufficient protection, such as forward secrecy or resistance to replay and man-in-the-middle attacks. This study focuses on designing a lightweight and secure authentication method suitable for the PIIoT environment. The method is intended to meet the operational needs of power terminal devices with limited computing capability while ensuring strong security protection.  Methods  A secure and lightweight identity authentication scheme is designed to address these challenges. Implicit certificate technology is applied during device identity registration, embedding public key authentication information into the signature rather than transmitting a complete certificate during communication. Compared with explicit certificates, implicit certificates are shorter and allow faster verification, reducing transmission and validation overhead. Based on this design, a lightweight authentication protocol is constructed using only hash functions, XOR operations, and elliptic curve point multiplication. This protocol supports secure mutual authentication and session key agreement while remaining suitable for resource-constrained power terminal devices. A formal analysis is then performed to evaluate security performance. The results show that the scheme achieves secure mutual authentication, protects session key confidentiality, ensures forward secrecy, and resists replay and man-in-the-middle attacks. Finally, experimental comparisons with advanced authentication protocols are conducted. The results indicate that the proposed scheme requires significantly lower computational and communication overhead, supporting its feasibility for practical deployment.  Results and Discussions  The proposed scheme is evaluated through simulation and numerical comparison with existing methods. The implementation is performed on a virtual machine configured with 8 GB RAM, an Intel i7-12700H processor, and Ubuntu 22.04, using the Miracl-Python cryptographic library. The security level is set to 128 bits, with the ed25519 elliptic curve, SHA-256 hash function, and AES-128 symmetric encryption. Table 1 summarizes the performance of the cryptographic primitives. As shown in Table 2, the proposed scheme achieves the lowest computational cost, requiring three elliptic curve point multiplications on the device side and five on the gateway side. These values are substantially lower than those of traditional certificate-based authentication, which may require up to 14 and 12 operations, respectively. Compared with other representative authentication approaches, the proposed method further reduces the computational burden on devices, improving suitability for resource-limited environments. Table 3 shows that communication overhead is also minimized, with the smallest total message size (3 456 bit) and three communication rounds, attributed to the implicit certificate mechanism. As shown in Fig. 5, the authentication process exhibits the shortest execution time among all evaluated schemes. The runtime is 47.72 ms on devices and 82.68 ms on gateways, indicating lightweight performance and suitability for deployment in Industrial Internet of Things applications.  Conclusions  A lightweight and secure identity authentication scheme based on implicit certificates is presented for resource-constrained terminal devices in the PIIoT. Through the integration of a low-overhead authentication protocol and efficient certificate processing, the scheme maintains a balance between security and performance. It enables secure mutual authentication, protects session key confidentiality, and ensures forward secrecy while keeping computational and communication overhead minimal. Security analysis and experimental evaluation confirm that the scheme provides stronger protection and higher efficiency compared with existing approaches. It offers a practical and scalable solution for enhancing the security architecture of modern power systems.
Architecture and Operational Dynamics for Enabling Symbiosis and Evolution of Network Modalities
ZHANG Huifeng, HU Yuxiang, ZHU Jun, ZOU Tao, HUANGFU Wei, LONG Keping
2026, 48(2): 607-617.   doi: 10.11999/JEIT250949
[Abstract](151) [FullText HTML](89) [PDF 4809KB](20)
Abstract:
  Objective  The paradigm shift toward polymorphic networks enables dynamic deployment of diverse network modalities on shared infrastructure but introduces two fundamental challenges. First, symbiosis complexity arises from the absence of formal mechanisms to orchestrate coexistence conditions, intermodal collaboration, and resource efficiency gains among heterogeneous network modalities, which results in inefficient resource use and performance degradation. Second, evolutionary uncertainty stems from the lack of lifecycle-oriented frameworks to govern triggering conditions (e.g., abrupt traffic surges), optimization objectives (service-level agreement compliance and energy efficiency), and transition paths (e.g., seamless migration from IPv6 to GEO-based modalities) during network modality evolution, which constrains adaptive responses to vertical industry demands such as vehicular networks and smart manufacturing. This study aims to establish a theoretical and architectural foundation to address these gaps by proposing a three-plane architecture that supports dynamic coexistence and evolution of polymorphic networks with deterministic service-level agreement guarantees.  Methods  The architecture decouples network operation into four domains: (1) The business domain dynamically clusters services using machine learning according to quality-of-service requirements. (2) The modal domain generates specialized network modalities through software-defined interfaces. (3) The function domain enables baseline capability pooling by atomizing network functions into reusable components. (4) The resource domain supports fine-grained resource scheduling through elementization techniques. The core innovation lies in three synergistic planes: (1) The evolutionary decision plane applies predictive analytics for adaptive selection and optimization of network modalities. (2) The intelligent generation plane orchestrates modality deployment with global resource awareness. (3) The symbiosis platform plane dynamically composes baseline capabilities to support modality coexistence.  Results and Discussions  The proposed architecture advances beyond conventional approaches by avoiding virtualization overhead through native deployment of network modalities directly on polymorphic network elements. Resource elementization and capability pooling jointly support efficient cross-modality resource sharing. Closed-loop interactions among the decision, generation, and symbiosis planes enable autonomous network evolution that adapts to time-varying service demands under unified control objectives.  Conclusions  A theoretically grounded framework is presented to support dynamic symbiosis of heterogeneous network modalities on shared infrastructure through business-driven decision mechanisms and autonomous evolution. The architecture provides a scalable foundation for future systems that integrate artificial intelligence. Future work will extend this paradigm to integrated 6G satellite-terrestrial scenarios, where spatial-temporal resource complementarity is expected to play a central role.
A Deception Jamming Discrimination Algorithm Based on Phase Fluctuation for Airborne Distributed Radar System
LÜ Zhuoyu, YANG Chao, SUO Chengyu, WEN Cai
2026, 48(2): 618-629.   doi: 10.11999/JEIT240787
[Abstract](188) [FullText HTML](118) [PDF 7983KB](24)
Abstract:
  Objective   Deception jamming in airborne distributed radar systems presents a crucial challenge, as false echoes generated by Digital Radio Frequency Memory (DRFM) devices tend to mimic true target returns in amplitude, delay, and Doppler characteristics. These similarities complicate target recognition and subsequently degrade tracking accuracy. To address this problem, attention is directed to phase fluctuation signatures, which differ inherently between authentic scattering responses and synthesized interference replicas. Leveraging this distinction is proposed as a means of improving discrimination reliability under complex electromagnetic confrontation conditions.  Methods   A signal-level fusion discrimination algorithm is proposed based on phase fluctuation variance. Five categories of synchronization errors that affect the phase of received echoes are analyzed and corrected, including filter mismatch, node position errors, and equivalent amplitude-phase deviations. Precise matched filters are constructed through a fine-grid iterative search to eliminate residual phase distortion caused by limited sampling resolution. Node position errors are estimated using a DRFM-based calibration array, and equivalent amplitude-phase deviations are corrected through an eigendecomposition-based procedure. After calibration, phase vectors associated with target returns are extracted, and the variance of these vectors is taken as the discrimination criterion. Authentic targets present large phase fluctuations due to complex scattering, whereas DRFM-generated replicas exhibit only small variations.  Results and Discussions   Simulation results show that the proposed method achieves reliable discrimination under typical airborne distributed radar conditions. When the signal-to-noise ratio is 25 dB and the jamming-to-noise ratio is 3 dB, the misjudgment rate for false targets approaches zero when more than five receiving nodes are used (Fig. 10, Fig. 11). The method remains robust even when only a few false targets are present and performs better than previously reported approaches, where discrimination fails in single- or dual-false-target scenarios (Fig. 14). High recognition stability is maintained across different jamming-to-noise ratios and receiver quantities (Fig. 13). The importance of system-level error correction is confirmed, as discrimination accuracy declines significantly when synchronization errors are not compensated (Fig. 12).  Conclusions   A phase-fluctuation-based discrimination algorithm for airborne distributed radar systems is presented. By correcting system-level errors and exploiting the distinct fluctuation behavior of phase signatures from real and false echoes, the method achieves reliable deception-jamming discrimination in complex electromagnetic environments. Simulations indicate stable performance under varying numbers of false targets, demonstrating good applicability for distributed configurations. Future work will aim to enhance robustness under stronger environmental noise and clutter.
Robust Resource Allocation Algorithm for Active Reconfigurable Intelligent Surface-Assisted Symbiotic Secure Communication Systems
MA Rui, LI Yanan, TIAN Tuanwei, LIU Shuya, DENG Hao, ZHANG Jinlong
2026, 48(2): 630-639.   doi: 10.11999/JEIT250811
[Abstract](164) [FullText HTML](66) [PDF 3115KB](37)
Abstract:
  Objective  Research on Reconfigurable Intelligent Surface (RIS)-assisted symbiotic radio systems is mainly centered on passive RIS. In practice, passive RIS suffers from a pronounced double-fading effect, which restricts capacity gains in scenarios dominated by strong direct paths. This work examines the use of active RIS, whose amplification capability increases the signal-to-noise ratio of the secondary signal and strengthens the security of the primary signal. Imperfect Successive Interference Cancellation (SIC) is considered, and a penalized Successive Convex Approximation (SCA) algorithm based on alternating optimization is analyzed to enable robust resource allocation.  Methods  The original optimization problem is difficult to address directly because it contains complex and non-convex constraints. An alternating optimization strategy is therefore adopted to decompose the problem into two subproblems: the design of the transmit beamforming vector at the primary transmitter and the design of the reflection coefficient matrix at the active RIS. Variable substitution, equivalent transformation, and a penalty-based SCA method are then applied in an alternating iterative manner. For the beamforming design, the rank-one constraint is first transformed into an equivalent form. The penalty-based SCA method is used to recover the rank-one optimal solution, after which iterative optimization is carried out to obtain the final result. For the reflection coefficient matrix design, the problem is reformulated and auxiliary variables are introduced to avoid feasibility issues. A penalty-based SCA approach is then used to handle the rank-one constraint, and the solution is obtained using the CVX toolbox. Based on these procedures, a penalty-driven robust resource allocation algorithm is established through alternating optimization.  Results and Discussions  The convergence curves of the proposed algorithm under different numbers of primary transmitter antennas (K) and RIS reflecting elements (N) is shown (Fig.3). The total system power consumption decreases as the number of iterations increases and converges within a finite number of steps. The relationship between total power consumption and the Signal-to-Interference-and-Noise Ratio (SINR) threshold of the secondary signal is illustrated (Fig. 4). As the SINR threshold increases, the system requires more power to maintain the minimum service quality of the secondary signal, which results in higher total power consumption. In addition, as the imperfect interference cancellation factor decreases, the total power consumption is further reduced. To compare performance, three baseline algorithms are examined (Fig. 5): the passive RIS, the active RIS with random phase shift, and the non-robust algorithm. The total system power consumption under the proposed algorithm remains lower than that of the passive RIS and the active RIS with random phase shift. Although the active RIS consumes additional power, the corresponding reduction in transmit power is more than that compensates for this consumption, thereby improving overall energy efficiency. When random phase shifts are applied, the active beamforming and amplification capabilities of the RIS cannot be fully utilized. This forces the primary transmitter to compensate alone to meet performance constraints, which increases its power consumption. Furthermore, because imperfect SIC is considered in the proposed algorithm, additional transmit power is required to counter residual interference and satisfy the minimum SINR constraint of the secondary system. Therefore, the total power consumption remains higher than that of the non-robust algorithm. The effect of the secrecy rate threshold of the primary signal on the secure energy efficiency of the primary system under different values of N is shown (Fig. 6). The results indicate that an optimal secrecy rate threshold exists that maximizes the secure energy efficiency of the primary system. To investigate the effect of active RIS placement on total system power consumption, the node positions are rearranged (Fig. 7). As the active RIS is positioned closer to the receiver, the fading effect weakens and the total system power consumption decreases.  Conclusions  This paper investigates the total power consumption of an active RIS-assisted symbiotic secure communication system under imperfect SIC. To enhance system energy efficiency, a total power minimization problem is formulated with constraints on the quality of service for both primary and secondary signals and on the power and phase shift of the active RIS. To address the non-convexity introduced by uncertain disturbance parameters, variable substitution, equivalent transformation, and a penalty-based SCA method are applied to convert the original formulation into a convex optimization problem. Simulation results confirm the effectiveness of the proposed algorithm and show that it achieves a notable reduction in total system power consumption compared with benchmark schemes.
Research on Directional Modulation Multi-carrier Waveform Design for Integrated Sensing and Communication
HUANG Gaojian, ZHANG Shengzhuang, DING Yuan, LIAO Kefei, JIN Shuanggen, LI Xingwang, OUYANG Shan
2026, 48(2): 640-650.   doi: 10.11999/JEIT250680
[Abstract](228) [FullText HTML](101) [PDF 3877KB](38)
Abstract:
  Objective  With the concurrent evolution of wireless communication and radar technologies, spectrum congestion has become increasingly severe. Integrated Sensing and Communication (ISAC) has emerged as an effective approach that unifies sensing and communication functionalities to achieve efficient spectrum and hardware sharing. Orthogonal Frequency Division Multiplexing (OFDM) signals are regarded as a key candidate waveform due to their high flexibility. However, estimating target azimuth angles and suppressing interference from non-target directions remain computationally demanding, and confidential information transmitted in these directions is vulnerable to eavesdropping. To address these challenges, the combination of Directional Modulation (DM) and OFDM, termed OFDM-DM, provides a promising solution. This approach enables secure communication toward the desired direction, suppresses interference in other directions, and reduces radar signal processing complexity. The potential of OFDM-DM for interference suppression and secure waveform design is investigated in this study.  Methods  As a physical-layer security technique, DM is used to preserve signal integrity in the intended direction while deliberately distorting signals in other directions. Based on this principle, an OFDM-DM ISAC waveform is developed to enable secure communication toward the target direction while simultaneously estimating distance, velocity, and azimuth angle. The proposed waveform has two main advantages: the Bit Error Rate (BER) at the radar receiver is employed for simple and adjustable azimuth estimation, and interference from non-target directions is suppressed without additional computational cost. The waveform maintains the OFDM constellation in the target direction while distorting constellation points elsewhere, which reduces correlation with the original signal and enhances target detection through time-domain correlation. Moreover, because element-wise complex division in the Two-Dimensional Fast Fourier Transform (2-D FFT) depends on signal integrity, phase distortion in signals from non-target directions disrupts phase relationships and further diminishes the positional information of interference sources.  Results and Discussions  In the OFDM-DM ISAC system, the transmitted signal retains its communication structure within the target beam, whereas constellation distortion occurs in other directions. Therefore, the BER at the radar receiver exhibits a pronounced main lobe in the target direction, enabling accurate azimuth estimation (Fig. 5). In the time-domain correlation algorithm, the target distance is precisely determined, while correlation in non-target directions deteriorates markedly due to DM, thereby achieving effective interference suppression (Fig. 6). Additionally, during 2-D FFT processing, signal distortion disrupts the linear phase relationship among modulation symbols in non-target directions, causing conventional two-dimensional spectral estimation to fail and further suppressing positional information of interference sources (Fig. 7). Additional simulations yield one-dimensional range and velocity profiles (Fig. 8). The results demonstrate that the OFDM-DM ISAC waveform provides structural flexibility, physical-layer security, and low computational complexity, making it particularly suitable for environments requiring high security or operating under strong interference conditions.  Conclusions  This study proposes an OFDM-DM ISAC waveform and systematically analyzes its advantages in both sensing and communication. The proposed waveform inherently suppresses interference from non-target directions, eliminating target ambiguity commonly encountered in traditional ISAC systems and thereby enhancing sensing accuracy. Owing to the spatial selectivity of DM, only legitimate directions can correctly demodulate information, whereas unintended directions fail to recover valid data, achieving intrinsic physical-layer security. Compared with existing methods, the proposed waveform simultaneously attains secure communication and interference suppression without additional computational burden, offering a lightweight and high-performance solution suitable for resource-constrained platforms. Therefore, the OFDM-DM ISAC waveform enables high-precision sensing while maintaining communication security and hardware feasibility, providing new insights for multi-carrier ISAC waveform design.
Adaptive Cache Deployment Based on Congestion Awareness and Content Value in LEO Satellite Networks
LIU Zhongyu, XIE Yaqin, ZHANG Yu, ZHU Jianyue
2026, 48(2): 651-661.   doi: 10.11999/JEIT250670
[Abstract](195) [FullText HTML](122) [PDF 3794KB](26)
Abstract:
  Objective  Low Earth Orbit (LEO) satellite networks are central to future space-air-ground integrated systems, offering global coverage and low-latency communication. However, their high-speed mobility leads to rapidly changing topologies, and strict onboard cache constraints hinder efficient content delivery. Existing caching strategies often overlook real-time network congestion and content attributes (e.g., freshness), which leads to inefficient resource use and degraded Quality of Service (QoS). To address these limitations, we propose an adaptive cache placement strategy based on congestion awareness. The strategy dynamically couples real-time network conditions, including link congestion and latency, with a content value assessment model that incorporates both popularity and freshness.This integrated approach enhances cache hit rates, reduces backhaul load, and improves user QoS in highly dynamic LEO satellite environments, enabling efficient content delivery even under fluctuating traffic demands and resource constraints.  Methods  The proposed strategy combines a dual-threshold congestion detection mechanism with a multi-dimensional content valuation model. It proceeds in three steps. First, satellite nodes monitor link congestion in real time using dual latency thresholds and relay congestion status to downstream nodes through data packets. Second, a two-dimensional content value model is constructed that integrates popularity and freshness. Popularity is updated dynamically using an Exponential Weighted Moving Average (EWMA), which balances historical and recent request patterns to capture temporal variations in demand. Freshness is evaluated according to the remaining data lifetime, ensuring that expired or near-expired content is deprioritized to maintain cache efficiency and relevance. Third, caching thresholds are adaptively adjusted according to congestion level, and a hop count control factor is introduced to guide caching decisions. This coordinated mechanism enables the system to prioritize high-value content while mitigating congestion, thereby improving overall responsiveness and user QoS.  Results and Discussions  Simulations conducted on ndnSIM demonstrate the superiority of the proposed strategy over PaCC (Popularity-Aware Closeness-based Caching), LCE (Leave Copy Everywhere), LCD (Leave Copy Down), and Prob (probability-based caching with probability = 0.5). The key findings are as follows. (1) Cache hit rate. The proposed strategy consistently outperforms conventional methods. As shown in Fig. 8, the cache hit rate rises markedly with increasing cache capacity and Zipf parameter, exceeding those of LCE, LCD, and Prob. Specifically, the proposed strategy achieves improvements of 43.7% over LCE, 25.3% over LCD, 17.6% over Prob, and 9.5% over PaCC. Under high content concentration (i.e., larger Zipf parameters), the improvement reaches 29.1% compared with LCE, highlighting the strong capability of the strategy in promoting high-value content distribution. (2) Average routing hop ratio. The proposed strategy also reduces routing hops compared with the baselines. As shown in Fig. 9, the average hop ratio decreases as cache capacity and Zipf parameter increase. Relative to PaCC, the proposed strategy lowers the average hop ratio by 2.24%, indicating that content is cached closer to users, thereby shortening request paths and improving routing efficiency. (3) Average request latency. The proposed strategy achieves consistently lower latency than all baseline methods. As summarized in Table 2 and Fig. 10, the reduction is more pronounced under larger cache capacities and higher Zipf parameters. For instance, with a cache capacity of 100 MB, latency decreases by approximately 2.9%, 5.8%, 9.0%, and 10.3% compared with PaCC, Prob, LCD, and LCE, respectively. When the Zipf parameter is 1.0, latency reductions reach 2.7%, 5.7%, 7.2%, and 8.8% relative to PaCC, Prob, LCD, and LCE, respectively. Concretely, under a cache capacity of 100 MB and Zipf parameter of 1.0, the average request latency of the proposed strategy is 212.37 ms, compared with 236.67 ms (LCE), 233.45 ms (LCD), 225.42 ms (Prob), and 218.62 ms (PaCC).  Conclusions  This paper presents a congestion-aware adaptive caching placement strategy for LEO satellite networks. By combining real-time congestion monitoring with multi-dimensional content valuation that considers both dynamic popularity and freshness, the strategy achieves balanced improvements in caching efficiency and network stability. Simulation results show that the proposed method markedly enhances cache hit rates, reduces average routing hops, and lowers request latency compared with existing schemes such as PaCC, Prob, LCD, and LCE. These benefits hold across different cache sizes and request distributions, particularly under resource-constrained or highly dynamic conditions, confirming the strategy’s adaptability to LEO environments. The main innovations include a closed-loop feedback mechanism for congestion status, dynamic adjustment of caching thresholds, and hop-aware content placement, which together improve resource utilization and user QoS. This work provides a lightweight and robust foundation for high-performance content delivery in satellite-terrestrial integrated networks. Future extensions will incorporate service-type differentiation (e.g., delay-sensitive vs. bandwidth-intensive services), and orbital prediction to proactively optimize cache migration and updates, further enhancing efficiency and adaptability in 6G-enabled LEO networks.
A Method for Named Entity Recognition in Military Intelligence Domain Using Large Language Models
LI Yongbin, LIU Lian, ZHENG Jie
2026, 48(2): 662-672.   doi: 10.11999/JEIT250764
[Abstract](216) [FullText HTML](209) [PDF 2592KB](50)
Abstract:
  Objective  Named Entity Recognition (NER) is a fundamental task in information extraction within specialized domains, particularly military intelligence. It plays a critical role in situation assessment, threat analysis, and decision support. However, conventional NER models face major challenges. First, the scarcity of high-quality annotated data in the military intelligence domain is a persistent limitation. Due to the sensitivity and confidentiality of military information, acquiring large-scale, accurately labeled datasets is extremely difficult, which severely restricts the training performance and generalization ability of supervised learning-based NER models. Second, military intelligence requires handling complex and diverse information extraction tasks. The entities to be recognized often possess domain-specific meanings, ambiguous boundaries, and complex relationships, making it difficult for traditional models with fixed architectures to adapt flexibly to such complexity or achieve accurate extraction. This study aims to address these limitations by developing a more effective NER method tailored to the military intelligence domain, leveraging Large Language Models (LLMs) to enhance recognition accuracy and efficiency in this specialized field.  Methods  To achieve the above objective, this study focuses on the military intelligence domain and proposes a NER method based on LLMs. The central concept is to harness the strong semantic reasoning capabilities of LLMs, which enable deep contextual understanding of military texts, accurate interpretation of complex domain-specific extraction requirements, and autonomous execution of extraction tasks without heavy reliance on large annotated datasets. To ensure that general-purpose LLMs can rapidly adapt to the specialized needs of military intelligence, two key strategies are employed. First, instruction fine-tuning is applied. Domain-specific instruction datasets are constructed to include diverse entity types, extraction rules, and representative examples relevant to military intelligence. Through fine-tuning with these datasets, the LLMs acquire a more precise understanding of the characteristics and requirements of NER in this field, thereby improving their ability to follow targeted extraction instructions. Second, Retrieval-Augmented Generation (RAG) is incorporated. A domain knowledge base is developed containing expert knowledge such as entity dictionaries, military terminology, and historical extraction cases. During the NER process, the LLM retrieves relevant knowledge from this base in real time to support entity recognition. This strategy compensates for the limited domain-specific knowledge of general LLMs and enhances recognition accuracy, particularly for rare or complex entities.  Results and Discussions  Experimental results indicate that the proposed LLM-based NER method, which integrates instruction fine-tuning and RAG, achieves strong performance in military intelligence NER tasks. Compared with conventional NER models, it demonstrates higher precision, recall, and F1-score, particularly in recognizing complex entities and managing scenarios with limited annotated data. The effectiveness of this method arises from several key factors. The powerful semantic reasoning capability of LLMs enables a deeper understanding of contextual nuances and ambiguous expressions in military texts, thereby reducing missed and false recognitions commonly caused by rigid pattern-matching approaches. Instruction fine-tuning allows the model to better align with domain-specific extraction requirements, ensuring that the recognition results correspond more closely to the practical needs of military intelligence analysis. Furthermore, the incorporation of RAG provides real-time access to domain expert knowledge, markedly enhancing the recognition of entities that are highly specialized or morphologically variable within military contexts. This integration effectively mitigates the limitations of traditional models that lack sufficient domain knowledge.  Conclusions  This study proposes a LLM-based NER method for the military intelligence domain, effectively addressing the challenges of limited annotated data and complex extraction requirements encountered by traditional models. By combining instruction fine-tuning and RAG, general-purpose LLMs can be rapidly adapted to the specialized demands of military intelligence, enabling the construction of an efficient domain-specific expert system at relatively low cost. The proposed method provides an effective and scalable solution for NER tasks in military intelligence scenarios, enhancing both the efficiency and accuracy of information extraction in this field. It offers not only practical value for military intelligence analysis and decision support but also methodological insight for NER research in other specialized domains facing similar data and complexity constraints, such as aerospace and national security. Future research will focus on optimizing instruction fine-tuning strategies, expanding the domain knowledge base, and reducing computational cost to further improve model performance and applicability.
A Reliable Service Chain Option for Global Migration of Intelligent Twins in Vehicular Metaverses
QIU Xianyi, WEN Jinbo, KANG Jiawen, ZHANG Tao, CAI Chengjun, LIU Jiqiang, XIAO Ming
2026, 48(2): 673-685.   doi: 10.11999/JEIT250612
[Abstract](140) [FullText HTML](71) [PDF 2333KB](13)
Abstract:
  Objective   As an emerging paradigm that integrates metaverses with intelligent transportation systems, vehicular metaverses are becoming a driving force in the transformation of the automotive industry. Within this context, intelligent twins act as digital counterparts of vehicles, covering their entire lifecycle and managing vehicular applications to provide immersive services. However, seamless migration of intelligent twins across RoadSide Units (RSUs) faces challenges such as excessive transmission delays and data leakage, particularly under cybersecurity threats like Distributed Denial of Service (DDoS) attacks. To address these issues, this paper proposes a globally optimized scheme for secure and dynamic intelligent twin migration based on RSU chains. The proposed approach mitigates transmission latency and enhances network security, ensuring that intelligent twins can be migrated reliably and securely through RSU chains even in the presence of multiple types of DDoS attacks.  Methods   A set of reliable RSU chains is first constructed using a communication interruption-free mechanism, which enables the rational deployment of intelligent twins for seamless RSU connectivity. This mechanism ensures continuous communication by dynamically reconfiguring RSU chains according to real-time network conditions and vehicle mobility. The secure migration of intelligent twins along these RSU chains is then formulated as a Partially Observable Markov Decision Process (POMDP). The POMDP framework incorporates dynamic network state variables, including RSU load, available bandwidth, computational capacity, and attack type. These variables are continuously monitored to support decision-making. Migration efficiency and security are evaluated based on total migration delay and the number of DDoS attacks encountered; these metrics serve as reward functions for optimization. Deep Reinforcement Learning (DRL) agents iteratively learn from their interactions with the environment, refining RSU chain selection strategies to maximize both security and efficiency. Through this algorithm, the proposed scheme mitigates excessive transmission delays caused by network attacks in vehicular metaverses, ensuring reliable and secure intelligent twin migration even under diverse DDoS attack scenarios.  Results and Discussions   The proposed secure dynamic intelligent twin migration scheme employs the MADRL framework to select efficient and secure RSU chains within the POMDP. By defining a suitable reward function, the efficiency and security of intelligent twin migration are evaluated under varying RSU chain lengths and different attack scenarios. Simulation results confirm that the scheme enhances migration security in vehicular metaverses. Shorter RSU chains yield lower migration delays than longer ones, owing to reduced handovers and lower communication overhead (Fig. 2). Additionally, the total reward reaches its maximum when the RSU chain length is 6 (Fig. 3). The MADQN approach exhibits strong defense capabilities against DDoS attacks. Under direct attacks, MADQN achieves final rewards that are 65.3% and 51.8% higher than those obtained by random and greedy strategies, respectively. Against indirect attacks, MADQN improves performance by 9.3%. Under hybrid attack conditions, MADQN increases the final reward by 29% and 30.9% compared with the random and greedy strategies, respectively (Fig. 4), demonstrating the effectiveness of the DRL-based defense strategy in handling complex and dynamic threats. Experimental comparisons with other DRL algorithms, including PPO, A2C, and QR-DQN, further highlight the superiority of MADQN under direct, indirect, and hybrid DDoS attacks (Figs. 57). Overall, the proposed scheme ensures reliable and efficient intelligent twin migration across RSUs even under diverse security threats, thereby supporting high-quality interactions in vehicular metaverses.  Conclusions   This study addresses the challenge of secure and efficient global migration of intelligent twins in vehicular metaverses by integrating RSU chains with a POMDP-based optimization framework. Using the MADQN algorithm, the proposed scheme improves both the efficiency and security of intelligent twin migration under diverse network conditions and attack scenarios. Simulation results confirm significant gains in performance. Along identical driving routes, shorter RSU chains provide higher migration efficiency and stronger defense capabilities. Under various types of DDoS attacks, MADQN consistently outperforms baseline strategies, achieving higher final rewards than random and greedy approaches across all scenarios. Compared with other DRL algorithms, MADQN increases the final reward by up to 50.1%, demonstrating superior adaptability in complex attack environments. Future work will focus on enhancing the communication security of RSU chains, including the development of authentication mechanisms to ensure that only authorized vehicles can access RSU edge communication networks.
A Polymorphic Network Backend Compiler for Domestic Switching Chips
TU Huaqing, WANG Yuanhong, XU Qi, ZHU Jun, ZOU Tao, LONG Keping
2026, 48(2): 686-696.   doi: 10.11999/JEIT250132
[Abstract](188) [FullText HTML](120) [PDF 5905KB](25)
Abstract:
  Objective  The P4 language and programmable switching chips offer a feasible approach for deploying polymorphic networks. However, polymorphic network packets written in P4 cannot be directly executed on the domestically produced TsingMa.MX programmable switching chip developed by Centec, which necessitates the design of a specialized compiler to translate and deploy the P4 language on this chip. Existing backend compilers are mainly designed and optimized for software-programmable switches such as BMv2, FPGAs, and Intel Tofino series chips, rendering them unsuitable for compiling polymorphic network programs for the TsingMa.MX chip. To resolve this limitation, a backend compiler named p4c-TsingMa is proposed for the TsingMa.MX switching chip. This compiler enables the translation of high-level network programming languages into executable formats for the TsingMa.MX chip, thereby supporting the concurrent parsing and forwarding of multiple network modal packets.  Methods  p4c-TsingMa first employs a preorder traversal approach to extract key information, including protocol types, protocol fields, and actions, from the Intermediate Representation (IR). It then performs instruction translation to generate corresponding control commands for the TsingMa.MX chip. Additionally, p4c-TsingMa adopts a User Defined Field (UDF) entry merging method to consolidate matching instructions from different network modalities into a unified lookup table. This design enables the extraction of multiple modal matching entries in a single operation, thereby enhancing chip resource utilization.  Results and Discussions  The p4c-TsingMa compiler is implemented in C++, mapping network modal programs written in the P4 language into configuration instructions for the TsingMa.MX switching chip. A polymorphic network packet testing environment (Fig. 6) is established, where multiple types of network data packets are simultaneously transmitted to the same switch port. According to the configured flow tables, the chip successfully identifies polymorphic network data packets and forwards them to their corresponding ports (Fig. 8). Additionally, the table entry merging algorithm improves register resource utilization by 37.5% to 75%, enabling the chip to process more than two types of modal data packets concurrently.  Conclusions  A polymorphic network backend compiler, p4c-TsingMa, is designed specifically for domestic switching chips. By utilizing the FlexParser and FlexEdit functions of the TsingMa chip, the compiler translates polymorphic network programs into executable commands for the TsingMa.MX chip, enabling the chip to parse and modify polymorphic data packets. Experimental results demonstrate that p4c-TsingMa achieves high compilation efficiency and improves register resource utilization by 37.5% to 75%.
Overviews
A Review on Phase Rotation and Beamforming Scheme for Intelligent Reflecting Surface Assisted Wireless Communication Systems
XING Zhitong, LI Yun, WU Guangfu, XIA Shichao
2026, 48(2): 697-712.   doi: 10.11999/JEIT250790
[Abstract](287) [FullText HTML](144) [PDF 2579KB](47)
Abstract:
  Objective  Since the large-scale commercial deployment of 5G networks in 2020 and the continued development of 6G technology, modern communication systems need to function under increasingly complex channel conditions. These include ultra-high-density urban environments and remote areas such as oceanic regions, deserts, and forests. To meet these challenges, low-energy solutions capable of dynamically adjusting and reconfiguring wireless channels are required. Such solutions would improve transmission performance by lowering latency, increasing data rates, and strengthening signal reception, and would support more efficient deployment in demanding environments. The Intelligent Reflecting Surface (IRS) has gained attention as a promising approach for reshaping channel conditions. Unlike traditional active relays, an IRS operates passively and adds minimal energy consumption. When integrated with communication architectures such as Single Input Single Output (SISO), Multiple Input Single Output (MISO), and Multiple Input Multiple Output (MIMO), an IRS can improve transmission efficiency, reduce power consumption, and enhance adaptability in complex scenarios. This paper reviews IRS-assisted communication systems, with emphasis on signal transmission models, beamforming methods, and phase-shift optimization strategies.  Methods  This review examines IRS technology in modern communication systems by analyzing signal transmission models across three fundamental configurations. The discussion begins with IRS-assisted SISO systems, in which IRS control of incident signals through reflection and phase shifting improves single-antenna communication by mitigating traditional propagation constraints. The analysis then extends to MISO and MIMO architectures, where the relationship between IRS phase adjustments and MIMO precoding is assessed to determine strategies that support high spectral efficiency. Based on these transmission models, this review surveys joint optimization and precoding methods tailored for IRS-enhanced MIMO systems. These algorithms can be grouped into four categories that meet different operational requirements. The first aims to minimize power consumption by reducing total energy use while maintaining acceptable communication quality, which is important for energy-sensitive applications such as IoT systems and green communication scenarios. The second seeks to maximize energy efficiency by optimizing the ratio of achievable data rate to power consumption rather than lowering energy use alone, thereby improving performance per unit of energy. The third focuses on maximizing the sum rate by increasing aggregated throughput across users to strengthen overall system capacity in high-density 5G and 6G environments. The fourth prioritizes fairness-aware rate maximization by applying resource allocation methods that ensure equitable bandwidth distribution among users while sustaining high Quality of Service (QoS). Together, these optimization approaches provide a framework for advancing IRS-assisted MIMO systems and allow engineers and researchers to balance performance, energy efficiency, and user fairness according to specific application needs in next-generation wireless networks.  Results and Discussions  This review shows that IRS assisted communication systems provide important capabilities for next-generation wireless networks through four major advantages. First, IRS strengthens system performance by reconfiguring propagation environments and improving signal strength and coverage in non-line-of-sight conditions, including urban canyons, indoor environments, and remote regions, while also maintaining reliable connectivity in high-mobility cases such as vehicular communication. Second, the technology supports high energy efficiency because of its passive operation, which adds minimal power overhead yet improves spectral efficiency. This characteristic is valuable for sustainable large-scale IoT deployments and green 6G systems that may incorporate energy-harvesting designs. Third, IRS shows strong adaptability when integrated with different communication architectures, including SISO for basic signal enhancement, MISO for improved beamforming, and MIMO for spatial multiplexing, enabling use across environments ranging from ultra-dense urban networks to remote or airborne communication platforms. Finally, recent progress in beamforming and phase-shift optimization strengthens system performance through coherent signal combining, interference suppression in multi-user settings, and low-latency operation for time-critical applications. Machine learning methods such as deep reinforcement learning are also being investigated for real-time optimization. Together, these capabilities position IRS as a key technology for future 6G networks with the potential to support smart radio environments and broad-area connectivity, although further study is required to address challenges in channel estimation, scalability, and standardization.  Conclusions  This review highlights the potential of IRS technology in next-generation wireless communication systems. By enabling dynamic channel reconfiguration with minimal energy overhead, IRS strengthens the performance of SISO, MISO, and MIMO systems and supports reliable operation in complex propagation environments. The surveyed signal transmission models and optimization methods form a technical basis for continued development of IRS-assisted communication frameworks. As research and industry move toward 6G, IRS is expected to support ultra-reliable, low-latency, and energy-efficient global connectivity. Future studies should address practical deployment challenges such as hardware design, real-time signal processing, and progress toward standardization.
A Survey of Lightweight Techniques for Segment Anything Model
LUO Yichang, QI Xiyu, ZHANG Borui, SHI Hanru, ZHAO Yan, WANG Lei, LIU Shixiong
2026, 48(2): 713-731.   doi: 10.11999/JEIT250894
[Abstract](535) [FullText HTML](285) [PDF 3802KB](85)
Abstract:
  Objective  The Segment Anything Model (SAM) demonstrates strong zero-shot generalization in image segmentation and sets a new direction for visual foundation models. The original SAM, especially the ViT-Huge version with about 637 million parameters, requires high computational resources and substantial memory. This restricts deployment in resource-limited settings such as mobile devices, embedded systems, and real-time tasks. Growing demand for efficient and deployable vision models has encouraged research on lightweight variants of SAM. Existing reviews describe applications of SAM, yet a structured summary of lightweight strategies across model compression, architectural redesign, and knowledge distillation is still absent. This review addresses this need by providing a systematic analysis of current SAM lightweight research, classifying major techniques, assessing performance, and identifying challenges and future research directions for efficient visual foundation models.  Methods  This review examines recent studies on SAM lightweight methods published in leading conferences and journals. The techniques are grouped into three categories based on their technical focus. The first category, Model Compression and Acceleration, covers knowledge distillation, network pruning, and quantization. The second category, Efficient Architecture Design, replaces the ViT backbone with lightweight structures or adjusts attention mechanisms. The third category, Efficient Feature Extraction and Fusion, refines the interaction between the image encoder and prompt encoder. A comparative assessment is conducted for representative studies, considering model size, computational cost, inference speed, and segmentation accuracy on standard benchmarks (Table 3).  Results and Discussions  The reviewed models achieve clear gains in inference speed and parameter efficiency. MobileSAM reduces the model to 9.6 M parameters, and Lite-SAM reaches up to 16× acceleration while maintaining suitable segmentation accuracy. Approaches based on knowledge distillation and hybrid design support generalization across domains such as medical imaging, video segmentation, and embedded tasks. Although accuracy and speed still show a degree of tension, the selection of a lightweight strategy depends on the intended application. Challenges remain in prompt design, multi-scale feature fusion, and deployment on low-power hardware platforms.  Conclusions  This review provides an overview of the rapidly developing field of SAM lightweight research. The development of efficient SAM models is a multifaceted challenge that requires a combination of compression, architectural innovation, and optimization strategies. Current studies show that real-time performance on edge devices can be achieved with a small reduction in accuracy. Although progress is evident, challenges remain in handling complex scenarios, reducing the cost of distillation data, and establishing unified evaluation benchmarks. Future research is expected to emphasize more generalizable lightweight architectures, explore data-free or few-shot distillation approaches, and develop standardized evaluation protocols that consider both accuracy and efficiency.
Wireless Communication and Internet of Things
Ultra-Low-Power IM3 Backscatter Passive Sensing System for IoT Applications
HUANG Ruiyang, WU Pengde
2026, 48(2): 732-742.   doi: 10.11999/JEIT250787
[Abstract](154) [FullText HTML](86) [PDF 9020KB](19)
Abstract:
  Objective  With advances in wireless communication and electronic manufacturing, the Internet of Things (IoT) continues to expand across healthcare, agriculture, logistics, and other sectors. The rapid increase in IoT devices creates significant energy challenges, as billions of units generate substantial cumulative consumption, and battery-powered nodes require recurrent charging that raises operating costs and contributes to electronic waste. Energy-efficient strategies are therefore needed to support sustainable IoT deployment. Current approaches focus on improving energy availability and lowering device power demand. Energy Harvesting (EH) technology enables the collection and storage of solar, thermal, kinetic, and Radio Frequency (RF) energy for Ambient IoT (AmIoT) applications. However, conventional IoT devices, particularly those containing active RF components, often require high power, and limited EH efficiency can constrain real-time sensing transmission. To address these constraints, this work proposes an Intermodulation-Product-Third-Order (IM3) backscatter passive sensing system that enables direct analog sensing transmission while maintaining RF EH efficiency.  Methods  The IM3 signal is a nonlinear distortion product generated when two fundamental tones pass through nonlinear devices such as transistors and diodes, producing components at 2f1f2 and 2f2f1. The central contribution of this work is the establishment of a controllable functional relationship between sensor information and IM3 signal frequencies, enabling information encoding through IM3 frequency characteristics. The regulatory element is an embedded impedance module designed as a parallel resonant tank composed of resistors, inductors, and capacitors and integrated into the rectifier circuit. Adjusting the tank’s resonant frequency regulates the conversion efficiency from the fundamental tones to IM3 components: when the resonant frequency approaches a target IM3 frequency, a high-impedance load is produced, lowering the conversion efficiency of that specific IM3 component while leaving other IM3 components unchanged. Sensor information modulates the resonant frequency by generating a DC voltage applied to a voltage-controlled varactor. By mapping sensor information to impedance states, impedance states to IM3 conversion efficiency, and IM3 frequency features back to sensor information, passive sensing is achieved.  Results and Discussions  A rectifying transmitter operating in the UHF 900 MHz band is designed and fabricated (Fig. 8). One signal source is fixed at 910.5 MHz, and the other scans 917~920 MHz, generating IM3 components in the 923.5~929.5 MHz range. Both sources provide an output power of 0 dBm, and the transmitted sensor information is expressed as a DC voltage. Experimental measurements show a power trough in the backscattered IM3 spectrum; as the DC voltage varies from 0 to 5 V, the trough position shifts accordingly (Fig. 9), with more than 10 dB attenuation across the range, giving adequate resolution determined by the varactor diode’s capacitance ratio. The embedded impedance module shows minimal effect on RF-to-DC efficiency (Fig. 10): at a fixed DC voltage, efficiency decreases by approximately 5 basis points at the modulation frequency, independent of input power, and under fixed input power, different sampled voltages cause about 5 basis points of efficiency reduction at different frequencies. These results confirm that the rectifier circuit maintains stable efficiency and meets low-power data transmission requirements.  Conclusions  This paper proposes a passive sensing system based on backscattered IM3 signals that enables simultaneous efficient RF EH and sensing readout. The regulation mechanism between the difference-frequency embedded impedance module and backscattered IM3 intensity is demonstrated. Driven by sensing information, the module links the sensed quantity to IM3 intensity to realize passive readout. Experimental results show that the embedded impedance reduces the target-frequency IM3 component by more than 10 dB, and the RF-to-DC efficiency decreases by only 5 percentage points during readout. Tests in a microwave anechoic chamber indicate that the error between the IM3-derived bias voltage and the measured value remains within 5%, confirming stable operation. The system addresses the energy-information transmission constraint and supports battery-free communication for passive sensor nodes. It extends device lifespan and reduces maintenance costs in Ultra-Low-Power scenarios such as wireless sensor networks and implantable medical devices, offering strong engineering relevance.
Performance Optimization of UAV-RIS-assisted Communication Networks Under No-Fly Zone Constraints
XU Junjie, LI Bin, YANG Jingsong
2026, 48(2): 743-751.   doi: 10.11999/JEIT250681
[Abstract](247) [FullText HTML](123) [PDF 4107KB](32)
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RIS) mounted on Unmanned Aerial Vehicles (UAVs) are considered an effective approach to enhance wireless communication coverage and adaptability in complex or constrained environments. However, two major challenges remain in practical deployment. The existence of No-Fly Zones (NFZs), such as airports, government facilities, and high-rise areas, restricts the UAV flight trajectory and may result in communication blind spots. In addition, the continuous attitude variation of UAVs during flight causes dynamic misalignment between the RIS and the desired reflection direction, which reduces signal strength and system throughput. To address these challenges, a UAV-RIS-assisted communication framework is proposed that simultaneously considers NFZ avoidance and UAV attitude adjustment. In this framework, a quadrotor UAV equipped with a bottom-mounted RIS operates in an environment containing multiple polygonal NFZs and a group of Ground Users (GUs). The aim is to jointly optimize the UAV trajectory, RIS phase shift, UAV attitude (represented by Euler angles), and Base Station (BS) beamforming to maximize the system sum rate while ensuring complete obstacle avoidance and stable, high-quality service for GUs located both inside and outside NFZs.  Methods  To achieve this objective, a multi-variable coupled non-convex optimization problem is formulated, jointly capturing UAV trajectory, RIS configuration, UAV attitude, and BS beamforming under NFZ constraints. The RIS phase shifts are dynamically adjusted according to the UAV orientation to maintain beam alignment, and UAV motion follows quadrotor dynamics while avoiding polygonal NFZs. Because of the high dimensionality and non-convexity of the problem, conventional optimization approaches are computationally intensive and lack real-time adaptability. To address this issue, the problem is reformulated as a Markov Decision Process (MDP), which enables policy learning through deep reinforcement learning. The Soft Actor-Critic (SAC) algorithm is employed, leveraging entropy regularization to improve exploration efficiency and convergence stability. The UAV-RIS agent interacts iteratively with the environment, updating actor-critic networks to determine UAV position, RIS phase configuration, and BS beamforming. Through continuous learning, the proposed framework achieves higher throughput and reliable NFZ avoidance, outperforming existing benchmarks.  Results and Discussions  As shown in (Fig. 3), the proposed SAC algorithm achieves higher communication rates than PPO, DDPG, and TD3 during training, benefiting from entropy-regularized exploration that prevents premature convergence. Although DDPG converges faster, it exhibits instability and inferior long-term performance. As illustrated in (Fig. 4), the UAV trajectories under different conditions demonstrate the proposed algorithm’s capability to achieve complete obstacle avoidance while maintaining reliable communication. Regardless of variations in initial UAV positions, BS locations, or NFZ configurations, the UAV consistently avoids all NFZs and dynamically adjusts its trajectory to serve users located both inside and outside restricted zones, indicating strong adaptability and scalability of the proposed model. As shown in (Fig. 5), increasing the number of BS antennas enhances system performance. The proposed framework significantly outperforms fixed phase shift, random phase shift, and non-RIS schemes because of improved beamforming flexibility.  Conclusions  This paper investigates a UAV-RIS-assisted wireless communication system in which a quadrotor UAV carries an RIS to enhance signal reflection and ensure NFZ avoidance. Unlike conventional approaches that emphasize avoidance alone, a path integral-based method is proposed to generate obstacle-free trajectories while maintaining reliable service for GUs both inside and outside NFZs. To improve generality, NFZs are represented as prismatic obstacles with regular n-sided polygonal cross-sections. The system jointly optimizes UAV trajectory, RIS phase shifts, UAV attitude, and BS beamforming. A DRL framework based on the SAC algorithm is developed to enhance system efficiency. Simulation results demonstrate that the proposed approach achieves reliable NFZ avoidance and maximized sum rate, outperforms benchmarks in communication performance, scalability, and stability.
Minimax Robust Kalman Filtering under Multistep Random Measurement Delays and Packet Dropouts
YANG Chunshan, ZHAO Ying, LIU Zheng, QIU Yuan, JING Benqin
2026, 48(2): 752-761.   doi: 10.11999/JEIT250741
[Abstract](90) [FullText HTML](45) [PDF 3214KB](19)
Abstract:
  Objective  Networked Control Systems (NCSs) provide advantages such as flexible installation, convenient maintenance, and reduced cost, but they also present challenges arising from random measurement delays and packet dropouts caused by communication network unreliability and limited bandwidth. Moreover, system noise variance may fluctuate significantly under strong electromagnetic interference. In NCSs, time delays are random and uncertain. When a set of Bernoulli-distributed random variables is used to describe multistep random measurement delays and packet dropouts, the fictitious noise method in existing studies introduces autocorrelation among different components, which complicates the computation of fictitious noise variances and makes it difficult to establish robustness. This study presents a solution for minimax robust Kalman filtering in systems characterized by uncertain noise variance, multistep random measurement delays, and packet dropouts.  Methods  The main challenges lie in model transformation and robustness verification. When a set of Bernoulli-distributed random variables is employed to represent multistep random measurement delays and packet dropouts, a series of strategies are applied to address the minimax robust Kalman filtering problem. First, a new model transformation method is proposed based on the flexibility of the Hadamard product in multidimensional data processing, after which a robust time-varying Kalman estimator is designed in a unified framework following the minimax robust filtering principle. Second, the robustness proof is established using matrix elementary transformation, strictly diagonally dominant matrices, the Gerŝgorin circle theorem, and the Hadamard product theorem within the framework of the generalized Lyapunov equation method. Additionally, by converting the Hadamard product into a matrix product through matrix factorization, a sufficient condition for the existence of a steady-state estimator is derived, and the robust steady-state Kalman estimator is subsequently designed.  Results and Discussions  The proposed minimax robust Kalman filter extends the robust Kalman filtering framework and provides new theoretical support for addressing the robust fusion filtering problem in complex NCSs. The curves (Fig. 5) present the actual accuracy \begin{document}${\text{tr}}{{\mathbf{\bar P}}^l}(N)$\end{document}, \begin{document}$l = a,b,c,d$\end{document} as a function of \begin{document}$ 0.1 \le {\alpha _0} $\end{document}, \begin{document}${\alpha _1} $\end{document}, \begin{document}${\alpha _2} \le 1 $\end{document}. It is observed that situation (1) achieves the highest robust accuracy, followed by situations (2) and (3), whereas situation (4) exhibits poorer accuracy. This difference arises because the estimators in situation (1) receive measurements with one-step random delay, whereas situation (4) experiences a higher packet loss rate. The curves (Fig. 5) confirm the validity and effectiveness of the proposed method. Another simulation is conducted for a mass-spring-damper system. The comparison between the proposed approach and the optimal robust filtering method (Table 2, Fig. 7) indicates that although the proposed method ensures that the actual prediction error variance attains the minimum upper bound, its actual accuracy is slightly lower than the optimal prediction accuracy.  Conclusions  The minimax robust Kalman filtering problem is investigated for systems characterized by uncertain noise variance, multistep random measurement delays, and packet dropouts. The system noise variance is uncertain but bounded by known conservative upper limits, and a set of Bernoulli-distributed random variables with known probabilities is used to represent the multistep random measurement delays and packet dropouts between the sensor and the estimator. The Hadamard product is used to enhance the model transformation method, followed by the design of a minimax robust time-varying Kalman estimator. Robustness is demonstrated through matrix elementary transformation, the Gerschgorin circle theorem, the Hadamard product theorem, matrix factorization, and the Lyapunov equation method. A sufficient condition is established for the time-varying generalized Lyapunov equation to possess a unique steady-state positive semidefinite solution, based on which a robust steady-state estimator is constructed. The convergence between the time-varying and steady-state estimators is also proven. Two simulation examples verify the effectiveness of the proposed approach. The presented methods overcome the limitations of existing techniques and provide theoretical support for solving the robust fusion filtering problem in complex NCSs.
Full Field-of-View Optical Calibration with Microradian-Level Accuracy for Space Laser Communication Terminals on Low-Earth-Orbit Constellation Applications
XIE Qingkun, XU Changzhi, BIAN Jingying, ZHENG Xiaosong, ZHANG Bo
2026, 48(2): 762-771.   doi: 10.11999/JEIT250734
[Abstract](275) [FullText HTML](188) [PDF 2901KB](45)
Abstract:
  Objective  The Coarse Pointing Assembly (CPA) is a core element in laser communication systems and supports wide-field scanning, active orbit-attitude compensation, and dynamic disturbance isolation. To address multi-source disturbances such as orbital perturbations and attitude maneuvers, a high-precision, high-bandwidth, and fast-response Pointing, Acquisition, and Tracking (PAT) algorithm is required. Establishing a full Field-Of-View (FOV) optical calibration model between the CPA and the detector is essential for suppressing image degradation caused by spatial pointing deviations. Conventional calibration methods often rely on ray tracing to simulate beam offsets and infer calibration relationships, yet they show several limitations. These limitations include high modeling complexity caused by non-coaxial paths, multi-reflective surfaces, and freeform optics; susceptibility to systematic errors generated by assembly tolerances, detector non-uniformity, and thermal drift; and restricted applicability across the full FOV due to spatial anisotropy. A high-precision calibration method that remains effective across the entire FOV is therefore needed to overcome these challenges and ensure stable and reliable laser communication links.  Methods  To achieve precise CPA-detector calibration and address the limitations of traditional approaches, this paper presents a full FOV optical calibration method with microradian-level accuracy. Based on the optical design characteristics of periscope-type laser terminals, an equivalent optical transmission model of the CPA is established and the mechanism of image rotation is examined. Leveraging the structural rigidity of the optical transceiver channel, the optical transmission matrix is simplified to a constant matrix, yielding a full-space calibration model that directly links CPA micro-perturbations to spot displacements. By correlating the CPA rotation angles between the calibration target points and the actual operating positions, the calibration task is further reduced to estimating the calibration matrix at the target points. Random micro-perturbations are applied to the CPA to induce corresponding micro-displacements of the detector spot. A calibration equation based on CPA motion and spot displacement is formulated, and the calibration matrix is obtained through least-squares regression. The full-space calibration relationship between the CPA and detector is then derived through matrix operations.  Results and Discussions  Using the proposed calibration method, an experimental platform (Fig. 4) is constructed for calibration and verification with a periscope laser terminal. Accurate measurements of the conjugate motion relationship between the CPA and the CCD detector spot are obtained (Table. 1). To evaluate calibration accuracy and full-space applicability, systematic verification is conducted through single-step static pointing and continuous dynamic tracking. In the static pointing verification, the mechanical rotary table is moved to three extreme diagonal positions, and the CPA performs open-loop pointing based on the established CPA-detector calibration relationship. Experimental results show that the spot reaches the intended target position (Fig. 5), with a pointing accuracy below 12 mrad (RMS). In the dynamic tracking experiment, system control parameters are optimized to maintain stable tracking of the platform beam. During low-angular-velocity motion of the rotary table, the laser terminal sustains stable tracking (Fig. 6). The CPA trajectory shows a clear conjugate relationship with the rotary table motion (Fig. 6(a), Fig. 6(b)), and the tracking accuracy in both orthogonal directions is below 5 mrad (Fig. 6(c), Fig. 6(d)). The independence of the optical transmission matrix from the selection of calibration target points is also examined. By increasing the spatial accessibility of calibration points, the method reduces operational complexity while maintaining calibration precision. Improved spatial distribution of calibration points further enhances calibration efficiency and accuracy.  Conclusions  This paper presents a full FOV optical calibration method with microradian-level accuracy based on single-target micro-perturbation measurement. To satisfy engineering requirements for rapid linking and stable tracking, a full-space optical matrix model for CPA-detector calibration is constructed using matrix optics. Random micro-perturbations applied to the CPA at a single target point generate a generalized transfer equation, from which the calibration matrix is obtained through least-squares estimation. Experimental results show that the model mitigates image rotation, mirroring, and tracking anomalies, suppresses calibration residuals to below 12 mrad across the full FOV, and limits the dynamic tracking error to within 5 mrad per axis. The method eliminates the need for additional hardware and complex alignment procedures, providing a high-precision and low-complexity solution that supports rapid deployment in the mass production of Low-Earth-Orbit (LEO) laser terminals.
Radar, Sonar,Navigation and Array Signal Processing
Unsupervised Anomaly Detection of Hydro-Turbine Generator Acoustics by Integrating Pre-Trained Audio Large Model and Density Estimation
WU Ting, WEN Shulin, YAN Zhaoli, FU Gaoyuan, LI Linfeng, LIU Xudu, CHENG Xiaobin, YANG Jun
2026, 48(2): 772-783.   doi: 10.11999/JEIT250934
[Abstract](143) [FullText HTML](78) [PDF 16624KB](21)
Abstract:
  Objective  Hydro-Turbine Generator Units (HTGUs) require reliable early fault detection to maintain operational safety and reduce maintenance cost. Acoustic signals provide a non-intrusive and sensitive monitoring approach, but their use is limited by complex structural acoustics, strong background noise, and the scarcity of abnormal data. An unsupervised acoustic anomaly detection framework is presented, in which a large-scale pretrained audio model is integrated with density-based k-nearest neighbors estimation. This framework is designed to detect anomalies using only normal data and to maintain robustness and strong generalization across different operational conditions of HTGUs.  Methods  The framework performs unsupervised acoustic anomaly detection for HTGUs using only normal data. Time-domain signals are preprocessed with Z-score normalization and Fbank features, and random masking is applied to enhance robustness and generalization. A large-scale pretrained BEATs model is used as the feature encoder, and an Attentive Statistical Pooling module aggregates frame-level representations into discriminative segment-level embeddings by emphasizing informative frames. To improve class separability, an ArcFace loss replaces the conventional classification layer during training, and a warm-up learning rate strategy is adopted to ensure stable convergence. During inference, density-based k-nearest neighbors estimation is applied to the learned embeddings to detect acoustic anomalies.  Results and Discussions  The effectiveness of the proposed unsupervised acoustic anomaly detection framework for HTGUs is examined using data collected from eight real-world machines. As shown in Fig. 7 and Table 2, large-scale pretrained audio representations show superior capability compared with traditional features in distinguishing abnormal sounds. With the FED-KE algorithm, the framework attains high accuracy across six metrics, with Hmean reaching 98.7% in the wind tunnel and exceeding 99.9% in the slip-ring environment, indicating strong robustness under complex industrial conditions. As shown in Table 4, ablation studies confirm the complementary effects of feature enhancement, ASP-based representation refinement, and density-based k-NN inference. The framework requires only normal data for training, reducing dependence on scarce fault labels and enhancing practical applicability. Remaining challenges include computational cost introduced by the pretrained model and the absence of multimodal fusion, which will be addressed in future work.  Conclusions  An unsupervised acoustic anomaly detection framework is proposed for HTGUs, addressing the scarcity of fault samples and the complexity of industrial acoustic environments. A pretrained large-scale audio foundation model is adopted and fine-tuned with turbine-specific strategies to improve the modeling of normal operational acoustics. During inference, a density-estimation-based k-NN mechanism is applied to detect abnormal patterns using only normal data. Experiments conducted on real-world hydropower station recordings show high detection accuracy and strong generalization across different operating conditions, exceeding conventional supervised approaches. The framework introduces foundation-model-based audio representation learning into the hydro-turbine domain, provides an efficient adaptation strategy tailored to turbine acoustics, and integrates a robust density-based anomaly scoring mechanism. These components jointly reduce dependence on labeled anomalies and support practical deployment for intelligent condition monitoring. Future work will examine model compression, such as knowledge distillation, to enable on-device deployment, and explore semi-/self-supervised learning and multimodal fusion to enhance robustness, scalability, and cross-station adaptability.
A one-dimensional 5G millimeter-wave wide-angle Scanning Array Antenna Using AMC Structure
MA Zhangang, ZHANG Qing, FENG Sirun, ZHAO Luyu
2026, 48(2): 784-793.   doi: 10.11999/JEIT250719
[Abstract](279) [FullText HTML](150) [PDF 7684KB](29)
Abstract:
  Objective  With the rapid advancement of 5G millimeter-wave technology, antennas are required to achieve high gain, wide beam coverage, and compact size, particularly in environments characterized by strong propagation loss and blockage. Conventional millimeter-wave arrays often face difficulties in reconciling wide-angle scanning with high gain and broadband operation due to element coupling and narrow beamwidths. To overcome these challenges, this study proposes a one-dimensional linear array antenna incorporating an Artificial Magnetic Conductor (AMC) structure. The AMC’s in-phase reflection is exploited to improve bandwidth and gain while enabling wide-angle scanning of ±80° at 26 GHz. By adopting a 0.4-wavelength element spacing and stacked topology, the design provides an effective solution for 5G millimeter-wave terminals where spatial constraints and performance trade-offs are critical. The findings highlight the potential of AMC-based arrays to advance antenna technology for future high-speed, low-latency 5G applications by combining broadband operation, high directivity, and broad coverage within compact form factors.  Methods  This study develops a high-performance single-polarized one-dimensional linear millimeter-wave array antenna through a multi-layered structural design integrated with AMC technology. The design process begins with theoretical analysis of the pattern multiplication principle and array factor characteristics, which identify 0.4-wavelength element spacing as an optimal balance between wide-angle scanning and directivity. A stacked three-layer antenna unit is then constructed, consisting of square patch radiators on the top layer, a cross-shaped coupling feed structure in the middle layer, and an AMC-loaded substrate at the bottom. The AMC provides in-phase reflection in the 21~30 GHz band, enhancing bandwidth and suppressing surface wave coupling. Full-wave simulations (HFSS) are performed to optimize AMC dimensions, feed networks, and array layout, confirming bandwidth of 23.7~28 GHz, peak gain of 13.9 dBi, and scanning capability of ±80°. A prototype is fabricated using printed circuit board technology and evaluated with a vector network analyzer and anechoic chamber measurements. Experimental results agree closely with simulations, demonstrating an operational bandwidth of 23.3~27.7 GHz, isolation better than −15 dB, and scanning coverage up to ±80°. These results indicate that the synergistic interaction between AMC-modulated radiation fields and the array coupling mechanism enables a favorable balance among wide bandwidth, high gain, and wide-angle scanning.  Results and Discussions  The influence of array factor on directional performance is analyzed, and the maximum array factor is observed when the element spacing is between 0.4λ and 0.46λ (Fig. 2). The in-phase reflection of the AMC structure in the 21~30 GHz range significantly enhances antenna characteristics, broadening the bandwidth by 50% compared with designs without AMC and increasing the gain at 26 GHz by 1.5 dBi (Fig. 10, Fig. 13). The operational bandwidth of 23.3~27.7 GHz is confirmed by measurements (Fig. 16a). When the element spacing is optimized to 4.6 mm (0.4λ) and the coupling radiation mechanisms are adjusted, the H-plane half-power beamwidth (HPBW) of the array elements is extended to 180° (Fig. 8, Fig. 9), with a further gain improvement of 0.6 dBi at the scanning edges (Fig. 11b). The three-layer stacked structure—comprising the radiation, isolation, and AMC layers—achieves isolation better than –15 dB (Fig. 16a). Experimental validation demonstrates wide-angle scanning capability up to ±80°, showing close agreement between simulated and measured results (Fig. 11, Fig. 16b). The proposed antenna is therefore established as a compact, high-performance solution for 5G millimeter-wave terminals, offering wide bandwidth, high gain, and broad scanning coverage.  Conclusions  A one-dimensional linear wide-angle scanning array antenna based on an AMC structure is presented for 5G millimeter-wave applications. Through theoretical analysis, simulation optimization, and experimental validation, balanced improvement in broadband operation, high gain, and wide-angle scanning is achieved. Pattern multiplication theory and array factor analysis are applied to determine 0.4-wavelength element spacing as the optimal compromise between scanning angle and directivity. A stacked three-layer configuration is adopted, and the AMC’s in-phase reflection extends the bandwidth to 23.7~28.5 GHz, representing a 50% increase. Simulation and measurement confirm ±80° scanning at 26 GHz with a peak gain of 13.8 dBi, which is 1.3 dBi higher than that of non-AMC designs. The close consistency between experimental and simulated results verifies the feasibility of the design, providing a compact and high-performance solution for millimeter-wave antennas in mobile communication and vehicular systems. Future research is expected to explore dual-polarization integration and adaptation to complex environments.
Image and Intelligent Information Processing
Considering Workload Uncertainty in Strategy Gradient-based Hyper-heuristic Scheduling for Software Projects
SHEN Xiaoning, SHI Jiangyi, MA Yanzhao, CHEN Wenyan, SHE Juan
2026, 48(2): 794-805.   doi: 10.11999/JEIT250769
[Abstract](109) [FullText HTML](55) [PDF 3892KB](7)
Abstract:
  Objective  The Software Project Scheduling Problem (SPSP) is essential for allocating resources and arranging tasks in software development, and it affects economic efficiency and competitiveness. Deterministic assumptions used in traditional models overlook common fluctuations in task effort caused by requirement changes or estimation deviation. These assumptions often reduce feasibility and weaken scheduling stability in dynamic development settings. This study develops a multi-objective model that integrates task effort uncertainty and represents it using asymmetric triangular interval type-2 fuzzy numbers to reflect real development conditions. The aim is to improve decision quality under uncertainty by designing an optimization method that shortens project duration and increases employee satisfaction, thereby strengthening robustness and adaptability in software project scheduling.  Methods  A Policy Gradient-based Hyper-Heuristic Algorithm (PGHHA) is developed to solve the formulated model. The framework contains a High-Level Strategy (HLS) and a set of Low-Level Heuristics (LLHs). The High-Level Strategy applies an Actor-Critic reinforcement learning structure. The Actor network selects appropriate LLHs based on real-time evolutionary indicators, including population convergence and diversity, and the Critic network evaluates the actions selected by the Actor. Eight LLHs are constructed by combining two global search operators, the matrix crossover operator and the Jaya operator with random jitter, with two local mining strategies, duration-based search and satisfaction-based search. Each LLH is configured with two neighborhood depths (V1=5 and V2=20), determined through Taguchi orthogonal experiments. Each candidate solution is encoded as a real-valued task-employee effort matrix. Constraints including skill coverage, maximum dedication, and maximum participant limits are applied during optimization. A prioritized experience replay mechanism is introduced to reuse historical trajectories, which accelerates convergence and improves network updating efficiency.  Results and Discussions  Experimental evaluation is performed on twelve synthetic cases and three real software projects. The algorithm is assessed against six representative methods to validate the proposed strategies. HyperVolume Ratio (HVR) and Inverted Generational Distance (IGD) are used as performance indicators, and statistical significance is examined using Wilcoxon rank-sum tests with a 0.05 threshold. The findings show that the PGHHA achieves better convergence and diversity than all comparison methods in most cases. The quantitative improvements are reflected in the summarized values (Table 5, Table 6). The visual distribution of Pareto fronts (Fig. 4, Fig. 5) shows that the obtained solutions lie below those of alternative algorithms and display more uniform coverage, indicating higher convergence precision and improved spread. The computational cost increases because of neural network training and the experience replay mechanism, as shown in Fig. 6. However, the improvement in solution quality is acceptable considering the longer planning period of software development. Modeling effort uncertainty with asymmetric triangular interval type-2 fuzzy numbers enhances system stability. The adaptive heuristic selection driven by the Actor-Critic mechanism and the prioritized experience replay strengthens performance under dynamic and uncertain conditions. Collectively, the evidence indicates that the PGHHA provides more reliable support for software project scheduling, maintaining diversity while optimizing conflicting objectives under uncertain workload environments.  Conclusions  A multi-objective software project scheduling model is developed in this study, where task effort uncertainty is represented using asymmetric triangular interval type-2 fuzzy numbers. A PGHHA is designed to solve the model. The algorithm applies an Actor-Critic reinforcement learning structure as the high-level strategy to adaptively select LLHs according to the evolutionary state. A prioritized experience replay mechanism is incorporated to enhance learning efficiency and accelerate convergence. Tests on synthetic and real cases show that: (1) The proposed algorithm delivers stronger convergence and diversity under uncertainty than six representative algorithms; (2) The combination of global search operators and local mining strategies maintains a suitable balance between exploration and exploitation. (3) The use of type-2 fuzzy representation offers a more stable characterization of effort uncertainty than type-1 fuzzy numbers. The current work focuses on a single-project context. Future work will extend the model to multi-project environments with shared resources and inter-project dependencies. Additional research will examine adaptive reward strategies and lightweight network designs to reduce computational demand while preserving solution quality.
Speaker Verification Based on Tide-Ripple Convolution Neural Network
CHEN Chen, YI Zhixin, LI Dongyuan, CHEN Deyun
2026, 48(2): 806-817.   doi: 10.11999/JEIT250713
[Abstract](158) [FullText HTML](69) [PDF 6644KB](18)
Abstract:
  Objective  State-of-the-art speaker verification models typically rely on fixed receptive fields, which limits their ability to represent multi-scale acoustic patterns while increasing parameter counts and computational loads. Speech contains layered temporal–spectral structures, yet the use of dynamic receptive fields to characterize these structures is still not well explored. The design principles for effective dynamic receptive field mechanisms also remain unclear.  Methods  Inspired by the non-linear coupling behavior of tidal surges, a Tide-Ripple Convolution (TR-Conv) layer is proposed to form a more effective receptive field. TR-Conv constructs primary and auxiliary receptive fields within a window by applying power-of-two interpolation. It then employs a scan-pooling mechanism to capture salient information outside the window and an operator mechanism to perceive fine-grained variations within it. The fusion of these components produces a variable receptive field that is multi-scale and dynamic. A Tide-Ripple Convolutional Neural Network (TR-CNN) is developed to validate this design. To mitigate label noise in training datasets, a total loss function is introduced by combining a NoneTarget with Dynamic Normalization (NTDN) loss and a weighted Sub-center AAM Loss variant, improving model robustness and performance.  Results and Discussions  The TR-CNN is evaluated on the VoxCeleb1-O/E/H benchmarks. The results show that TR-CNN achieves a competitive balance of accuracy, computation, and parameter efficiency (Table 1). Compared with the strong ECAPA-TDNN baseline, the TR-CNN (C=512, n=1) model attains relative EER reductions of 4.95%, 4.03%, and 6.03%, and MinDCF reductions of 31.55%, 17.14%, and 17.42% across the three test sets, while requiring 32.7% fewer parameters and 23.5% less computation (Table 2). The optimal TR-CNN (C=1 024, n=1) model further improves performance, achieving EERs of 0.85%, 1.10%, and 2.05%. Robustness is strengthened by the proposed total loss function, which yields consistent improvements in EER and MinDCF during fine-tuning (Table 3). Additional evaluations, including ablation studies (Tables 5 and 6), component analyses (Fig. 3 and Table 4), and t-SNE visualizations (Fig. 4), confirm the effectiveness and robustness of each module in the TR-CNN architecture.  Conclusions  This research proposes a simple and effective TR-Conv layer built on the T-RRF mechanism. Experimental results show that TR-Conv forms a more expressive and effective receptive field, reducing parameter count and computational cost while exceeding conventional one-dimensional convolution in speech feature modeling. It also exhibits strong lightweight characteristics and scalability. Furthermore, a total loss function combining the NTDN loss and a Sub-center AAM loss variant is proposed to enhance the discriminability and robustness of speaker embeddings, particularly under label noise. TR-Conv shows potential as a general-purpose module for integration into deeper and more complex network architectures.
Defeating Voice Conversion Forgery by Active Defense with Diffusion Reconstruction
TIAN Haoyuan, CHEN Yuxuan, CHEN Beijing, FU Zhangjie
2026, 48(2): 818-828.   doi: 10.11999/JEIT250709
[Abstract](304) [FullText HTML](166) [PDF 3337KB](44)
Abstract:
  Objective  Voice deep generation technology is able to produce speech that is perceptually realistic. Although it enriches entertainment and everyday applications, it is also exploited for voice forgery, creating risks to personal privacy and social security. Existing active defense techniques serve as a major line of protection against such forgery, yet their performance remains limited in balancing defensive strength with the imperceptibility of defensive speech examples, and in maintaining robustness.  Methods  An active defense method against voice conversion forgery is proposed on the basis of diffusion reconstruction. The diffusion vocoder PriorGrad is used as the generator, and the gradual denoising process is guided by the diffusion prior of the target speech so that the protected speech is reconstructed and defensive speech examples are obtained directly. A multi-scale auditory perceptual loss is further introduced to suppress perturbation amplitudes in frequency bands sensitive to the human auditory system, which improves the imperceptibility of the defensive examples.  Results and Discussions  Defense experiments conducted on four leading voice conversion models show that the proposed method maintains the imperceptibility of defensive speech examples and, when speaker verification accuracy is used as the evaluation metric, improves defense ability by about 32% on average in white-box scenarios and about 16% in black-box scenarios compared with the second-best method, achieving a stronger balance between defense ability and imperceptibility (Table 2). In robustness experiments, the proposed method yields an average improvement of about 29% in white-box scenarios and about 18% in black-box scenarios under three compression attacks (Table 3), and an average improvement of about 35% in the white-box scenario and about 17% in the black-box scenario under Gaussian filtering attack (Table 4). Ablation experiments further show that the use of multi-scale auditory perceptual loss improves defense ability by 5% to 10% compared with the use of single-scale auditory perceptual loss (Table 5).  Conclusions  An active defense method against voice conversion forgery based on diffusion reconstruction is proposed. Defensive speech examples are reconstructed directly through a diffusion vocoder so that the generated audio better approximates the distribution of the original target speech, and a multi-scale auditory perceptual loss is integrated to improve the imperceptibility of the defensive speech. Experimental results show that the proposed method achieves stronger defense performance than existing approaches in both white-box and black-box scenarios and remains robust under compression coding and smoothing filtering. Although the method demonstrates clear advantages in defense performance and robustness, its computational efficiency requires further improvement. Future work is directed toward diffusion generators that operate with a single time step or fewer time steps to enhance computational efficiency while maintaining defense performance.
MCL-PhishNet: A Multi-Modal Contrastive Learning Network for Phishing URL Detection
DONG Qingwei, FU Xueting, ZHANG Benkui
2026, 48(2): 829-841.   doi: 10.11999/JEIT250758
[Abstract](190) [FullText HTML](130) [PDF 2653KB](20)
Abstract:
  Objective  The growing complexity and rapid evolution of phishing attacks present challenges to traditional detection methods, including feature redundancy, multi-modal mismatch, and limited robustness to adversarial samples.  Methods  MCL-PhishNet is proposed as a Multi-Modal Contrastive Learning framework that achieves precise phishing URL detection through a hierarchical syntactic encoder, bidirectional cross-modal attention mechanisms, and curriculum contrastive learning strategies. In this framework, multi-scale residual convolutions and Transformers jointly model local grammatical patterns and global dependency relationships of URLs, whereas a 17-dimensional statistical feature set improves robustness to adversarial samples. The dynamic contrastive learning mechanism optimizes the feature-space distribution through online spectral-clustering-based semantic subspace partitioning and boundary-margin constraints.  Results and Discussions  This study demonstrates consistent performance across different datasets (EBUU17 accuracy 99.41%, PhishStorm 99.41%, Kaggle 99.30%), validating the generalization capability of MCL-PhishNet. The three datasets differ significantly in sample distribution, attack types, and feature dimensions, yet the method in this study maintains stable high performance, indicating that the multimodal contrastive learning framework has strong cross-scenario adaptability. Compared to methods optimized for specific datasets, this approach avoids overfitting to particular dataset distributions through end-to-end learning and an adaptive feature fusion mechanism.  Conclusions  This paper addresses the core challenges in phishing URL detection, such as the difficulty of dynamic syntax pattern modeling, multimodal feature mismatches, and insufficient adversarial robustness, and proposes a multimodal contrastive learning framework, MCL-PhishNet. Through a collaborative mechanism of hierarchical syntax encoding, dynamic semantic distillation, and curriculum optimization, it achieves 99.41% accuracy and a 99.65% F1 score on datasets like EBUU17, PhishStorm and so on, improving existing state-of-the-art methods by 0.27%~3.76%. Experiments show that this approach effectively captures local variation patterns in URLs (such as numeric substitution attacks in ‘payp41-log1n.com’) through a residual convolution-Transformer collaborative architecture and reduces the false detection rate of path-sensitive parameters to 0.07% via a bidirectional cross-modal attention mechanism. However, the proposed framework has relatively high complexity. Although the hierarchical encoding module of MCL-PhishNet (including multi-scale CNNs, Transformers, and gated networks) improves detection accuracy, it also increases the number of model parameters. Moreover, the current model is trained primarily on English-based public datasets, resulting in significantly reduced detection accuracy for non-Latin characters (such as Cyrillic domain confusions) and regional phishing strategies (such as ‘fake’ URLs targeting local payment platforms).
Complete Coverage Path Planning Algorithm Based on Rulkov-like Chaotic Mapping
LIU Sicong, HE Ming, LI Chunbiao, HAN Wei, LIU Chengzhuo, XIA Hengyu
2026, 48(2): 842-854.   doi: 10.11999/JEIT250887
[Abstract](176) [FullText HTML](92) [PDF 20666KB](30)
Abstract:
  Objective  This study proposes a Complete Coverage Path Planning (CCPP) algorithm based on a sine-constrained Rulkov-Like Hyper-Chaotic (SRHC) mapping. The work addresses key challenges in robotic path planning and focuses on improving coverage efficiency, path unpredictability, and obstacle adaptability for mobile robots in complex environments, including disaster rescue, firefighting, and unknown-terrain exploration. Traditional methods often exhibit predictable movement patterns, fall into local optima, and show inefficient backtracking, which motivates the development of an approach that uses chaotic dynamics to strengthen exploration capability.  Methods  The SRHC-CCPP algorithm integrates three components: (1) SRHC Mapping A hyper-chaotic system with nonlinear coupling (Eq. 1) generates highly unpredictable trajectories. Lyapunov exponent analysis (Fig. 3), phase-space diagrams (Fig. 1), and parameter-sensitivity studies (Table 1) confirm chaotic behavior under conditions such as a=0.01 and b=1.3. (2) Memory-Driven Exploration—A dynamic visitation grid prioritizes uncovered regions and reduces redundancy (Algorithm 1). (3) Collision detection combined with normal-vector reflection reduces oscillations in cluttered environments (Fig. 4). Simulations employ a Mecanum-wheel robot model (Eq. 2) to provide omnidirectional mobility.  Results and Discussions  (1) Efficiency: SRHC-CCPP achieved faster coverage and improved uniformity in both obstacle-free and obstructed scenarios (Figs. 810). The chaotic driver increased path diversity by 37% compared with rule-based methods. (2) Robustness: The algorithm demonstrated initial-value sensitivity and adaptability to environmental noise (Fig. 5). (3) Scalability Its low computational overhead supported deployment in large-scale grids (>104 cells).  Conclusions  The SRHC-CCPP algorithm advances robotic path planning by: (1) Merging hyper-chaotic unpredictability with memory-guided efficiency, which reduces repetitive loops. (2) Offering real-time obstacle negotiation through adaptive reflection mechanics. (3) Providing a versatile framework suited to applications that require high coverage reliability and dynamic responsiveness. Future work may examine multi-agent extensions and three-dimensional environments.
Circuit and System Design
High Area-efficiency Radix-4 Number Theoretic Transform Hardware Architecture with Conflict-free Memory Access Optimization for Lattice-based Cryptography
ZHENG Jiwen, ZHAO Shilei, ZHANG Ziyue, LIU Zhiwei, YU Bin, HUANG Hai
2026, 48(2): 855-865.   doi: 10.11999/JEIT250687
[Abstract](189) [FullText HTML](107) [PDF 8239KB](37)
Abstract:
  Objective  The advancement of Post-Quantum Cryptography (PQC) standardization increases the demand for efficient Number Theoretic Transform (NTT) hardware modules. Existing high-radix NTT studies primarily optimize in-place computation and configurability, yet the performance is constrained by complex memory access behavior and a lack of designs tailored to the parameter characteristics of lattice-based schemes. To address these limitations, a high area-efficiency radix-4 NTT design with a Constant-Geometry (CG) structure is proposed. The modular multiplication unit is optimized through an analysis of common modulus properties and the integration of multi-level operations, while memory allocation and address-generation strategies are refined to reduce capacity requirements and improve data-access efficiency. The design supports out-of-place storage and achieves conflict-free memory access, providing an effective hardware solution for radix-4 CG NTT implementation.  Methods  At the algorithmic level, the proposed radix-4 CG NTT/INTT employs a low-complexity design and removes the bit-reversal step to reduce multiplication count and computation cycles, with a redesigned twiddle-factor access scheme. For the modular multiplication step, which is the most time-consuming stage in the radix-4 butterfly, the critical path is shortened by integrating the multiplication with the first-stage K−RED reduction and simplifying the correction logic. To support three parameter configurations, a scalable modular-multiplication method is developed through an analysis of the shared properties of the moduli. At the architectural level, two coefficients are concatenated and stored at the same memory address. A data-decomposition and reorganization scheme is designed to coordinate memory interaction with the dual-butterfly units efficiently. To achieve conflict-free memory access, a cyclic memory-reuse strategy is employed, and read and write address-generation schemes using sequential and stepped access patterns are designed, which reduces required memory capacity and lowers control-logic complexity.  Results and Discussions  Experimental results on Field Programmable Gate Arrays demonstrate that the proposed NTT architecture achieves high operating frequency and low resource consumption under three parameter configurations, together with notable improvement in the Area-Time Product (ATP) compared with existing designs (Table 1). For the configuration with 256 terms and a modulus of 7 681, the design uses 2 397 slices, 4 BRAMs, and 16 DSPs, achieves an operating frequency of 363 MHz, and yields at least a 56.4% improvement in ATP. For the configuration with 256 terms and a modulus of 8 380 417, it uses 3 760 slices, 6 BRAMs, and 16 DSPs, achieves an operating frequency of 338 MHz, and yields at least a 69.8% improvement in ATP. For the configuration with 1 024 terms and a modulus of 12 289, it uses 2 379 slices, 4 BRAMs, and 16 DSPs, achieves an operating frequency of 357 MHz, and yields at least a 50.3% improvement in ATP.  Conclusions  A high area-efficiency radix-4 NTT hardware architecture for lattice-based PQC is proposed. The use of a low-complexity radix-4 CG NTT/INTT and the removal of the bit-reversal step reduce latency. Through an analysis of shared characteristics among three moduli and the merging of partial computations, a scalable modular-multiplication architecture based on K²−RED reduction is designed. The challenges of increased storage requirements and complex address-generation logic are addressed by reusing memory efficiently and designing sequential and stepped address-generation schemes. Experimental results show that the proposed design increases operating frequency and reduces resource consumption, yielding lower ATP under all three parameter configurations. As the present work focuses on a dual-butterfly architecture, future research may examine higher-parallelism designs to meet broader performance requirements.
The Storage and Calculation of Biological-like Neural Networks for Locally Active Memristor Circuits
LI Fupeng, WANG Guangyi, LIU Jingbiao, YING Jiajie
2026, 48(2): 866-872.   doi: 10.11999/JEIT250631
[Abstract](132) [FullText HTML](55) [PDF 5306KB](14)
Abstract:
  Objective  At present, binary computing systems have encountered bottlenecks in terms of power consumption, operation speed, and storage capacity. In contrast, the biological nervous system seems to have unlimited capacity. The biological nervous system has significant advantages in low-power computing and dynamic storage capability, which is closely related to the working mechanism of neurons transmitting neural signals through the directional secretion of neurotransmitters. After analyzing the Hodgkin-Huxley model of the squid giant axon, Professor Leon Chua proposed that synapses could be composed of locally passive memristors, and neurons could be made up of locally active memristors. The two types of memristors share similar electrical characteristics with nerve fibers. Since the first experimental claim of memristors was claimed to be found, locally active memristive devices have been identified in the research of devices with layered structures. The circuits constructed from those devices exhibit different types of neuromorphic-dynamics under different excitations. However, a single two-terminal device capable of achieving multi-state storage has not yet been reported. Locally active memristors have advantages in generating biologically -inspired neural signals. Various forms of locally active memristor models can produce neural morphological signals based on spike pulses. The generation of neural signals involves the amplification and computation of stimulus signals, and its working mechanism can be realized using capacitance-controlled memristor oscillators. When a memristor operates in the locally active domian, the output voltage of its third-order circuit undergoes a period-doubling bifurcation as the capacitance in the circuit changes regularly, forming a multi-state mapping between capacitance values and oscillating voltages. In this paper, the locally active memristor-based third-order circuit is used as a unit to generate neuromorphic signals, thereby forming a biologically-inspired neural operation unit, and an operation network can be formed based on the operation unit  Methods  The mathematical model of the Chua Corsage Memristor proposed by Leon Chua was selected for analysis. The characteristics of the partial locally active domain were examined, and an appropriate operating point and external components were chosen to establish a third-order memristor chaotic circuit. Circuit simulation and analysis were then conducted on this circuit. When the memristor operates in the locally active domain, the oscillator formed by its third-order circuit can simultaneously perform the functions of signal amplification, computation, and storage. In this way, the third-order circuit can perform as the nerve cell, and the variable capacitors as cynapses. This functionality Enables the electrical signal and the dielectric capacitor to work in succession, allowing the third-order oscillation circuit of the memristor to function like a neuron, with alternating electrical fields and neurotransmitters forming a brain-like computing and storage system. The secretion of biological neurotransmitters has a threshold characteristic, and the membrane threshold voltage controls the secretion of neurotransmitters to the postsynaptic membrane, thereby forming the transmission of neural signals. The step peak value of the oscillation circuit can serve as the trigger voltage for the transfer of the capacity electrolyte.  Results and Discussions  This study utilizes the third-order circuit of a locally active memristor to generate stable voltage oscillations exhibiting period-doubling bifurcation voltage signal oscillations as the external capacitance changes. The variation of capacitance in the circuit causes different forms of electrical signals lead to be serially output at the terminals of the memristor, and the voltage amplitude of these signals changes stably in stable periodic manner. This results in a stable multi-state mapping relationship between the changed capacitance values and the output voltage signal, thereby forming a storage and computing unit, and subsequently, a storage and computing network. Currently a structure that enables the dielectric to transfer and change the capacitance value to the next stage under the control of the modulated voltage threshold needs to be realized. It is similar to the function of neurotransmitter secretion. The feasibility of using the third-order oscillation circuit of the memristor as a storage and computing unit is expounded, and a storage and computing structure based on the change of capacitance value is obtained.  Conclusions  When the Chua Corsage Memristor operates in its locally active domain, its third-order circuit powered solely by a voltage-stabilized source generates stable period-doubling bifurcation oscillations as the external capacitance changes. The serially output oscillating signals exhibit stable voltage amplitudes/and periods and has threshold characteristics. The change of the capacitance in the circuit causes different forms of electrical signals to be serially output at the terminals of the memristor, and the voltage amplitude of these signals changes stably in a periodic manner. This results in a stable multi-state mapping relationship between the changed capacitance values and the output voltage signal, thereby forming a storage and computing unit, and subsequently, a storage and computing network. Currently, a structure is need to realize the transfer of the dielectric to the subordinatenext stage under the control of the modulated voltage threshold, similar to the function of neurotransmitter secretion. The feasibility of using the third-order oscillation circuit of the memristor as a storage and computing unit is obtained, and a storage and computing structure based on the variation of capacitance value is described.
Dataset Papers
BIRD1445: Large-scale Multimodal Bird Dataset for Ecological Monitoring
WANG Hongchang, XIAN Fengyu, XIE Zihui, DONG Miaomiao, JIAN Haifang
2026, 48(2): 873-888.   doi: 10.11999/JEIT250647
[Abstract](717) [FullText HTML](398) [PDF 11356KB](94)
Abstract:
  Objective  With the rapid advancement of Artificial Intelligence (AI) and growing demands in ecological monitoring, high-quality multimodal datasets have become essential for training and deploying AI models in specialized domains. Existing bird datasets, however, face notable limitations, including challenges in field data acquisition, high costs of expert annotation, limited representation of rare species, and reliance on single-modal data. To overcome these constraints, this study proposes an efficient framework for constructing large-scale multimodal datasets tailored to ecological monitoring. By integrating heterogeneous data sources, employing intelligent semi-automatic annotation pipelines, and adopting multi-model collaborative validation based on heterogeneous attention fusion, the proposed approach markedly reduces the cost of expert annotation while maintaining high data quality and extensive modality coverage. This work offers a scalable and intelligent strategy for dataset development in professional settings and provides a robust data foundation for advancing AI applications in ecological conservation and biodiversity monitoring.  Methods  The proposed multimodal dataset construction framework integrates multi-source heterogeneous data acquisition, intelligent semi-automatic annotation, and multi-model collaborative verification to enable efficient large-scale dataset development. The data acquisition system comprises distributed sensing networks deployed across natural reserves, incorporating high-definition intelligent cameras, custom-built acoustic monitoring devices, and infrared imaging systems, supplemented by standardizedpublic data to enhance species coverage and modality diversity. The intelligent annotation pipeline is built upon four core automated tools: (1) spatial localization annotation leverages object detection algorithms to generate bounding boxes; (2) fine-grained classification employs Vision Transformer models for hierarchical species identification; (3) pixel-level segmentation combines detection outputs with SegGPT models to produce instance-level masks; and (4) multimodal semantic annotation uses Qwen large language models to generate structured textual descriptions. To ensure annotation quality and minimize manual verification costs, a multi-scale attention fusion verification mechanism is introduced. This mechanism integrates seven heterogeneous deep learning models, each with different feature perception capacities across local detail, mid-level semantic, and global contextual scales. A global weighted voting module dynamically assigns fusion weights based on model performance, while a prior knowledge-guided fine-grained decision module applies category-specific accuracy metrics and Top-K model selection to enhance verification precision and computational efficiency.  Results and Discussions  The proposed multi-scale attention fusion verification method dynamically assesses data quality based on heterogeneous model predictions, forming the basis for automated annotation validation. Through optimized weight allocation and category-specific verification strategies, the collaborative verification framework evaluates the effect of different model combinations on annotation accuracy. Experimental results demonstrate that the optimal verification strategy—achieved by integrating seven specialized models—outperforms all baseline configurations across evaluation metrics. Specifically, the method attains a Top-1 accuracy of 95.39% on the CUB-200-2011 dataset, exceeding the best-performing single-model baseline, which achieves 91.79%, thereby yielding a 3.60% improvement in recognition precision. The constructed BIRD1445 dataset, comprising 3.54 million samples spanning 1 445 bird species and four modalities, outperforms existing datasets in terms of coverage, quality, and annotation accuracy. It serves as a robust benchmark for fine-grained classification, density estimation, and multimodal learning tasks in ecological monitoring.  Conclusions  This study addresses the challenge of constructing large-scale multimodal datasets for ecological monitoring by integrating multi-source data acquisition, intelligent semi-automatic annotation, and multi-model collaborative verification. The proposed approach advances beyond traditional manual annotation workflows by incorporating automated labeling pipelines and heterogeneous attention fusion mechanisms as the core quality control strategy. Comprehensive evaluations on benchmark datasets and real-world scenarios demonstrate the effectiveness of the method: (1) the verification strategy improves annotation accuracy by 3.60% compared to single-model baselines on the CUB-200-2011 dataset; (2) optimal trade-offs between precision and computational efficiency are achieved using Top-K = 3 model selection, based on performance-complexity alignment; and (3) in large-scale annotation scenarios, the system ensures high reliability across 1 445 species categories. Despite its effectiveness, the current approach primarily targets species with sufficient data. Future work should address the representation of rare and endangered species by incorporating advanced data augmentation and few-shot learning techniques to mitigate the limitations posed by long-tail distributions.
News
more >
Conference
more >
Author Center

Wechat Community