Advanced Search

Current Issue

2026 Vol. 48, No. 3

2026, 48(3)
Abstract:
2026, 48(3): 1-4.
Abstract:
Excellence Action Plan Leading Column
Key Technologies for Low-Altitude Intelligent Networks: Architecture, Security, and Optimization
WANG Yuntao, SU Zhou, GAO Yuan, BA Jianle
2026, 48(3): 889-913. doi: 10.11999/JEIT250947
Abstract:
Low-Altitude Intelligent Networks (LAINs) function as a core infrastructure for the emerging low-altitude digital economy by connecting humans, machines, and physical objects through the integration of manned and unmanned aircraft with ground networks and facilities. This paper provides a comprehensive review of recent research on LAINs from four perspectives: network architecture, resource optimization, security threats and protection, and large model-enabled applications. First, existing standards, general architecture, key characteristics, and networking modes of LAINs are investigated. Second, critical issues related to airspace resource management, spectrum allocation, computing resource scheduling, and energy optimization are discussed. Third, existing/emerging security threats across sensing, network, application, and system layers are assessed, and multi-layer defense strategies in LAINs are reviewed. Furthermore, the integration of large model technologies with LAINs is also analyzed, highlighting their potential in task optimization and security enhancement. Future research directions are discussed to provide theoretical foundations and technical guidance for the development of efficient, secure, and intelligent LAINs.  Significance   LAINs support the low-altitude economy by enabling the integration of manned and unmanned aircraft with ground communication, computing, and control networks. By providing real-time connectivity and collaborative intelligence across heterogeneous platforms, LAINs support applications such as precision agriculture, public safety, low-altitude logistics, and emergency response. However, LAINs continue to face challenges created by dynamic airspace conditions, heterogeneous platforms, and strict real-time operational requirements. The development of large models also presents opportunities for intelligent resource coordination, proactive defense, and adaptive network management, which signals a shift in the design and operation of low-altitude networks.  Progress  Recent studies on LAINs have reported progress in network architecture, resource optimization, security protection, and large model integration. Architecturally, hierarchical and modular designs are proposed to integrate sensing, communication, and computing resources across air, ground, and satellite networks, which enables scalable and interoperable operations. In system optimization research, attention is given to airspace resource management, spectrum allocation, computing offloading, and energy-efficient scheduling through distributed optimization and AI-driven orchestration methods. In security research, multi-layer defense frameworks are developed to address sensing-layer spoofing, network-layer intrusions, and application-layer attacks through cross-layer threat intelligence and proactive defense mechanisms. Large Language Models (LLMs), Vision-Language Models (VLMs), and Multimodal LLMs (MLLMs) also support intelligent task planning, anomaly detection, and autonomous decision-making in complex low-altitude environments, which enhances the resilience and operational efficiency of LAINs.  Conclusions  This survey provides a comprehensive review of the architecture, security mechanisms, optimization techniques, and large model applications in LAINs. The challenges in multi-dimensional resource coordination, cross-layer security protection, and real-time system adaptation are identified, and existing or potential approaches to address these challenges are analyzed. By synthesizing recent research on architectural design, system optimization, and security defense, this work offers a unified perspective for researchers and practitioners aiming to build secure, efficient, and scalable LAIN systems. The findings emphasize the need for integrated solutions that combine algorithmic intelligence, system engineering, and architectural innovation to meet future low-altitude network demands.  Prospects  Future research on LAINs is expected to advance the integration of architecture design, intelligent optimization, security defense, and privacy preservation technologies to meet the demands of rapidly evolving low-altitude ecosystems. Key directions include developing knowledge-driven architectures for cross-domain semantic fusion, service-oriented network slicing, and distributed autonomous decision-making. Furthermore, research should also focus on proactive cross-layer security mechanisms supported by large models and intelligent agents, efficient model deployment through AI-hardware co-design and hierarchical computing architectures, and improved multimodal perception and adaptive decision-making to strengthen system resilience and scalability. In addition, establishing standardized benchmarks, open-source frameworks, and realistic testbeds are essential to accelerate innovation and ensure secure, reliable, and intelligent deployment of LAIN systems in real-world environments.
Special Topic on Smart Healthcare and Engineering Innovation
A Review of Joint EEG-fMRI Methods for Visual Evoked Response Studies
WEI Zhiwei, XIAO Xiaolin, XU Minpeng, MING Dong
2026, 48(3): 914-924. doi: 10.11999/JEIT250781
Abstract:
  Significance   The study of Visual Evoked Responses (VERs) using non-invasive neuroimaging is central to understanding human visual information processing. Electroencephalography (EEG) provides millisecond temporal resolution but has limited spatial precision. Functional Magnetic Resonance Imaging (fMRI) offers millimeter spatial resolution based on the blood-oxygen-level-dependent signal, although its temporal resolution is constrained by delayed hemodynamic responses. This trade-off limits the ability of any single modality to characterize complex visual processes such as attentional modulation, motion perception, and multisensory integration. Joint EEG-fMRI acquisition has therefore become an effective multimodal approach. By recording both modalities synchronously, this technique combines their complementary strengths and yields a unified spatiotemporal representation of visual neural dynamics. Despite increasing use, the literature lacks a focused review that summarizes core methods, representative applications, and continuing challenges in joint EEG-fMRI research on VERs. This review addresses this need by providing a structured overview for researchers working on visual system investigation.  Progress   The review first introduces the foundational technologies that support joint EEG-fMRI studies, beginning with synchronous data acquisition using MR-compatible EEG systems and dedicated synchronization hardware. The core data fusion methods are grouped into asymmetric and symmetric approaches. Asymmetric strategies use one modality to constrain analyses of the other. EEG-informed fMRI analysis models fMRI activity using single-trial EEG features, whereas fMRI-informed EEG source imaging uses fMRI activation maps as spatial priors to improve source localization. Symmetric fusion treats both modalities equally. Data-driven methods such as joint independent component analysis identify shared neural sources without imposing strong biophysical assumptions. These methods have contributed to advances in several areas. In visual mechanism studies, joint EEG-fMRI has clarified feedforward and feedback interactions in visual cortical networks. In clinical diagnosis and evaluation, it offers objective physiological markers for disorders such as amblyopia and epilepsy by revealing altered activation patterns and network dysfunction. In Brain-Computer Interface (BCI) research, multimodal feature fusion improves the accuracy and robustness of decoding visual intentions.  Conclusions  This review examines joint EEG-fMRI methods for VER studies, classifying major acquisition and fusion strategies and summarizing representative applications. The choice of fusion framework depends on the research objective, data quality, and underlying assumptions. Although joint EEG-fMRI benefits basic neuroscience, clinical diagnosis, and BCI development, several issues limit broader use. System-level obstacles include hardware-induced artifacts, particularly severe electromagnetic interference in ultra-high-field MRI, which degrades EEG data quality. Algorithmic challenges arise from the mismatch in spatiotemporal scales between rapid EEG signals and delayed hemodynamic responses. Inter-subject variability further reduces the generalizability of analytical and decoding models. Continued innovation in hardware engineering and computational methods is required to address these limitations.  Prospects   Future work in joint EEG-fMRI for VER studies is expected to progress gradually and will be shaped by advances in artificial intelligence. System-level developments include next-generation hardware combining ultra-high-field MRI systems with artifact-resilient EEG sensors and real-time correction algorithms. The creation of open, multi-center EEG-fMRI databases (following standards like BIDS) based on standardized formats and analysis pipelines will improve reproducibility and comparability. Algorithmic progress is likely to focus on artificial intelligence and deep learning. End-to-end neural architectures with spatiotemporal attention mechanisms may learn nonlinear transformations between EEG and fMRI directly, addressing limitations of conventional linear models. Transfer learning and personalized modeling may mitigate inter-subject variability and support adaptive decoding and clinical applications. As clinical and BCI uses expand, balancing model complexity with interpretability and computational efficiency will remain essential. These developments are expected to advance understanding of visual neural computation, improve diagnostic and therapeutic strategies, and support more effective BCI systems.
Auxiliary Screening for Hypertrophic Cardiomyopathy With Heart Failure with Preserved Ejection Fraction Utilizing Smartphone-Acquired Heart Sound Analysis
DONG Xianpeng, MENG Xiangbin, ZHANG Kuo, FANG Guanchen, GAI Weihao, WANG Wenyao, WANG Jingjia, GAO Jun, PAN Junjun, TANG Zhenchao, SONG Zhen
2026, 48(3): 925-935. doi: 10.11999/JEIT250830
Abstract:
  Objective  Heart Failure with preserved Ejection Fraction (HFpEF) is highly prevalent among patients with Hypertrophic CardioMyopathy (HCM), and early identification is critical for improving disease management. However, early screening for HFpEF remains challenging because symptoms are non-specific, diagnostic procedures are complex, and follow-up costs are high. Smartphones, owing to their wide accessibility, low cost, and portability, provide a feasible means to support heart sound-based screening. In this study, smartphone-acquired heart sounds from patients with HCM are used to develop and train an ensemble learning classification model for early detection and dynamic self-monitoring of HFpEF in the HCM population.  Methods  The proposed HFpEF screening framework consists of three components: preprocessing, feature extraction, and model training and fusion based on ensemble learning (Fig. 1). During preprocessing, smartphone-acquired heart sounds are subjected to bandpass filtering and wavelet denoising to improve signal quality, followed by segmentation into individual cardiac cycles. For feature extraction, Mel-Frequency Cepstral Coefficients (MFCCs) and Short-Time Fourier Transform (STFT) time-frequency spectra are calculated (Fig. 3). For classification, a stacking ensemble strategy is applied. Base learners, including a Support Vector Machine (SVM) and a Convolutional Neural Network (CNN), are trained, and their predicted probabilities are combined to construct a new feature space. A Logistic Regression (LR) meta-learner is then trained on this feature space to identify HFpEF in patients with HCM.  Results and Discussions  The classification performance of the three models is evaluated using the same patient-level independent test set. The SVM base learner achieves an Area Under the Curve (AUC) of 0.800, with an accuracy of 0.766, sensitivity of 0.659, and specificity of 0.865 (Table 5). The CNN base learner attains an AUC of 0.850, with an accuracy of 0.789, sensitivity of 0.622, and specificity of 0.944 (Table 5). By comparison, the ensemble-based LR classifier demonstrates superior performance, reaching an AUC of 0.900, with an accuracy of 0.813, sensitivity of 0.768, and specificity of 0.854 (Table 5). Relative to the base learners, the ensemble model exhibits a significant overall performance improvement after probability-based feature fusion (Fig. 5). Compared with existing clinical HFpEF risk scores, the proposed method shows higher predictive performance and stronger dynamic monitoring capability, supporting its suitability for risk stratification and follow-up warning in home settings. Compared with professional heart sound acquisition devices, the smartphone-acquired approach provides greater accessibility and cost efficiency, supporting its application in auxiliary HFpEF screening for high-risk HCM populations.  Conclusions  The challenges of clinical HFpEF screening in patients with HCM are addressed by proposing a smartphone-acquired heart sound analysis approach combined with an ensemble learning prediction model, resulting in an accessible and easily implemented auxiliary screening pipeline. The effectiveness of smartphone-based heart sound analysis for initial HFpEF screening in patients with HCM is validated, demonstrating its feasibility as an economical auxiliary tool for early HFpEF detection. This approach provides a non-invasive, convenient, and efficient screening strategy for patients with HCM complicated by HFpEF.
Identification of Novel Protein Drug Targets for Respiratory Diseases by Integrating Human Plasma Proteome with Genome
MA Xinqian, NI Wentao
2026, 48(3): 936-946. doi: 10.11999/JEIT250796
Abstract:
  Objective  Respiratory diseases are a major cause of global morbidity and mortality and place a heavy socioeconomic burden on healthcare systems. Epidemiological data indicate that Chronic Obstructive Pulmonary Disease (COPD), pneumonia, asthma, lung cancer, and tuberculosis are the five most significant pulmonary diseases worldwide. The COronaVIrus Disease 2019 (COVID-19) pandemic has introduced additional challenges for respiratory health and emphasizes the need for new diagnostic and therapeutic strategies. Integrating proteomics with Genome-Wide Association Studies (GWAS) provides a framework for connecting genetic variation to clinical phenotypes. Genetic variants associated with plasma protein levels, known as protein Quantitative Trait Loci (pQTLs), link the genome to complex respiratory phenotypes. This study evaluates the causal effects of druggable proteins on major respiratory diseases through proteome-wide Mendelian Randomization (MR) and colocalization analyses. The aim is to identify causal associations that can guide biomarker development and drug discovery, and to prioritize candidates for therapeutic repurposing.  Methods  Summary-level data for circulating protein levels are obtained from two large pQTL studies: the deCODE study and the UK Biobank Pharma Proteomics Project (UKB-PPP). Strictly defined cis-pQTLs are selected to ensure robust genetic instruments, yielding 2,918 proteins for downstream analyses. For disease outcomes, large GWAS summary statistics for 27 respiratory phenotypes are collected from previously published studies and international consortia. A two-sample MR design is applied to estimate the effects of plasma proteins on these phenotypes. To reduce confounding driven by Linkage Disequilibrium (LD), Bayesian colocalization analysis is used to assess whether genetic signals for protein levels and respiratory outcomes share a causal variant. The Posterior Probability of hypothesis 4 (PP4) serves as the primary metric, and PP4 > 0.8 is considered strong evidence of shared causality. Summary-data-based Mendelian Randomization (SMR) and the HEterogeneity In Dependent Instruments (HEIDI) test are used to validate the causal associations. Bidirectional MR and the Steiger test are applied to evaluate potential reverse causality. Protein-Protein Interaction (PPI) networks are generated through the STRING database to visualize functional connectivity and biological pathways associated with the causal proteins.  Results and Discussions  The causal effects of 2 918 plasma proteins on 27 respiratory phenotypes are evaluated (Fig. 1). A total of 694 protein-trait associations meet the Bonferroni-corrected threshold (P<1.7×10–5) when cis-instrumental variables are used (Fig. 2). The MR-Egger intercept test identifies 94 protein-disease associations with evidence of directional pleiotropy, which are excluded. Colocalization analysis indicates that 29 protein-phenotype associations show high-confidence evidence of a shared causal variant (PP4>0.8), and 39 show medium-level evidence (0.5<PP4<0.8). SMR validation confirms 26 associations (P<1.72×10–3), and 21 pass the HEIDI test (P>0.05). The findings provide insights into several respiratory diseases. For COPD, five proteins—NRX3A, NRX3B, ERK-1, COMMD1, and PRSS27—are identified as causal. The association between NRXN3 and COPD suggests a genetic connection between nicotine-addiction pathways and chronic lung decline. For asthma, TEF, CASP8, and IL7R show causal evidence, and the robust association between IL7R and asthma suggests that modulation of T-cell homeostasis may provide a therapeutic opportunity. The FUT3_FUT5 complex is uniquely associated with Idiopathic Pulmonary Fibrosis (IPF). CSF3 and LTBP2 are significantly associated with severe COVID-19. For lung cancer, subtype-specific causal proteins are identified, including BTN2A1 for squamous cell lung cancer, BTN1A1 for small cell lung carcinoma, and EHBP1 for lung adenocarcinoma. These findings provide a basis for the development of subtype-specific precision therapies.  Conclusions  This study identifies 29 plasma proteins with high-confidence causal associations across major respiratory diseases. Using MR and colocalization, a comprehensive map of molecular drivers of respiratory conditions is generated. These findings may support precision medicine strategies. However, the findings are limited by the focus on European populations and potential heterogeneity arising from different proteomic platforms. The associations are based on computational analysis, and further validation in independent cohorts and animal models is needed. Additional experimental studies and clinical trials are required to clarify the pathogenic roles and biological mechanisms of the identified proteins to support therapeutic innovation in respiratory medicine.
Unsupervised 3D Medical Image Segmentation With Sparse Radiation Measurement
YU Xiaofan, ZOU Lanlan, GU Wenqi, CAI Jun, KANG Bin, DING Kang
2026, 48(3): 947-959. doi: 10.11999/JEIT250841
Abstract:
  Objective  Three-dimensional medical image segmentation is a central task in medical image analysis. Compared with two-dimensional imaging, it captures organ and lesion morphology more completely and provides detailed structural information, supporting early disease screening, personalized surgical planning, and treatment assessment. With advances in artificial intelligence, three-dimensional segmentation is viewed as a key technique for diagnostic support, precision therapy, and intraoperative navigation. However, methods such as Swinunetr-v2 and UNETR++ depend on extensive voxel-level annotations, which create high annotation costs and restrict clinical use. High-quality segmentation also often requires multi-view projections to recover full volumetric information, increasing radiation exposure and patient burden. Segmentation under sparse radiation measurements is therefore an important challenge. Neural Attenuation Fields (NAF) have recently been introduced for low-dose reconstruction by recovering linear attenuation coefficient fields from sparse views, yet their suitability for three-dimensional segmentation remains insufficiently examined. To address this limitation, a unified framework termed NA-SAM3D is proposed, integrating NAF-based reconstruction with interactive segmentation to enable unsupervised three-dimensional segmentation under sparse-view conditions, reduce annotation dependence, and improve boundary perception.  Methods  The framework is designed in two stages. In the first stage, sparse-view reconstruction is performed with NAF to generate a continuous three-dimensional attenuation coefficient tensor from sparse X-ray projections. Ray sampling and positional encoding are applied to arbitrary three-dimensional points, and the encoded features are forwarded to a Multi-Layer Perceptron (MLP) to predict linear attenuation coefficients that serve as input for segmentation. In the second stage, interactive segmentation is performed. A three-dimensional image encoder extracts high-dimensional features from the attenuation coefficient tensor, and clinician-provided point prompts specify regions of interest. These prompts are embedded into semantic features by an interactive user module and fused with image features to guide the mask decoder in producing initial masks. Because point prompts provide only local positional cues, boundary ambiguity and mask expansion may occur. To address these issues, a Density-Guided Module (DGM) is introduced at the decoder output stage. NAF-derived attenuation coefficients are transformed into a density-aware attention map, which is fused with the initial masks to strengthen tissue-boundary perception and improve segmentation accuracy in complex anatomical regions.  Results and Discussions  NA-SAM3D is evaluated on a self-constructed colorectal cancer dataset comprising 299 patient cases (collected in collaboration with Nanjing Hospital of Traditional Chinese Medicine) and on two public benchmarks: the Lung CT Segmentation Challenge (LCTSC) and the Liver Tumor Segmentation Challenge (LiTS). The results show that NA-SAM3D achieves overall better performance than mainstream unsupervised three-dimensional segmentation methods based on full radiation observation (SAM-MED series) and reaches accuracy comparable to, or in some cases higher than, the fully supervised Swinunetr-v2. Compared with SAM-MED3D, NA-SAM3D increases the Dice on the LCTSC dataset by more than 3%, while HD95 and ASD decrease by 5.29 mm and 1.32 mm, respectively, indicating improved boundary localization and surface consistency. Compared with the sparse-field-based method SA3D, NA-SAM3D achieves higher Dice scores on all three datasets (Table 1). Compared with the fully supervised Swinunetr-v2, NA-SAM3D reduces HD95 by 1.28 mm, and the average Dice is only 0.3% lower. Compared with SA3D, NA-SAM3D increases the average Dice by about 6.6% and reduces HD95 by about 11 mm, further confirming its capacity to restore structural details and boundary information under sparse-view conditions (Table 2). Although the overall performance remains slightly lower than that of the fully supervised UNETR++ model, NA-SAM3D still shows strong competitiveness and good generalization under label-free inference. Qualitative analysis shows that in complex pelvic and intestinal regions, NA-SAM3D produces clearer boundaries and higher contour consistency (Fig. 3). On public datasets, segmentation of the lung and liver also shows superior boundary localization and contour integrity (Fig. 4). Three-dimensional visualization further confirms that in colorectal, lung, and liver regions, NA-SAM3D achieves stronger structural continuity and boundary preservation than SAM-MED2D and SAM-MED3D (Fig. 5). The DGM further enhances boundary sensitivity, increasing Dice and mIoU by 1.20% and 3.31% on the self-constructed dataset, and by 4.49 and 2.39 percentage points on the LiTS dataset (Fig. 6).  Conclusions  An unsupervised three-dimensional medical image segmentation framework, NA-SAM3D, is presented, integrating NAF-based reconstruction with interactive segmentation to achieve high-precision segmentation under sparse radiation measurements. The DGM effectively uses attenuation coefficient priors to enhance boundary recognition in complex lesion regions. Experimental results show that the framework approaches the performance of fully supervised methods under unsupervised inference and yields an average Dice improvement of 2.0%, indicating strong practical value and clinical potential for low-dose imaging and complex anatomical segmentation. Future work will refine the model for additional anatomical regions and assess its practical use in preoperative planning.
Federated Semi-Supervised Image Segmentation with Dynamic Client Selection
LIU Zhenbing, LI Huanlan, WANG Baoyuan, LU Haoxiang, PAN Xipeng
2026, 48(3): 960-970. doi: 10.11999/JEIT250834
Abstract:
  Objective  Multicenter validation is a growing requirement in clinical research, yet strict privacy regulations, heterogeneous cross-institutional data distributions, and scarce pixel-level annotations limit the use of conventional centralized medical image segmentation models. This study develops a federated semi-supervised framework that uses labeled and unlabeled prostate MRI data from multiple hospitals, considers dynamic client participation and Non-Independent and Identically Distributed (Non-IID) data, and aims to improve segmentation accuracy and robustness under real-world constraints.  Methods  A cross-silo Federated Semi-Supervised Learning (FSSL) paradigm is used. Clients with pixel-wise annotations act as labeled clients, and those without annotations act as unlabeled clients. Each client maintains a local student network for prostate segmentation. On unlabeled clients, a teacher network with the same architecture is updated using the exponential moving average of student parameters and generates perturbed pseudo-labels to supervise the student through a hybrid consistency loss that combines Dice and binary cross-entropy terms. To reduce the effect of heterogeneous and low-quality updates, a performance-driven dynamic client selection and aggregation strategy is applied. At each communication round, clients are evaluated on their local validation sets, and only those whose Dice scores exceed a threshold are retained. A top-K subset is then aggregated with normalized contribution weights derived from validation Dice, with bounds to avoid gradient vanishing and single-client dominance. For unlabeled clients, a penalty factor down-weights unreliable pseudo-labeled updates. The segmentation backbone is a Multi-scale Feature Fusion U-Net (MFF-UNet). Starting from a standard encoder–decoder U-Net, an FPN-like pyramid is added to the encoder, where multi-level feature maps are channel-aligned using 1\begin{document}$* $\end{document}1 convolutions, fused in a top-down pathway through upsampling and element-wise addition, and refined using 3\begin{document}$* $\end{document}3 convolutions. The decoder upsamples these fused features and combines them with encoder features through skip connections, enabling joint modeling of global semantics and fine-grained boundaries. The framework is evaluated on T2-weighted prostate MRI from six centers, comprising three labeled and three unlabeled clients. All 3D volumes are resampled, sliced into 2D axial images, resized, and augmented. The Dice coefficient and 95th percentile Hausdorff distance (HD95) are used as evaluation metrics.  Results and Discussions  On the six-center dataset, the method achieves average Dice scores of 0.840 5 on labeled clients and 0.786 8 on unlabeled clients, with corresponding HD95 values of 8.04 and 8.67 pixel. These results are superior to or comparable with several representative federated semi-supervised or mixed-supervision methods, with the largest gains on distribution-shifted unlabeled centers. Visualization shows that the method generates more complete and smoother prostate contours with fewer false positives in low-contrast or small-volume cases than the baselines. Attention heatmaps from the final decoder layer indicate that UNet exhibits attention drift, SegMamba produces diffuse responses, and nnU-Net shows weak activations for small lesions, whereas MFF-UNet focuses more precisely on the prostate region with stable high responses, indicating improved discriminative capability and interpretability.  Conclusions  A federated semi-supervised prostate MRI segmentation framework that integrates teacher-student consistency learning, multi-scale feature fusion, and performance-driven dynamic client selection is presented. The method preserves privacy by keeping data local, reduces annotation scarcity by using unlabeled clients, and addresses client heterogeneity through reliability-aware aggregation. Experiments on a six-center dataset show that the framework achieves competitive or superior overlap and boundary accuracy compared with state-of-the-art federated semi-supervised methods, particularly on distribution-shifted unlabeled centers. The framework is model-agnostic and can be applied to other organs, imaging modalities, and cross-institutional segmentation tasks under strict privacy and regulatory constraints.
Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction
WANG Yuao, HUANG Yeqi, LI Qingyuan, LIU Yun, JING Shenqi, SHAN Tao, GUO Yongan
2026, 48(3): 971-981. doi: 10.11999/JEIT250798
Abstract:
  Objective  Diabetes mellitus and its complications are recognized as major global health challenges, causing severe morbidity, high healthcare costs, and reduced quality of life. Accurate joint prediction of these conditions is essential for early intervention but is hindered by data heterogeneity, sparsity, and complex inter-entity relationships. To address these challenges, a Representation Learning Enhanced Knowledge Graph-based Multi-Disease Prediction (REKG-MDP) model is proposed. Electronic Health Records (EHRs) are integrated with supplementary medical knowledge to construct a comprehensive Medical Knowledge Graph (MKG), and higher-order semantic reasoning combined with relation-aware representation learning is applied to capture complex dependencies and improve predictive accuracy across multiple diabetes-related conditions.  Methods  The REKG-MDP framework consists of three modules. First, an MKG is constructed by integrating structured EHR data from the MIMIC-IV dataset with external disease knowledge. Patient-side features include demographics, laboratory indices, and medical history, whereas disease-side attributes cover comorbidities, susceptible populations, etiological factors, and diagnostic criteria. This integration mitigates data sparsity and enriches semantic representation. Second, a relation-aware embedding module captures four relational patterns: symmetric, antisymmetric, inverse, and compositional. These patterns are used to optimize entity and relation embeddings for semantic reasoning. Third, a Hierarchical Attention-based Graph Convolutional Network (HA-GCN) aggregates multi-hop neighborhood information. Dynamic attention weights capture both local and global dependencies, and a bidirectional mechanism enhances the modeling of patient–disease interactions.  Results and Discussions  Experiments demonstrate that REKG-MDP consistently outperforms four baselines: two machine learning models (DCKD-RF and bSES-AC-RUN-FKNN) and two graph-based models (KGRec and PyRec). Compared with the strongest baseline, REKG-MDP achieves average improvements in P, F1, and NDCG of 19.39%, 19.67%, and 19.39% for single-disease prediction (\begin{document}$ n=1 $\end{document}); 16.71%, 21.83%, and 23.53% for \begin{document}$ n=3 $\end{document}; and 22.01%, 20.34%, and 20.88% for \begin{document}$ n=5 $\end{document} (Table 4). Ablation studies confirm the contribution of each module. Removing relation-pattern modeling reduces performance metrics by approximately 12%, removing hierarchical attention decreases them by 5~6%, and excluding disease-side knowledge produces the largest decline of up to 20% (Fig. 5). Sensitivity analysis indicates that increasing the embedding dimension from 32 to 128 enhances performance by more than 11%, whereas excessive dimensionality (256) leads to over-smoothing (Fig. 6). Adjusting the \begin{document}$ \beta $\end{document} parameter strengthens sample discrimination, improving P, F1, and NDCG by 9.28%, 27.9%, and 8.08%, respectively (Fig. 7).  Conclusions  REKG-MDP integrates representation learning with knowledge graph reasoning to enable multi-disease prediction. The main contributions are as follows: (1) integrating heterogeneous EHR data with disease knowledge mitigates data sparsity and enhances semantic representation; (2) modeling diverse relational patterns and applying hierarchical attention improves the capture of higher-order dependencies; and (3) extensive experiments confirm the model’s superiority over state-of-the-art baselines, with ablation and sensitivity analyses validating the contribution of each module. Remaining challenges include managing extremely sparse data and ensuring generalization across broader populations. Future research will extend REKG-MDP to model temporal disease progression and additional chronic conditions.
Wave-MambaCT: Low-dose CT Artifact Suppression Method Based on Wavelet Mamba
CUI Xueying, WANG Yuhang, LIU Bin, SHANGGUAN Hong, ZHANG Xiong
2026, 48(3): 982-993. doi: 10.11999/JEIT250489
Abstract:
  Objective  Low-Dose Computed Tomography (LDCT) reduces patient radiation exposure but introduces substantial noise and artifacts into reconstructed images. Convolutional Neural Network (CNN)-based denoising approaches are limited by local receptive fields, which restrict their abilities to capture long-range dependencies. Transformer-based methods alleviate this limitation but incur quadratic computational complexity relative to image size. In contrast, State Space Model (SSM)-based Mamba frameworks achieve linear complexity for long-range interactions. However, existing Mamba-based methods often suffer from information loss and insufficient noise suppression. To address these limitations, we propose the Wave-MambaCT model.  Methods  The proposed Wave-MambaCT model adopts a multi-scale framework that integrates Discrete Wavelet Transform (DWT) with a Mamba module based on the SSM. First, DWT performs a two-level decomposition of the LDCT image, decoupling noise from Low-Frequency (LF) content. This design directs denoising primarily toward the High-Frequency (HF) components, facilitating noise suppression while preserving structural information. Second, a residual module combined with a Spatial-Channel Mamba (SCM) module extracts both local and global features from LF and HF bands at different scales. The noise-free LF features are then used to correct and enhance the corresponding HF features through an attention-based Cross-Frequency Mamba (CFM) module. Finally, inverse wavelet transform is applied in stages to progressively reconstruct the image. To further improve denoising performance and network stability, multiple loss functions are employed, including L1 loss, wavelet-domain LF loss, and adversarial loss for HF components.  Results and Discussions  Extensive experiments on the simulated Mayo Clinic datasets, the real Piglet datasets, and the hospital clinical dataset DeepLesion show that Wave-MambaCT provides superior denoising performance and generalization. On the Mayo dataset, a PSNR of 31.6528 dB is achieved, which is higher than that of the suboptimal method DenoMamba (PSNR 31.4219 dB), while MSE is reduced to 0.00074 and SSIM and VIF are improved to 0.8851 and 0.4629, respectively (Table 1). Visual results (Figs. 46) demonstrate that edges and fine details such as abdominal textures and lesion contours are preserved, with minimal blurring or residual artifacts compared with competing methods. Computational efficiency analysis (Table 2) indicates that Wave-MambaCT maintains low FLOPs (17.2135 G) and parameters (5.3913 M). FLOPs are lower than those of all networks except RED-CNN, and the parameter count is higher only than those of RED-CNN and CTformer. During training, 4.12 minutes per epoch are required, longer only than RED-CNN. During testing, 0.1463 seconds are required per image, which is at a medium level among the compared methods. Generalization tests on the Piglet datasets (Figs. 7, 8, Tables 3, 4) and DeepLesion (Fig. 9) further confirm the robustness and generalization capacity of Wave-MambaCT.In the proposed design, HF sub-bands are grouped, and noise-free LF information is used to correct and guide their recovery. This strategy is based on two considerations. First, it reduces network complexity and parameter count. Second, although the sub-bands correspond to HF information in different orientations, they are correlated and complementary as components of the same image. Joint processing enhances the representation of HF content, whereas processing them separately would require a multi-branch architecture, inevitably increasing complexity and parameters. Future work will explore approaches to reduce complexity and parameters when processing HF sub-bands individually, while strengthening their correlations to improve recovery. For structural simplicity, SCM is applied to both HF and LF feature extraction. However, redundancy exists when extracting LF features, and future studies will explore the use of different Mamba modules for HF and LF features to further optimize computational efficiency.  Conclusions  Wave-MambaCT integrates DWT for multi-scale decomposition, a residual module for local feature extraction, and an SCM module for efficient global dependency modeling to address the denoising challenges of LDCT images. By decoupling noise from LF content through DWT, the model enables targeted noise removal in the HF domain, facilitating effective noise suppression. The designed RSCM, composed of residual blocks and SCM modules, captures fine-grained textures and long-range interactions, enhancing the extraction of both local and global information. In parallel, the Cross-band Enhancement Module (CEM) employs noise-free LF features to refine HF components through attention-based CFM, ensuring structural consistency across scales. Ablation studies (Table 5) confirm the essential contributions of both SCM and CEM modules to maintaining high performance. Importantly, the model’s staged denoising strategy achieves a favorable balance between noise reduction and structural preservation, yielding robustness to varying radiation doses and complex noise distributions.
Joint Mask and Multi-Frequency Dual Attention GAN Network for CT-to-DWI Image Synthesis in Acute Ischemic Stroke
ZHANG Zehua, ZHAO Ning, WANG Shuai, WANG Xuan, ZHENG Qiang
2026, 48(3): 994-1004. doi: 10.11999/JEIT250643
Abstract:
  Objective  In the clinical management of Acute Ischemic Stroke (AIS), Computed Tomography (CT) and Diffusion-Weighted Imaging (DWI) serve complementary roles at different stages. CT is widely applied for initial evaluation due to its rapid acquisition and accessibility, but it has limited sensitivity in detecting early ischemic changes, which can result in diagnostic uncertainty. In contrast, DWI demonstrates high sensitivity to early ischemic lesions, enabling visualization of diffusion-restricted regions soon after symptom onset. However, DWI acquisition requires a longer time, is susceptible to motion artifacts, and depends on scanner availability and patient cooperation, thereby reducing its clinical accessibility. The limited availability of multimodal imaging data remains a major challenge for timely and accurate AIS diagnosis. Therefore, developing a method capable of rapidly and accurately generating DWI images from CT scans has important clinical significance for improving diagnostic precision and guiding treatment planning. Existing medical image translation approaches primarily rely on statistical image features and overlook anatomical structures, which leads to blurred lesion regions and reduced structural fidelity.  Methods  This study proposes a Joint Mask and Multi-Frequency Dual Attention Generative Adversarial Network (JMMDA-GAN) for CT-to-DWI image synthesis to assist in the diagnosis and treatment of ischemic stroke. The approach incorporates anatomical priors from brain masks and adaptive multi-frequency feature fusion to improve image translation accuracy. JMMDA-GAN comprises three principal modules: a mask-guided feature fusion module, a multi-frequency attention encoder, and an adaptive fusion weighting module. The mask-guided feature fusion module integrates CT images with anatomical masks through convolution, embedding spatial priors to enhance feature representation and texture detail within brain regions and ischemic lesions. The multi-frequency attention encoder applies Discrete Wavelet Transform (DWT) to decompose images into low-frequency global components and high-frequency edge components. A dual-path attention mechanism facilitates cross-scale feature fusion, reducing high-frequency information loss and improving structural detail reconstruction. The adaptive fusion weighting module combines convolutional neural networks and attention mechanisms to dynamically learn the relative importance of input features. By assigning adaptive weights to multi-scale features, the module selectively enhances informative regions and suppresses redundant or noisy information. This process enables effective integration of low- and high-frequency features, thereby improving both global contextual consistency and local structural precision.  Results and Discussions  Extensive experiments were performed on two independent clinical datasets collected from different hospitals to assess the effectiveness of the proposed method. JMMDA-GAN achieved Mean Squared Error (MSE) values of 0.0097 and 0.0059 on Clinical Dataset 1 and Clinical Dataset 2, respectively, exceeding state-of-the-art models by reducing MSE by 35.8% and 35.2% compared with ARGAN. The proposed network reached peak Signal-to-Noise Ratio (PSNR) values of 26.75 dB and 28.12 dB, showing improvements of 30.7% and 7.9% over the best existing methods. For Structural Similarity Index (SSIM), JMMDA-GAN achieved 0.753 and 0.844, indicating superior structural preservation and perceptual quality. Visual analysis further demonstrates that JMMDA-GAN restores lesion morphology and fine texture features with higher fidelity, producing sharper lesion boundaries and improved structural consistency compared with other methods. Cross-center generalization and multi-center mixed experiments confirm that the model maintains stable performance across institutions, highlighting its robustness and adaptability in clinical settings. Parameter sensitivity analysis shows that the combination of Haar wavelet and four attention heads achieves an optimal balance between global structural retention and local detail reconstruction. Moreover, superpixel-based gray-level correlation experiments demonstrate that JMMDA-GAN exceeds existing models in both local consistency and global image quality, confirming its capacity to generate realistic and diagnostically reliable DWI images from CT inputs.  Conclusions  This study proposes a novel JMMDA-GAN designed to enhance lesion and texture detail generation by incorporating anatomical structural information. The method achieves this through three principal modules. (1) The mask-guided feature fusion module effectively integrates anatomical structure information, with particular optimization of the lesion region. The mask-guided network focuses on critical lesion features, ensuring accurate restoration of lesion morphology and boundaries. By combining mask and image data, the method preserves the overall anatomical structure while enhancing lesion areas, preventing boundary blurring and texture loss commonly observed in traditional approaches, thereby improving diagnostic reliability. (2) The multi-frequency feature fusion module jointly optimizes low- and high-frequency features to enhance image detail. This integration preserves global structural integrity while refining local features, producing visually realistic and high-fidelity images. (3) The adaptive fusion weighting module dynamically adjusts the learning strategy for frequency-domain features according to image content, enabling the network to manage texture variations and complex anatomical structures effectively, thereby improving overall image quality. Through the coordinated function of these modules, the proposed method enhances image realism and diagnostic precision. Experimental results demonstrate that JMMDA-GAN exceeds existing advanced models across multiple clinical datasets, highlighting its potential to support clinicians in the diagnosis and management of AIS.
Dynamic Wavelet Multi-Directional Perception and Geometry Axis-Solution Guided 3D CT Fracture Image Segmentation
ZHANG Yinhui, LIU Kai, HE Zifen, ZHANG Jinkai, CHEN Guangchen, MA Zhijian
2026, 48(3): 1005-1016. doi: 10.11999/JEIT250732
Abstract:
  Objective  Accurate segmentation of fracture surfaces in three-dimensional computed tomography (3D CT) images is essential for orthopedic surgical planning, particularly for determining nail insertion angles perpendicular to fracture planes. However, existing approaches present three major limitations: limited capture of deep global volumetric context, directional texture ambiguity in low-contrast fracture regions, and insufficient decoding of geometric features. To address these limitations, a Dynamic Wavelet Multi-Directional Perception and Geometry Axis-Solution Guided Network (DWAG-Net) is proposed to improve segmentation accuracy for complex tibial fractures and to provide reliable 3D digital guidance for preoperative planning.  Methods  The proposed architecture extends 3D nnU-Netv2 through three core components. First, a Dynamic Multi-View Aggregation (DMVA) module adaptively fuses tri-planar views (axial, sagittal, and coronal) with full-volume features using learnable parameter interpolation with an optimized kernel size of 2×2×2 and a channel-wise Hadamard product, thereby strengthening global context representation. Second, a Wavelet Direction Perception Enhancement (WDPE) module applies a 3D Symlets discrete wavelet transform to decompose inputs into eight subbands, followed by direction-specific enhancement. Adaptive convolutional kernels (e.g., [5, 3, 3] for depth-dominant fractures), reinforce texture information in high-frequency subbands, whereas cross-subband fusion integrates complementary features. Third, a Geometry Axis-Solution Guided (GASG) module is embedded in the decoder to maintain anatomical consistency by constructing axis-level affinity maps along depth, height, and width that combine geometric similarity with spatial distance decay, and by refining boundary delineation using rotational positional encoding and multi-axis attention. The network is trained on the YN-TFS dataset, which contains 110 tibial fracture CT scans with spatial resolutions ranging from 0.39 to 1.00 mm. Stochastic gradient descent is used with a learning rate of 0.01 and a momentum of 0.99. A class-weighted loss function with weights of 0.5 for background, 1 for bone, and 5 for fracture is adopted to address severe pixel imbalance.  Results and Discussions  DWAG-Net achieves state-of-the-art performance, with a mean Dice score of 71.20% (Table 1), exceeding that of nnU-Netv2 by 5.06%. For fracture surfaces, the Dice score reaches 69.48%, corresponding to an improvement of 7.12%. Boundary accuracy improves significantly, with a mean 95th percentile Hausdorff distance (HD95) of 1.38 mm and a fracture surface HD95 of 1.54 mm, representing a reduction of 3.70 mm. Ablation studies (Table 2) confirm the contribution of each component. DMVA increases the Dice score by 2.40% through adaptive multi-view fusion. WDPE reduces directional ambiguity and yields a 5.84% gain in fracture surface Dice. GASG provides an additional 1.20% improvement by enforcing geometric consistency. Optimal performance is obtained with a DMVA kernel size of 2×2×2, the use of Symlets wavelets, and sequential axis processing in the order of depth, height, and width. Qualitative comparisons indicate that DWAG-Net preserves fracture continuity in cases where U-Mamba and nnWNet fail, and reduces over-segmentation relative to nnFormer and UNETR++. (Fig. 5).  Conclusions  DWAG-Net establishes a state-of-the-art framework for 3D fracture segmentation by integrating multi-directional wavelet perception with geometry-guided decoding. The coordinated use of DMVA, directional texture enhancement, and geometry axis-solution guidance achieves clinically relevant precision, with a Dice score of 71.20% and an HD95 of 1.38 mm. These results support accurate data-driven surgical planning. Future work will focus on refining loss design to further mitigate severe class imbalance.
An EEG Emotion Recognition Model Integrating Memory and Self-attention Mechanisms
LIU Shanrui, BI Yingzhou, HUO Leigang, GAN Qiujing, ZHOU shuheng
2026, 48(3): 1017-1026. doi: 10.11999/JEIT250737
Abstract:
  Objective  ElectroEncephaloGraphy (EEG) is a noninvasive technique for recording neural signals and provides rich emotional and cognitive information for brain science research and affective computing. Although Transformer-based models demonstrate strong global modeling capability in EEG emotion recognition, their multi-head self-attention mechanisms do not reflect the characteristics of brain-generated signals that exhibit a forgetting effect. In human cognition, emotional or cognitive states from distant time points gradually decay, whereas existing Transformer-based approaches emphasize temporal relevance only and neglect this forgetting behavior. This limitation reduces recognition performance. Therefore, a model is designed to account for both temporal relevance and the intrinsic forgetting effect of brain activity.  Methods  A novel EEG emotion recognition model, termed Memory Self-Attention (MSA), is proposed by embedding a memory-based forgetting mechanism into the standard self-attention framework. The MSA mechanism integrates global semantic modeling with a biologically inspired memory decay component. For each attention head, a memory forgetting score is learned through two independent linear decay curves to represent natural attenuation over time. These scores are combined with conventional attention weights so that temporal relationships are adjusted by distance-aware forgetting behavior. This design improves performance with a negligible increase in model parameters and computational cost. An Aggregated Convolutional Neural Network (ACNN) is first applied to extract spatiotemporal features across EEG channels. The MSA module then captures global dependencies and memory-aware interactions. The refined representations are finally passed to a classification head to generate predictions.  Results and Discussions  The proposed model is evaluated on several benchmark EEG emotion recognition datasets. On the DEAP binary classification task, classification accuracies of 98.87% for valence and 98.30% for arousal are achieved. On the SEED three-class task, an accuracy of 97.64% is obtained, and on the SEED-IV four-class task, the accuracy reaches 95.90%. These results (Figs. 35, Tables 35) exceed those of most mainstream methods, indicating the effectiveness and robustness of the proposed approach across different datasets and emotion classification settings.  Conclusions  An effective and biologically informed method for EEG-based emotion recognition is presented by incorporating a memory forgetting mechanism into a Transformer architecture. The proposed MSA model captures both temporal correlations and forgetting characteristics of brain signals, providing a lightweight and accurate solution for multi-class emotion recognition. Experimental results confirm its strong performance and generalizability.
Research on Collaborative Reasoning Framework and Algorithms of Cloud-Edge Large Models for Intelligent Auxiliary Diagnosis Systems
HE Qian, ZHU Lei, LI Gong, YOU Zhengpeng, YUAN Lei, JIA Fei
2026, 48(3): 1027-1035. doi: 10.11999/JEIT250828
Abstract:
  Objective  The deployment of Large Language Models (LLMs) in intelligent auxiliary diagnosis is constrained by limited computing resources for local hospital deployment and by privacy risks related to the transmission and storage of medical data in cloud environments. Low-parameter local LLMs show 20%~30% lower accuracy in medical knowledge question answering and 15%~25% reduced medical knowledge coverage compared with full-parameter cloud LLMs, whereas cloud-based systems face inherent data security concerns. To address these issues, a cloud-edge LLM collaborative reasoning framework and related algorithms are proposed for intelligent auxiliary diagnosis systems. The objective is to design a cloud-edge collaborative reasoning agent equipped with intelligent routing and dynamic semantic desensitization to enable adaptive task allocation between the edge (hospital side) and cloud (regional cloud). The framework is intended to achieve a balanced result across diagnostic accuracy, data privacy protection, and resource use efficiency, providing a practical technical path for the development of medical artificial intelligence systems.  Methods  The proposed framework adopts a layered architectural design composed of a four-tier progressive architecture on the edge side and a four-tier service-oriented architecture on the cloud side (Fig. 1). The edge side consists of resource, data, model, and application layers, with the model layer hosting lightweight medical LLMs and the cloud-edge collaborative agent. The cloud side comprises AI IaaS, AI PaaS, AI MaaS, and AI SaaS layers, functioning as a center for computing power and advanced models. The collaborative reasoning process follows a structured workflow (Fig. 2), beginning with user input parsed by the agent to extract key clinical features, followed by reasoning node decision-making. Two core technologies support the agent: (1) Intelligent routing: This mechanism defaults to edge-side processing and dynamically selects the reasoning path (edge or cloud) through a dual-driven weight update strategy. It integrates semantic feature similarity computed through Chinese word segmentation and pre-trained medical language models and incorporates historical decision data, with an exponential moving average used to update feature libraries for adaptive optimization. (2) Dynamic semantic desensitization: Employing a three-stage architecture (sensitive entity recognition, semantic correlation analysis, and hierarchical desensitization decision-making), this technology identifies sensitive entities through a domain-enhanced Named Entity Recognition (NER) model, calculates entity sensitivity and desensitization priority, and applies a semantic similarity constraint to prevent excessive desensitization. Three desensitization strategies (complete deletion, general replacement, partial masking) are used based on entity sensitivity. Experimental validation is conducted with two open-source Chinese medical knowledge graphs (CMeKG and CPubMedKG) containing more than 2.7 million medical entities. The experimental environment (Fig. 3) deploys a qwen3:1.7b model on the edge and the Jiutian LLM on the cloud, with a 5 000-sample evaluation dataset divided into entity-level, relation-level, and subgraph-level questions. Performance is assessed with three metrics: answer accuracy, average token consumption, and average response time.  Results and Discussions  Experimental results show that the proposed framework achieves strong performance across the main evaluation dimensions. For answer accuracy, the intelligent routing mechanism attains 72.44% on CMeKG (Fig. 4) and 66.20% on CPubMedKG (Fig. 5), which are higher than the edge-side LLM alone (60.73% and 54.18%) and close to the cloud LLM (72.68% and 66.49%). These results indicate that the framework maintains diagnostic consistency with cloud-based systems while taking advantage of edge-side capabilities. For resource use, the intelligent routing model reduces average token consumption to 61.27, representing 45.63% of the cloud LLM’s token usage (131.68) (Fig. 6), which supports substantial cost reduction. For response time, the edge-side LLM shows latency greater than 6 s because of limited computing power, whereas the cloud LLM reaches 0.44 s latency through dedicated line access (8% of the 5.46 s latency under internet access). The intelligent routing model produces average latency values between those of the edge and cloud LLMs under both access modes (Fig. 7), consistent with expected trade-offs. The framework also shows applicability across common medical scenarios (Table 1), including outpatient triage, chronic disease management, medical image analysis, intensive care, and health consultation, by combining local real-time processing with cloud-based deep reasoning. Limitations appear in emergency rescue settings with weak network conditions because of latency constraints and in rare disease diagnosis because of limited edge-side training samples and potential loss of specific features during desensitization. Overall, the results verify that the cloud-edge collaborative reasoning mechanism reduces computing resource overhead while preserving consistency in diagnostic results.  Conclusions  This study constructs a cloud-edge LLM collaborative reasoning framework for intelligent auxiliary diagnosis systems, addressing the challenges of limited local computing power and cloud data privacy risks. Through the integration of intelligent routing, prompt engineering adaptation, and dynamic semantic desensitization, the framework achieves balanced optimization of diagnostic accuracy, data security, and resource economy. Experimental validation shows that its accuracy is comparable to cloud-only LLMs while resource consumption is substantially reduced, providing a feasible technical path for medical intelligence development. Future work focuses on three directions: intelligent on-demand scheduling of computing and network resources to mitigate latency caused by edge-side computing constraints; collaborative deployment of localized LLMs with Retrieval-Augmented Generation (RAG) to raise edge-side standalone accuracy above 90%; and expansion of diagnostic evaluation indicators to form a three-dimensional scenario-node-indicator system incorporating sensitivity, specificity, and AUC for clinical-oriented assessment.
Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images
WANG Yumeng, LIU Zhenbing, LIU Zaiyi
2026, 48(3): 1036-1046. doi: 10.11999/JEIT250842
Abstract:
  Objective  Data-driven deep learning methods are widely applied to cancer subtyping, yet their performance depends on large training datasets with fine-grained annotations. For gigapixel Whole Slide Images (WSI), such annotations are labor-intensive and costly. Clinical data are typically stored in isolated data silos, and sharing procedures raise privacy concerns. Federated Learning (FL) enables a global model to be trained from data distributed across multiple medical centers without transmitting local data. However, in conventional FL, substantial heterogeneity across centers reduces the performance and stability of the global model.  Methods  A privacy-preserving FL method is proposed for gigapixel WSI in computational pathology. Weakly supervised attention-based Multiple Instance Learning (MIL) is integrated with differential privacy to support training when only slide-level labels are available. Within each client, a multi-scale attention-based MIL method is used to conduct local training on histopathology WSIs, reducing the need for costly pixel-level annotation through a weakly supervised setting. During the federated update, local differential privacy is applied to limit the risk of sensitive information leakage. Random noise drawn from a Gaussian or Laplace distribution is added to model parameters after each client’s local training. Furthermore, a federated adaptive reweighting strategy is introduced to address the heterogeneity of pathological images across clients by dynamically balancing the influence of local data quantity and quality on each client’s aggregation weight.  Results and Discussions  The proposed FL framework is evaluated on two clinical diagnostic tasks: Non-small Cell Lung Cancer (NSCLC) histologic subtyping and Breast Invasive Carcinoma (BRCA) histologic subtyping. As shown in (Table 1, Table 2, and Fig. 4), the proposed FL method (Ours with DP and Ours w/o DP) achieves higher accuracy and stronger generalization than localized models and other FL approaches. Its classification performance remains competitive even when compared with the centralized model (Fig. 3). These results indicate that privacy-preserving FL is a feasible and effective strategy for multicenter histopathology images and may reduce the performance degradation typically caused by data heterogeneity across centers. When the magnitude of added noise is controlled within a limited range, stable classification can also be achieved (Table 3). The two main components, the multiscale representation attention network and the federated adaptive reweighting strategy, each contribute to consistent performance improvement (Table 4). In addition, the proposed FL method maintains stable classification performance across different hyperparameter settings (Table 5, Table 6), confirming its robustness.  Conclusions  The proposed FL method addresses two central challenges in multicenter computational pathology: the presence of data silos and concerns over privacy. It also alleviates the performance degradation caused by inter-center data heterogeneity. As balancing model accuracy with privacy protection remains a key challenge, future work focuses on developing methods that preserve privacy while sustaining stable classification performance.
Comparison of DeepSeek-V3.1 and ChatGPT-5 in Multidisciplinary Team Decision-making for Colorectal Liver Metastases
ZHANG Yangzi, XU Ting, GAO Zhaoya, SI Zhenduo, XU Weiran
2026, 48(3): 1047-1055. doi: 10.11999/JEIT250849
Abstract:
  Objective   ColoRectal Cancer (CRC) is the third most commonly diagnosed malignancy worldwide. Approximately 25~50% of patients with CRC develop liver metastases during the course of their disease, which increases the disease burden. Although the MultiDisciplinary Team (MDT) model improves survival in ColoRectal Liver Metastases (CRLM), its broader implementation is limited by delayed knowledge updates and regional differences in medical standards. Large Language Models (LLMs) can integrate multimodal data, clinical guidelines, and recent research findings, and can generate structured diagnostic and therapeutic recommendations. These features suggest potential to support MDT-based care. However, the actual effectiveness of LLMs in MDT decision-making for CRLM has not been systematically evaluated. This study assesses the performance of DeepSeek-V3.1 and ChatGPT-5 in supporting MDT decisions for CRLM and examines the consistency of their recommendations with MDT expert consensus. The findings provide evidence-based guidance and identify directions for optimizing LLM applications in clinical practice.  Methods   Six representative virtual CRLM cases are designed to capture key clinical dimensions, including colorectal tumor recurrence risk, resectability of liver metastases, genetic mutation profiles (e.g., KRAS/BRAF mutations, HER2 amplification status, and microsatellite instability), and patient functional status. Using a structured prompt strategy, MDT treatment recommendations are generated separately by the DeepSeek-V3.1 and ChatGPT-5 models. Independent evaluations are conducted by four MDT specialists from gastrointestinal oncology, gastrointestinal surgery, hepatobiliary surgery, and radiation oncology. The model outputs are scored using a 5-point Likert scale across seven dimensions: accuracy, comprehensiveness, frontier relevance, clarity, individualization, hallucination risk, and ethical safety. Statistical analysis is performed to compare the performance of DeepSeek-V3.1 and ChatGPT-5 across individual cases, evaluation dimensions, and clinical disciplines.  Results and Discussions   Both LLMs, DeepSeek-V3.1 and ChatGPT-5, show robust performance across all six virtual CRLM cases, with an average overall score of ≥ 4.0 on a 5-point scale. This performance indicates that clinically acceptable decision support is provided within a complex MDT framework. DeepSeek-V3.1 shows superior overall performance compared with ChatGPT-5 (4.27±0.77 vs. 4.08±0.86, P=0.03). Case-by-case analysis shows that DeepSeek-V3.1 performs significantly better in Cases 1, 4, and 6 (P=0.04, P<0.01, and P =0.01, respectively), whereas ChatGPT-5 receives higher scores in Case 2 (P<0.01). No significant differences are observed in Cases 3 and 5 (P=0.12 and P=1.00, respectively), suggesting complementary strengths across clinical scenarios (Table 3). In the multidimensional assessment, both models receive high scores (range: 4.12\begin{document}$ \sim $\end{document}4.87) in clarity, individualization, hallucination risk, and ethical safety, confirming that readable, patient-tailored, reliable, and ethically sound recommendations are generated. Improvements are still needed in accuracy, comprehensiveness, and frontier relevance (Fig. 1). DeepSeek-V3.1 shows a significant advantage in frontier relevance (3.90±0.65 vs. 3.24±0.72, P=0.03) and ethical safety (4.87±0.34 vs. 4.58±0.65, P= 0.03) (Table 4), indicating more effective incorporation of recent evidence and more consistent delivery of ethically robust guidance. For the case with concomitant BRAF V600E and KRAS G12D mutations, DeepSeek-V3.1 accurately references a phase III randomized controlled study published in the New England Journal of Medicine in 2025 and recommends a triple regimen consisting of a BRAF inhibitor + EGFR monoclonal antibody + FOLFOX. By contrast, ChatGPT-5 follows conventional recommendations for RAS/BRAF mutant populations-FOLFOXIRI+bevacizumab-without integrating recent evidence on targeted combination therapy. This difference shows the effect of timely knowledge updates on the clinical value of LLM-generated recommendations. For MSI-H CRLM, ChatGPT-5’s recommendation of “postoperative immunotherapy” is not supported by phase III evidence or existing guidelines. Direct use of such recommendations may lead to overtreatment or ineffective therapy, representing a clear ethical concern and illustrating hallucination risks in LLMs. Discipline-specific analysis shows notable variation. In radiation oncology, DeepSeek-V3.1 provides significantly more precise guidance on treatment timing, dosage, and techniques than ChatGPT-5 (4.55±0.67 vs. 3.38±0.91, P<0.01), demonstrating closer alignment with clinical guidelines. In contrast, ChatGPT-5 performs better in gastrointestinal surgery (4.48±0.67 vs. 4.17 ±0.85, P=0.02), with experts rating its recommendations on surgical timing and resectability as more concise and accurate. No significant differences are identified in gastrointestinal oncology and hepatobiliary surgery (P=0.89 and P=0.14, respectively), indicating comparable performance in these areas (Table 5). These findings show a performance bias across medical sub-specialties, demonstrating that LLM effectiveness depends on the distribution and quality of training data.  Conclusions   Both DeepSeek-V3.1 and ChatGPT-5 demonstrated strong capabilities in providing reliable recommendations for CRLM-MDT decision-making. Specifically, DeepSeek-V3.1 showed notable advantages in integrating cutting-edge knowledge, ensuring ethical safety, and performing in the field of radiation oncology, whereas ChatGPT-5 excelled in gastrointestinal surgery, reflecting a complementary strength between the two models. This study confirms the feasibility of leveraging LLMs as “MDT collaborators”, offering a readily applicable and robust technical solution to bridge regional disparities in clinical expertise and enhance the efficiency of decision-making. However, model hallucination and insufficient evidence grading remain key limitations. Moving forward, mechanisms such as real-world clinical validation, evidence traceability, and reinforcement learning from human feedback are expected to further advance LLMs into more powerful auxiliary tools for CRLM-MDT decision support.
A Causality-Guided KAN Attention Framework for Brain Tumor Classification
FAN Yawen, WANG Xiang, YUE Zhen, YU Xiaofan
2026, 48(3): 1056-1066. doi: 10.11999/JEIT250865
Abstract:
  Objective  Convolutional Neural Network (CNN)-based Computer-Aided Diagnosis (CAD) systems have advanced brain tumor classification in recent years. However, performance remains limited by feature confusion and insufficient modeling of high-order interactions. This study proposes a framework that integrates causal feature guidance with a KAN attention mechanism. A Confusion Balance Index (CBI) is developed to quantify real label distribution within clusters. A causal intervention mechanism then incorporates confused samples to strengthen discrimination between causal variables and confounding factors. A spline-based KAN attention module is further constructed to model high-order feature interactions and enhance focus on critical lesion regions and discriminative features. The combined causal modeling and nonlinear interaction enhancement improves robustness and addresses the inability of traditional architectures to capture complex pathological feature relationships.  Methods  A pre-trained CLIP model is used for feature extraction to obtain semantically rich visual representations. K-means clustering and the CBI are applied to identify confusing factor images, after which a causal intervention mechanism incorporates these samples into the training process. A causal-enhanced loss function is then designed to strengthen discrimination between causal variables and confounding factors. To address limited high-order interaction modeling, a Kolmogorov-Arnold Network (KAN)-based attention mechanism is integrated. This spline-based module constructs flexible nonlinear attention representations and refines high-order feature interactions. When fused with the backbone network, it improves discriminative performance and generalization.  Results and Discussions  The proposed method achieves superior performance across three datasets. On DS1, the model reaches 99.92% accuracy, 99.98% specificity, and 99.92% precision, outperforming RanMerFormer (+0.15%) and SAlexNet (+0.23%) and exceeding traditional CNNs by more than 2% (95%~97%). Swin Transformers reach 98.08% accuracy but only 91.75% precision, indicating stronger robustness of the proposed model in reducing false detections. On DS2, the method achieves 98.86% accuracy and 98.80% precision, exceeding the next-best RanMerFormer. On a more challenging in-house dataset, it maintains 90.91% accuracy and 95.45% specificity, showing generalization in complex settings. The gains result from the KAN attention mechanism’s ability to model high-order interactions and the causal reasoning module’s decoupling of confounding factors. These components improve focus on lesion regions and stabilize decision-making in complex scenarios. The results demonstrate reliable performance for clinical precision diagnostics.  Conclusions  The findings confirm that the proposed framework improves brain tumor classification. The combined effect of the causal intervention mechanism and the KAN attention module is the primary contributor to performance gains. These improvements require minimal increases in model parameters and inference latency, preserving efficiency and practicality. The study proposes a methodological direction for medical image classification and shows potential utility in few-shot learning and clinical decision support.
A Novel Prognostic Model Establishment and Treatment Efficacy Analysis for Primary Pulmonary Non-Hodgkin’s Lymphoma
LI Hui, LI Jiancheng, LIU Feng, WU Di, CHEN Chuanben, LI Jinluan
2026, 48(3): 1067-1076. doi: 10.11999/JEIT250874
Abstract:
  Objective  At present, few studies have examined Primary Pulmonary non-Hodgkin’s Lymphoma (PPL). Most available reports are single-center retrospective studies. Therefore, no widely accepted prognostic index or treatment strategy for PPL has been established. This study aims to develop and validate a novel prognostic index based on the International Prognostic Index (IPI) for PPL using data from the United States cancer population and Chinese multicenter cohorts. The study also compares the therapeutic effects of different treatment approaches to predict clinical prognosis and provide evidence to support treatment decision-making for PPL.  Methods  Clinical data from patients diagnosed with PPL were collected from two sources. The first source was the Surveillance, Epidemiology, and End Results (SEER) database of the United States, covering the period from 2000 to 2019. The second source included patients treated between 2010 and 2021 at three tertiary hospitals in China. Independent prognostic factors were identified using the Cox proportional hazards regression model. A nomogram was constructed to predict Cancer-Specific Survival (CSS). Model performance was evaluated using the Concordance index (C-index) and calibration curves. The nomogram was combined with the IPI to develop a novel prognostic index. Risk stratification was performed, and the 3-year Overall Survival (OS) rate was calculated for each risk group. The Inverse Probability of Treatment Weighting (IPTW) method was applied to reduce confounding factors. Survival analysis was conducted using Kaplan-Meier curves and the log-rank test.  Results and Discussions  A total of 4 313 cases from the SEER database and 107 cases from the Chinese multicenter cohort were included. Multivariate Cox regression analysis showed that independent prognostic factors for PPL included age (p<0.001; Hazard Ratio(HR), 1.078; 95% Confidence Interval(CI), 1.072\begin{document}$ \sim $\end{document}1.084), Ann Arbor stage (p<0.001), sex (p<0.001; HR, 0.719; 95% CI, 0.624\begin{document}$ \sim $\end{document}0.829), primary site (p=0.037), pathological type (p< 0.001), B symptoms (p= 0.012; HR, 0.944; 95% CI, 0.773\begin{document}$ \sim $\end{document}0.997), surgery (p< 0.001; HR, 1.453; 95% CI, 1.221\begin{document}$ \sim $\end{document}1.728), chemotherapy (p<0.001; HR, 0.742; 95% CI, 0.631\begin{document}$ \sim $\end{document}0.872), and marital status (p<0.001). Based on these factors, a nomogram predicting 3-, 5-, and 10-year CSS was established. By integrating the nomogram with the IPI, a prognostic model for PPL was developed with a C-index of 0.932. Using defined risk parameters, a novel prognostic index for PPL was constructed. The risk parameters included age>60 years, Ann Arbor stage III/IV, serum Lactate DeHydrogenase (LDH) level>1 times the normal level, performance status score>2, number of extranodal sites>1, male sex, pathological type other than Mucosa-Associated Lymphoid Tissue (MALT) lymphoma, presence of B symptoms, and absence of cancer treatment. Risk stratification was defined as follows: low-risk group (0\begin{document}$ \sim $\end{document}2 risk factors), low-intermediate-risk group (3\begin{document}$ \sim $\end{document}4 risk factors), high-intermediate-risk group (5 risk factors), and high-risk group (6\begin{document}$ \sim $\end{document}9 risk factors). The corresponding 3-year OS rates were 96.97%, 82.61%, 50.00%, and 11.11%, respectively (p<0.000 1). In the analysis of treatment efficacy, both the United States and Chinese datasets showed that chemotherapy significantly reduced CSS in patients with primary pulmonary MALT lymphoma (p<0.001). No significant difference was observed between surgery and radiotherapy in patients with either primary pulmonary MALT lymphoma or diffuse large B-cell lymphoma (p>0.05).  Conclusions  This study develops a novel prognostic index for PPL based on data from the United States cancer population and a Chinese multicenter cohort. The model includes age, disease stage, serum LDH level, performance status score, and number of extranodal sites. The index demonstrates strong predictive performance and accuracy. Risk stratification based on this index provides estimated 3-year OS rates for different risk groups. Treatment efficacy analysis indicates that chemotherapy may reduce CSS in patients with primary pulmonary MALT lymphoma. In addition, no significant difference is observed between surgery and radiotherapy in patients with primary pulmonary MALT lymphoma or diffuse large B-cell lymphoma.
A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning
LU Jiafa, TANG Kai, ZHANG Guoming, YU Xiaofan, GU Wenqi, LI Zhuo
2026, 48(3): 1077-1086. doi: 10.11999/JEIT250909
Abstract:
  Objective  The structuring of Traditional Chinese Medicine (TCM) Electronic Medical Records (EMRs) is essential for knowledge discovery, clinical decision support, and intelligent diagnosis. However, two major barriers remain. First, TCM EMRs are primarily unstructured free text and often paired with tongue images, which complicates automated processing. Second, grassroots hospitals usually have limited GPU resources, which restricts the deployment of large pretrained models. This study aims to address these challenges by proposing a lightweight multimodal model based on memory-constrained pruning. The method is designed to preserve near-state-of-the-art accuracy while sharply reducing memory consumption and computation cost, ensuring practical use in resource-limited healthcare settings.  Methods  A three-stage architecture is used, comprising an encoder, a multimodal fusion module, and a decoder. For text, a distilled TinyBERT encoder is combined with a BiLSTM-CRF decoder to extract 23 categories of TCM clinical entities, including symptoms, syndromes, prescriptions, and herbs. For images, a ResNet-50 encoder processes tongue diagnosis photographs. A memory-constrained pruning strategy is introduced in which an LSTM decision network observes convolutional feature maps and adaptively prunes redundant channels while retaining key diagnostic information. Gradient reparameterization and dynamic channel grouping improve pruning flexibility, and a reinforcement-learning controller stabilizes training. INT8 mixed-precision quantization, gradient accumulation, and Dynamic Batch Pruning (DBP) further reduce memory usage. A TCM terminology-enhanced lexicon is integrated into the encoder embeddings to improve recognition of rare entities. The system is trained end-to-end on paired EMR-tongue datasets (Fig. 1) to optimize multimodal information flow.  Results and Discussions  Experiments are performed on 10,500 de-identified EMRs paired with tongue images from 21 tertiary hospitals. On an RTX 3060 GPU, the model achieves an F1-score of 91.7%, reduces peak GPU memory to 3.8 GB, and reaches an inference speed of 22 records per second (Table 1). Compared with BERT-Large, memory consumption decreases by 75%, throughput increases 1.75×, and accuracy remains comparable. Ablation studies confirm the contributions of each component. The adaptive attention gating mechanism increases F1 by 2.8% (Table 3). DBP reduces memory usage by 38.7% with minimal accuracy loss and improves performance on EMRs exceeding 5 000 characters. The terminology-enhanced lexicon improves recognition of rare entities such as “blood stasis” by 6.2%. Structured EMR fields also support association rule mining, and the confidence of syndrome-symptom relationships increases by 18%. These findings highlight three observations: (1) multimodal fusion with lightweight design provides clinical advantages over unimodal models; (2) memory-constrained pruning achieves stable channel reduction under strict hardware limits and outperforms magnitude-based pruning; and (3) pruning, quantization, and dynamic batching show strong synergy when jointly designed. The results support the feasibility of deploying high-performing TCM EMR structuring systems in real-world environments with limited computational capacity.  Conclusions  This study proposes a lightweight multimodal framework for structuring TCM EMRs. Memory-constrained pruning, combined with quantization and DBP, substantially compresses the visual encoder while maintaining text-image fusion accuracy. The approach reaches near-state-of-the-art performance with sharply reduced hardware requirements, enabling deployment in regional hospitals and clinics. Beyond efficiency gains, the structured multimodal outputs enhance TCM knowledge graphs and improve downstream tasks such as syndrome classification and treatment recommendation. The framework narrows the gap between powerful pretrained models and limited hardware resources in grassroots institutions and provides a scalable direction for lightweight multimodal NLP in medical informatics. Future work includes integrating modalities such as pulse-wave signals, extending pruning strategies with graph neural networks, and exploring adaptive cross-modal attention to strengthen clinical applicability.
Progress in Modeling Cardiac Myocyte Calcium Cycling and Investigating Arrhythmia Mechanisms: A Study Focused on the Ryanodine Receptor
GAO Ying, ZHANG Yucheng, WANG Wenyao, SU Xuanyi, SONG Zhen
2026, 48(3): 1087-1104. doi: 10.11999/JEIT250957
Abstract:
  Significance   The Ryanodine Receptor (RyR) is a central regulator of intracellular calcium (Ca2+) homeostasis in cardiomyocytes through its control of Ca2+ release from the Sarcoplasmic Reticulum (SR). Abnormal RyR activity, including excessive activation or impaired gating, is a key mechanism underlying Early Afterdepolarizations (EADs) and Delayed Afterdepolarizations (DADs), thereby increasing arrhythmia risk. The coupling between membrane electrophysiology and Ca2+ cycling in cardiomyocytes depends on spatially organized and rapidly evolving processes that are difficult to resolve experimentally. Conventional approaches, including animal models and pharmacological interventions, are constrained by high cost and limited control of experimental variables. Mathematical modeling and computer simulation of the RyR have therefore become essential tools for studying RyR regulation under physiological and pathological conditions and for elucidating arrhythmogenic mechanisms. This review provides an overview of RyR biology and modeling. It first summarizes structural features and core functional properties to establish the mechanistic basis of RyR gating and regulation. It then evaluates current and emerging modeling approaches, outlining their strengths and limitations. The review next describes the integration of RyR models into cardiomyocyte Ca2+ cycling frameworks and their application across different cardiomyocyte subtypes. It further examines arrhythmogenic mechanisms arising from RyR dysfunction and assesses drug strategies designed to stabilize RyR activity. Finally, it highlights artificial intelligence and cardiac digital twins as emerging directions for advancing RyR modeling and therapeutic development.  Progress   The growing availability of RyR structural data has enabled continued refinement of modeling strategies. Early RyR models relied primarily on phenomenological formulations that were computationally practical but limited in mechanistic detail. Markov models have become the dominant framework for simulating RyR gating behavior and enable detailed representation of Ca2+ sparks and related events through discrete state transitions. Deterministic integration of Markov models offers high computational efficiency and adaptability across different cardiomyocyte types. However, this approach neglects the stochastic nature of RyR opening and fails to reproduce random fluctuations in intracellular Ca2+ concentration, which can lead to discrepancies between simulations and physiological behavior. Stochastic Markov models capture these random processes and are therefore essential for investigating arrhythmogenic phenomena such as Ca2+ waves. Their application, however, requires extensive experimental data and substantial computational resources, which limits large-scale implementation. Recent artificial intelligence approaches, including deep neural networks that compress Markov models into single governing equations, have improved computational efficiency. Advances in structural biology have further clarified RyR conformational dynamics and subunit cooperativity during gating, particularly in relation to diastolic Ca2+ leak. These insights have motivated more detailed models that incorporate subunit interactions or molecular dynamics. Numerous RyR models have been incorporated into cardiac action potential frameworks and applied to the study of EADs and DADs. These integrated models enhance understanding of electrical disturbances caused by RyR dysfunction and provide a useful platform for drug screening and mechanistic investigation.  Conclusion  Multiple RyR models have been developed that successfully reproduce key physiological processes, including Ca2+ sparks, and are widely applied in studies of cardiomyocyte Ca2+ cycling. Nevertheless, several challenges remain. (1) A unified modeling framework is still lacking. No single RyR model can accurately simulate Ca2+ dynamics across the full spectrum of physiological and pathological conditions. Careful evaluation is therefore required when selecting models for intracellular Ca2+ handling. (2) Computational burden limits multiscale integration. Multiscale models are necessary to connect cellular Ca2+ dynamics with tissue-level electrical propagation by incorporating spatial heterogeneity, but their high computational cost restricts application in clinically relevant scenarios. (3) Pacemaker cell models remain underdeveloped. Current research focuses primarily on ventricular and atrial cardiomyocytes, whereas pacemaker cell models are less mature and often rely on common-pool formulations that do not represent spatial Ca2+ gradients. Future studies should prioritize the development of detailed pacemaker cell models that explicitly represent Ca2+ release unit networks and incorporate realistic RyR dynamics. Artificial intelligence and cardiac digital twins, although still at an early stage in RyR modeling, offer substantial potential to advance mechanistic research and support precision-medicine applications.  Prospects   Future RyR research will increasingly depend on integrating advances from structural biology, biophysics, and computational science. Such efforts are required to connect molecular-scale RyR conformational changes with organ-level cardiac function and to enable scalable, clinically actionable models. These models can strengthen mechanistic understanding and accelerate translational progress in precision cardiology. Artificial intelligence and cardiac digital twins provide a pathway toward multi-scale cardiac models that incorporate patient-specific electrophysiology and Ca2+ cycling. These approaches may substantially improve understanding of arrhythmia mechanisms and heart failure pathophysiology and serve as predictive platforms for the development of mechanism-based personalized antiarrhythmic therapies.
Clinical Disease Risk Assessment System Based on Multi-source Genetic Information
NING Kaida, YU Zhengyang, ZHAO Xin, LI Ziyan, DAI Ju, XIA Li
2026, 48(3): 1105-1115. doi: 10.11999/JEIT251025
Abstract:
  Objective  Complex diseases are driven by polygenic inheritance and gene–environment interactions, resulting in highly heterogeneous pathogenic mechanisms and posing major challenges for both research and public health. Conventional single-trait polygenic risk scores (PRS) aggregate genetic variants associated with individual diseases but are limited by their neglect of cross-trait genetic correlations and nonlinear genetic interactions. Although multi-trait PRS approaches have been proposed to improve prediction accuracy, existing statistical-learning frameworks predominantly rely on linear integration of PRS features, failing to capture nonlinear interactions among single-nucleotide polymorphisms (SNPs) and to fully exploit shared genetic information across diseases. To address these limitations, we propose a nonlinear multi-source disease prediction framework, the SNP–PRS Fusion model, termed the mtSNPPRS_XGB (mtSNP-PRS XGBoost Integration Model).  Methods  The mtSNPPRS_XGB framework integrates raw SNP data of target traits with multi-trait PRS information to enhance genetic risk prediction for complex diseases through nonlinear modeling. SNPs significantly associated with target diseases were extracted from the GWAS Catalog (p < 5 × 10–8) and encoded as allele dosages (0/1/2), while PRS weights covering 80 traits were obtained from the PGS Catalog and used to compute individual PRS. After standardized preprocessing, SNP and PRS features were jointly fused and modeled using XGBoost to capture complex SNP-SNP and SNP-PRS interactions. This framework introduces two key innovations: (1) collaborative modeling of multi-trait genetic information by jointly leveraging disease-specific SNPs and cross-disease PRS, and (2) systematic learning of nonlinear genetic interactions to overcome the linear constraints of conventional PRS-based models.  Results and Discussions   The mtSNPPRS_XGB model was evaluated using UK Biobank data across 18 complex diseases. It achieved an average AUC of 66.70%, representing improvements of 1.04% over the elastic-net-based model and 4.39% over the conventional UniPRS model. The inclusion of SNP features substantially improved predictive performance in diseases such as coronary heart disease, psoriasis, and celiac disease, while the integration of multi-trait PRS further enhanced specificity, particularly in cardiovascular, autoimmune, and cancer-related conditions. SHAP-based interpretability analyses demonstrated that mtSNPPRS_XGB simultaneously captures global cross-disease genetic liability encoded by PRS and disease-specific localized SNP effects, as illustrated in Alzheimer’s disease, colorectal cancer, gout, and ischemic stroke. These findings support both the biological plausibility and interpretability of the proposed framework.  Conclusions  We present a novel statistical learning-based multi-trait genetic risk prediction model, mtSNPPRS_XGB, which introduces an SNP-PRS fusion architecture and employs XGBoost to capture nonlinear interactions among multi-source genetic features. By integrating raw SNP data with multi-trait PRS, the proposed framework significantly improves risk prediction performance for complex diseases. Validation across 18 diseases in the UK Biobank demonstrates consistent performance gains over traditional PRS-based methods. This study overcomes the linear modeling limitations of conventional PRS approaches and provides a new paradigm for nonlinear integration of SNPs and multi-trait PRS, offering a robust and interpretable tool for personalized genetic risk prediction in precision medicine.
Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification
CHEN Xiaohe, ZHANG Jiaang, LI Lingzhi, LI Guixiu, OU Zirong, BAO Yuehua, LIU Xinxin, YU Qiuchen, MA Yuhan, ZHAO Keyu, BAI Hua
2026, 48(3): 1116-1127. doi: 10.11999/JEIT250726
Abstract:
  Objective  Cancer mortality in China continues to rise, and pathological image classification has become central to diagnosis. Pathological images have a multilevel structure, yet many existing methods focus only on the highest resolution or use simple feature concatenation for multi-scale fusion. These strategies do not make effective use of hierarchical information. In addition, most approaches rely on random pseudo-bag division to handle high-resolution images. Because cancerous regions in positive slides are sparse, random sampling often produces incorrect pseudo-labels and low signal-to-noise ratios, which reduce classification accuracy. This study proposes a Hierarchical Fusion Multi-Instance Learning (HFMIL) method that integrates multilevel feature fusion with a pseudo-bag division strategy based on an attention evaluation function to improve accuracy and interpretability in pathological image classification.  Methods  A weakly supervised multilevel classification method is proposed to use the hierarchical characteristics of pathological images and improve cancer image classification performance. The method has three main steps. First, multilevel features are extracted. Blank regions are removed, low-resolution images are divided into patches, and these patches are indexed to their corresponding high-resolution regions. Semantic features capture low-resolution tissue structure and high-resolution cellular detail. Second, pseudo-bags are constructed using an attention-based evaluation function. Class activation mapping is used to compute patch-level scores. Patches are ranked, and high-scoring ones are selected as potential positive samples. Low-scoring patches are discarded to maintain pseudo-label relevance. High-resolution pseudo-bags are then generated using index mapping, which reduces incorrect pseudo-labels and improves the signal-to-noise ratio. Third, a two-stage classification model is developed. Low-resolution pseudo-bags are aggregated with a gated attention mechanism for preliminary classification. A cross-attention mechanism then fuses the most informative low-resolution features with their corresponding high-resolution features. The fused representation is concatenated with aggregated high-resolution pseudo-bags to form an image-level feature vector for final prediction. Training uses a two-stage loss that combines low-resolution and overall cross-entropy losses. Experiments on three pathological image datasets confirm the effectiveness of the method in weakly supervised settings.  Results and Discussions  The proposed method is compared with several recent weakly supervised classification approaches, including ABMIL, CLAM, TransMIL, and DTFD, using three pathological image datasets: the publicly available Camelyon16 and TCGA-LUNG datasets and a private skin cancer dataset, NBU-Skin. The results show clear performance gains. On Camelyon16, the method achieves 88.3% accuracy and an AUC of 0.979 (Table 2). On TCGA-LUNG, accuracy reaches 86.0% and AUC 0.931 (Table 2), exceeding the comparative methods. On the NBU-Skin dataset, accuracy reaches 90.5% and AUC 0.976 for multiclass tasks (Table 2). Ablation studies further examine the necessity of the multilevel feature fusion and pseudo-bag division modules. The combination of these modules improves classification performance. On the skin cancer dataset, removing the pseudo-bag division module reduces accuracy from 93.8% to 90.7%, and removing the multilevel feature fusion module reduces accuracy further to 80.0% (Table 3). These results confirm that each component contributes to the effectiveness of the method.  Conclusions  A weakly supervised pathological image classification method that integrates multilevel feature fusion and an attention-based pseudo-bag division strategy is proposed. The method uses hierarchical information effectively and reduces errors caused by incorrect pseudo-labels and low signal-to-noise ratios. Experiments show consistent improvements in accuracy and AUC across three datasets. The main contributions are: (1) a multilevel feature extraction and fusion strategy that uses a cross-attention mechanism to combine features across scales; (2) an attention-based pseudo-bag division method that identifies potential positive regions and improves pseudo-label correctness through a top-k strategy while reducing background noise; and (3) superior performance compared with recent weakly supervised classifiers. Future work may include optimizing cross-level attention mechanisms, extending the framework to prognosis prediction or lesion segmentation, and developing more efficient feature extraction and fusion modules for broader clinical use.
Lightweight Dual Convolutional Finger Vein Recognition Network Based on Attention Mechanism
ZHAO Bingyan, LIANG Yihuai, ZHANG Zhongxia, ZHANG Wenzheng
2026, 48(3): 1128-1138. doi: 10.11999/JEIT250380
Abstract:
  Objective  Finger vein recognition is an emerging biometric authentication technology valued for its physiological uniqueness and advantages in in vivo detection. However, mainstream deep learning recognition frameworks still face two challenges. High-precision recognition often depends on complex network structures, which increase parameter counts and hinder deployment in memory-limited embedded devices and edge scenarios with constrained computing resources. Model compression can reduce computational cost but often weakens feature representation, creating a conflict between recognition accuracy and efficiency. To address these issues, a lightweight dual convolutional model integrated with an attention mechanism is proposed. A parallel heterogeneous convolution module and an attention guidance mechanism are designed to extract diverse image features and improve recognition accuracy while preserving a lightweight network structure.  Methods  The proposed architecture adopts a three-level collaborative mechanism comprising feature extraction, dynamic calibration, and decision fusion. A dual convolutional feature extraction module is constructed using normalized ROI images. This module combines heterogeneous convolution kernels. Rectangular convolution branches with different shapes capture venous topological structures and diameter orientations, whereas square convolution branches employ stacked square kernels to extract local texture details and background intensity distributions. These branches operate in parallel with reduced channel numbers and generate complementary responses through kernel shape diversity. This design reduces parameter scale while improving feature discrimination. A parallel dual attention mechanism is then applied to achieve two-dimensional calibration through joint optimization of channel attention and spatial attention. Channel attention adaptively assigns weights to enhance discriminative venous texture features, whereas spatial attention constructs pixel-level dependency models that focus on effective discriminative regions. A parallel concatenation fusion strategy preserves structural information without introducing additional parameters and improves sensitivity to critical features. Finally, a three-level progressive feature optimization structure is implemented. A convolutional compression module with stride 2 nests multi-scale receptive fields and progressively refines primary features during dimensionality reduction. Two fully connected layers then perform feature space transformation. The first layer applies ReLU activation to form sparse representations, and the final layer applies Softmax for probability calibration. This structure balances shallow underfitting and deep overfitting while maintaining efficient forward inference.  Results and Discussions  The effectiveness and robustness of the proposed network are evaluated on three public datasets, namely USM, HKPU, and SDUMLA. Recognition accuracy is assessed using the Acc metric. Experimental results (Table 1) show strong recognition performance. Feature visualization heatmaps (Fig. 4, Fig. 6) confirm that the network extracts complete and discriminative venous features. Training visualizations (Fig. 7, Fig. 8) show stable loss and accuracy trends, achieving 100% classification performance and demonstrating training reliability and robustness. Quantitative comparisons (Tables 2 and 3) indicate that the proposed method effectively addresses the trade-off between model complexity and classification performance and achieves superior results across all three datasets. Ablation studies (Table 4) further verify the effectiveness of the proposed modules and show significant improvements in finger vein recognition performance.  Conclusions  A lightweight dual convolutional neural network with an attention mechanism is proposed. The network consists of three core modules: a dual convolutional feature extraction module, a parallel dual-attention module, and a feature optimization classification module. During feature extraction, long-range venous features and background information are jointly encoded through a low-channel parallel design, which substantially reduces parameter counts while improving inter-individual discrimination. The attention module efficiently captures critical venous features without the parameter expansion commonly observed in conventional attention mechanisms. The feature optimization classification module applies progressive feature recalibration, which reduces underfitting and overfitting during stacked dimensionality reduction. Experimental results show recognition accuracies of 99.70%, 98.33%, and 98.27% on the USM, HKPU, and SDUMLA datasets, corresponding to an average improvement of 2.05% over existing state-of-the-art methods. Compared with representative lightweight finger vein recognition approaches, the proposed method reduces parameter scale by 11.35%~60.19%, achieving a balance between model lightening and performance improvement.
Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients
ZHANG Junli, XU Weiran, WANG Zhao
2026, 48(3): 1139-1145. doi: 10.11999/JEIT251056
Abstract:
  Objective  With the rapid increase in cancer incidence and mortality worldwide, patient education has become a critical strategy for reducing the disease burden and improving patient outcomes. However, traditional education methods, such as paper-based materials or face-to-face consultations, are limited by time, space, and personalization constraints. The emergence of large language models (LLMs) has opened new opportunities for delivering intelligent, scalable, and personalized health education. Although domestic LLMs, such as Doubao, Kimi, and DeepSeek have been widely applied in general scenarios, their utility in oncology education remains underexplored. This study aimed to systematically evaluate the performance of three domestic LLMs in cancer patient education across multiple dimensions, providing empirical evidence for their potential clinical application and optimization.  Methods  Frequently asked patient education questions were collected through group discussions with oncology nurses from a tertiary hospital. Nineteen oncology nurses with ≥1 year of clinical experience participated in item selection, and the ten most common questions were chosen, covering domains such as diet, nutrition, treatment, adverse drug reactions, and prognosis. Each question was independently input into Doubao (Pro, ByteDance, May 2024), Kimi (V1.1, Moonshot AI, Nov 2023), and DeepSeek (R1, DeepSeek AI, Jan 2025) under “new chat” conditions to avoid contextual interference. Responses were standardized to remove model identifiers and randomly coded. Quality evaluation followed a blinded design. Thirteen inpatients with cancer assessed responses for readability and effectiveness, while six senior oncologists rated responses for accuracy, comprehensiveness, and professionalism. A self-designed five-point Likert scale was used for each dimension. Statistical analyses were conducted using GraphPad Prism 9.5.1. One-way ANOVA with Bonferroni correction was applied for dimensional comparisons, while Welch’s ANOVA and Games-Howell post hoc tests were used for overall score analysis. Results were visualized with tables and radar plots.  Results and Discussions  Overall, the three models achieved mean total scores of 4.05±0.687 (Doubao), 4.17±0.791 (Kimi), and 4.19±0.640 (DeepSeek). Welch’s ANOVA showed significant overall differences (F=5.537, P=0.004). Games-Howell analysis revealed that Doubao performed significantly worse than Kimi and DeepSeek (P=0.005 and 0.042, respectively), while Kimi and DeepSeek did not differ significantly (P=0.975). From the patient perspective, Kimi outperformed its peers, achieving the highest scores in readability (4.615±0.534) and effectiveness (4.476±0.560), with statistically significant differences (P<0.05). Patients rated Kimi’s responses to lifestyle-related queries, such as managing nausea or loss of appetite during chemotherapy, as particularly clear and actionable. From the expert perspective, DeepSeek demonstrated superiority in accuracy (4.117±0.846), comprehensiveness (4.100±0.681), and professionalism (3.917±0.645), with significant advantages over Kimi (P<0.01) and moderate superiority over Doubao (P<0.05). DeepSeek was favored for handling technical and evidence-based questions, such as drug metabolism or integrative therapy evaluation. The divergence between patient and expert assessments highlighted a mismatch: the “most understandable”responses (Kimi) were not always the “most professional” (DeepSeek). This complementarity suggests that future research should explore layered output formats or dual verification mechanisms. Such approaches would balance readability with professional rigor, minimizing the risks of misinformation while improving accessibility. Despite promising findings, limitations exist. This single-center study involved a relatively small sample size, and only patients with lung and breast cancer were included. The evaluation simulated static Q&A interactions rather than dynamic multi-turn dialogues, which are more representative of real-world consultations. Additionally, technical enhancements such as retrieval-augmented generation (RAG), fine-tuning with oncology-specific corpora, and multi-agent collaboration were not implemented. Future studies should expand to multi-center designs, diverse cancer populations, and advanced LLM optimization methods.  Conclusions  Domestic LLMs demonstrated significant potential as tools for cancer patient education. Kimi excelled in communication and patient-centered knowledge translation, while DeepSeek showed strength in professional accuracy and comprehensiveness. Doubao, although moderate across all dimensions, lagged behind in overall performance. The results indicate that LLMs can complement traditional health education by bridging the gap between patient comprehension and clinical expertise.
Split-architecture Non-contact Optical Seismocardiography Triggering System for Cardiac Magnetic Resonance Imaging
GAO Qiannan, ZHANG Jiayu, ZHU Yingen, WANG Wenjin, JI Jiansong, JI Xiaoyue
2026, 48(3): 1146-1156. doi: 10.11999/JEIT251098
Abstract:
  Objective  Cardiac-cycle synchronization is required in Cardiovascular Magnetic Resonance (CMR) to reduce motion artifacts and preserve quantitative accuracy. At high field strengths, the ElectroCardioGram (ECG) trigger is affected by magnetohydrodynamic effects and scanner-generated ElectroMagnetic Interference (EMI). Electrode placement and lead routing add setup burden. Contact-based mechanical sensors still require skin contact, and optical photoplethysmography introduces long physiological delay. A fully contactless and EMI-robust mechanical surrogate is therefore needed. This study develops a split-architecture, non-contact optical SeismoCardioGraphy (SCG) triggering system for CMR and evaluates its availability, beatwise detection performance, and timing characteristics under practical body-coil coverage.  Methods  The split-architecture system consists of a near-magnet optical acquisition unit and a far-magnet computation-and-triggering unit connected by fiber-optic links to minimize conductive pathways near the scanner (Fig. 2). The acquisition unit uses a defocused industrial camera and laser illumination to record speckle-pattern dynamics on the anterior chest without physical contact (Fig. 3). Dense optical flow is computed in a chest region of interest, and the displacement field is projected onto a principal motion direction to form a one-dimensional SCG sequence (Fig. 4). Drift suppression, smoothing, and short-window normalization are applied. Trigger timing is refined with a valley-constrained gradient search within a physiologically bounded window to reduce spurious detections and improve temporal consistency (Fig. 4). A benchmark dataset is acquired from 20 healthy volunteers under three coil configurations: no body coil, an ultra-flexible body coil, and a rigid body coil (Fig. 5, Fig. 6, Table 3). ECG serves as the reference, and CamPPG and radar are recorded for comparison. Beatwise precision, recall, and F1 score are computed against ECG R peaks, and availability is reported as the fraction of usable segments under unified quality criteria (Table 4). Backward and forward physiological delays and delay variability are summarized across subjects and coil conditions (Table 5, Table 6). Key windowing and refractory parameters are tested for sensitivity (Table 2). Runtime is measured to assess real-time feasibility, including the cost of dense optical flow and the overhead of one-dimensional processing and triggering (Table 7).  Results and Discussions  Under no-coil and ultra-flexible-coil conditions, the optical SCG trigger achieves high availability (about 97.6%) and strong beatwise performance. F1 reaches about 0.91 under the ultra-flexible coil (Table 4, Table 5). The backward physiological delay remains on the order of several tens of milliseconds, and delay jitter is generally within a few tens of milliseconds (Table 5, Table 6). Under the rigid body coil, performance decreases markedly. Mechanical decoupling between the coil surface and the chest wall weakens and distorts the vibration signature, which blurs AO-related features and increases false triggers (Fig. 1). This effect appears as lower precision and F1 and as a shift toward longer and more variable delays compared with the other conditions (Table 4, Table 6). Compared with CamPPG, which reflects peripheral blood-volume dynamics and typically lags further behind the ECG R peak, the optical SCG surrogate provides a more proximal mechanical marker with reduced trigger phase lag (Fig. 8, Table 5). EMI robustness is supported by representative segments: ECG waveforms show visible distortion under interference, whereas the optical SCG surrogate remains interpretable because acquisition and transmission near the scanner are fully optical and electrically isolated (Fig. 8). Parameter analysis supports a moderate processing window and a 0.5 s minimum interbeat interval as a stable choice across subjects (Table 2). Runtime analysis shows that dense optical flow dominates computational cost, whereas one-dimensional processing and triggering add little overhead. Throughput exceeds the acquisition frame rate, supporting real-time triggering (Table 7).  Conclusions  A split-architecture, non-contact optical SCG triggering system is developed and validated under three representative body-coil configurations. Fiber-optic separation between near-magnet acquisition and far-magnet processing improves EMI robustness while maintaining real-time trigger output. High availability, strong beatwise performance, and short physiological delay are demonstrated under no-coil and ultra-flexible-coil conditions (Table 4, Table 5). Rigid-coil coverage exposes a clear limitation caused by reduced mechanical coupling, which motivates further optimization for mechanically decoupled or heavily occluded scenarios (Fig. 1, Table 6).
Research on ECG Pathological Signal Classification Empowered by Diffusion Generative Data
GE Beining, CHEN Nuo, JIN Peng, SU Xin, LU Xiaochun
2026, 48(3): 1157-1166. doi: 10.11999/JEIT250404
Abstract:
  Objective  ElectroCardioGram (ECG) signals are key indicators of human health. However, their complex composition and diverse features make visual recognition prone to errors. This study proposes a classification algorithm for ECG pathological signals based on data generation. A Diffusion Generative Network (DGN), also known as a diffusion model, progressively adds noise to real ECG signals until they approach a noise distribution, thereby facilitating model processing. To improve generation speed and reduce memory usage, a Knowledge Distillation-Diffusion Generative Network (KD-DGN) is proposed, which demonstrates superior memory efficiency and generation performance compared with the traditional DGN. This work compares the memory usage, generation efficiency, and classification accuracy of DGN and KD-DGN, and analyzes the characteristics of the generated data after lightweight processing. In addition, the classification effects of the original MIT-BIH dataset and an extended dataset (MIT-BIH-PLUS) are evaluated. Experimental results show that convolutional networks extract richer feature information from the extended dataset generated by DGN, leading to improved recognition performance of ECG pathological signals.  Methods  The generative network-based ECG signal generation algorithm is designed to enhance the performance of convolutional networks in ECG signal classification. The process begins with a Gaussian noise-based image perturbation algorithm, which obscures the original ECG data by introducing controlled randomness. This step simulates real-world variability, enabling the model to learn more robust representations. A diffusion generative algorithm is then applied to reconstruct and reproduce the data, generating synthetic ECG signals that preserve the essential characteristics of the original categories despite the added noise. This reconstruction ensures that the underlying features of ECG signals are retained, allowing the convolutional network to extract more informative features during classification. To improve efficiency, the approach incorporates knowledge distillation. A teacher-student framework is adopted in which a lightweight student model is trained from the original, more complex teacher ECG data generation model. This strategy reduces computational requirements and accelerates the data generation process, improving suitability for practical applications. Finally, two comparative experiments are designed to validate the effectiveness and accuracy of the proposed method. These experiments evaluate classification performance against existing approaches and provide quantitative evidence of its advantages in ECG signal processing.  Results and Discussions  The data generation algorithm yields ECG signals with a Signal-to-Noise Ratio (SNR) comparable to that of the original data, while presenting more discernible signal features. The student model constructed through knowledge distillation produces ECG samples with the same SNR as those generated by the teacher model, but with substantially reduced complexity. Specifically, the student model achieves a 50% reduction in size, 37.5% lower memory usage, and a 57% shorter runtime compared with the teacher model (Fig. 6). When the convolutional network is trained with data generated by the KD-DGN, its classification performance improves across all metrics compared with a convolutional network trained without KD-DGN. Precision reaches 95.7%, and the misidentification rate is reduced to approximately 3% (Fig. 9).  Conclusions  The DGN provides an effective data generation strategy for addressing the scarcity of ECG datasets. By supplying additional synthetic data, it enables convolutional networks to extract more diverse class-specific features, thereby improving recognition performance and reducing misidentification rates. Optimizing DGN with knowledge distillation further enhances efficiency, while maintaining SNR equivalence with the original DGN. This optimization reduces computational cost, conserves machine resources, and supports simultaneous task execution. Moreover, it enables the generation of new data without LOSS, allowing convolutional networks to learn from larger datasets at lower cost. Overall, the proposed approach markedly improves the classification performance of convolutional networks on ECG signals. Future work will focus on further algorithmic optimization for real-world applications.
Cross Modal Hashing of Medical Image Semantic Mining for Large Language Model
LIU Qinghai, WU Qianlin, LUO Jia, TANG Lun, XU Liming
2026, 48(3): 1167-1179. doi: 10.11999/JEIT250529
Abstract:
  Objective  A novel cross-modal hashing framework driven by Large Language Models (LLMs) is proposed to address the semantic misalignment between medical images and their corresponding textual reports. The objective is to enhance cross-modal semantic representation and improve retrieval accuracy by effectively mining and matching semantic associations between modalities.  Methods  The generative capacity of LLMs is first leveraged to produce high-quality textual descriptions of medical images. These descriptions are integrated with diagnostic reports and structured clinical data using a dual-stream semantic enhancement module, designed to reinforce inter-modality alignment and improve semantic comprehension. A structural similarity-guided hashing scheme is then developed to encode both visual and textual features into a unified Hamming space, ensuring semantic consistency and enabling efficient retrieval. To further enhance semantic alignment, a prompt-driven attention template is introduced to fuse image and text features through fine-tuned LLMs. Finally, a contrastive loss function with hard negative mining is employed to improve representation discrimination and retrieval accuracy.  Results and Discussions  Experiments are conducted on a multimodal medical dataset to compare the proposed method with existing cross-modal hashing baselines. The results indicate that the proposed method significantly outperforms baseline models in terms of precision and Mean Average Precision (MAP) (Table 3; Table 4). On average, a 7.21% improvement in retrieval accuracy and a 7.72% increase in MAP are achieved across multiple data scales, confirming the effectiveness of the LLM-driven semantic mining and hashing approach.  Conclusions  Experiments are conducted on a multimodal medical dataset to compare the proposed method with existing cross-modal hashing baselines. The results indicate that the proposed method significantly outperforms baseline models in terms of precision and Mean Average Precision (MAP) (Table 3; Table 4). On average, a 7.21% improvement in retrieval accuracy and a 7.72% increase in MAP are achieved across multiple data scales, confirming the effectiveness of the LLM-driven semantic mining and hashing approach.
Overviews
Intelligent Analysis Technologies for Encrypted Traffic: Current Status, Advances, and Challenges
GONG Bi, LIU Jian, TANG Xiaomei, YU Meiting, GONG Hang, HUANG Meigen
2026, 48(3): 1180-1197. doi: 10.11999/JEIT250416
Abstract:
  Significance   Encrypted traffic enables secure and reliable data transmission, yet introduces challenges to network security. These include the covert spread of malicious attacks, reduced effectiveness of security tools, and increased network resource overhead. Encrypted traffic analysis technologies are therefore essential. Traditional port filtering and deep packet inspection are inadequate in increasingly complex network environments. Intelligent encrypted traffic analysis integrates feature engineering, deep learning, Transformer architectures, federated learning, multimodal feature fusion, and generative models. These approaches address network security management from multiple perspectives. They support efficient detection of hidden attacks, improve network resource allocation, balance system security and privacy protection, enhance security defenses, and strengthen user experience.  Progress   Intelligent encrypted traffic analysis technologies provide new methods for network security. (1) Feature engineering: (a) Statistical features: Basic statistical features of encrypted traffic, such as packet size, count, arrival time, and rate, are selected through feature selection techniques so that the processed data reflect internal traffic characteristics. (b) Behavioral features: Observation and analysis of network traffic identify behavioral patterns such as access frequency and protocol usage habits. (2) Deep learning methods: (a) Convolutional Neural Network (CNN): Convolution and pooling layers automatically extract local features from encrypted traffic and capture key information. An improved multi-scale CNN achieves 86.77% accuracy on the ISCXVPN2016 dataset. (b) Recurrent Neural Network (RNN): RNNs process time-series data through memory units and capture long-term dependencies, enabling analysis of temporal features such as connection duration and traffic trends. (c) Graph Neural Network (GNN): GNNs are suited to relational data and model the graph structures of encrypted traffic to identify potential node relationships. (d) Transformer architectures: With parallel processing and support for long sequences, attention mechanisms capture long-distance dependencies. A traffic Transformer method using masked autoencoders reaches 98.07% accuracy on the ISCXVPN2016 dataset. (3) Other advanced methods: (a) Federated learning: Participants train a shared global model by exchanging sub-model parameters rather than raw traffic data, which protects privacy and improves performance. Reported results show performance gaps relative to centralized learning reduced to 0.8%. (b) Multimodal feature fusion: Features extracted from multiple traffic modalities are fused into a unified representation to build a comprehensive analysis architecture. This integration of heterogeneous features improves model performance, raising accuracy and F1-score for multitask classification to 93.75% and 91.95%. (c) Generative model-driven methods: Generative Adversarial Networks (GAN) and diffusion models learn real traffic distributions to generate synthetic samples, which mitigate data scarcity and class imbalance. Diffusion-based traffic generation increases similarity to real traffic in packet size and inter-arrival time by up to 43.4% and 39.02% compared with baseline models.  Conclusions  This paper explains the necessity of intelligent encrypted traffic analysis technologies and summarizes key methods and related research. Remaining challenges include: (1) Network complexity: Modern networks are heterogeneous and dynamic, using diverse encryption algorithms and producing inconsistent traffic structures that traditional rules do not adapt to. Network adjustments and behavior changes also shift traffic features over time, which complicates analysis. (2) Insufficient model robustness: Encrypted traffic features depend strongly on environment. Accuracy decreases after model migration, and models remain sensitive to non-ideal inputs and adversarial examples, which affect model decisions. (3) Privacy protection and compliance: Encrypted traffic carries sensitive information, and conventional analysis risks exposing original features. Even metadata can be associated with identities, which complicates compliance with anonymization requirements.  Prospects   Future work may focus on: (1) Dynamic adaptability: Full-link adaptive mechanisms that integrate multi-dimensional information may support dynamic context awareness. Incremental learning frameworks may help models respond in real time to feature drift. Genetic algorithms and reinforcement learning may also support dynamic detection strategies. (2) Anti-attack capability: A comprehensive protection system that includes adversarial sample detection, model defense, and attack traceability may be established by designing monitoring modules and applying adversarial training. (3) Privacy protection and compliance: Differential privacy can be applied by adding controlled noise during feature extraction or to model parameters. Homomorphic encryption may support analytical tasks directly on ciphertext. (4) Synergy between reverse engineering and Explainable AI (XAI): Reverse engineering may deepen protocol analysis and enhance the quality of inputs for XAI, and XAI may improve model transparency. This supports closed-loop optimization between protocol analysis and model interpretation.
Image and Intelligent Information Processing
Multi-Scale Region of Interest Feature Fusion for Palmprint Recognition
MA Yuxuan, ZHANG Feifei, LI Guanghui, TANG Xin, DONG Zhengyang
2026, 48(3): 1198-1207. doi: 10.11999/JEIT250940
Abstract:
  Objective  Accurate localization of the Region Of Interest (ROI) is a prerequisite for high-precision palmprint recognition. In contactless and uncontrolled application scenarios, complex background illumination and diverse hand postures frequently cause ROI localization offsets. Most existing deep learning-based recognition methods rely on a single fixed-size ROI as input. Although some approaches adopt multi-scale convolution kernels, fusion at the ROI level is not performed, which makes these methods highly sensitive to localization errors. Therefore, small deviations in ROI extraction often result in severe performance degradation, which restricts practical deployment. To overcome this limitation, a Multi-scale ROI Feature Fusion Mechanism is proposed, and a corresponding model, termed ROI3Net, is designed. The objective is to construct a recognition system that is inherently robust to localization errors by integrating complementary information from multiple ROI scales. This strategy reinforces shared intrinsic texture features while suppressing scale-specific noise introduced by positioning inaccuracies.  Methods  The proposed ROI3Net adopts a dual-branch architecture consisting of a Feature Extraction Network and a lightweight Weight Prediction Network (Fig. 4). The Feature Extraction Network employs a sequence of Multi-Scale Residual Blocks (MSRBs) to process ROIs at three progressive scales (1.00×, 1.25×, and 1.50×) in parallel. Within each MSRB, dense connections are applied to promote feature reuse and reduce information loss (Eq. 3). Convolutional Block Attention Modules (CBAMs) are incorporated to adaptively refine features in both the channel and spatial dimensions. The Weight Prediction Network is implemented as an end-to-end lightweight module. It takes raw ROI images as input and processes them using a serialized convolutional structure (Conv2d-BN-GELU-MaxPool), followed by a Multi-Layer Perceptron (MLP) head, to predict a dynamic weight vector for each scale. This subnetwork is optimized for efficiency, containing 2.38 million parameters, which accounts for approximately 6.2% of the total model parameters, and requiring 103.2 MFLOPs, which corresponds to approximately 2.1% of the total computational cost. The final feature representation is obtained through a weighted summation of multi-scale features (Eq. 1 and Eq. 2), which mathematically maximizes the information entropy of the fused feature vector.  Results and Discussions  Experiments are conducted on six public palmprint datasets: IITD, MPD, NTU-CP, REST, CASIA, and BMPD. Under ideal conditions with accurate ROI localization, ROI3Net demonstrates superior performance compared with state-of-the-art single-scale models. For instance, a Rank-1 accuracy of 99.90% is achieved on the NTU-CP dataset, and a Rank-1 accuracy of 90.17% is achieved on the challenging REST dataset (Table 1). Model robustness is further evaluated by introducing a random 10% localization offset. Under this condition, conventional models exhibit substantial performance degradation. For example, the Equal Error Rate (EER) of the CO3Net model on NTU-CP increases from 2.54% to 15.66%. In contrast, ROI3Net maintains stable performance, with the EER increasing only from 1.96% to 5.01% (Fig. 7, Table 2). The effect of affine transformations, including rotation (±30°) and scaling (0.85\begin{document}$ \sim $\end{document}1.15×), is also analyzed. Rotation causes feature distortion because standard convolution operations lack rotation invariance, whereas the proposed multi-scale mechanism effectively compensates for translation errors by expanding the receptive field (Table 3). Generalization experiments further confirm that embedding this mechanism into existing models, including CCNet, CO3Net, and RLANN, significantly improves robustness (Table 6). In terms of efficiency, although the theoretical computational load increases by approximately 150%, the actual GPU inference time increases by only about 20% (6.48 ms) because the multi-scale branches are processed independently and in parallel (Table 7).  Conclusions  A Multi-scale ROI Feature Fusion Mechanism is presented to reduce the sensitivity of palmprint recognition systems to localization errors. By employing a lightweight Weight Prediction Network to adaptively fuse features extracted from different ROI scales, the proposed ROI3Net effectively combines fine-grained texture details with global semantic information. Experimental results confirm that this approach significantly improves robustness to translation errors by recovering truncated texture information, whereas the efficient design of the Weight Prediction Network limits computational overhead. The proposed mechanism also exhibits strong generalization ability when integrated into different backbone networks. This study provides a practical and resilient solution for palmprint recognition in unconstrained environments. Future work will explore non-linear fusion strategies, such as graph neural networks, to further exploit cross-scale feature interactions.
CaRS-Align: Channel Relation Spectra Alignment for Cross-Modal Vehicle Re-identification
SA Baihui, ZHUANG Jingyi, ZHENG Jinjie, ZHU Jianqing
2026, 48(3): 1208-1218. doi: 10.11999/JEIT250917
Abstract:
  Objective  Visible and infrared images are two commonly used modalities in intelligent transportation scenarios and play a key role in vehicle re-identification. However, differences in imaging mechanisms and spectral responses lead to inconsistent visual characteristics between these modalities, which limits cross-modal vehicle re-identification. To address this problem, this paper proposes a Channel Relation Spectra Alignment (CaRS-Align) method that uses channel relation spectra, rather than channel-wise features, as the alignment target. This strategy reduces interference caused by imaging style differences at the relational-structure level. Within each modality, a channel relation spectrum is constructed to capture stable and semantically coordinated channel-to-channel relationships through correlation modeling. At the cross-modal level, the correlation between the corresponding channel relation spectra of the two modalities is maximized to achieve consistent alignment of relational structures. Experiments on the public MSVR310 and RGBN300 datasets show that CaRS-Align outperforms existing state-of-the-art methods. For example, on MSVR310, under infrared-to-visible retrieval, CaRS-Align achieves a Rank-1 accuracy of 64.35%, which is 2.58% higher than advanced existing methods.  Methods  CaRS-Align adopts a hierarchical optimization paradigm: (1) for each modality, a channel-channel relation spectrum is constructed by mining inter-channel dependencies, yielding a semantically coordinated relation matrix that preserves the organizational structure of semantic cues; (2) cross-modal consistency is achieved by maximizing the correlation between the relation spectra of the two modalities, enabling progressive optimization from intra-modal construction to cross-modal alignment; and (3) relation spectrum alignment is integrated with standard classification and retrieval objectives commonly used in re-identification to supervise backbone training for the vehicle re-identification model.  Results and Discussions  Compared with several state-of-the-art cross-modal re-identification methods on the RGBN300 and MSVR310 datasets, CaRS-Align demonstrates strong performance and achieves best or second-best results across both retrieval modes. As shown in (Table 1), on RGBN300 it attains 75.09% Rank-1 accuracy and 55.45% mean Average Precision (mAP) in the infrared-to-visible mode, and 76.60% Rank-1 accuracy and 56.12% mAP in the visible-to-infrared mode. As shown in (Table 2), similar advantages are observed on MSVR310, with 64.54% Rank-1 accuracy and 41.25% mAP in the visible-to-infrared mode, and 64.35% Rank-1 accuracy and 40.99% mAP in the infrared-to-visible mode. (Fig. 4) presents Top-10 retrieval results, where CaRS-Align reduces identity mismatches in both directions (Fig. 5) illustrates feature distance distributions, showing substantial overlap between intra-class and inter-class distances without CaRS-Align (Fig. 5(a)), whereas clearer separation is observed with CaRS-Align (Fig. 5(b)), confirming improved feature discrimination. These results indicate that modeling channel-level relational structures improves both retrieval modes, increases adaptability to modality shifts, and effectively reduces mismatches caused by cross-modal differences.  Conclusions  This paper proposes a visible-infrared cross-modal vehicle re-identification method based on CaRS-Align. Within each modality, a channel relation spectrum is constructed to preserve semantic co-occurrence structures. A CaRS-Align function is then designed to maximize the correlation between modalities, thereby achieving consistent alignment and improving cross-modal performance. Experiments on the MSVR310 and RGBN300 datasets demonstrate that CaRS-Align outperforms existing state-of-the-art methods in key metrics, including Rank-1 accuracy and mAP.
A Multi-scale Spatiotemporal Correlation Attention and State Space Modeling-based Approach for Precipitation Nowcasting
ZHENG Hui, CHEN Fu, HE Shuping, QIU Xuexing, ZHU Hongfang, WANG Shaohua
2026, 48(3): 1219-1229. doi: 10.11999/JEIT250786
Abstract:
  Objective  Precipitation nowcasting is a representative task in meteorological forecasting. It uses radar echoes or precipitation sequences to predict precipitation distribution in the next 0~2 hours. It supports disaster warning and key decision-making and protects lives and property. Current mainstream methods show loss of local details, limited representation of conditional information, and weak adaptability in complex regions. This study proposes a PredUMamba model based on a diffusion model. The model introduces a Mamba block with an adaptive zigzag scanning mechanism that extracts key local detail information and reduces computational complexity. A multi-scale spatiotemporal correlation attention module is also designed to enhance interactions across spatiotemporal hierarchies and to achieve a comprehensive representation of conditional information. In addition, a radar echo dataset tailored for complex regions is constructed for the southern Anhui mountainous area to evaluate the model's ability to predict sudden and extreme rainfall. This work provides an intelligent solution and theoretical support for precipitation nowcasting.  Methods  The PredUMamba model adopts a two-stage diffusion network. In the first stage, a frame-by-frame Variational AutoEncoder (VAE) is trained to map precipitation data from pixel space to a low-dimensional latent space. In the second stage, a diffusion network is built on the encoded latent space. An adaptive zigzag Mamba module with a spatiotemporal alternating scanning strategy is proposed. Sequential scanning is performed within rows and turn-back scanning is performed between rows. This design captures detailed precipitation-field features while maintaining low computational complexity. A multi-scale spatiotemporal correlation attention module is further introduced on temporal and spatial scales. On the temporal scale, adaptive convolution kernels and attention-based convolution layers extract local and global information. On the spatial scale, a lightweight correlation attention mechanism aggregates spatial information and strengthens historical conditional information representation. A radar dataset for the southern Anhui mountainous area is constructed to evaluate model adaptability in complex terrain.  Results and Discussions  The adaptive zigzag Mamba module and multi-scale spatiotemporal correlation attention module strengthen the model’s ability to capture intrinsic spatiotemporal dependencies. They extract conditional information more accurately and yield prediction results closer to real conditions. Experiments show that PredUMamba achieves the best performance across all indicators on the Southern Anhui Mountain Area and Shanghai radar datasets. On the SEVIR dataset, FVD, \begin{document}${\mathrm{CSI}}_{{-}}{\mathrm{pool4}} $\end{document}, and \begin{document}${\mathrm{CSI}}_{{-}}{\mathrm{pool6}} $\end{document} outperform other methods, and CSI and CRPS achieve competitive results. Visualization results further show that PredUMamba does not produce temporal blurring (Fig. 4). This indicates stronger stability and clear advantages in detail generation and motion-trend prediction. The model preserves edge details aligned with real precipitation fields and maintains accurate motion patterns.  Conclusions  This study proposes an innovative PredUMamba model based on a diffusion network architecture. Model performance is improved through a Mamba module with an adaptive zigzag scanning mechanism and a multi-scale spatiotemporal correlation attention module. The adaptive zigzag module captures fine-grained spatiotemporal features and reduces computational complexity. The multi-scale attention module strengthens historical conditional information extraction through temporal dual-branch processing and a lightweight spatial correlation mechanism, enabling joint representation of local and global features. A radar dataset for the southern Anhui mountainous area is also constructed to validate model applicability in complex terrain. The dataset covers precipitation under various terrain conditions and supports extreme rainfall prediction. Comparative experiments on the constructed dataset and on public datasets show that PredUMamba achieves the best results on the southern Anhui mountainous area and Shanghai datasets. On the SEVIR dataset, FVD, \begin{document}${\mathrm{CSI}}_{{-}}{\mathrm{pool4}} $\end{document}, and \begin{document}${\mathrm{CSI}}_{{-}}{\mathrm{pool6}} $\end{document} outperform other methods, and CRPS and CSI achieve competitive results. As this work focuses on a data-driven forecasting approach, future research will integrate physical-condition constraints to improve interpretability and enhance prediction accuracy for small- and medium-scale convective systems.
Multi-UAV RF Signals CNN|Triplet-DNN Heterogeneous Network Feature Extraction and Type Recognition
ZHAO Shen, LI Guangxuan, ZHOU Xiancheng, HUANG Wendi, YANG Lingling, GAO Liping
2026, 48(3): 1230-1240. doi: 10.11999/JEIT250757
Abstract:
  Objective  This study addresses the detection requirements of simultaneous Unmanned Aerial Vehicle (UAV) operations. The strategy is based on extracting model-specific information features from Radio Frequency (RF) time-frequency spectra. A CNN|Triplet-DNN heterogeneous network is developed to optimize feature extraction and classification. The method resolves the problem of identifying individual UAV models within coexisting RF signals and supports efficient multi-UAV management in complex environments.  Methods  The CNN|Triplet-DNN architecture uses a parallel-branch structure that integrates a Convolutional Neural Network (CNN) and a Triplet Convolutional Neural Network (Triplet-CNN). Branch 1 employs a lightweight CNN to extract global features from RF time-frequency diagrams while reducing computational cost. Branch 2 adds an enhanced center-loss function to strengthen feature discrimination and address ambiguous feature boundaries under complex conditions. Branch 3, based on a Triplet-CNN framework, applies Triplet Loss to capture local and global features of RF time-frequency diagrams. The complementary features from the three branches are fused and processed through a fully connected DNN with a Softmax activation function to generate probability distributions for UAV signal classification. This structure improves UAV type recognition performance.  Results and Discussions  RF signals from the open-source DroneRFa dataset were superimposed to simulate multi-UAV coexistence, and real-world drone signals were collected through controlled flights to build a comprehensive signal database. (1) Based on single-UAV RF time-frequency diagrams from the open-source dataset, ablation experiments (Fig. 7) were conducted on the three-branch CNN|Triplet-DNN structure to validate its design, and each model was trained. (2) The simulated multi-UAV coexistence dataset was used for identification tasks to evaluate recognition performance under coexistence conditions. Results (Fig. 10) show that recognition accuracy for four or fewer UAV types ranges from 83% to 100%, confirming the effectiveness of the CNN|Triplet-DNN model. (3) Each model was trained using the flight dataset and then applied to real multi-UAV coexistence identification. The CNN|Triplet-DNN achieved recognition accuracies of 86%, 57%, and 73% for two, three, and four UAV types, respectively (Fig. 13). Comparison with the CNN, Triplet-CNN, and Transformer models shows that the CNN|Triplet-DNN has stronger generalizability. All models exhibited performance degradation on real-world data relative to the open-source dataset, mainly because drones dynamically adjust communication frequency bands, which reduces recognition performance under coexistence scenarios.  Conclusions  A CNN|Triplet-DNN heterogeneous network is proposed for identifying RF signals emitted by multiple UAVs. The three-branch structure and backpropagation algorithm improve the extraction of discriminative aircraft-model features, and the DNN enhances model generalization. Experiments using open-source datasets and real flight scenarios verify the method’s effectiveness and practical value. Future work will address dataset expansion, model optimization for dynamic frequency-band adaptation, and improved recognition under complex coexistence conditions.
Circuit and System Design
Component Placement Algorithm Considering Reagent Type Differences in Cell Reuse for FPVA Biochips
XU Yanbo, ZHU Yuhan, HUANG Xing, LIU Genggeng
2026, 48(3): 1241-1251. doi: 10.11999/JEIT250731
Abstract:
  Objective  Fully Programmable Valve Array (FPVA) biochips, a recent type of flow-based microfluidic biochip, offer high flexibility and programmability, which enables them to meet different and complex experimental needs. Component placement is a critical stage in FPVA architectural synthesis because it affects several performance metrics, including assay completion time, total fluid-transport length, and cross-contamination. Cell reuse, an essential feature of FPVA programmability, requires special consideration during placement. However, existing studies have largely ignored the effect of reagent type differences in cell reuse on these metrics  Methods  This study presents a component placement algorithm for FPVA biochips that accounts for reagent type differences during cell reuse. The algorithm first introduces a cell reuse complexity metric that quantifies reuse complexity by considering the effects of reagent-type differences and component overlap on cross-contamination. It then integrates constraints, including placement-area limits and non-overlapping conditions for concurrent components, to ensure valid placement. The reward function is optimized to minimize reuse complexity and reduce the distance between components that use the same reagent type. The goal is to lower cross-contamination, total fluid-transport length, and assay completion time.  Results and Discussions  The algorithm is evaluated on benchmark FPVA instances with different chip sizes and functional requirements and compared with related methods. It reduces cell reuse complexity by 34.2%, assay completion time by 2.8%, and total fluid-transport length by 9.2% on average (Table 2). It also reduces the reagent-aware distance metric by 29.9% on average (Fig. 6). The learning agent’s decision trajectories show clear spatial structure, which reflects global placement awareness.  Conclusions  This study is the first to investigate FPVA component placement with attention to reagent type differences in cell reuse. The main contributions are as follows: (1) a cell reuse complexity metric is proposed to assess reuse intensity in placement, (2) the FPVA placement problem is modeled as a Markov decision process to enable the use of double deep Q-networks for safe and efficient placement policy learning, and (3) compared with existing work, the model improves FPVA biochemical assay performance and reliability.
Research on Low Leakage Current Voltage Sampling Method for Multi-cell Series Battery Packs
GUO Zhongjie, GAO Yuyang, DONG Jianfeng, BAI Ruokai
2026, 48(3): 1252-1261. doi: 10.11999/JEIT250733
Abstract:
  Objective  The battery voltage sampling circuit is a key component of the Battery Management Integrated Circuit (BMIC). It performs real-time monitoring of cell voltages, and its performance directly affects the safety of series battery packs. Traditional resistive voltage sampling circuits exhibit channel leakage current, which affects cell-voltage consistency and sampling accuracy. In addition, the level-shifting circuit in the high-voltage domain contains high-voltage operational amplifiers, and the use of many high-voltage MOSFETs increases area overhead.  Methods  This study proposes a low-leakage-current battery voltage sampling circuit for 14-series lithium batteries. Based on the traditional resistive sampling structure, channel leakage current is reduced to the pA level by designing an operational-amplifier-isolated active-drive technique. Voltage conversion methods are selected according to the voltage domain of each cell group. The first section of the battery uses a unity-gain buffer for isolation and then performs voltage conversion through resistive division. Sections 2 to 13 use operational-amplifier-isolated active driving to follow each cell voltage synchronously, after which the followed voltage is converted to a ground-referenced level through a level-shifting circuit. The voltage sampling process of the highest-section battery draws power from the entire battery stack and does not affect pack consistency; therefore, this section directly adopts the level-shifting circuit for voltage conversion.  Results and Discussions  The circuit was designed and verified using a 0.35 μm high-voltage BCD process. The overall layout area of the proposed sampling circuit is 3 105 μm × 638 μm (Fig. 10). Verification results show that, across different process corners and temperatures, the maximum channel leakage current after applying the isolated active-drive technique is only 48.9 pA. In contrast, the minimum leakage current of the traditional sampling circuit is 1.169 × 106 pA (Fig. 12, Fig. 13). The effect of the sampling process on cell-voltage inconsistency is reduced from 18.56% to 2.122 ppm (Fig. 14). Under full PVT verification, the maximum measurement error of the proposed sampling circuit is 0.9 mV (Fig. 15, Fig. 16, Fig. 17).  Conclusions  This study proposes an operational-amplifier-isolated active-drive technique to address the channel leakage issue in traditional resistive voltage sampling circuits, which affects cell-voltage consistency and measurement accuracy. Using the proposed circuit, the maximum channel leakage current is 48.9 pA, the cell-voltage inconsistency is 2.122 ppm, and the maximum measurement error is 1.25 mV. The circuit achieves very low leakage current while maintaining sampling accuracy. The proposed low-leakage-current sampling circuit is suitable for 14-series lithium battery management chips.
Hybrid PUF Tag Generation Technology for Battery Anti-counterfeiting
HE Zhangqing, LUO Siyu, ZHANG Junming, ZHANG Yin, WAN Meilin
2026, 48(3): 1262-1270. doi: 10.11999/JEIT250967
Abstract:
  Objective  A global shift toward a low-carbon economy has increased the importance of power batteries as energy storage devices. The traceability and security of their life cycle are central to industrial governance. In 2023, the Global Battery Alliance (GBA) proposed the Battery Passport, which requires each battery to carry a unique, tamper-resistant, and verifiable digital identity. Conventional digital tags, such as QR codes and RFID, rely on static pre-written storage and remain vulnerable to physical cloning, data extraction, and environmental degradation. This study proposes a battery anti-counterfeiting tag generation technology based on a hybrid Physical Unclonable Function (PUF). The method applies physical coupling among the battery, PCB, and IC to generate a unique battery ID, and ensures strong physical binding and system-level anti-counterfeiting performance.  Methods  The tag includes four modules: an off-chip RC battery fingerprint extraction circuit, an on-chip arbiter PUF module, an on-chip delay compensation module, and a reliability enhancement module. The off-chip RC circuit uses the physical coupling between the battery negative tab and the PCB copper-clad area to form a capacitor structure that introduces manufacturing variation as an entropy source. The arbiter PUF converts these deviations into a unique digital signature. To reduce bias caused by asymmetric routing and off-circuit delay, a programmable delay compensation module with coarse and fine-tuning stages is used. The reliability enhancement module filters unstable response bits by tracking delay deviation, and improves response reliability without complex error-correcting codes.  Results and Discussions  The structure was implemented and tested using an FPGA Spartan-6 chip, a custom PCB, and 100 Ah blade batteries. The randomness reached 48.85%, and uniqueness averaged 49.15% under normal conditions (Fig. 11). Stability (RA) reached 99.98% at room temperature and nominal voltage, and remained above 98% at 100 ℃ and 1.05 V (Fig. 12). To evaluate anti-desoldering performance, three tampering scenarios were tested: battery replacement, PCB replacement, and IC replacement. The average response change rates were 14.86%, 24.58%, and 41.66%, respectively (Fig. 13). These results show strong physical binding among the battery, PCB, and chip, and confirm that the triple physical coupling mechanism resists counterfeiting and tampering.  Conclusions  This study presents a battery anti-counterfeiting tag generation technology based on a triple physical coupling mechanism. By binding the battery tab, PCB, and chip into a unified physical structure and extracting fingerprints from manufacturing variation, the method provides high randomness, uniqueness, and stability. The tag is highly sensitive to physical tampering and supports reliable battery authentication across its life cycle. Future work will examine the structure using more advanced fabrication processes and different PCB manufacturers, and will further refine the design for broader application.
Research on Generation and Optimization of Dual-channel High-current Relativistic Electron Beams Based on a Single Magnet
AN Chenxiang, HUO Shaofei, SHI Yanchao, ZHAI Yonggui, XIAO Renzhen, CHEN Changhua, CHEN Kun, HUANG Huijie, SHEN Liuyang, LUO Kaiwen, WANG HongGuang, LI YuQing
2026, 48(3): 1271-1279. doi: 10.11999/JEIT250487
Abstract:
  Objective  High-Power Microwave (HPM) technology is a strategic frontier in defense, military, and civilian systems. The microwave output power of a single HPM source reaches a bottleneck because of physical limits, material constraints, and fabrication challenges. To address this issue, researchers have proposed HPM power synthesis, which increases peak power by integrating multiple HPM sources.  Methods  This study addresses the time synchronization problem in multipath HPM synthesis by designing a dual-channel high-current relativistic electron-beam generator. The device uses one pulse-power driver to drive two diodes simultaneously and applies one coil magnet to confine both electron beams. Three-dimensional particle-in-cell simulations revealed the angular nonuniformity of the beam current, and a cathode stalk modification is proposed to improve beam quality, whose effectiveness is subsequently validated by experiments.   Results and Discussions  Three-dimensional UNIPIC particle-in-cell simulations of the device’s physical processes revealed that: due to side emission from the cathode stalk, the dual electron beams exhibit significant angular nonuniformity. Specifically, the beam current density near the center of the magnetic field is relatively low, while it is higher in regions farther from the magnetic center. To address this issue, the structure of the cathode stalk was modified to suppress side emission. The angular current fluctuation of cathode emission in Tube 1 decreased dramatically from 35.61% to 2.93%, and that in Tube 2 decreased from 33.17% to 3.13%, improving beam quality. Simulations and experiments show that the device stably generates high-quality electron beams with a voltage of 800 kV and a current of 20 kA, reaching a total power of 16 GW. The current waveform remains stable within the 45 ns voltage half-width without impedance collapse.  Conclusions  The study provides a reliable basis for generating multipath high-current relativistic electron beams and for synthesizing the power of multiple HPM sources, demonstrating strong application potential.
Design of a CNN Accelerator Based on Systolic Array Collaboration with Inter-Layer Fusion
LU Di, WANG Zhen Fa
2026, 48(3): 1280-1291. doi: 10.11999/JEIT250867
Abstract:
  Objective  With the rapid deployment of deep learning in edge computing, the demand for efficient Convolutional Neural Network (CNN) accelerators continues to increase. Although traditional CPUs and GPUs provide strong computational capability, they incur high power consumption, long latency, and limited scalability in real-time embedded scenarios. FPGA-based accelerators, due to their reconfigurability and parallelism, provide a viable alternative. However, current designs often show low resource utilization, memory access bottlenecks, and difficulty in balancing throughput and energy efficiency. To address these issues, a systolic array–based CNN accelerator with inter-layer fusion optimization is proposed. The design integrates an enhanced memory hierarchy and optimized computation scheduling. Hardware-oriented convolution mapping and lightweight quantization are adopted to improve computational efficiency and reduce resource consumption, while meeting real-time inference requirements for applications such as intelligent surveillance and autonomous driving.  Methods  This study addresses core challenges in FPGA-based CNN accelerators, including data transfer overhead, insufficient resource utilization, and low processing unit efficiency. A hybrid accelerator architecture based on systolic array–assisted inter-layer fusion is proposed. Computation-intensive adjacent layers are tightly coupled and executed sequentially within a single systolic array, which reduces frequent off-chip memory accesses for intermediate results. This reduces data transfer overhead and power consumption and improves computation speed and overall energy efficiency. A dynamically reconfigurable systolic array is further developed to support multi-dimensional matrix multiplications with varying scales. This design avoids resource waste caused by fixed-function hardware and reduces FPGA logic consumption, thereby improving hardware adaptability and flexibility. A streaming systolic array computation scheme is also introduced through coordinated computation flow and control logic. Processing elements maintain a high-efficiency operating state, and data flows continuously through the computation engine in a pipelined and parallel manner. This improves processing unit utilization, reduces idle cycles, and increases overall throughput.  Results and Discussions  To determine appropriate quantization precision, experiments are conducted on the MNIST dataset using VGG16 and ResNet50 under fixed-point quantization with 12-bit, 10-bit, 8-bit, and 6-bit precision. As shown in Table 1, inference accuracy decreases significantly when precision falls below 8 bits, indicating that excessively low precision weakens model representational capacity. On the proposed accelerator, VGG16, ResNet50, and YOLOv8n achieve peak computational performances of 390.25 GOPS, 360.27 GOPS, and 348.08 GOPS, respectively. Performance comparisons with FPGA accelerators reported in the literature are summarized in Table 4. Table 5 presents comparisons with CPU and GPU platforms in terms of throughput and energy efficiency. For VGG16, ResNet50, and YOLOv8n, the proposed accelerator delivers throughput that is 1.76×, 3.99×, and 2.61× higher than the corresponding CPU platforms. Energy efficiency improves by 3.1× (VGG16), 2.64× (ResNet50), and 2.96× (YOLOv8n) compared with GPU platforms, demonstrating superior energy utilization.  Conclusions  A systolic array–assisted inter-layer fusion CNN accelerator architecture is proposed. A theoretical analysis of computational density confirms the performance advantages of the design. To address variation in convolution window sizes in the second layer, a dynamically reconfigurable systolic array method is developed. A streaming systolic array scheme is also implemented to sustain pipelined and parallel data flow within the computation engine. This design reduces idle cycles and improves throughput. Experimental results show that the accelerator achieves high computational performance with minimal loss in inference accuracy. Peak performances of 390.25 GOPS, 360.27 GOPS, and 348.08 GOPS are achieved for VGG16, ResNet50, and YOLOv8n, respectively. Compared with CPU and GPU platforms, the proposed accelerator shows superior energy efficiency and is suitable for resource-constrained and energy-sensitive edge computing scenarios.
Tri-Frequency Wearable Antenna Loaded with Artificial Magnetic Conductors
JIN Bin, ZHANG Jialin, DU Chengzhu, CHU Jun
2026, 48(3): 1292-1300. doi: 10.11999/JEIT251050
Abstract:
  Objective   The rapid advancement of 5G communication technology has expanded the use of antennas in aviation, radar, medical, and other wireless systems. Wearable antennas have gained attention because of their conformability. Loading an Artificial Magnetic Conductor (AMC) is an effective way to enhance wearable antenna performance. This method increases gain, improves the Front-to-Back Ratio (FBR), and provides radiation isolation between the antenna and the human body. This study presents a tri-band wearable antenna loaded with an AMC for the ISM band and 5G frequency bands.  Methods   A trident-structured tri-band monopole antenna operating at 2.5 GHz, 3.5 GHz, and 5.8 GHz is designed together with a ring-shaped tri-band AMC tuned to the same frequency bands. Both structures use semi-flexible Rogers 4003 substrate. A 4×5 AMC array is placed on the back of the antenna to form a wearable integrated antenna. Simulation, physical measurement, and human safety assessment are performed.  Results and Discussions  The integrated antenna shows simulated operating bands of 2.40\begin{document}$ \sim $\end{document}2.50 GHz, 3.15\begin{document}$ \sim $\end{document}3.80 GHz, and 5.56\begin{document}$ \sim $\end{document}6.02 GHz, and measured bands of 2.38\begin{document}$ \sim $\end{document}2.52 GHz, 3.30\begin{document}$ \sim $\end{document}3.86 GHz, and 5.54\begin{document}$ \sim $\end{document}7.86 GHz (Fig. 4). These bands cover the ISM scientific band (2.40\begin{document}$ \sim $\end{document}2.4835 GHz), the 5G-n78 band (3.3\begin{document}$ \sim $\end{document}3.8 GHz), and the 5G-WiFi 5.8 GHz band (5.725\begin{document}$ \sim $\end{document}5.875 GHz). Measured gains at 2.5 GHz, 3.5 GHz, and 5.8 GHz increase by 5.3 dB, 4.6 dB, and 2.2 dB compared with the unloaded state (Fig. 15). The FBR values reach 20.8 dB, 18.0 dB, and 18.8 dB, corresponding to improvements of 19.8 dB, 16.7 dB, and 12.4 dB relative to the unloaded AMC (Table 4). The AMC reflector reduces the Specific Absorption Rate (SAR), and all integrated antennas show SAR values below 0.025 W/kg/g (Table 6), well below the FCC and ETSI limits. Performance is also measured when the antenna is placed on the chest, back, and thigh (Fig. 16), confirming safe and flexible on-body use.  Conclusions   A tri-band wearable antenna incorporating an AMC array is developed using semi-flexible Rogers 4003 substrate. With a 4×5 AMC array integrated behind the antenna, the measured operating bands cover the ISM scientific band (2.40\begin{document}$ \sim $\end{document}2.483 5 GHz), the 5G-n78 band (3.3\begin{document}$ \sim $\end{document}3.8 GHz), and the 5G-WiFi 5.8 GHz band (5.725\begin{document}$ \sim $\end{document}5.875 GHz). The results confirm high gain, high FBR, and stable wearable performance suitable for human-worn devices.
Research on a Miniaturized Wide Stopband Folded Substrate Integrated Waveguide Filter
KE Rongjie, WANG Hongbin, CHENG Yujian
2026, 48(3): 1301-1310. doi: 10.11999/JEIT250869
Abstract:
To meet the requirements of 5G/6G communication systems for miniaturization, high integration, and a wide stopband, this paper proposes a fourth-order bandpass filter based on an eighth-mode Folded Substrate Integrated Waveguide (FSIW) using High-Temperature Co-Fired Ceramic (HTCC) technology. The design combines the miniaturization characteristics of FSIW with the three-dimensional integration capability of HTCC. Size reduction is achieved through an eighth-mode FSIW cavity structure with dimensions of 0.29λg × 0.29λg, where λg denotes the waveguide wavelength at the center operating frequency (f0). Metal vias suppress high-order mode coupling, a bent 10 microstrip line introduces transmission zeros, and an L-shaped stub improves the high-frequency response. Three controllable transmission zeros are generated in the upper stopband, achieving 20 dB@3.73f0. Measurements show a center frequency of 6.4 GHz. Although slight frequency deviation and insertion loss are observed, the design provides clear advantages in miniaturization, stopband width, and the number of transmission zeros compared with reported work, indicating potential for high-density integrated communication systems.  Objective  The rapid development of 5G/6G communication systems increases the demand for Radio Frequency (RF) microwave devices that provide miniaturization, high integration, and wide stopband performance. As core components of RF transceiver front-ends, bandpass filters transmit useful signals and suppress interference. Conventional Substrate Integrated Waveguide (SIW) filters often show large size, limited stopband extension, and insufficient control of transmission zeros, which restrict their use in high-density integrated systems. To address these challenges, this paper presents a miniaturized wide stopband fourth-order bandpass filter based on an eighth-mode FSIW structure and HTCC technology to achieve compact size and broad stopband performance.  Methods  The filter integrates the miniaturization capability of FSIW with the three-dimensional integration characteristics of HTCC. First, an eighth-mode FSIW cavity is developed by modifying a quarter-mode FSIW cavity. A square patch is replaced with a triangular patch (eighth-mode cavity I), followed by slot etching in the triangular patch (eighth-mode cavity II). Second, a fourth-order bandpass filter is constructed by symmetrically designing two triangular metal patches for each cavity type and stacking them vertically. A common metal layer (fifth layer) containing coupling windows enables coupling between the upper and lower cavities. Three techniques are used to optimize performance: metal vias to suppress high-order mode coupling, bent microstrip lines to generate transmission zeros, and an L-shaped stub to enhance high-frequency response. Parameter scanning of key dimensions (d2, s4, s6) verifies the controllability of transmission zeros. The filter is fabricated using HTCC on an Al2O3 substrate with relative permittivity 9.8 and loss tangent 0.000 2.  Results and Discussions  Measurements show a center frequency of 6.4 GHz. Although fabrication and assembly deviations cause slight frequency shift and additional insertion loss, the filter demonstrates strong performance compared with reported designs (Table 2). The size of 0.29λg×0.29λg is smaller than that of most SIW filters. The upper stopband extends to 20dB@3.73f0, outperforming filters of comparable size. Three controllable transmission zeros appear in the upper stopband, and parameter scanning confirms their tunability (Fig. 13).  Conclusions  A miniaturized wide stopband fourth-order bandpass filter based on an eighth-mode FSIW structure is presented. The eighth-mode cavity combined with HTCC technology achieves a compact footprint of 0.29λg×0.29λg, meeting the integration requirements of 5G/6G systems. The use of metal vias, bent microstrip lines, and L-shaped stubs generates a wide stopband of 20 dB@3.73f0 and three tunable transmission zeros, strengthening interference suppression. Adjustable parameters enable flexible tuning of transmission zero frequencies without affecting the passband, improving the adaptability of the design to different interference conditions. These advances address key challenges in miniaturization, stopband extension, and design flexibility of SIW filters, offering a practical solution for RF front-ends in next-generation high-density integrated communication systems.
Cryption and Network Information Security
Improved Related-tweak Attack on Full-round HALFLOOP-48
SUN Xiaomeng, ZHANG Wenying, YUAN Zhaozhong
2026, 48(3): 1311-1321. doi: 10.11999/JEIT251014
Abstract:
  Objective  HALFLOOP is a family of tweakable AES-like lightweight block ciphers used to encrypt automatic link establishment messages in fourth-generation high-frequency radio systems. Because the RotateRows and MixColumns operations diffuse differences rapidly, long differentials with high probability are difficult to construct, which limits attacks on the full cipher. This study examines full HALFLOOP-48 and evaluates its resistance to sandwich attacks in the related-tweak setting, a critical method in lightweight-cipher cryptanalysis.  Methods  A new truncated sandwich distinguisher framework is proposed to attack full HALFLOOP-48. The cipher is decomposed into three sub-ciphers, \begin{document}$ {{E}}_{0} $\end{document}, \begin{document}$ {{E}}_{1} $\end{document}. A model is built by applying an automatic search method based on the Boolean SATisfiability problem (SAT) to each part: byte-wise models for \begin{document}$ {{E}}_{0} $\end{document}, \begin{document}$ {{E}}_{1} $\end{document} and a bit-wise model for \begin{document}$ {E}_{\rm{m}} $\end{document}. For \begin{document}$ {E}_{\rm{m}} $\end{document}, a method is proposed to model large S-boxes using SAT, the Affine subspace Dimensional Reduction method (ADR). ADR converts the modeling of a high-dimensional set into two sub-problems for a low-dimensional set. ADR ensures that the SAT-searched differentials exist and that their probabilities are accurate, while reducing the size of Conjunctive Normal Form (CNF) clauses. It also enables the SAT method to search longer differentials efficiently when large S-boxes appear. To improve probability accuracy in \begin{document}$ {E}_{\rm{m}} $\end{document}, dependencies between \begin{document}$ {{E}}_{0} $\end{document} and \begin{document}$ {{E}}_{1} $\end{document} are evaluated across three layers, and their probabilities are multiplied. Two key-recovery attacks, a sandwich attack and a rectangle-like sandwich attack, are mounted on the distinguisher in the related-tweak scenario.  Results and Discussions  The SAT-based model reveals a critical weakness in HALFLOOP-48. A practical sandwich distinguisher for the first 8 rounds withprobability \begin{document}$ {2}^{-43.415} $\end{document} is identified. An optimal truncated sandwich distinguisher for 8-round HALFLOOP-48 with probability \begin{document}$ {2}^{-43.2} $\end{document} is then established by exploiting the clustering effect of the identified differentials. Compared with earlier results, this distinguisher is practical and extends the reach by two rounds. Using the 8-round distinguisher, both a sandwich attack and a rectangle-like sandwich attack are mounted on full-round HALFLOOP-48 under related tweaks. The sandwich attack requires data complexity of \begin{document}$ {2}^{32.8} $\end{document}, time complexity \begin{document}$ {2}^{96.2} $\end{document} and memory complexity \begin{document}$ {2}^{42.8} $\end{document}. For the rectangle-like sandwich attack, the data complexity is \begin{document}$ {2}^{16.2} $\end{document}, with time complexity \begin{document}$ {2}^{99.2} $\end{document} and memory complexity \begin{document}$ {2}^{26.2} $\end{document}. Compared with the previous results, these attacks reduce time complexity by \begin{document}$ {2}^{25.4} $\end{document} and memory complexity by \begin{document}$ {2}^{10} $\end{document}.  Conclusions  To handle the rapid diffusion of differences in HALFLOOP, a new perspective on sandwich attacks based on truncated differentials is developed by combining byte-wise and bit-wise models. The models for \begin{document}$ {{E}}_{0} $\end{document} and \begin{document}$ {{E}}_{1} $\end{document} are byte-wise and extend these two parts forward and backward into \begin{document}$ {E}_{\rm{m}} $\end{document}, which is based on bit-wise. To efficiently model the 8-bit S-box in the layer \begin{document}$ {E}_{\rm{m}} $\end{document}, which is bit-wise. To model the 8-bit S-box in Em efficiently, an affine subspace dimensional reduction approach is proposed. This model ensures compatibility between the two truncated differential trails and covers as many rounds as possible with high probability. It supports a new 8-round truncated boomerang distinguisher that outperforms previous distinguishers for HALFLOOP-48. Based on this 8-round truncated boomerang distinguisher, a key-recovery attack is achieved with success probability 63%. The results show that (1) the ADR method offers an efficient way to apply large S-boxes in lightweight ciphers, (2) the truncated boomerang distinguisher construction can be applied to other AES-like lightweight block ciphers, and (3) HALFLOOP-48 does not provide an adequate security margin for use in the U.S. military standard.
A Class of Double-twisted Generalized Reed-Solomon Codes and Their Extended Codes
CHENG Hongli, ZHU Shixin
2026, 48(3): 1322-1332. doi: 10.11999/JEIT251045
Abstract:
  Objective  Twisted Generalized Reed-Solomon (TGRS) codes have attracted considerable attention in coding theory due to their flexible structural properties. However, studies on their extended codes remain limited. Existing results indicate that only a small number of works examine extended TGRS codes, leaving gaps in the understanding of their error-correcting capability, duality properties, and applications. In addition, previously proposed parity-check matrix forms for TGRS codes lack clarity and do not cover all parameter ranges. In particular, the case h = 0 is not addressed, which limits applicability in scenarios requiring diverse parameter settings. Constructing non-Generalized Reed-Solomon (non-GRS) codes is of interest because such codes resist Sidelnikov-Shestakov and Wieschebrink attacks, whereas GRS codes are vulnerable. Maximum Distance Separable (MDS) codes, self-orthogonal codes, and almost self-dual codes are valued for their error-correcting efficiency and structural properties. MDS codes achieve the Singleton bound and are essential for distributed storage systems that require data reliability under node failures. Self-orthogonal and almost self-dual codes, due to their duality structures, are applied in quantum coding, secret sharing schemes, and secure multi-party computation. Accordingly, this paper aims to: (1) characterize the MDS and Almost MDS (AMDS) properties of double-twisted GRS codes \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and their extended codes \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}; (2) derive explicit and unified parity-check matrices for all valid parameter ranges, including h = 0; (3) establish non-GRS properties under specific parameter conditions; (4) provide necessary and sufficient conditions for self-orthogonality of the extended codes and almost self-duality of the original codes; and (5) construct a class of almost self-dual double-twisted GRS codes with flexible parameters for secure and reliable communication systems.  Methods   The study is based on algebraic coding theory and finite field methods. Explicit parity-check matrices are derived using properties of polynomial rings over \begin{document}$ {F}_{q} $\end{document}, Vandermonde matrix structures, and polynomial interpolation. The Schur product method is applied to determine non-GRS properties by comparing the dimensions of the Schur squares of the codes and their duals with those of GRS codes. Linear algebra and combinatorial techniques are used to characterize MDS and AMDS properties. Conditions are obtained by analyzing the nonsingularity of generator-matrix submatrices and solving systems involving symmetric sums of finite field elements. These conditions are expressed using the sets \begin{document}$ {S}_{k}(\boldsymbol{\alpha },{{\boldsymbol{\eta}} }) $\end{document},\begin{document}$ {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}, and \begin{document}$ {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. Duality theory is used to study orthogonality. A code C is self-orthogonal if \begin{document}$ C\subseteq {C}^{\bot } $\end{document} and its generator matrix satisfies \begin{document}$ {\boldsymbol{G}}{{\boldsymbol{G}}}^{\rm T}=\boldsymbol{O} $\end{document}. For almost self-dual codes with odd length and dimension-(n–1)/2, this condition is combined with the structure of the dual code and symmetric sum relations of αi to obtain necessary and sufficient conditions.  Results and Discussions   For MDS and AMDS properties, the following results are obtained. The extended double-twisted GRS code \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} is MDS if and only if \begin{document}$ 1\notin {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document} and \begin{document}$ 1\notin {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. The double-twisted GRS code \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} is AMDS if and only if \begin{document}$ 1\in {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document} and \begin{document}$ (0,1)\notin {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. The code \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}\begin{document}$ (0,1)\in {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. Unified parity-check matrices of \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} are derived for all \begin{document}$ 0\leq h\leq k-1 $\end{document}, removing previous restrictions that exclude h = 0. For non-GRS properties, when \begin{document}$ k\geq 4 $\end{document} and \begin{document}$ n-k\geq 4 $\end{document}, both \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and its extended code \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} are non-GRS for both \begin{document}$ 2k\geq n $\end{document} or \begin{document}$ 2k \lt n $\end{document}. This conclusion follows from the fact that the dimensions of their Schur squares exceed those of the corresponding GRS codes, which ensures resistance to Sidelnikov-Shestakov and Wieschebrink attacks. Regarding orthogonality, the extended code \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} with \begin{document}$ h=k-1 $\end{document} is self-orthogonal under specific algebraic conditions. The code \begin{document}$ {C}_{k,\boldsymbol{h},{\boldsymbol{\eta }}}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} with \begin{document}$ h=k-1 $\end{document} and \begin{document}$ n=2k+1 $\end{document} is almost self-dual if and only if there exists \begin{document}$ \lambda \in F_{q}^{*} $\end{document} such that \begin{document}$ \lambda {u}_{j}=v_{j}^{2} (j=1,2,\cdots ,2k+1) $\end{document} together with a symmetric sum condition on \begin{document}$ {\alpha }_{i} $\end{document} involving \begin{document}$ {\eta }_{1} $\end{document} and \begin{document}$ {\eta }_{2} $\end{document}. For odd prime power \begin{document}$ q $\end{document}, an almost self-dual code with parameters \begin{document}$[q-t-1,(q-t-2)/2 $\end{document}, \begin{document}$ \geq (q-t-2)/2] $\end{document} is constructed using the roots of \begin{document}$ m(x)=({x}^{q}-x)/f(x) $\end{document} where \begin{document}$ f(x)={x}^{t+1}-x $\end{document}. An example over \begin{document}$ {F}_{11} $\end{document} yields a \begin{document}$ [5,2,\geq 2] $\end{document} code.  Conclusions   The study advances the theory of double-twisted GRS codes and their extensions through five contributions: (1) complete characterization of MDS and AMDS properties using sets \begin{document}$ {S}_{k} $\end{document},\begin{document}$ {L}_{k} $\end{document},\begin{document}$ {D}_{k} $\end{document}; (2) unified parity-check matrices for all \begin{document}$ 0\leq h\leq k-1 $\end{document}; (3) non-GRS properties are established for \begin{document}$ k\geq 4 $\end{document}, ensuring resistance to known structural attacks; (4) necessary and sufficient conditions for self-orthogonal extended codes and almost self-dual original codes are obtained; (5) a flexible construction of almost self-dual double-twisted GRS codes is proposed. These results extend the theoretical understanding of TGRS-type codes and support the design of secure and reliable coding systems.
Total Coloring on Planar Graphs of Nested n-Pointed Stars
SU Rongjin, FANG Gang, ZHU Enqiang, XU Jin
2026, 48(3): 1333-1342. doi: 10.11999/JEIT250861
Abstract:
  Objective  Many combinatorial optimization problems can be regarded as graph coloring problems. A classic topic in this field is total coloring, which combines vertex coloring and edge coloring. Previous studies and current research focus on the Total Coloring Conjecture (TCC), proposed in the 1960s. For graphs, including planar graphs, with maximum degree less than six, the correctness of the TCC has been verified through case enumeration. For planar graphs with maximum degree greater than six, the discharging technique has been used to confirm the conjecture by identifying reducible configurations and establishing detailed discharging rules. This method becomes limited when applied to planar graphs with maximum degree exactly six. Only certain restricted classes of graphs have been shown to satisfy the TCC, such as graphs without 4-cycles and graphs without adjacent triangles. More recent work demonstrates that the TCC holds for planar graphs without 4-fan subgraphs and for planar graphs with maximum average degree less than twenty-three fifths. Thus, it remains unclear whether planar graphs with maximum degree six that contain a 4-fan subgraph or have maximum average degree at least twenty-three fifths satisfy the conjecture. To address this question, this paper studies total coloring of a class of planar graphs known as nested n-pointed stars and aims to show that the TCC holds for these graphs.  Methods  The study relies on theoretical methods, including mathematical induction, constructive techniques, and case enumeration. An n-pointed star is obtained by connecting each edge of an n-polygon (n ≥ 3) to a triangle and then joining the triangle vertices not on the polygon to form a new n-polygon. Repeating this operation produces a nested n-pointed star with l layers, denoted by \begin{document}$ G_{n}^{l} $\end{document}. These graphs have maximum degree exactly six. Their structural properties, including the presence of 4-fan subgraphs and maximum average degree greater than twenty-three fifths, are established. Induction on the number of layers is then used to show that \begin{document}$ G_{n}^{l} $\end{document} has a total 8-coloring: (1) \begin{document}$ G_{n}^{1} $\end{document} has a total 8-coloring; (2) Suppose that \begin{document}$ G_{n}^{l-1} $\end{document} has a total 8-coloring; (3) prove that \begin{document}$ G_{n}^{l} $\end{document} has a total 8-coloring. A graph \begin{document}$ G_{n}^{l} $\end{document} is defined as a type I graph if it has a total 7-coloring. When \begin{document}$ n=3k $\end{document}, constructive arguments show that \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph. The value of \begin{document}$ k $\end{document} is considered in two cases, \begin{document}$ (k=2m-1) $\end{document} and \begin{document}$ (k=2m) $\end{document}. In both cases, a total 7-coloring of \begin{document}$ G_{3k}^{l} $\end{document} is obtained by directly assigning colors to all vertices and edges.  Results and Discussions  Induction on the number of layers of \begin{document}$ G_{n}^{l} $\end{document} that nested n-pointed stars satisfy the Total Coloring Conjecture (Fig. 5). Five colors are assigned to the vertices and edges of \begin{document}$ G_{3k}^{1} $\end{document} to obtain a total 5-coloring (Fig. 6(a) and Fig. 8(a)). Two additional colors are then applied alternately to the edges connecting the polygons in layers 1 and 2. This produces a total 7-coloring of \begin{document}$ G_{3k}^{2} $\end{document} (Fig. 7(a) and Fig. 9(a)). After a permutation of the colors, another total 7-coloring of \begin{document}$ G_{3k}^{3} $\end{document} is obtained (Fig. 7(b) and Fig. 9(b)). The coloring pattern on the outermost layer is identical to that of \begin{document}$ G_{3k}^{1} $\end{document}, which allows the same extension to construct total 7-colorings for \begin{document}$ G_{3k}^{4},G_{3k}^{5},\cdots ,G_{3k}^{l} $\end{document}. Therefore, \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph.  Conclusions  This study verifies that the Total Coloring Conjecture holds for nested n-pointed stars, which have maximum degree six and contain 4-fan subgraphs. It shows that \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph. A further question arises regarding whether \begin{document}$ G_{n}^{l} $\end{document} is a type I graph when \begin{document}$ n\neq 3k $\end{document}. A total 7-coloring can be constructed when \begin{document}$ n=4 $\end{document} or \begin{document}$ n=5 $\end{document}, and therefore both \begin{document}$ G_{4}^{l} $\end{document} and \begin{document}$ G_{5}^{l} $\end{document} are type I graphs. For other values of \begin{document}$ n\neq 3k $\end{document}, whether \begin{document}$ G_{n}^{l} $\end{document} is a type I graph remains open.
Wireless Communication and Internet of Things
Neighboring Mutual-Coupling Channel Model and Tunable-Impedance Optimization Method for Reconfigurable-Intelligent-Surface Aided Communications
WU Wei, WANG Wennai
2026, 48(3): 1343-1353. doi: 10.11999/JEIT251109
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RIS) attract increasing attention due to their ability to controllably manipulate electromagnetic wave propagation. A typical RIS consists of a dense array of Reflecting Elements (REs) with inter-element spacing no greater than half a wavelength, under which electromagnetic mutual coupling inevitably occurs between adjacent REs. This effect becomes more pronounced when the element spacing is smaller than half a wavelength and can significantly affect the performance and efficiency of RIS-assisted systems. Accurate modeling of mutual coupling is therefore essential for RIS optimization. However, existing mutual-coupling-aware channel models usually suffer from high computational complexity because of the large dimensionality of the mutual-impedance matrix, which restricts their practical use. To address this limitation, a simplified mutual-coupling-aware channel model based on a sparse neighboring mutual-coupling matrix is proposed, together with an efficient optimization method for configuring RIS tunable impedances.  Methods  First, a simplified mutual-coupling-aware channel model is established through two main steps. (1) A neighboring mutual-coupling matrix is constructed by exploiting the exponential decay of mutual impedance with inter-element distance. (2) A closed-form approximation of the mutual impedance between the transmitter or receiver and the REs is derived under far-field conditions. By taking advantage of the rapid attenuation of mutual impedance as spacing increases, only eight or three mutual-coupling parameters, together with one self-impedance parameter, are retained. These parameters are arranged into a neighboring mutual-coupling matrix using predefined support matrices. To further reduce computational burden, the distance term in the mutual-impedance expression is approximated by a central value under far-field assumptions, which allows the original integral formulation to be simplified into a compact analytical expression. Based on the resulting channel model, an efficient optimization method for RIS tunable impedances is developed. Through impedance decomposition, a closed-form expression for the optimal tunable-impedance matrix is derived, enabling low-complexity RIS configuration with computational cost independent of the number of REs.  Results and Discussions  The accuracy and computational efficiency of the proposed simplified models, as well as the effectiveness of the proposed impedance optimization method, are validated through numerical simulations. First, the two simplified models are evaluated against a reference model. The first simplified model accounts for mutual coupling among elements separated by at most one intermediate unit, whereas the second model considers only immediately adjacent elements. Results indicate that channel gain increases as element spacing decreases, with faster growth observed at smaller spacings (Fig. 4). The modeling error between the simplified models and the reference model remains below 0.1 when the spacing does not exceed λ/4, but increases noticeably at larger spacings. Error curves further show that the modeling errors of both simplified models become negligible when the spacing is below λ/4, indicating that the second model can be adopted to further reduce complexity (Fig. 6). Second, the computational complexity of the proposed models is compared with that of the reference model. When the number of REs exceeds four, the complexity of computing the mutual-coupling matrix in the reference model exceeds that of the proposed neighboring mutual-coupling model. As the number of REs increases, the complexity of the reference model grows rapidly, whereas that of the proposed model remains constant (Fig. 5). Finally, the proposed impedance optimization method is compared with benchmark method (Fig. 7, Fig. 8). When the element spacing is no greater than λ/4, the channel gain achieved by the proposed method approaches that of the benchmark method. As the spacing increases beyond this range, a clear performance gap emerges. In all cases, the proposed method yields higher channel gain than the coherent phase-shift optimization method.  Conclusions  The integration of a large number of densely arranged REs in an RIS introduces notable mutual coupling effects, which can substantially influence system performance and therefore must be considered in channel modeling and impedance optimization. A simplified mutual-coupling-aware channel model based on a neighboring mutual-coupling matrix has been proposed, together with an efficient tunable-impedance optimization method. By combining the neighboring mutual-coupling matrix with a simplified mutual-impedance expression derived under far-field assumptions, a low-complexity channel model is obtained. Based on this model, a closed-form solution for the optimal RIS tunable impedances is derived using impedance decomposition. Simulation results confirm that the proposed channel model and optimization method maintain satisfactory accuracy and effectiveness when the element spacing does not exceed λ/4. The proposed framework provides practical theoretical support and useful design guidance for analyzing and optimizing RIS-assisted systems under mutual coupling effects.
High-Efficiency Side-Channel Analysis: From Collaborative Denoising to Adaptive B-Spline Dimension Reduction
LUO Yuling, XU Haiyang, OUYANG Xue, FU Qiang, QIN Sheng, LIU Junxiu
2026, 48(3): 1354-1365. doi: 10.11999/JEIT251047
Abstract:
  Objective  The performance of side-channel attacks is often constrained by the low signal-to-noise ratio of raw power traces, the masking of local leakage by redundant high-dimensional data, and the reliance on empirically chosen preprocessing parameters. Existing studies typically optimize individual stages, such as denoising or dimensionality reduction, in isolation, lack a unified framework, and fail to balance signal-to-noise ratio enhancement with the preservation of local leakage features. A unified analysis framework is therefore proposed to integrate denoising, adaptive parameter selection, and dimensionality reduction while preserving local leakage characteristics. Through coordinated optimization of these components, both the efficiency and robustness of side-channel attacks are improved.  Methods  Based on the similarity of power traces corresponding to identical plaintexts and the local approximation properties of B-splines, a side-channel analysis method combining collaborative denoising and Adaptive B-Spline Dimension Reduction (ABDR) is presented. First, a Collaborative Denoising Framework (CDF) is constructed, in which high-quality traces are selected using a plaintext-mean template, and targeted denoising is performed via singular value decomposition guided by a singular-value template. Second, a Neighbourhood Asymmetry Clustering (NAC) method is applied to adaptively determine key thresholds within the CDF. Finally, an ABDR algorithm is proposed, which allocates knots non-uniformly according to the variance distribution of power traces, thereby enabling efficient data compression while preserving critical local leakage features.  Results and Discussions  Experiments conducted on two datasets based on 8-bit AVR (OSR2560) and 32-bit ARM Cortex-M4 (OSR407) architectures demonstrate that the CDF significantly enhances the signal-to-noise ratio, with improvements of 60% on OSR2560 (Fig. 2) and 150% on OSR407 (Fig. 4). The number of power traces required for successful key recovery is reduced from 3 000/2 400 to 1 200/1 500 for the two datasets, respectively (Figs. 3 and 5). Through adaptive threshold selection in the CDF, NAC achieves faster and more stable guessing-entropy convergence than fixed-threshold and K-means-based strategies, which enhances overall robustness (Fig. 6). The ABDR algorithm places knots densely in high-variance leakage regions and sparsely in low-variance regions. While maintaining a high attack success rate, it reduces the data dimensionality from 5 000 and 5 500 to 1 000 and 500, respectively, corresponding to a compression rate of approximately 80%. At the optimal dimensionality (Fig. 7), the correlation coefficients of the correct key reach 0.186 0 on OSR2560 and 0.360 5 on OSR407, both exceeding those obtained using other dimensionality reduction methods. These results indicate superior local information retention and attack efficiency (Tables 3 and 4).  Conclusions  The results confirm that the proposed CDF substantially improves the signal-to-noise ratio of power traces, while NAC enables adaptive parameter selection and enhances robustness. Through accurate local modeling, ABDR effectively alleviates the trade-off between high-dimensional data reduction and the preservation of critical leakage information. Comprehensive experimental validation shows that the integrated framework addresses key challenges in side-channel analysis, including low signal-to-noise ratio, redundancy-induced information masking, and dependence on empirical parameters, and provides a practical and scalable solution for real-world attack scenarios.
Spatio-Temporal Constrained Refined Nearest Neighbor Fingerprinting Localization
WANG Yifan, SUN Shunyuan, QIN Ningning
2026, 48(3): 1366-1376. doi: 10.11999/JEIT250777
Abstract:
  Objective  Indoor fingerprint-based localization faces three key challenges. First, Dimensionality Reduction (DR), used to reduce storage and computational costs, often disrupts the geometric correlation between signal features and physical space, which reduces mapping accuracy. Second, signal features present temporal variability caused by human movement or environmental changes. During online mapping, this variability introduces bias and distorts similarity between target and reference points in the low-dimensional space. Third, pseudo-neighbor interference persists because environmental noise or imperfect similarity metrics lead to inaccurate neighbor selection and skew position estimates. To address these issues, this study proposes a Spatio-Temporal Constrained Refined Nearest Neighbor (STC-RNL) fingerprinting localization algorithm designed to provide robust, high-accuracy localization under complex interference conditions.  Methods  In the offline phase, a robust DR framework is constructed by integrating two constraints into a MultiDimensional Scaling (MDS) model. A spatial correlation constraint uses physical distances between reference points and assigns stronger associations to proximate locations to preserve alignment between low-dimensional features and the real layout. A temporal consistency constraint clusters multiple temporal signal samples from the same location into a compact region to suppress feature drift. These constraints, combined with the MDS structure-preserving loss, form the optimization objective, from which low-dimensional features and an explicit mapping matrix are obtained. In the online phase, a progressive refinement mechanism is applied. An initial candidate set is selected using a Euclidean distance threshold. A hybrid similarity metric is then constructed by enhancing shared-neighbor similarity with a Sigmoid-based strategy, which truncates low and smooths high similarities, and fusing it with Euclidean distance to improve discrimination of true neighbors. Subsequently, an iterative Z-score-based filtering procedure removes reference points that deviate from local group characteristics in feature and coordinate domains. The final position is estimated through a similarity-weighted average over the refined neighbor set, assigning higher weights to more reliable references.  Results and Discussions  The performance of STC-RNL is assessed on a private ITEC dataset and a public SYL dataset. The spatio-temporal constraints enhance the robustness of the mapping matrix under noisy conditions (Table 2). Compared with baseline DR methods, the proposed module reduces mean localization error by at least 6.30% in high-noise scenarios (Fig. 9). In the localization stage, the refined neighbor selection reduces pseudo-neighbor interference. On the ITEC dataset, STC-RNL achieves an average error of 0.959 m, improving performance by 9.61% to 33.68% compared with SSA-XGBoost and SPSO (Table 1). End-to-end comparisons show that STC-RNL reduces the average error by at least 12.42% on ITEC and by at least 7.08% on SYL (Table 2), and its CDF curves demonstrate faster convergence and higher precision, especially within the 1.2 m range (Fig. 10). These results indicate that the algorithm maintains high stability and accuracy with a lower maximum error across datasets.  Conclusions  The STC-RNL algorithm addresses structural distortion and mapping bias found in traditional DR-based localization. By jointly optimizing offline feature embedding with spatio-temporal constraints and online neighbor selection with progressive refinement, the coupling between signal features and physical coordinates is strengthened. The main innovation lies in a synergistic framework that ensures only high-confidence neighbors contribute to the final estimate, improving accuracy and robustness in dynamic environments. Experiments show that the model reduces average localization error by 12.42%~32.80% on ITEC and by 7.08%~13.67% on SYL relative to baseline algorithms, while achieving faster error convergence. Future research may incorporate nonlinear manifold modeling to further improve performance in heterogeneous access point environments.
Radar, Sonar,Navigation and Array Signal Processing
Detection and Parameter Estimation of Quadratic Frequency Modulated Signal Based on Non-uniform Quadrilinear Autocorrelation Function
YANG Yuchao, FANG Gang
2026, 48(3): 1377-1389. doi: 10.11999/JEIT250723
Abstract:
  Objective  Polynomial Phase Signal (PPS) analysis has attracted broad attention because many radar, sonar, and seismic signals are modeled as PPS of different orders. A first-order PPS can be focused into a frequency bin through the Fourier transform to estimate the center frequency. For higher order PPS, such as a Quadratic Frequency Modulated (QFM) signal, non-coherent characteristics limit the effectiveness of the Fourier transform for energy integration. Existing time–frequency distribution methods, such as the short-time Fourier transform and the Wigner-Ville distribution, do not resolve the conflicts between auto-terms and cross-terms or between time- and frequency-domain resolution. In addition, current algorithms face difficulties in balancing computational complexity and detection performance, which results in reduced parameter estimation accuracy. This study proposes a QFM detection method based on a non-uniform quadrilinear autocorrelation function to provide balanced performance for QFM parameter estimation with controlled computational cost.  Methods  A time-frequency distribution method for QFM detection and parameter estimation is presented. The method applies non-uniform sampling and maps a one-dimensional signal into a two-dimensional time domain through a forth-order autocorrelation function. A non-uniform fast Fourier transform is used to resolve the time variable and concentrate the energy into a vertical line in the two-dimensional plane. Then, FFT is performed along this line to focuse the signal into a peak, from which the chirp rate and quadratic chirp rate are estimated. Finally, dechirp processing compensates high-order phase terms of the original signal, and FFT yields the center frequency estimation result can be obtaioned through FFT operation.  Results and Discussions  Theoretical analysis and simulation results show that the method balances computational complexity and detection performance. Under low signal-to-noise ratio conditions, it distinguishes targets effectively and produces accurate parameter estimates (Fig. 1). For multicomponent signals with large amplitude differences, it enables stepwise detection and estimation (Fig. 2). Comparative experiments with state-of-the-art algorithms show that the method is quasi-optimal in estimation accuracy and integration gain (Fig. 3Fig. 6). Compared with the ML estimator, it offers markedly higher computational efficiency.  Conclusions  A QFM detection and parameter estimation method based on non-uniform quadrilinear autocorrelation functions is proposed. The method maps the QFM signal into a two-dimensional time domain through a new autocorrelation kernel and achieves coherent integration through scaling and FFT. Mathematical analysis and simulation results show that, relative to the ML method, it sacrifices part of the detection performance but substantially reduces computational complexity. When computational efficiency is similar, it outperforms other classical methods in detection and parameter estimation accuracy. The method provides a balanced solution for QFM signal detection and parameter estimation.
An Optimized Multi-Layer Equivalent Source Method for Spatial Continuation of Magnetic Anomalies in the Geomagnetic Background
GUAN Yu, ZHANG Huiqiang
2026, 48(3): 1390-1400. doi: 10.11999/JEIT250958
Abstract:
  Objective  Spatial continuation of magnetic anomalies is a key technique in potential field data processing and supports geological interpretation and geomagnetic navigation. Existing methods remain limited: frequency-domain approaches are severely ill-posed and amplify high-frequency noise during downward continuation, whereas traditional single-layer equivalent source methods often fail to fit multi-scale anomalies generated by sources at different depths. Although the Multilayer Equivalent Source (MES) model improves depth resolution, its performance is constrained by subjective parameter selection and instability in large-scale inversion, which can lead to the loss of high-frequency structural information. This study proposes an optimized MES method for high-precision continuation in complex geological environments. The method establishes an objective parameterization scheme by combining Radially Averaged Power Spectrum (RAPS) analysis with Variational Mode Decomposition (VMD) to separate sources. It also introduces a collaborative inversion scheme based on the Fungal Growth Optimizer (FGO) and the Preconditioned Conjugate Gradient (PCG) method to adaptively optimize regularization parameters, suppress ill-posedness, and improve reconstruction robustness under noise.  Methods  A four-step technical framework is developed. (1) Model construction: A Multi-layer Equivalent Source (MES) model is formed using uniformly magnetized rectangular prisms to represent subsurface sources. (2) Parameter configuration: An objective scheme combining RAPS and VMD is applied. RAPS estimates average source-layer depths from slope variations in the logarithmic power spectrum. VMD then decomposes the magnetic signal into intrinsic mode functions representing different depths, enabling calculation of layer thickness using the ratio of the Mean Total Horizontal Gradient (MTHD). (3) Collaborative inversion: A robust inversion strategy incorporates FGO into the PCG algorithm. Tikhonov regularization forms the objective function to mitigate ill-posedness, and FGO adaptively searches for optimal hyperparameters, including the regularization parameter, step-size scaling factor, and preconditioner weights, improving solution stability and convergence efficiency. (4) Comprehensive validation: Three evaluations are conducted. A five-prism theoretical model is used to benchmark performance against single-layer, double-layer, and frequency-domain methods. The global EMAG2 magnetic anomaly model with 5% Gaussian noise is applied to assess robustness. Finally, real aeromagnetic data from the Australian magnetic anomaly grid are tested in two sub-regions—a complex tectonic zone (Area A) and a sedimentary basin (Area B)—for downward continuation from 2 000 m to 0 m, using RMSE and GOF as indicators.  Results and Discussions  The performance of the proposed method is validated in three stages. (1) Theoretical model verification: The radial average logarithmic power spectrum (Fig. 3) and VMD analysis (Fig. 4) identify three equivalent source layers, demonstrating the objectivity of the parameter configuration framework. The FGO-optimized inversion accelerates convergence by approximately 5~6 times and reduces the residual norm by 13% compared with the traditional Conjugate Gradient (CG) method (Fig. 7). In the 100 m upward continuation (Fig. 8, Table 4) and downward continuation (Fig. 9, Table 5) tests, the proposed method attains the lowest RMSE and highest GOF, addressing the ill-posedness of frequency-domain methods and the large fitting errors of single- and double-layer models. (2) Robustness analysis: Using the EMAG2 data (Fig. 10), the method demonstrates strong noise resistance. With 5% Gaussian noise added to the 1 000 m observation data, the downward continuation results remain stable and free of noticeable artifacts. Quantitative evaluation (Table 6) yields an RMSE of 7.36 nT and a GOF of 82.65%, confirming robustness in low signal-to-noise conditions. (3) Generalization verification: When applied to Australian magnetic anomaly grid data, two different geological regions are examined (Fig. 11, Fig. 12). In Area B (sedimentary basin), which has smooth gradients, the method achieves high-fidelity reconstruction with a GOF of 84.28% and an RMSE of 29.06 nT. In Area A (complex tectonic zone), despite the exponential decay of high-frequency signals, the method recovers key structural features (GOF = 76.14%), although localized residuals appear in high-gradient areas because of physical limits in field transformation. These findings support the method’s applicability across varied geological textures.  Conclusions  This study proposes a robust spatial continuation method for magnetic anomalies based on an optimized MES framework. By integrating RAPS analysis with VMD, the method establishes an objective parameterization scheme that reduces subjectivity in model construction. The incorporation of the FGO into the inversion algorithm improves convergence speed and stability, mitigating the ill-posedness inherent in downward continuation. Experimental results show that: (1) the method exhibits strong robustness, maintaining high signal fidelity under 5% Gaussian noise, as confirmed by the EMAG2 model tests; and (2) the method has broad geological applicability. In real Australian aeromagnetic grid data, it achieves high-precision reconstruction in deep sedimentary basins (Area B) and recovers major structural features in complex tectonic zones (Area A), outperforming traditional single-layer and frequency-domain methods. A remaining limitation is high memory demand due to storage of large dense kernel matrices. Future work will explore matrix compression or matrix-free inversion strategies to improve computational efficiency for large-scale geomagnetic data processing.