Advanced Search

2024 Vol. 46, No. 2

Cover
Cover
2024, 46(2)
Abstract:
2024, 46(2): 1-4.
Abstract:
Special Topic on Frontiers and Applications of New Generation Artificial Intelligence
Some Applications and Progress of Set Pair Theory in Artificial Intelligence
ZHAO Keqin
2024, 46(2): 383-407. doi: 10.11999/JEIT230889
Abstract:
Set Pair Theory(SPT) regards the spacetime of things as a Deterministic Uncertainty(D-U) spacetime which is both definite and uncertain, treats certainty and uncertainty of things as a system of certainty and uncertainty, and “Objective recognition, systematic description, quantitative description, concrete analysis and practical test” of uncertainty, in the application of continuous development. After reviewing the source and property of Set Pair (SP) and its Connection Number (CN), the pairwise principles and uncertainty principle of set pair theory, the uncertainty system theory and the theory of similarities and differences, and the basic algorithms; some applications of set pair theory in intelligent definition, space data rapid evaluation and multi-radar signal sorting, intelligent prediction of complex systems, intelligent decision-making under uncertainty, connection digitalization of natural numbers and intelligent calculation of groups are summarized. This paper briefly introduces some progresses of set pair theory in the field of intelligent algorithm innovation, including the green intelligent computation involving the calculation of partial connection coefficient and the conservation of system energy of connection number, etc.. It is expected that the green intelligent algorithm based on “set-to-theory non-set-to-theory” integration will be more applied in the new generation of artificial intelligence.
Theory of Cognitive Relativity — The Road to Strong Artificial Intelligence
LI Yujian
2024, 46(2): 408-427. doi: 10.11999/JEIT230749
Abstract:
Artificial Intelligence(AI) develops in full swing with a great potential to surpass human, leading many people to believe that a singularity is imminent and that strong AI is about to be realized. This is a misconception of strong AI, because the core of strong AI is not whether it is powerful, but whether it has consciousness. In this article, firstly, the connotation of strong AI is explained, and the related problem of consciousness is discussed; Then, the ideas of Theory of Cognitive Relativity is elucidated, aimed at revealing the secret of consciousness, including the Principle of World’s Relativity, the Principle of Symbol’s Relativity, together with the relationships between world, language and mind. Subsequently, another new principle is expounded, namely the Principle of Consciousness’ Equivalence, to show the physical conditions required for arising of consciousness from matter, and to solve the hard problem of subjective experience or phenomenal consciousness, and to establish the fundamental theorem of cognition that conscious ability is limited by sensory ability with the upper bound of sensory capacity, and to analyze the possibility of where consciousness is as well as what self is. Finally, under the framework of the theory of cognitive relativity, a new creed for solving the puzzle of consciousness and a new guide for implementing machine consciousness are presented, with the future of strong AI envisioned.
Review of Deep Gradient Inversion Attacks and Defenses in Federated Learning
SUN Yu, YAN Yu, CUI Jian, XIONG Gaojian, LIU Jianhua
2024, 46(2): 428-442. doi: 10.11999/JEIT230541
Abstract:
As a distributed machine learning approach that preserves data ownership while releasing data usage rights, federated learning overcomes the challenge of data silos that hinder large-scale modeling with big data. However, the characteristic of only sharing gradients without training data during the federated training process does not guarantee the confidentiality of users’ training data. In recent years, novel deep gradient inversion attacks have demonstrated the ability of adversaries to reconstruct private training data from shared gradients, which poses a serious threat to the privacy of federated learning. With the evolution of gradient inversion techniques, adversaries are increasingly capable of reconstructing large volumes of data from deep neural networks, which challenges the Privacy-Preserving Federated Learning (PPFL) with encrypted gradients. Effective defenses mainly rely on perturbation transformations to obscure original gradients, inputs, or features to conceal sensitive information. Firstly, the gradient inversion vulnerability in PPFL is highlighted and the threat model in gradient inversion is presented. Then a detailed review of deep gradient inversion attacks is conducted from the perspectives of paradigms, capabilities, and targets. The perturbation-based defenses are divided into three categories according to the perturbed objects: gradient perturbation, input perturbation, and feature perturbation. The representative works in each category are analyzed in detail. Finally, an outlook on future research directions is provided.
A Review of Research Progress on Brain-Computer Interface Systems for Rapid Serial Visual Presentation Based on ElectroEncephaloGram
WEI Wei, QIU Shuang, LI Xujin, MAO Jiayu, WANG Yanzi, HE Huiguang
2024, 46(2): 443-455. doi: 10.11999/JEIT230952
Abstract:
Brain-Computer Interface (BCI) system establishes a direct communication pathway between the brain and external devices, and combined with the Rapid Serial Visual Presentation (RSVP) paradigm, it can achieve high-throughput target image retrieval by utilizing the human visual system. In recent years, the RSVP-BCI system has made significant progress in research on paradigm, ElectroEncephaloGram (EEG) decoding, and system applications. Research on paradigm reveals the impact of different paradigm parameters on system performance, promoting the improvement of system performance; The research on EEG decoding improves the classification performance of algorithms and promotes applications in scenarios such as few training, zero training samples, and multimodality; The research on the RSVP-BCI system application has driven the system towards practical applications and expanded its application fields. However, the system also faces challenges such as limited practical applications, difficulties in cross-domain decoding of EEG, and the rapid progress of computer vision. This article reviews and summarizes the research progress of RSVP-BCI in recent years, and looks forward to the future development direction.
Analysis on Current Development Situation of Unmanned Ground Vehicle Clusters Collaborative Pursuit
XU Youchun, GUO Hongda, LOU Jingtao, YE Peng, SU Zhiyuan
2024, 46(2): 456-471. doi: 10.11999/JEIT230122
Abstract:
In recent years, there has been a growing interest in unmanned ground vehicle clustering as a research topic in the unmanned driving field for its low cost, good secuity, and high autonomy. Various collaborative strategies have been proposed for unmanned vehicle clusters, with collaborative pursuit being a particularly important application direction that has garnered significant attention in various fields. A systematic analysis of the strategy mechanism for collaborative pursuit in unmanned vehicle clusters is provided, considering relevant applications and architectures. The collaborative pursuit strategy is divided into three sub-modes: search, tracking, and roundup. The key methods for unmanned vehicle cluster collaborative pursuit are compared from the perspectives of game theory, probabilistic analysis, and machine learning, the advantages and disadvantages of these algorithms are highlighted. Finally, comments and suggestions are provided for future research, considering offer references and ideas for further improving the efficiency and performance of collaborative pursuit in unmanned vehicle clusters.
Pseudo Supervised Attention Short-term Memory and Multi-Scale Deartifacting Network Based on Image Block Compressed Sensing
LI Junhui, HOU Xingsong
2024, 46(2): 472-480. doi: 10.11999/JEIT231069
Abstract:
Deep unfolding network based Block Compressed Sensing (BCS) methods typically remove some signal and retain certain block artifacts simultaneously during iterative deartifacting, which is unfavorable for signal recovery. To enhance reconstruction performance, based on Learned Denoising Iterative Thresholding (LDIT) algorithm. Pseudo Supervised Attention Short-term Memory and Multi-scale Deartifacting (PSASM-MD) based image BCS, is proposed in this paper. Initially, in each iteration, each image block is denoised separately in parallel using residual networks before being concatenated. Subsequently, in conjunction with the Pseudo-Supervised Attention Module (PSAM), Multi-Scale Deartifacting Network (MSD-Net) is used to perform feature extraction on the concatenated images, enabling more efficient removal of block artifacts and improving the reconstruction performance. In this case, PSAM is utilized to extract useful signal components from the residuals containing block artifacts, transfer the short-term memory to the subsequent iteration to minimize the removal of useful signals. Experimental results demonstrate that this approach outperforms existing state-of-the-art BCS methods both in subjective visual perception and objective evaluation metrics.
Semi-paired Multi-modal Query Hashing Method
YU Jun, MA Jiangtao, XIAN Yang, HOU Ruixia, SUN Wei
2024, 46(2): 481-491. doi: 10.11999/JEIT231072
Abstract:
Multimodal hashing can convert heterogeneous multimodal data into unified binary codes. Due to its advantages of low storage cost and fast Hamming distance sorting, it has attracted widespread attention in large-scale multimedia retrieval. Existing multimodal hashing methods assume that all query data possess complete multimodal information to generate their joint hash codes. However, in practical applications, it is difficult to obtain fully complete multimodal information. To address the problem of missing modal information in semi-paired query scenarios, a novel Semi-paired Query Hashing (SPQH) method is proposed to solve the joint encoding problem of semi-paired query samples. Firstly, the proposed method performs projection learning and cross-modal reconstruction learning to maintain semantic consistency among multimodal data. Then, the semantic similarity structure information of the label space and complementary information among multimodal data are effectively captured to learn a discriminative hash function. During the query encoding stage, the missing modal features of unpaired sample data are completed using the learned cross-modal reconstruction matrix, and then the hash features are generated using the learned joint hash function. Compared to state-of-the-art baseline methods, the average retrieval accuracy on the Pascal Sentence, NUS-WIDE, and IAPR TC-12 datasets has improved by 2.48%. Experimental results demonstrate that the algorithm can effectively encode semi-paired multimodal query data and achieve superior retrieval performance.
PPNet: A Precipitation Nowcasting Model Based on Pre-Prediction
SONG Yi, ZHANG Hanyi, SUN Feng, ZHANG Jinglin, BAI Cong
2024, 46(2): 492-502. doi: 10.11999/JEIT230547
Abstract:
Precipitation nowcasting has always been a hot research topic in weather forecasting. Traditional forecasting methods are based on numerical weather prediction. But recently the radar extrapolation-based methods using deep learning have attracted many researchers' attentions. Among them, the temporal prediction network cannot be calculated in parallel, which causes it to take too long time and has the problem of gradient explosion. The fully convolutional networks can solve the above two problems, but it does not have the ability to extract temporal information. Therefore, based on Taylor frozen hypothesis, a 2D fully convolutional Pre-predicted Precipitation nowcasting Network (PPNet) with a pre-prediction auxiliary inference structure is proposed. The network firstly extracts coarse-grained temporal and spatial information, and then uses the fully convolution structure to refine the feature granularity thereby effectively remitting the drawback that 2-D convolutional networks cannot extract temporal information. In addition, the paper provides a temporal features constraint structure to constrain the pre-predicted features and the structure makes the extracted features more realistic. The ablation experiments prove that the proposed pre-prediction auxiliary inference structure and temporal features constraint structure have excellent ability to extract temporal features and improve the sensitivity of the network to temporal features. Compared with the current best rainfall prediction algorithms and video prediction algorithms, the paper's network achieves better prediction results, especially in the rainstorm area.
A Photovoltaic Power Prediction Model Integrating Multi-source Heterogeneous Meteorological Data
TAN Ling, KANG Ruixing, XIA Jingming, WANG Yue
2024, 46(2): 503-517. doi: 10.11999/JEIT230731
Abstract:
High-precision photovoltaic power prediction is of great significance for improving the operation efficiency of power system. Photovoltaic power is affected by many factors, among which cloud change is the most important uncertain factor. However, the traditional photovoltaic power prediction methods do not fully consider the influence of cloud three-dimensional structure and meteorological factors on photovoltaic power. To solve this problem, a Multi-source variables Photovoltaic power Prediction Model (MPPM) based on integrating multi-source heterogeneous meteorological data is proposed. The core of MPPM includes SpatioTemporal feature Conditional Diffusion Model (STCDM), Attention Stacked LSTM network (ASLSTM) and Multidimensional Feature Fusion Module (MFFM). STCDM accurately predicts the two-dimensional satellite cloud image, eliminating the blurring phenomenon at the cloud boundary. ASLSTM extracts the three-dimensional Weather Research and Forecasting model (WRF) meteorological element features. MFFM fuses the two-dimensional satellite cloud image features and three-dimensional WRF meteorological element features to obtain the photovoltaic power prediction results for the next 1 h. In this paper, satellite cloud image prediction experiment and photovoltaic power prediction experiment are carried out by using STCDM model and MPPM model respectively. The results show that the Structural SIMilarity index (SSIM) of STCDM in satellite cloud image prediction within 1 h is up to 0.914, and the CORRelation index (CORR) of MPPM in photovoltaic power prediction within 1 h is up to 0.949, which are superior to all comparison algorithms.
A Cross-modal Person Re-identification Method Based on Hybrid Channel Augmentation with Structured Dual Attention
ZHUANG Jianjun, ZHUANG Yuchen
2024, 46(2): 518-526. doi: 10.11999/JEIT230614
Abstract:
In the current research on cross-modal person re-identification technology, most existing methods reduce cross-modal differences by using single modal original visible light images or locally shared features of adversarially generated images, resulting in a lack of stable recognition accuracy in infrared image discrimination due to the loss of feature information. In order to solve this problem, A cross-modal person re-identification method based on swappable hybrid random channel augmentation with structured dual attention is proposed. The visual image after channel enhancement is used as the third mode, and the single channel and three channels random hybrid enhancement extraction of visible image is performed through the Image Channel Swappable random mix Augmentation (I-CSA) module, so as to highlight the structural details of pedestrian posture, Reduce modal differences in learning. The Structured joint Attention Feature Fusion (SAFF) module provides richer supervision for cross-modal Feature learning, and enhances the robustness of shared features in modal changes, under the premise of focusing on the structural relationship of pedestrian attitudes between modes. Under the single shot setting of full search mode in the SYSU-MM01 dataset, Rank-1 and mAP reached 71.2% and 68.1%, respectively, surpassing similar cutting-edge methods.
Pedestrian Trajectory Prediction Method Based on Information Fractals
YANG Tian, WANG Gang, LAI Jian, WANG Yang
2024, 46(2): 527-537. doi: 10.11999/JEIT230726
Abstract:
Pedestrian trajectory prediction has been widely used in several fields, such as autonomous driving and robot navigation. In trajectory prediction, some uncertain information, such as the uncertainty of trajectory information discrimination in the discriminator and complex interactive information, bring challenges to the trajectory prediction task. In the field of uncertain information processing, information fractals can effectively deal with the uncertainty and complexity of uncertain information. Inspired by this, a trajectory prediction method based on the information fractal is proposed to fully deal with the uncertainty of trajectory information discrimination in the discriminator and improve the prediction accuracy. First, the scene and historical trajectory information are extracted by the feature extraction module. Subsequently, the scene-pedestrian interaction and pedestrian-pedestrian interaction information are obtained through the attention module. Finally, reasonable trajectories are generated using generative adversarial networks and information fractals. Experiments on the two public datasets ETH and UCY reveal that the proposed method can effectively deal with the uncertainty of trajectory information and improve the accuracy of trajectory prediction. For example, the trajectories of sudden turns, overtaking, avoidance, and other behaviors can be effectively predicted. Moreover, the Average Displacement Error (ADE) and Final Displacement Error (FDE) are reduced by an average of 11.11% and 23.48%, respectively compared with the benchmark model error.
Multi-Agent Deep Reinforcement Learning with Clustering and Information Sharing for Traffic Light Cooperative Control
DU Tongchun, WANG Bo, CHENG Haoran, LUO Le, ZENG Nengmin
2024, 46(2): 538-545. doi: 10.11999/JEIT230857
Abstract:
In order to improve the joint control effect of multi-crossing, Multi-Agent Deep Recurrent Q-Network (MADRQN) for real-time control of multi-intersection traffic signals is proposed in this paper. Firstly, the traffic light control is modeled as a Markov decision process, wherein one controller at each crossing is considered as an agent. Secondly, agents are clustered according to their position and observation. Then, information sharing and centralized training are conducted within each cluster. Also the value function network parameters of agents with the highest critic value are shared with other agent at the end of every training process. The simulated experimental results under Simulation of Urban MObility (SUMO) show that the proposed method can reduce the amount of communication data, make information sharing of agents and centralized training more feasible and efficient. The average delay of vehicles is reduced obviously compared with the state-of-the-art traffic light control methods based on multi-agent deep reinforcement learning. The proposed method can effectively alleviate traffic congestion.
Parkinson's Disease Detection Method Based on Cross-Language Acoustic Analysis
JI Wei, WANG Chuanyu, WU Di, LI Yun, ZHENG Huifen
2024, 46(2): 546-554. doi: 10.11999/JEIT230981
Abstract:
The research on speech-based Parkinson’s disease detection has the advantages of non-intrusive, low cost and non-invasive. The current publicly available speech datasets for Parkinson’s disease mostly originate from single-language speech, which has the characteristics such as insufficient data capacity and small differences in the pronunciation characteristics of the subjects' mother tongue. The Parkinson’s disease detection model trained on a single language dataset will experience performance degradation when faced with cross-language speech data. To avoid the impact of language differences and improve the detection performance of the model in cross-language scenarios, the ideas of adversarial transfer learning and feature decoupling is introduced and a Parkinson’s disease Cross-Language Speech Analysis Model (CLSAM) is proposed in this paper. Firstly, the model cascades a multihead self-attention encoder and a multi-layer neural network to form a feature extractor module, which is used to decouple the original Fbank speech features extracted from the pronunciation characteristics of the source domain and target domain into two vectors, namely domain invariant pathological information representation vector and domain information representation vector. Secondly, a dual adversarial training module with inconsistent target tasks is designed, which explicitly separates domain invariant pathological information and domain information. Finally, domain invariant pathological information is extracted from cross-language speech data for Parkinson’s disease detection. This paper verifies the effectiveness of the proposed method using a ten-fold cross-validation method on both the publicly available MaxLittle Parkinson’s disease speech dataset and the self-collected Parkinson’s disease speech dataset. Experimental results show that compared with traditional machine learning methods and existing transfer learning algorithms, the proposed model significantly improves the accuracy, sensitivity and F1 scores in cross-language scenarios.
Intelligent Heart Sound Abnormal Diagnosis Chip Based on LSTM for Wearable Applications
ZHOU Weixin, GAO Zhaogang, XIAO Wan'ang
2024, 46(2): 555-563. doi: 10.11999/JEIT230934
Abstract:
The gravity of cardiovascular disease hazards necessitates the utmost importance of preventive measures and early diagnosis for such ailments. Conventional manual auscultation techniques and computer-based diagnostic methods prove inadequate in meeting the demands of auscultation. Consequently, wearable devices attract increasing attention, but they are required to obtain both a high accuracy and low-power consumption. An intelligent heart sound abnormal diagnostic chip based on LSTM for wearable applications is presented. The abnormal heart sound diagnostic system is developed, including preprocessing, feature extraction, and abnormal diagnosis. Furthermore, an FPGA-based system for heart sounds acquisition is constructed. The challenge of imbalanced datasets is addressed through the implementation of data augmentation techniques. By utilizing pre-trained model as a foundation, the intelligent heart sound abnormal diagnostic chip is developed, and the layout and MPW are finished under SMIC 180nm. The post-simulation results demonstrate that the chip achieves a diagnostic accuracy of 98.6%, a power consumption of 762 μW, and an area of 3.06 mm\begin{document}$ \times $\end{document}2.45 mm, meeting the high-performance and low-power consumption prerequisites of wearable devices.
Damaged Inscription Recognition Based on Hierarchical Decomposition Embedding and Bipartite Graph
LIN Guangfeng, WU Na, HE Menglan, ZHANG Erhu, SUN Qiang
2024, 46(2): 564-573. doi: 10.11999/JEIT230893
Abstract:
Ancient inscriptions carry rich historical and cultural information. However, due to natural weathering and man-made destruction, the text information on the inscriptions is incomplete. The semantic information of ancient inscriptions is diverse and the text examples of ancient inscription are insufficient, which make it very difficult to learn the semantic information between Chinese characters for recognizing damaged characters. The challenging task of damaged characters recognition and understanding by Chinese character spatial semantic modeling is attempted to be solved in this paper. Based on Hierarchical Decomposition Embedding(HDE), the proposed DynamicGrape performs feature mapping on damaged character image and determines whether it is damaged. If character is not damaged, its image is directly converted into hierarchical decomposition embedding to reason the edge weight of the bipartite graph for recognizing Chinese character. If character is damaged, it is necessary to search for possible Chinese characters and components in the encoding set, select the feature dimension of HDE from image mapping, and input the bipartite graph to infer the possible Chinese character. In the self-built dataset and Chinese Text in the Wild(CTW) dataset, the experimental results show that the bipartite graph network can not only transfer and infer Chinese character pattern of damaged characters effectively, but also precisely recognize and understand damaged Chinese characters. It opens up new ideas for the damaged structure information processing.
Bimodal Emotion Recognition With Adaptive Integration of Multi-level Spatial-Temporal Features and Specific-Shared Feature Fusion
SUN Qiang, CHEN Yuan
2024, 46(2): 574-587. doi: 10.11999/JEIT231110
Abstract:
There are usually two challenging issues in the field of bimodal emotion recognition combining ElectroEncephaloGram (EEG) and facial images: (1) How to learn more significant emotionally semantic features from EEG signals in an end-to-end manner; (2) How to effectively integrate bimodal information to capture the coherence and complementarity of emotional semantics among bimodal features. In this paper, a bimodal emotion recognition model is proposed via the adaptive integration of multi-level spatial-temporal features and the fusion of specific-shared features. On the one hand, in order to obtain more significant emotionally semantic features from EEG signals, a module, called adaptive integration of multi-level spatial-temporal features, is designed. The spatial-temporal features of EEG signals are firstly captured with a dual-flow structure before the features from each level are integrated by taking into consideration the weights deriving from the similarity of features. Finally, the relatively important feature information from each level is adaptively learned based on the gating mechanism. On the other hand, in order to leverage the emotionally semantic consistency and complementarity between EEG signals and facial images, one module fusing specific-shared features is devised. Emotionally semantic features are learned jointly through two branches: specific-feature learning and shared-feature learning. The loss function is also incorporated to automatically extract the specific semantic information for each modality and the shared semantic information among the modalities. On both the DEAP and MAHNOB-HCI datasets, cross-experimental verification and 5-fold cross-validation strategies are used to assess the performance of the proposed model. The experimental results and their analysis demonstrate that the model achieves competitive results, providing an effective solution for bimodal emotion recognition based on EEG signals and facial images.
Self-supervised Multimodal Emotion Recognition Combining Temporal Attention Mechanism and Unimodal Label Automatic Generation Strategy
SUN Qiang, WANG Shuyu
2024, 46(2): 588-601. doi: 10.11999/JEIT231107
Abstract:
Most multimodal emotion recognition methods aim to find an effective fusion mechanism to construct the features from heterogeneous modalities, so as to learn the feature representation with semantic consistency. However, these methods usually ignore the emotionally semantic differences between modalities. To solve this problem, one multi-task learning framework is proposed. By training one multimodal task and three unimodal tasks jointly, the emotionally semantic consistency information among multimodal features and the emotionally semantic difference information contained in each modality are respectively learned. Firstly, in order to learn the emotionally semantic consistency information, one Temporal Attention Mechanism (TAM) based on a multilayer recurrent neural network is proposed. The contribution degree of emotional features is described by assigning different weights to time series feature vectors. Then, for multimodal fusion, the fine-grained feature fusion per semantic dimension is carried out in the semantic space. Secondly, one self-supervised Unimodal Label Automatic Generation (ULAG) strategy based on the inter-modal feature vector similarity is proposed in order to effectively learn the difference information of emotional semantics in each modality. A large number of experimental results on three datasets CMU-MOSI, CMU-MOSEI, CH-SIMS, confirm that the proposed TAM-ULAG model has strong competitiveness, and has improved the classification indices (\begin{document}$ Ac{c_2} $\end{document}, \begin{document}$ {F_1} $\end{document}) and regression index (MAE, Corr) compared with the current benchmark models. For binary classification, the recognition rate is 87.2% and 85.8% on the CMU-MOSEI and CMU-MOSEI datasets, and 81.47% on the CH-SIMS dataset. The results show that simultaneously learning the emotionally semantic consistency information and the emotionally semantic difference information for each modality is helpful in improving the performance of self-supervised multimodal emotion recognition method.
Image and Intelligent Information Processing
Overview of Immersive Video Coding
ZENG Huanqiang, KONG Qingwei, CHEN Jing, ZHU Jianqing, SHI Yifan, HOU Junhui
2024, 46(2): 602-614. doi: 10.11999/JEIT230097
Abstract:
With the development of immersive media technologies such as virtual reality and augmented reality, the presentationm, storage and transmission of immersive video has received a lot of attention in both research and industry field. Due to the more complex video characteristics and huge data volume, the traditional coding techniques are not efficient for immersive video coding. How to present and encode the immersive video more efficiently is a challenge. Based on the Degree of Freedom (DoF), 3DoF and 6DoF formats of immersive video are introduced respectively in this paper. Firstly, 3DoF video related coding techniques including projection model, motion estimation are introduced, and then, 3DoF video coding standard is discussed. In 6DoF format video coding, the video representation, virtual viewpoint synthesis techniques, 6DoF video coding techniques and Moving Picture Experts Group Immersive Video (MPEG, MIV) video coding standard are illustrated. Finally, the development of immersive video and its coding technology is summarized and prospected.
Recognition of Basketball Tactics Based on Vision Transformer and Track Filter
XU Guoliang, SHEN Gang, LIANG Xupeng, LUO Jiangtao
2024, 46(2): 615-623. doi: 10.11999/JEIT230079
Abstract:
The analysis of player trajectory data using machine learning to obtain offensive or defensive tactics is a crucial component of understanding basketball video content. Traditional machine learning methods require the setting of feature variables manually, significantly reducing flexibility. Therefore, the key issue is how to automatically obtain feature information that can be used for tactic recognition. To address this issue, a basketball Tactic Vision Transformer (TacViT) recognition model is proposed based on player trajectory data from the National Basketball Association (NBA) games. The proposed model adopts Vision Transformer (ViT) as the backbone network and multi-head attention modules to extract rich global trajectory feature information. Trajectory filters are also incorporated in order to not only enhance the feature interaction between the court lines and player trajectories, but also strengthen the representation of player position features in this study. The trajectory filters learn the long-term spatial correlations in the frequency domain with log-linear complexity. A self-built basketball tactic dataset (PlayersTrack) is created from the sequence data of the Sport Vision System (SportVU), which are converted into trajectory graphs in this work. The experiments on this dataset showed that the accuracy of TacViT reached 82.5%, which is a 16.7% improvement over the accuracy of the Vision Transformer S model (ViT-S) without modifications.
CL-YOLOv5: PET/CT Lung Cancer Detection With Cross-modal Lightweight YOLOv5 Model
ZHOU Tao, YE Xinyu, LIU Fengzhen, LU Huiling
2024, 46(2): 624-632. doi: 10.11999/JEIT230052
Abstract:
Multimodal medical images can provide more semantic information at the same lesion. To address the problems that cross-modal semantic features are not fully considered and model complexity is too high, a Cross-modal Lightweight YOLOv5(CL-YOLOv5) lung cancer detection model is proposed. Firstly, three-branch network is proposed to learn semantic information of Positron Emission Tomography (PET), Computed Tomography (CT) and PET/CT; Secondly, Cross-modal Interactive Enhancement block is designed to fully learn multimodal semantic correlation, cosine reweighted Transformer efficiently learns global feature relationship, interactive enhancement network extracts lesion features; Finally, dual-branch lightweight block is proposed, ACtivate Or Not (ACON) bottleneck structure reduces parameters while increasing network depth and robustness, the other branch is densely connected recursive re-parametric convolution with maximized feature transfer, recursive spatial interaction efficiently learning multimodal features. In lung cancer PET/CT multimodal dataset, the model in this paper achieves 94.76% mAP optimal performance and 3238 s highest efficiency, 0.81 M parameters are obtained, which is 7.7 times and 5.3 times lower than YOLOv5s and EfficientDet-d0, overall outperforms existing state-of-the-art methods in multimodal comparative experiments. In multi-modal comparison experiment, it is generally better than the existing advanced methods, further verification by ablation experiments and heat map visualization ablation experiment.
3D Hilbert Space Filling Curve Encoding and Decoding Algorithms Based on Efficient Prefix Reduction
JIA Lianyin, FAN Yao, DING Jiaman, LI Xiaowu, YOU Jinguo
2024, 46(2): 633-642. doi: 10.11999/JEIT230013
Abstract:
The encoding and decoding efficiency of 3D Hilbert Space Filling Curve (3D HSFC) is key for the application of spatial query processing, image processing. The existing 3D encoding and decoding algorithms encode and decode each point independently, ignoring the local preservation characteristic of Hilbert curve. To improve the efficiency of encoding and decoding, an efficient 3D state view is designed in this paper, and a new Prefix Reduction 3D HSFC Encoding Algorithm (PR-3HE) and Prefix Reduction 3D HSFC Decoding Algorithm (PR-3HD) are proposed. These two algorithms minimize the orders to be processed through the definition and identification of common prefix, common prefix reduction and various optimization techniques, thus improving 3D HSFC encoding and decoding efficiency. Theoretical proof is provided in this paper, demonstrating that when encoding or decoding a k-order window (all \begin{document}${2^k} \times {2^k} \times {2^k}$\end{document} points in a window), PR-3HE passively encodes no more than 2 orders for each coordinate on average, while PR-3HD passively decodes no more than 8/7 orders for each Hilbert code on average. The encoding and decoding time complexity can be reduced from \begin{document}$O(k)$\end{document} to \begin{document}$O(1)$\end{document}. Experimental results show on both synthetic and real dataset the benefits of our algorithms over the other counterparts.
Low Light Image Enhancement With Adaptive Light Initialization
LIU Bo, TIAN Guangliang, XIAO Bin, MA Jianfeng, BI Xiuli
2024, 46(2): 643-651. doi: 10.11999/JEIT230056
Abstract:
Due to the high uncertainty in the estimation of the light component decomposition, how to accurately estimate the light component of an image has been a challenge to be addressed by image enhancement methods based on the Retinex model. An effective method is proposed to accurately estimate the initial illumination component in this paper. Specifically, the corresponding illumination weight matrices for different inputs are obtained to guide the adaptive initialization estimation, subsequently the estimation of the initial illumination components are optimized under the constraints of the illumination structure, and the non-linear illumination adjustment be performed on them. Finally, the Retinex be combined to obtain the enhanced images. Experiments show that our method not only achieves accurate image decomposition estimation, but also performs better in terms of both subjective visual effects and objective evaluation metrics on multiple datasets while maintaining good operational efficiency compared with existing methods for low-light image enhancement.
Wireless Communication,Internet of Things and Digital Signal Processing
Robust Secure Resource Allocation Algorithm for Cognitive Backscatter Communication with Hardware Impairment
XU Yongjun, JIANG Siqiao, ZHANG Haibo, WANG Zhengqiang, ZHOU Jihua
2024, 46(2): 652-661. doi: 10.11999/JEIT230117
Abstract:
To improve spectral efficiency, transmission robustness, and information security of backscatter communication networks, a robust secure resource allocation algorithm is proposed for cognitive backscatter communication networks with hardware impairments. Firstly, considering the constraints of the minimum secure rate of each cognitive backscatter user, transmission time, energy harvesting, and reflection coefficients, a multivariable coupled resource allocation problem with throughput maximization is established under bounded channel uncertainties and spectrum sensing errors. Secondly, the original problem is transformed into a convex problem by using a worst-case approach, successive convex approximation, alternating optimization, and an iteration-based robust resource allocation algorithm is proposed to solve it. Simulation results show that the proposed algorithm has better robustness by comparing it with the existing algorithms.
Real-time Task Scheduling for Multi-access Edge Computing-enabled AI Quality Inspection Systems
ZHOU Xiaotian, SUN Shang, ZHANG Haixia, DENG Yiqin, LU Binbin
2024, 46(2): 662-670. doi: 10.11999/JEIT230129
Abstract:
AI-based quality inspection is an important part of intelligent manufacturing, where the devices produce a large amount of computation-intensive and time-sensitive tasks. Owing to the insufficient computation capability of end devices, the latency to execute these inspection tasks is large, which greatly affects manufacturing efficiency. To this end, Multi-access Edge Computing (MEC) is proposed to provide computation resources through offloading tasks to the edge servers deployed nearby. The execution efficiency is therefore improved. However, the dynamic channel state and random task arrival greatly impact the task offloading efficiency and consequently bring challenges to task scheduling. In this paper, the joint task scheduling and resource allocation problem with the purpose of minimizing the long-term delay of MEC-enabled system is studied. As the state space of the problem is large and the action space contains continuous variables, a Deep Deterministic Policy Gradient (DDPG) based real-time task scheduling algorithm is proposed. The proposed algorithm can make optimal decision with real-time system state information. Simulation results confirm the promising performance of the proposed algorithm, which achieves lower task execution latency than that of the benchmark algorithm.
Robust Resource Allocation Algorithm for Low Orbit Satellite Communication System Based on Imperfect CSI
WU Cuixian, DONG Yiheng, XU Yongjun, ZHANG Haibo, XUE Qing
2024, 46(2): 671-679. doi: 10.11999/JEIT230086
Abstract:
In order to solve the imbalance problem between power consumption and transmission in low orbit satellite communication systems caused by the limited resource, a robust resource allocation algorithm is proposed to maximize the minimum energy efficiency of multiple users by considering the effect of channel uncertainties on the performance degradation of real satellite communication systems. Firstly, a robust resource allocation model with Gaussian channel uncertainties is formulated by jointly optimizing the beamforming vectors and power allocation factors of the multi-beam satellite, meanwhile the outage rate constraint of each user, the power allocation factor constraint and the maximum transmit power constraint are considered simultaneously. The formulated problem is a non-convex and NP-hard with parametric perturbation, which is difficult to solve it directly. To this end, the original problem is converted into a convex one by using Dinkelbach’s method, Bernstein-type inequality, semi-definite relaxation and the alternating optimization technique, and an iteration-based hybrid robust beamforming and power allocation algorithm is proposed. Simulation results verify that the proposed algorithm has good energy efficiency and strong robustness.
Orthogonal Time Frequency Space Channel Estimation Based on Model-driven Deep Learning
PU Xumin, LIU Yanxiang, SONG Mixue, CHEN Qianbin
2024, 46(2): 680-687. doi: 10.11999/JEIT230072
Abstract:
In this paper, a channel estimation scheme based on model-driven deep learning algorithm is proposed for Single Input Single Output (SISO) Orthogonal Time Frequency Space (OTFS) modulation systems. First, the Denoising Approximate Message Passing (DAMP) algorithm is considerably expanded. Then the traditional denoiser is replaced by the Denoising Convolutional Neural Network (DnCNN) to estimate the delay-Doppler channel with additive white Gaussian noise. The State Evolution (SE) equation is provided to predict the theoretical Normalized Mean Square Error (NMSE) performance of the Learned Denoising based Approximate Message Passing (LDAMP) algorithm. Simulation results show that the scheme performs well under a low Signal-to-Noise Ratio (SNR) and has great robustness compared with other estimation schemes. When the total number of channel paths is invariant, increasing the number of OTFS two-dimensional grid points can effectively improve channel estimation accuracy.
Estimation of Underwater Acoustic Doppler Factor and time Delay based on time-frequency Analysis of multi-component LFM Signals
NING Gengxin, XIAO Ruojun, XIE Liang
2024, 46(2): 688-696. doi: 10.11999/JEIT230068
Abstract:
The use of the multicomponent Linear Frequency Modulated (LFM) signals for estimating the underwater acoustic Doppler factor and time delay estimation is increasingly common in the practical process. An adaptive chirp-mode-decomposition algorithm based on incomplete residual and ridge segment matching is proposed to solve the problem of inaccurate parameter estimation for multicomponent LFM with cross-terms in the time–frequency domain. The incomplete residual function is used to retain part of the time-frequency information at the intersection point, and the ridge segment matching method is used to provide a more accurate time-frequency ridge, improving the estimation accuracy of the frequency modulation slope and starting frequency of each component of LFM signal. A combination of these two estimators provides the algorithm for estimating the Doppler factor and time delay. The results showed the proposed method effectively solves the estimation error induced by the break of cross-interval, compared with the existing mode-decomposition algorithms. The accuracy of the proposed method for estimating the Doppler factor and time delay is better than that of the existing methods in underwater acoustic multipath propagation.
DOA Estimation of Direction Vector Estimation Algorithm Based on Second-order Statistical Properties
HOU Jin, SHENG Yaobao, ZHANG Bo
2024, 46(2): 697-704. doi: 10.11999/JEIT230172
Abstract:
In order to reduce the influence of errors of antenna array manifold on Direction of Arrival (DOA) estimation results, and to overcome the shortcoming of DOA estimation algorithm based on traditional blind source separation algorithm that can not be applied to direction-finding equipment with few channel receivers, a DOA estimation algorithm of direction vector estimation algorithm based on second-order statistical properties is proposed. Firstly, according to the characteristics of spectral function of Deterministic Maximum Likelihood (DML) estimation algorithm, an optimization problem with unitary constraints on covariance matrix is constructed. Then, the actual direction vector of each single signal is obtained by optimizing the problem. Finally, the actual direction vectors of each single signal are input into the spatial spectral algorithm to achieve DOA estimation. Because the DOA estimation of multiple signals is transformed into the DOA estimation of multiple single signals, the proposed algorithm has better DOA estimation performance than the traditional DOA method when the antenna array manifold has errors. Because the proposed algorithm only uses covariance matrix, the proposed algorithm can be applied to direction-finding equipment with few channel receivers. The simulation results show that the proposed algorithm has higher accuracy, immunity and resolution than the traditional DOA estimation algorithm when the array manifold has errors and the equipment is the direction-finding equipment with few channel receivers.
Electromagnetic Field and Electromagnetic Wave Technology
Double-sided Parallel-strip Line-based Leaky-wave Antenna with Full-space Beam Scanning Property
WANG Henghui, SUN Sheng, LIU Nengwu, LIU Yuanan
2024, 46(2): 705-712. doi: 10.11999/JEIT230067
Abstract:
Design investigation of a leaky-wave antenna aimed to acquire continuous beam scanning from the backfire direction to the endfire direction is presented on a Double-Sided Parallel-Strip Line (DSPSL) structure. The unit cell comprises a DSPSL structure and a pair of inverted open-ended stubs attached to the top and bottom strips, respectively. The loaded stubs act as a dipole and provide omnidirectional radiation performance, contributing to the full-space beam scanning potential of the leaky-wave antenna. To eliminate the open-stop band effect, two pairs of series slots are etched on the transmission line to meet the frequency balanced condition and match the Bloch impedance. Experiments of the designed prototype were consistent with the simulation, revealing that as the frequency varies from 7.6 GHz to 14.0 GHz, the proposed leaky-wave antenna radiates beams from the backfire direction through the broadside to the endfire direction.
Time Domain Parallel Calculation Method for the Coupling of Transmission Line Network Terminated with Complex Circuits
YE Zhihong, ZHANG Yu, LU Changchang
2024, 46(2): 713-719. doi: 10.11999/JEIT230098
Abstract:
Efficient field-circuit synchronous simulation techniques used for the coupling analysis of Transmission Line (TL) network with complex circuits excited by ambient wave are still rare. In this work, the TL equations are combined with the Norton’s theorem, the Substitution theorem Finite-Difference Time-Domain (FDTD) method, NGSPICE software and parallel technique based on Message Passing Interface (MPI) to form an efficient parallel time domain hybrid method (FDTD-Transmission Line equation-NGSPICE, FDTDTL-NGSPICE). Firstly, the overall structure of transmission line network is decomposed into the transmission line subsystem and complex circuit subsystems according to the Norton’s theorem and Substitution theorem, and the corresponding equivalent circuit models are constructed. Then the parallel FDTDTL method is employed to solve the voltage and current responses along the transmission line subsystem, which are utilized to extract the current sources and equivalent admittance of the Norton’s equivalent circuits. Finally, the NGSPICE software is applied for the conducted interference analysis of the complex circuit subsystems to obtain the transient responses on the ports and all elements of the complex circuits, and then the port voltages are fed back to the transmission line subsystem as boundary conditions. The significant feature of this time domain hybrid method is that it realizes the field-line-circuit synchronous simulation of transmission line network. And the confidence of this method is verified by the comparison of three typical scenario examples simulated by this method and the electromagnetic software CST Cable Studio (CS).
Parameter Estimation of Surface Nuclear Magnetic Resonance Signals Based on Total Least Squares-Estimation of Signal Parameters via Rotational Invariance Technique
YU Xiaohui, FENG Hai, TIAN Baofeng, SUN Haixin, SUN Xiaodong
2024, 46(2): 720-727. doi: 10.11999/JEIT230102
Abstract:
In the Surface Nuclear Magnetic Resonance (SNMR) water searching system, the parameters of SNMR signals can be used to predict the water storage, electrical conductivity, pore structure of underground aquifer. However, the SNMR signals collected on site are very weak in practical application, which is easy to be interfered by environmental noise, resulting in the inability to directly obtain the parameters of SNMR signal. To solve this problem, an estimation method of SNMR signal parameters based on Total Least Squares-Estimation of Signal Parameters via Rotational Invariance Technique (TLS-ESPRIT) is proposed in this paper. Based on the similar signal characteristics of harmonic noise and SNMR signals, a mixed signal model consisting of multiple sinusoidal attenuation signals is constructed. TLS-ESPRIT is used to transform the problem of extracting mixed signal parameters into a generalized eigenvalue solution of a rotation invariant matrix, in order to obtain the Lamor frequency and the relaxation time of the SNMR signal, and its initial amplitude and phase are obtained by combining the least squares method. The experimental results of simulated signal and measured signal show that the proposed method can estimate the parameters of SNMR signal mixed with random noise and power frequency harmonic noise. Compared with the traditional harmonic modeling method, the parameter extraction accuracy is better.
Circuit and System Design
Chaotic Power System Control Based on Improved Adaptive Synergetic Control Method
FANG Jie, ZHANG Shaohui, JIANG Yong
2024, 46(2): 728-737. doi: 10.11999/JEIT230075
Abstract:
An adaptive cooperative control scheme with fast convergence characteristics is proposed for a four-dimensional chaotic power system. Firstly, based on the Lyapunov stability theorem and global fast convergence theory, a cooperative controller with fast convergence property is designed. The controller can make the macro variables reach the invariant manifold quickly and can obtain smooth and chatter-free control inputs to achieve the exact convergence of macro variables. The designed controller is then applied to the chaotic control of a four-dimensional power system. Since excess energy in the power system can cause chaotic oscillations, an energy storage device is introduced in the control loop. The chaotic oscillations are suppressed by making the energy storage device to absorb the excess active power in the power system. The complex terms that appear in the controller design process are eliminated through the adaptive law, so the practicality of the controller is increased. Finally, the effectiveness and superiority of the control method are verified by numerical simulation.
Abnormal Battery On-line Detection Method Based on Dynamic Time Warping and Improved Variational Auto-Encoder
GUO Tiefeng, HE Jianjun, SHEN Shuai, WANG Xiang, ZHANG Binhan
2024, 46(2): 738-747. doi: 10.11999/JEIT230084
Abstract:
In the process of battery production, the traditional detection accuracy of abnormal batteries is poor, and the offline anomaly detection method after production is inefficient. To solve these problems, a lithium battery anomaly online detection method integrating Long Short-Term Memory Variational AutoEncoder and Dynamic Time Warping evaluation (VAE-LSTM-DTW) is proposed, which realizes the online detection of abnormal battery conditions and prevents the time and energy wastage caused by offlize anomaly detection. Firstly, the Long Short-Term Memory (LSTM) is introduced into the Variational Auto-Encoder (VAE) model to train the battery time series reconstruction model. Secondly, in battery anomaly detection, the Dynamic Time Warping value (DTW) is introduced into the evaluation index, and the optimal detection threshold is obtained based on Bayesian optimization, and the dynamic warping value of each single battery reconstruction data is abnormally identified. The experimental results indicate that, compared with the traditional anomaly detection methods in this field, the VAE-LSTM-DTW model has superior performance, the accuracy rate and F1-score have been greatly improved, and it has high effectiveness and practicability.
Cryption and Network Information Security
Privacy Crowdsourcing on Blockchain with Data Verification and Controllable Anonymity
XUE Kaiping, FAN Mao, WANG Feng, LUO Xingyi
2024, 46(2): 748-756. doi: 10.11999/JEIT230106
Abstract:
Considering the requirements of data verification, anonymous malicious behavior detection and cross-platform resource interaction in privacy crowdsourcing, a scheme under the consortium chain architecture is proposed, basing on blockchain technology with zero-knowledge proof and ring signature technology. The proposed scheme relies on zero-knowledge proof to achieve encrypted data verification, relies on improved revocable-iff-linked ring signature to achieve controllable anonymity of workers, introduces consortium chain to realize resource interaction between crowdsourcing entities. In addition to completing the crowdsourcing process, the scheme also implements data protection and identity protection required for privacy crowdsourcing. Security analysis shows that the proposed scheme satisfies privacy, verifiability, controllable anonymity and fairness. Experimental results verify the advantages of the proposed scheme in efficiency and performance.
Identity-Based Chameleon Signature Schemes over Lattices
ZHANG Yanhua, CHEN Yan, LIU Ximeng, YIN Yifeng, HU Yupu
2024, 46(2): 757-764. doi: 10.11999/JEIT230155
Abstract:
Chameleon Signature (CS) is an ideal designated verifier signature, it realizes non-transferability by using chameleon hash function, makes any third party distrust the content disclosed by a designated verifier, and avoids the shortcoming of online interactive verification of undeniable signature. In addition to non-transferability, CS also should satisfy unforgeability, deniability, non-repudiation for the signer, and so on. To solve the problems that cryptosystems based on the number theory problems such as integer factorization or discrete logarithm cannot resist quantum computing attacks and users rely on digital certificates, an Identity-Based Chameleon Signature (IBCS) over lattices is proposed, the new scheme avoids the security vulnerability that the signer cannot reject the forged signature of the designated verifier in the existing schemes, and reduces the transmission cost of the final signature from square to linear; Furthermore, to solve the failure problem of non-transferability in the arbitration phase, an IBCS scheme with exposure-freeness over lattices is proposed, the new scheme enables the signer to reject a forged signature of any adversary without exposing the real message. Particularly, based on the hardness of the small integer solution problem, both schemes can be proved secure in the random oracle model.