Email alert
2024 Vol. 46, No. 8
Display Method:
2024, 46(8): 3063-3072.
doi: 10.11999/JEIT231273
Abstract:
The Visibility Region (VR) information can be used to reduce the complexity in transmission design of EXtremely Large-scale massive Multiple-Input Multiple-Output (XL-MIMO) systems. Existing theoretical analysis and transmission design are mostly based on simplified VR models. In order to evaluate and analyze the performance of XL-MIMO in realistic propagation scenarios, this paper discloses a VR spatial distribution dataset for XL-MIMO systems, which is constructed by steps including environmental parameter setting, ray tracing simulation, field strength data preprocessing and VR determination. For typical urban scenarios, the dataset establishes the connections between user locations, field strength data, and VR data, with a total number of hundreds of millions of data entries. Furthermore, the VR distribution is visualized and analyzed, and a VR-based XL-MIMO user access protocol is taken as an example usecase, with its performance being evaluated with the proposed VR dataset.
The Visibility Region (VR) information can be used to reduce the complexity in transmission design of EXtremely Large-scale massive Multiple-Input Multiple-Output (XL-MIMO) systems. Existing theoretical analysis and transmission design are mostly based on simplified VR models. In order to evaluate and analyze the performance of XL-MIMO in realistic propagation scenarios, this paper discloses a VR spatial distribution dataset for XL-MIMO systems, which is constructed by steps including environmental parameter setting, ray tracing simulation, field strength data preprocessing and VR determination. For typical urban scenarios, the dataset establishes the connections between user locations, field strength data, and VR data, with a total number of hundreds of millions of data entries. Furthermore, the VR distribution is visualized and analyzed, and a VR-based XL-MIMO user access protocol is taken as an example usecase, with its performance being evaluated with the proposed VR dataset.
2024, 46(8): 3073-3093.
doi: 10.11999/JEIT231255
Abstract:
Forward Scatter Radar (FSR) can obtain high level Radar Cross Section (RCS), so it plays an important role in anti-stealth. The Global Navigation Satellite System (GNSS) has the advantage of all-weather coverage throughout the day as a radiation source and the ground/sea/air target surveillance network can be built by deploying multiple receiving nodes. According to the development status of GNSS-based FSR, the key technologies and the existing problems in target detection, target parameter estimation, Shadow Inverse Synthetic Aperture Radar (SISAR) imaging, and target classification are summarized. What’s more, the development trend of GNSS-based FSR is prospected from the aspects of network detection, multi-target location, station optimization and polarization information acquisition.
Forward Scatter Radar (FSR) can obtain high level Radar Cross Section (RCS), so it plays an important role in anti-stealth. The Global Navigation Satellite System (GNSS) has the advantage of all-weather coverage throughout the day as a radiation source and the ground/sea/air target surveillance network can be built by deploying multiple receiving nodes. According to the development status of GNSS-based FSR, the key technologies and the existing problems in target detection, target parameter estimation, Shadow Inverse Synthetic Aperture Radar (SISAR) imaging, and target classification are summarized. What’s more, the development trend of GNSS-based FSR is prospected from the aspects of network detection, multi-target location, station optimization and polarization information acquisition.
2024, 46(8): 3094-3116.
doi: 10.11999/JEIT231222
Abstract:
Time Series Classification(TSC) is one of the most important and challenging tasks in the field of data mining. Deep learning techniques have achieved revolutionary progress in natural language processing and computer vision, and have also demonstrated great potential in areas such as time series analysis. A detailed review of the latest research advances in deep learning-based TSC is provided in this paper. Firstly, key terms and related concepts are defined. Secondly, the latest time series classification models are classified from four perspectives of network architectures: multilayer perceptron, convolutional neural networks, recurrent neural networks, and attention mechanisms, along with their respective advantages and limitations. Additionally, the latest developments and challenges in time series classification in the fields of human activity recognition and electroencephalogram-based emotion recognition are outlined. Finally, the unresolved issues and future research directions when applying deep learning to time series data are discussed. This paper provides researchers with a reference for understanding the latest research dynamics, new technologies, and development trends in the deep learning-based time series classification field.
Time Series Classification(TSC) is one of the most important and challenging tasks in the field of data mining. Deep learning techniques have achieved revolutionary progress in natural language processing and computer vision, and have also demonstrated great potential in areas such as time series analysis. A detailed review of the latest research advances in deep learning-based TSC is provided in this paper. Firstly, key terms and related concepts are defined. Secondly, the latest time series classification models are classified from four perspectives of network architectures: multilayer perceptron, convolutional neural networks, recurrent neural networks, and attention mechanisms, along with their respective advantages and limitations. Additionally, the latest developments and challenges in time series classification in the fields of human activity recognition and electroencephalogram-based emotion recognition are outlined. Finally, the unresolved issues and future research directions when applying deep learning to time series data are discussed. This paper provides researchers with a reference for understanding the latest research dynamics, new technologies, and development trends in the deep learning-based time series classification field.
2024, 46(8): 3117-3125.
doi: 10.11999/JEIT231324
Abstract:
In order to address the limitations of the joint beamforming method based on channel prior knowledge, which is constrained by multivariate Vehicle-to-Infrastructure (V2I) communication scenes and suffers from large overhead caused by channel estimation, a wireless propagation link prediction-based joint beamforming method assisted by environmental situation awareness is proposed in this paper. Firstly, a model of Reconfigurable Intelligent Surface (RIS) assisted mmWave communication system for V2I networks is established using a ray tracer. To build a dataset, diverse data of wireless propagation links is obtained by changing the environmental situation. Then, this dataset is used to train a machine learning-based wireless propagation link prediction model. Finally, the joint beamforming problem under the constraint of maximum transmission power is modeled. Additionally, based on the prediction outcome, the beamforming matrix of base station and the phase shift matrix of RIS are optimized using Alternating Iterative Optimization Algorithm (AIOA) to maximize the minimum Signal to Interference plus Noise Ratio (SINR) among synchronous communication vehicle users. Simulation results validate the effectiveness of the proposed method. Introducing non-channel prior knowledge driven reduces channel detection overhead and improves feasibility in applying the proposed method to V2I scenes.
In order to address the limitations of the joint beamforming method based on channel prior knowledge, which is constrained by multivariate Vehicle-to-Infrastructure (V2I) communication scenes and suffers from large overhead caused by channel estimation, a wireless propagation link prediction-based joint beamforming method assisted by environmental situation awareness is proposed in this paper. Firstly, a model of Reconfigurable Intelligent Surface (RIS) assisted mmWave communication system for V2I networks is established using a ray tracer. To build a dataset, diverse data of wireless propagation links is obtained by changing the environmental situation. Then, this dataset is used to train a machine learning-based wireless propagation link prediction model. Finally, the joint beamforming problem under the constraint of maximum transmission power is modeled. Additionally, based on the prediction outcome, the beamforming matrix of base station and the phase shift matrix of RIS are optimized using Alternating Iterative Optimization Algorithm (AIOA) to maximize the minimum Signal to Interference plus Noise Ratio (SINR) among synchronous communication vehicle users. Simulation results validate the effectiveness of the proposed method. Introducing non-channel prior knowledge driven reduces channel detection overhead and improves feasibility in applying the proposed method to V2I scenes.
2024, 46(8): 3126-3135.
doi: 10.11999/JEIT231385
Abstract:
The challenges of scarce communication resources and uneven allocation in multi-user communication networks are investigated in the article with a focus on the Unmanned Aerial Vehicle (UAV)-assisted multi-user downlink communication network using Rate Splitting Multiple Access (RSMA) technology. In complex real-world communication environments, interference caused by useless signals due to frequency reuse is inevitable. Considering the co-channel interference affecting the data transmission between UAV and various user nodes in Nakagami-m fading channels, precise closed-form expressions for the outage probability and channel capacity of the system are derived. It is demonstrated that the diversity order is 0 resulted from the presence of co-channel interference for high Signal-to-Noise Ratio (SNR) regime systems. Moreover, under the same spatial model, the system performance using the RSMA communication scheme is superior to the Non-Orthogonal Multiple Access (NOMA) scheme. As the flight speed of UAV increases, the probability of establishing line-of-sight links for ground-to-air communication is decreased, leading to a degradation in system outage performance. Therefore, effective trade-offs for the overall UAV communication system need to be achieved by comprehensively considering the flight speed of UAV, multiple access methods, system performance, and communication connectivity, in order to meet the practical communication needs of users.
The challenges of scarce communication resources and uneven allocation in multi-user communication networks are investigated in the article with a focus on the Unmanned Aerial Vehicle (UAV)-assisted multi-user downlink communication network using Rate Splitting Multiple Access (RSMA) technology. In complex real-world communication environments, interference caused by useless signals due to frequency reuse is inevitable. Considering the co-channel interference affecting the data transmission between UAV and various user nodes in Nakagami-m fading channels, precise closed-form expressions for the outage probability and channel capacity of the system are derived. It is demonstrated that the diversity order is 0 resulted from the presence of co-channel interference for high Signal-to-Noise Ratio (SNR) regime systems. Moreover, under the same spatial model, the system performance using the RSMA communication scheme is superior to the Non-Orthogonal Multiple Access (NOMA) scheme. As the flight speed of UAV increases, the probability of establishing line-of-sight links for ground-to-air communication is decreased, leading to a degradation in system outage performance. Therefore, effective trade-offs for the overall UAV communication system need to be achieved by comprehensively considering the flight speed of UAV, multiple access methods, system performance, and communication connectivity, in order to meet the practical communication needs of users.
2024, 46(8): 3136-3145.
doi: 10.11999/JEIT231423
Abstract:
In response to the challenge of ensuring positioning accuracy in environments where the Global Navigation Satellite System (GNSS) is denied, a positioning scheme based on opportunistic New Radio (NR) signals is devised and an Interference Cancellation Subspace Pursuit (ICSP) algorithm is proposed in this paper. This algorithm aims to resolve the issue of inadequate precision in the extraction of positioning observations due to co-channel interference within Ultra-Dense Networks (UDNs) and Heterogeneous Networks (HetNets). The effectiveness of the ICSP algorithm in optimizing the performance of 5G opportunistic signal receivers and enhancing positioning accuracy in complex network environments has been validated through simulation experiments and semi-physical simulations utilizing the Universal Software Radio Peripheral (USRP).
In response to the challenge of ensuring positioning accuracy in environments where the Global Navigation Satellite System (GNSS) is denied, a positioning scheme based on opportunistic New Radio (NR) signals is devised and an Interference Cancellation Subspace Pursuit (ICSP) algorithm is proposed in this paper. This algorithm aims to resolve the issue of inadequate precision in the extraction of positioning observations due to co-channel interference within Ultra-Dense Networks (UDNs) and Heterogeneous Networks (HetNets). The effectiveness of the ICSP algorithm in optimizing the performance of 5G opportunistic signal receivers and enhancing positioning accuracy in complex network environments has been validated through simulation experiments and semi-physical simulations utilizing the Universal Software Radio Peripheral (USRP).
2024, 46(8): 3146-3154.
doi: 10.11999/JEIT231240
Abstract:
In the face of large-scale, diverse, and time-evolving data, as well as machine learning tasks in industrial production processes, a Federated Incremental Learning(FIL) and optimization method based on information entropy is proposed in this paper. Within the federated framework, local computing nodes utilize local data for model training, and compute the average entropy to be transmitted to the server to assist in identifying class-incremental tasks. The global server then selects local nodes for current round training based on the locally provided average entropy and makes decisions on task incrementality, followed by global model deployment and aggregation updates. The proposed method combines average entropy and thresholds for nodes selection in various situations, achieving stable model learning under low average entropy and incremental model expansion under high average entropy. Additionally, convex optimization is employed to adaptively adjust aggregation frequency and resource allocation in resource-constrained scenarios, ultimately achieving effective model convergence. Simulation results demonstrate that the proposed method accelerates model convergence and enhances training accuracy in different scenarios.
In the face of large-scale, diverse, and time-evolving data, as well as machine learning tasks in industrial production processes, a Federated Incremental Learning(FIL) and optimization method based on information entropy is proposed in this paper. Within the federated framework, local computing nodes utilize local data for model training, and compute the average entropy to be transmitted to the server to assist in identifying class-incremental tasks. The global server then selects local nodes for current round training based on the locally provided average entropy and makes decisions on task incrementality, followed by global model deployment and aggregation updates. The proposed method combines average entropy and thresholds for nodes selection in various situations, achieving stable model learning under low average entropy and incremental model expansion under high average entropy. Additionally, convex optimization is employed to adaptively adjust aggregation frequency and resource allocation in resource-constrained scenarios, ultimately achieving effective model convergence. Simulation results demonstrate that the proposed method accelerates model convergence and enhances training accuracy in different scenarios.
2024, 46(8): 3155-3164.
doi: 10.11999/JEIT231235
Abstract:
A hybrid active-passive Reconfigurable Intelligent reflecting Surface (RIS) and Artificial Noise (AN) based transmission scheme is proposed for the secret communication of the RIS assisted wireless communication system. Aiming at maximizing the secrecy rate, a joint optimization problem over the transmit beamforming and AN vector of the base station and the reflecting coefficient matrix of the RIS is formulated. Then, the Alternating Optimization (AO) method, the weighted Minimum Mean Square Error (MMSE) algorithm and the semi-definite relaxation algorithm are proposed to solve this non-convex optimization problem with highly-coupled variables. The simulation results show that the proposed hybrid RIS and AN based scheme can efficiently improve the secrecy rate of the considered system and overcome the secrecy rate decrease due to the "double fading" effect of the passive RIS. Compared with the full active RIS, the proposed scheme achieves higher energy efficiency.
A hybrid active-passive Reconfigurable Intelligent reflecting Surface (RIS) and Artificial Noise (AN) based transmission scheme is proposed for the secret communication of the RIS assisted wireless communication system. Aiming at maximizing the secrecy rate, a joint optimization problem over the transmit beamforming and AN vector of the base station and the reflecting coefficient matrix of the RIS is formulated. Then, the Alternating Optimization (AO) method, the weighted Minimum Mean Square Error (MMSE) algorithm and the semi-definite relaxation algorithm are proposed to solve this non-convex optimization problem with highly-coupled variables. The simulation results show that the proposed hybrid RIS and AN based scheme can efficiently improve the secrecy rate of the considered system and overcome the secrecy rate decrease due to the "double fading" effect of the passive RIS. Compared with the full active RIS, the proposed scheme achieves higher energy efficiency.
2024, 46(8): 3165-3173.
doi: 10.11999/JEIT231291
Abstract:
The study of information dissemination models is an important component of the Internet of Things field, which helps to improve the performance and efficiency of IoT systems, promote the further development of IoT technology. In response to the complex and unstable factors that affect information dissemination in IoT communication, a double-layer coupled information dissemination model SIVR-UAD (Susceptible, Infection, Variant, Recovered-Unknown, Aware, Disinterest) is proposed, which analyzes the impact of devices and users in different states on information dissemination in the Internet of Things, Six coupling states were established, and the Markov method was used to analyze the state change process of the coupling nodes, finding the information dissemination equilibrium point. Finally, the uniqueness and stability of the equilibrium point of the model were proved through theoretical analysis. The simulation results show that under three different initial coupling node numbers, the number of six coupling nodes in the SIVR-UAD model always tends to the same stable level, proving the equilibrium point and stability of the model.
The study of information dissemination models is an important component of the Internet of Things field, which helps to improve the performance and efficiency of IoT systems, promote the further development of IoT technology. In response to the complex and unstable factors that affect information dissemination in IoT communication, a double-layer coupled information dissemination model SIVR-UAD (Susceptible, Infection, Variant, Recovered-Unknown, Aware, Disinterest) is proposed, which analyzes the impact of devices and users in different states on information dissemination in the Internet of Things, Six coupling states were established, and the Markov method was used to analyze the state change process of the coupling nodes, finding the information dissemination equilibrium point. Finally, the uniqueness and stability of the equilibrium point of the model were proved through theoretical analysis. The simulation results show that under three different initial coupling node numbers, the number of six coupling nodes in the SIVR-UAD model always tends to the same stable level, proving the equilibrium point and stability of the model.
2024, 46(8): 3174-3183.
doi: 10.11999/JEIT231165
Abstract:
To relieve the impact of data heterogeneity problems caused by full overlapping attribute skew between clients in Federated Learning (FL), a local adaptive FL algorithm that incorporates channel personalized normalization is proposed in this paper. Specifically, an FL model oriented to data attribute skew is constructed, and a series of random enhancement operations are performed on the images data set in the client before training begins. Next, the client calculates the mean and standard deviation of the data set separately by color channel to achieve channel personalized normalization. Furthermore, a local adaptive update FL algorithm is designed, that is, the global model and the local model are adaptively aggregated for local initialization. The uniqueness of this aggregation method is that it not only retains the personalized characteristics of the client model, but also can capture necessary information in the global model to improve the generalization performance of the model. Finally, the experimental results demonstrate that the proposed algorithm obtains competitive convergence speed compared with existing representative works and the accuracy is 3%~19% higher.
To relieve the impact of data heterogeneity problems caused by full overlapping attribute skew between clients in Federated Learning (FL), a local adaptive FL algorithm that incorporates channel personalized normalization is proposed in this paper. Specifically, an FL model oriented to data attribute skew is constructed, and a series of random enhancement operations are performed on the images data set in the client before training begins. Next, the client calculates the mean and standard deviation of the data set separately by color channel to achieve channel personalized normalization. Furthermore, a local adaptive update FL algorithm is designed, that is, the global model and the local model are adaptively aggregated for local initialization. The uniqueness of this aggregation method is that it not only retains the personalized characteristics of the client model, but also can capture necessary information in the global model to improve the generalization performance of the model. Finally, the experimental results demonstrate that the proposed algorithm obtains competitive convergence speed compared with existing representative works and the accuracy is 3%~19% higher.
2024, 46(8): 3184-3192.
doi: 10.11999/JEIT231232
Abstract:
Narrowband radar is widely used in the field of air defense guidance due to its advantages of low cost and long operating range. With the development of high-speed mobile platforms, traditional target recognition methods based on feature modeling of long-term observation echo sequences are no longer applicable. In response to the problem of poor feature recognition ability of narrowband radar for Observe Echoes for a Short period of Time (OEST) sequences and susceptibility to bait target interference, resulting in low reliability of recognition results, a narrowband radar OEST sequence air target recognition method using multi feature adaptive fusion is proposed in this paper. Firstly, the encoder and classification layers are constructed with channel-spatial attention modules and trained to adaptively enhance features with high separability. Then, the maximum edge orthogonal loss function is proposed to increase the feature spacing between different classes, reduce the feature spacing between the same classes, and make the feature vectors orthogonal between different classes; Finally, the parameters of the encoder layer and classification layer are fixed, and the decoder layer is trained using reconstruction loss value to ensure that the model has accurate identification ability for decoy targets. Under the condition of an observation sequence length of 100, the classification accuracy and discrimination rate of the experimental part reached 94.37% and 96.78%, respectively. It can be concluded that the proposed method can effectively improve the classification performance of narrowband radar and the discrimination ability against bait targets, thereby improving the reliability of recognition results.
Narrowband radar is widely used in the field of air defense guidance due to its advantages of low cost and long operating range. With the development of high-speed mobile platforms, traditional target recognition methods based on feature modeling of long-term observation echo sequences are no longer applicable. In response to the problem of poor feature recognition ability of narrowband radar for Observe Echoes for a Short period of Time (OEST) sequences and susceptibility to bait target interference, resulting in low reliability of recognition results, a narrowband radar OEST sequence air target recognition method using multi feature adaptive fusion is proposed in this paper. Firstly, the encoder and classification layers are constructed with channel-spatial attention modules and trained to adaptively enhance features with high separability. Then, the maximum edge orthogonal loss function is proposed to increase the feature spacing between different classes, reduce the feature spacing between the same classes, and make the feature vectors orthogonal between different classes; Finally, the parameters of the encoder layer and classification layer are fixed, and the decoder layer is trained using reconstruction loss value to ensure that the model has accurate identification ability for decoy targets. Under the condition of an observation sequence length of 100, the classification accuracy and discrimination rate of the experimental part reached 94.37% and 96.78%, respectively. It can be concluded that the proposed method can effectively improve the classification performance of narrowband radar and the discrimination ability against bait targets, thereby improving the reliability of recognition results.
2024, 46(8): 3193-3201.
doi: 10.11999/JEIT231335
Abstract:
Due to the non-uniform ground clutter in the forward array of airborne weather radar, it is difficult to obtain enough independent and equally distributed samples, which affects the accurate estimation of clutter covariance matrix and wind speed estimation. In this paper, a novel estimation method of low altitude wind shear speed based on convolutional neural network STAP is proposed, which can realize high resolution clutter space-time spectrum estimation with a small number of samples. First, the high-resolution clutter space-time spectrum convolutional neural network is trained based on the convolutional neural network model, and then the clutter covariance matrix is calculated, and then the optimal weight vector of the convolutional neural network STAP is calculated for clutter suppression, so as to accurately estimate the wind shear speed at low altitude. The sparse recovery problem is realized by convolutional neural network in the case of small samples, and the space-time spectrum of high-resolution clutter is effectively estimated. The simulation results show that the proposed method can effectively estimate the space-time spectrum and complete the wind speed estimation.
Due to the non-uniform ground clutter in the forward array of airborne weather radar, it is difficult to obtain enough independent and equally distributed samples, which affects the accurate estimation of clutter covariance matrix and wind speed estimation. In this paper, a novel estimation method of low altitude wind shear speed based on convolutional neural network STAP is proposed, which can realize high resolution clutter space-time spectrum estimation with a small number of samples. First, the high-resolution clutter space-time spectrum convolutional neural network is trained based on the convolutional neural network model, and then the clutter covariance matrix is calculated, and then the optimal weight vector of the convolutional neural network STAP is calculated for clutter suppression, so as to accurately estimate the wind shear speed at low altitude. The sparse recovery problem is realized by convolutional neural network in the case of small samples, and the space-time spectrum of high-resolution clutter is effectively estimated. The simulation results show that the proposed method can effectively estimate the space-time spectrum and complete the wind speed estimation.
2024, 46(8): 3202-3209.
doi: 10.11999/JEIT231367
Abstract:
A sparse Bayesian estimation for spatial Radio Frequency Interference (RFI) of synthetic aperture microwave radiometers is proposed in this paper. Firstly, an interferometry measurement model of the visibility function for synthetic aperture microwave radiometers is established. The observed data are expressed as the product of the observation matrix of the aperture synthesis antenna baseline correlation steering vector and the brightness temperature of the field of view. Due to the orthogonality of the observation matrix and the sparsity of the RFI spatial angle distribution, the transformation coefficients of brightness temperature in the support domain are sparse. Under the Sparse Bayesian Learning (SBL) framework, brightness temperature is sparsely reconstructed. Notably, this method can obtain high reconstruction performance without the prior information of sparsity and regularization parameters. The effectiveness of this method is verified through computer simulations.
A sparse Bayesian estimation for spatial Radio Frequency Interference (RFI) of synthetic aperture microwave radiometers is proposed in this paper. Firstly, an interferometry measurement model of the visibility function for synthetic aperture microwave radiometers is established. The observed data are expressed as the product of the observation matrix of the aperture synthesis antenna baseline correlation steering vector and the brightness temperature of the field of view. Due to the orthogonality of the observation matrix and the sparsity of the RFI spatial angle distribution, the transformation coefficients of brightness temperature in the support domain are sparse. Under the Sparse Bayesian Learning (SBL) framework, brightness temperature is sparsely reconstructed. Notably, this method can obtain high reconstruction performance without the prior information of sparsity and regularization parameters. The effectiveness of this method is verified through computer simulations.
2024, 46(8): 3210-3218.
doi: 10.11999/JEIT231374
Abstract:
Addressing the issues of inadequate performance in constructing Radio Environment Maps (REMs) in complex scenarios due to non-penetrable obstacles for electromagnetic waves, and the arbitrary selection of interpolation neighborhoods imposed by Inverse Distance Weighted (IDW), a Voronoi-based Inverse Obstacle Distance Weighted algorithm (VIODW) is proposed in this paper. This algorithm adaptively defines interpolation neighborhoods for each interpolation point by creating Voronoi diagrams incorporating obstacles for numerical computation. Then, using the ANY-Angle (ANYA) Algorithm to calculate the obstacle distance between the interpolation point and each monitoring station within the interpolation neighborhood. Finally, by calculating the weighted mean with the inverse power of the obstacle distance as the weight, the value at the point is obtained, achieving high-precision construction of REMs in complex scenarios. Both theoretical analysis and simulation results demonstrate that this method offers excellent construction accuracy and can accurately model the power distribution of electromagnetic waves in complex scenarios. Hence, it provides an effective approach for high-precision REM construction in complex scenarios.
Addressing the issues of inadequate performance in constructing Radio Environment Maps (REMs) in complex scenarios due to non-penetrable obstacles for electromagnetic waves, and the arbitrary selection of interpolation neighborhoods imposed by Inverse Distance Weighted (IDW), a Voronoi-based Inverse Obstacle Distance Weighted algorithm (VIODW) is proposed in this paper. This algorithm adaptively defines interpolation neighborhoods for each interpolation point by creating Voronoi diagrams incorporating obstacles for numerical computation. Then, using the ANY-Angle (ANYA) Algorithm to calculate the obstacle distance between the interpolation point and each monitoring station within the interpolation neighborhood. Finally, by calculating the weighted mean with the inverse power of the obstacle distance as the weight, the value at the point is obtained, achieving high-precision construction of REMs in complex scenarios. Both theoretical analysis and simulation results demonstrate that this method offers excellent construction accuracy and can accurately model the power distribution of electromagnetic waves in complex scenarios. Hence, it provides an effective approach for high-precision REM construction in complex scenarios.
2024, 46(8): 3219-3227.
doi: 10.11999/JEIT231376
Abstract:
According to the problem that the maximum likelihood DOA estimation algorithm requires multi-dimensional search, is computationally intensive, and there is a problem in grid estimation, an Off-grid alternating projection maximum likelihood algorithm based on Taylor expansion is proposed. Firstly, the alternating projection method is used to transform the multi-dimensional search into multiple one-dimensional searches to obtain the rough estimation results corresponding to the preset large grid. Then, the second-order Taylor expansion of the one-dimensional cost function at the rough estimation results is carried out by using the matrix derivation theory. Finally, by calculating the partial derivative of the second-order Taylor expansion and making the derivative equal to zero, the closed-form solution of the off-grid parameters is obtained. Compared with the alternating projection maximum likelihood algorithm, the proposed algorithm breaks through the limitation of the search grid size. It effectively reduces the number of points in the grid calculation of the algorithm while ensuring the accuracy of itself, and improves the operation efficiency. Simulation results show the effectiveness of the algorithm.
According to the problem that the maximum likelihood DOA estimation algorithm requires multi-dimensional search, is computationally intensive, and there is a problem in grid estimation, an Off-grid alternating projection maximum likelihood algorithm based on Taylor expansion is proposed. Firstly, the alternating projection method is used to transform the multi-dimensional search into multiple one-dimensional searches to obtain the rough estimation results corresponding to the preset large grid. Then, the second-order Taylor expansion of the one-dimensional cost function at the rough estimation results is carried out by using the matrix derivation theory. Finally, by calculating the partial derivative of the second-order Taylor expansion and making the derivative equal to zero, the closed-form solution of the off-grid parameters is obtained. Compared with the alternating projection maximum likelihood algorithm, the proposed algorithm breaks through the limitation of the search grid size. It effectively reduces the number of points in the grid calculation of the algorithm while ensuring the accuracy of itself, and improves the operation efficiency. Simulation results show the effectiveness of the algorithm.
2024, 46(8): 3228-3237.
doi: 10.11999/JEIT231139
Abstract:
For Semi-Coprime Arrays (SCA), the performance of classical Direction of Arrival (DoA) estimation algorithm degrades under the presence of coherent adjacent sources. To address this problem, a high-precison DoA estimation method for SCA is proposed. Firstly, the array is divided into three subarrays (Subarray 1 to 3 respectively). And conventional beamforming algorithm is applied to obtain the signals of the three subarrays, respectively. Then, the output signal of subarray 3 is weighted and added to the output signals of subarray 1 and 2 to construct one sum beam. Meanwhile, the difference between output signals of subarray 1 and subarray 2 is used to construct one difference beam. Finally, the final output signal is obtained by the sum beam signal and the difference beam signal. The azimuth spectrum is the power of the final output signal. This method is based on the characteristic of SCA arrays to construct sum beam and difference beam, fully utilizing the overlapping sensors of the three subarrays to improve estimation accuracy. Simulations and lake experiments are implemented to validate the effectiveness for the proposed method used for DoA estimation in SCA. The proposed method performs better than the existing approaches, such as Minimum Variance Distortionless Response (MVDR) and Min Processing (MP) when facing adjacent coherent sources.
For Semi-Coprime Arrays (SCA), the performance of classical Direction of Arrival (DoA) estimation algorithm degrades under the presence of coherent adjacent sources. To address this problem, a high-precison DoA estimation method for SCA is proposed. Firstly, the array is divided into three subarrays (Subarray 1 to 3 respectively). And conventional beamforming algorithm is applied to obtain the signals of the three subarrays, respectively. Then, the output signal of subarray 3 is weighted and added to the output signals of subarray 1 and 2 to construct one sum beam. Meanwhile, the difference between output signals of subarray 1 and subarray 2 is used to construct one difference beam. Finally, the final output signal is obtained by the sum beam signal and the difference beam signal. The azimuth spectrum is the power of the final output signal. This method is based on the characteristic of SCA arrays to construct sum beam and difference beam, fully utilizing the overlapping sensors of the three subarrays to improve estimation accuracy. Simulations and lake experiments are implemented to validate the effectiveness for the proposed method used for DoA estimation in SCA. The proposed method performs better than the existing approaches, such as Minimum Variance Distortionless Response (MVDR) and Min Processing (MP) when facing adjacent coherent sources.
2024, 46(8): 3238-3245.
doi: 10.11999/JEIT231236
Abstract:
In order to achieve identification of radar emitter unaffected by signal parameters and modulation methods, a method based on Dual Radio Frequency Fingerprint Convolutional Neural Network (Dual RFF-CNN2) and feature fusion is proposed in this paper. Firstly, Raw-In-phase/Quadrature (Raw-I/Q) signals are extracted from the received radio frequency signals. Secondly, Axially Integral Bispectrum (AIB) and Square Integral Bispectrum (SIB) dimensionality reduction are performed separately on Raw-I/Q signals to construct the bispectrum integration matrix. Finally, both the Raw-I/Q signals and the bispectrum integration matrix are fed into the Dual RFF-CNN2 network for feature fusion to achieve identification of radar emitter. Experimental results demonstrate that this method achieves high identification accuracy, and the extracted "fingerprint features" exhibit stability and robustness.
In order to achieve identification of radar emitter unaffected by signal parameters and modulation methods, a method based on Dual Radio Frequency Fingerprint Convolutional Neural Network (Dual RFF-CNN2) and feature fusion is proposed in this paper. Firstly, Raw-In-phase/Quadrature (Raw-I/Q) signals are extracted from the received radio frequency signals. Secondly, Axially Integral Bispectrum (AIB) and Square Integral Bispectrum (SIB) dimensionality reduction are performed separately on Raw-I/Q signals to construct the bispectrum integration matrix. Finally, both the Raw-I/Q signals and the bispectrum integration matrix are fed into the Dual RFF-CNN2 network for feature fusion to achieve identification of radar emitter. Experimental results demonstrate that this method achieves high identification accuracy, and the extracted "fingerprint features" exhibit stability and robustness.
2024, 46(8): 3246-3255.
doi: 10.11999/JEIT231421
Abstract:
Non-Line-Of-Sight (NLOS) propagation will cause the pseudo-range measurement error of the Global Navigation Satellite System (GNSS) receivers, and eventually lead to a large positioning error, which is especially prominent in complex environments such as urban canyons. To solve this problem, a robust positioning method for Kernel Density Estimation (KDE) is proposed. The core idea is to introduce robust estimation theory into localization solution to alleviate the influence of NLOS. Considering that the pseudo-range observation error caused by NLOS deviates from the Gaussian distribution, the proposed method firstly uses the method based on KDE to estimate the probability density function of the observation error, and then uses the probability density function to construct a robust cost function for positioning solution, so as to alleviate the positioning error caused by NLOS. The experimental results show that the proposed method can effectively reduce GNSS positioning error in NLOS propagation environment.
Non-Line-Of-Sight (NLOS) propagation will cause the pseudo-range measurement error of the Global Navigation Satellite System (GNSS) receivers, and eventually lead to a large positioning error, which is especially prominent in complex environments such as urban canyons. To solve this problem, a robust positioning method for Kernel Density Estimation (KDE) is proposed. The core idea is to introduce robust estimation theory into localization solution to alleviate the influence of NLOS. Considering that the pseudo-range observation error caused by NLOS deviates from the Gaussian distribution, the proposed method firstly uses the method based on KDE to estimate the probability density function of the observation error, and then uses the probability density function to construct a robust cost function for positioning solution, so as to alleviate the positioning error caused by NLOS. The experimental results show that the proposed method can effectively reduce GNSS positioning error in NLOS propagation environment.
2024, 46(8): 3256-3266.
doi: 10.11999/JEIT240142
Abstract:
Unsupervised Continual Learning (UCL) refers to the ability to learn over time while remembering previous patterns without supervision. Although significant progress has been made in this direction, existing works often assume strong prior knowledge about forthcoming data (e.g., knowing class boundaries), which may not be obtainable in complex and unpredictable open environments. Inspired by real-world scenarios, a more practical problem setting called unsupervised online continual learning without prior knowledge is proposed in this paper. The proposed setting is challenging because the data are non-i.i.d. and lack external supervision or prior knowledge. To address these challenges, a method called EvolveNet is intriduced, which is an adaptive unsupervised continual learning approach capable of purely extracting and memorizing representations from data streams. EvolveNet is designed around three main components: adversarial pseudo-supervised learning loss, self-supervised forgetting loss, and online memory update for uniform subset selection. The design of these three components aims to synergize and maximize learning performance. We conduct comprehensive experiments on five public datasets with EvolveNet. The results show that EvolveNet outperforms existing algorithms in all settings, achieving significantly improved accuracy on CIFAR-10, CIFAR-100, and TinyImageNet datasets, as well as performing best on the multimodal datasets Core-50 and iLab-20M for incremental learning. The cross-dataset generalization experiments are also conducted, demonstrating EvolveNet’s robustness in generalization. Finally, we open-source the EvolveNet model and core code on GitHub, facilitating progress in unsupervised continual learning and providing a useful tool and platform for the research community.
Unsupervised Continual Learning (UCL) refers to the ability to learn over time while remembering previous patterns without supervision. Although significant progress has been made in this direction, existing works often assume strong prior knowledge about forthcoming data (e.g., knowing class boundaries), which may not be obtainable in complex and unpredictable open environments. Inspired by real-world scenarios, a more practical problem setting called unsupervised online continual learning without prior knowledge is proposed in this paper. The proposed setting is challenging because the data are non-i.i.d. and lack external supervision or prior knowledge. To address these challenges, a method called EvolveNet is intriduced, which is an adaptive unsupervised continual learning approach capable of purely extracting and memorizing representations from data streams. EvolveNet is designed around three main components: adversarial pseudo-supervised learning loss, self-supervised forgetting loss, and online memory update for uniform subset selection. The design of these three components aims to synergize and maximize learning performance. We conduct comprehensive experiments on five public datasets with EvolveNet. The results show that EvolveNet outperforms existing algorithms in all settings, achieving significantly improved accuracy on CIFAR-10, CIFAR-100, and TinyImageNet datasets, as well as performing best on the multimodal datasets Core-50 and iLab-20M for incremental learning. The cross-dataset generalization experiments are also conducted, demonstrating EvolveNet’s robustness in generalization. Finally, we open-source the EvolveNet model and core code on GitHub, facilitating progress in unsupervised continual learning and providing a useful tool and platform for the research community.
2024, 46(8): 3267-3275.
doi: 10.11999/JEIT231290
Abstract:
In the 3D maneuvering target tracking, unknown prior and coordinate coupling errors can cause model-mode mismatch and state estimation bias. In this paper, the state transition matrices are modified based on the target velocity-orthogonal condition, the spherical feasible domain is approximated by using the primal-dual regularization, and the adaptive turn rate model is combined in the frame of Unscented Kalman Filtering (UKF) to estimate the model-conditioned state, attaining the consistent output processing. 3D Variable Structure Multi-Model UKF (VSMMUKF) algorithm is derived. Simulation results show that, compared to the Multimode Importance UKF (MIUKF) algorithm, VSMMUKF can more accurately fit the maneuvering motion of 3D spatial point target with the comparable computational complexity; Compared to the Interactive Multi-model Maximum Minimum Particle Filtering (IMM-MPF) algorithm, the filtering accuracy of VSMMUKF for tracking a fixed-wing Unmanned Aerial Vehicle (UAV) has improved by 2.8%~59.9%, and the overall computation burden has reduced an order of magnitude.
In the 3D maneuvering target tracking, unknown prior and coordinate coupling errors can cause model-mode mismatch and state estimation bias. In this paper, the state transition matrices are modified based on the target velocity-orthogonal condition, the spherical feasible domain is approximated by using the primal-dual regularization, and the adaptive turn rate model is combined in the frame of Unscented Kalman Filtering (UKF) to estimate the model-conditioned state, attaining the consistent output processing. 3D Variable Structure Multi-Model UKF (VSMMUKF) algorithm is derived. Simulation results show that, compared to the Multimode Importance UKF (MIUKF) algorithm, VSMMUKF can more accurately fit the maneuvering motion of 3D spatial point target with the comparable computational complexity; Compared to the Interactive Multi-model Maximum Minimum Particle Filtering (IMM-MPF) algorithm, the filtering accuracy of VSMMUKF for tracking a fixed-wing Unmanned Aerial Vehicle (UAV) has improved by 2.8%~59.9%, and the overall computation burden has reduced an order of magnitude.
2024, 46(8): 3276-3284.
doi: 10.11999/JEIT231264
Abstract:
There are long-term dependencies, such as trends, seasonality, and periodicity in time series, which may span several months. It is insufficient to apply existing methods in modeling the long-term dependencies of the series explicitly. To address this issue, this paper proposes a Statistical Feature-based Search for multivariate time series Forecasting (SFSF). First, statistical features which include smoothing, variance, and interval standardization are extracted from multivariate time series to enhance the perception of the time series’ trends and periodicity. Next, statistical features are used to search for similar series in historical sequences. The current and historical sequence information is then blended using attention mechanisms to produce accurate prediction results. Experimental results show that the SFSF method outperforms six state-of-the-art methods.
There are long-term dependencies, such as trends, seasonality, and periodicity in time series, which may span several months. It is insufficient to apply existing methods in modeling the long-term dependencies of the series explicitly. To address this issue, this paper proposes a Statistical Feature-based Search for multivariate time series Forecasting (SFSF). First, statistical features which include smoothing, variance, and interval standardization are extracted from multivariate time series to enhance the perception of the time series’ trends and periodicity. Next, statistical features are used to search for similar series in historical sequences. The current and historical sequence information is then blended using attention mechanisms to produce accurate prediction results. Experimental results show that the SFSF method outperforms six state-of-the-art methods.
2024, 46(8): 3285-3294.
doi: 10.11999/JEIT231180
Abstract:
To address the problem that the correspondence calculation of non-isometric 3D point cloud shape is easily affected by large-scale distortions, which often leads to corresponding distortions, low accuracy, and poor smoothness, a new algorithm of shape correspondence calculation for non-isometric 3D point cloud is proposed, which combines smooth attention with spectral up-sampling refinement. Firstly, a smooth attention mechanism and a smooth perception module are designed using the geometric feature information of the surface on which the points are located to improve the perception ability of the features for non-rigid transformations in large-scale deformation areas. Secondly, the deep functional maps module is combined with smooth regularization constraints to improve the smoothness of the functional maps calculation results. Finally, the final point-by-point mapping result is obtained using a multi-resolution reconstruction method in the spectral up-sampling refinement module. Experimental results show that the proposed algorithm has the smallest geodesic error in the correspondence constructed on the FAUST, SCAPE, and SMAL datasets compared with existing algorithms. It can improve the smoothness and global accuracy of point-by-point mapping for shapes with large-scale deformation.
To address the problem that the correspondence calculation of non-isometric 3D point cloud shape is easily affected by large-scale distortions, which often leads to corresponding distortions, low accuracy, and poor smoothness, a new algorithm of shape correspondence calculation for non-isometric 3D point cloud is proposed, which combines smooth attention with spectral up-sampling refinement. Firstly, a smooth attention mechanism and a smooth perception module are designed using the geometric feature information of the surface on which the points are located to improve the perception ability of the features for non-rigid transformations in large-scale deformation areas. Secondly, the deep functional maps module is combined with smooth regularization constraints to improve the smoothness of the functional maps calculation results. Finally, the final point-by-point mapping result is obtained using a multi-resolution reconstruction method in the spectral up-sampling refinement module. Experimental results show that the proposed algorithm has the smallest geodesic error in the correspondence constructed on the FAUST, SCAPE, and SMAL datasets compared with existing algorithms. It can improve the smoothness and global accuracy of point-by-point mapping for shapes with large-scale deformation.
2024, 46(8): 3295-3304.
doi: 10.11999/JEIT230945
Abstract:
Considering the trajectory prediction problem of drift buoys, an end-to-end prediction model based on the depth learning framework is proposed in this paper.The hydrodynamic models in different sea areas are quite different, and the calculation of fluid load of floating buoys on the sea surface is also complicated. Therefore, a more universal data-driven trajectory prediction model based on the multidimensional time series formed by the historical trajectories of drifting buoys is proposed. In this model, Particle Swarm Optimization (PSO) is combined with Gated Recurrent Unit (GRU), and the PSO is used to initialize the hyperparameters of the GRU neural network. The optimal drifting buoy trajectory prediction model is obtained after multiple migration iteration training. Finally, several real drifting buoy track data in the North Atlantic are used to verify the results. The results show that the PSOGRU algorithm can achieve accurate drifting buoy track prediction results.
Considering the trajectory prediction problem of drift buoys, an end-to-end prediction model based on the depth learning framework is proposed in this paper.The hydrodynamic models in different sea areas are quite different, and the calculation of fluid load of floating buoys on the sea surface is also complicated. Therefore, a more universal data-driven trajectory prediction model based on the multidimensional time series formed by the historical trajectories of drifting buoys is proposed. In this model, Particle Swarm Optimization (PSO) is combined with Gated Recurrent Unit (GRU), and the PSO is used to initialize the hyperparameters of the GRU neural network. The optimal drifting buoy trajectory prediction model is obtained after multiple migration iteration training. Finally, several real drifting buoy track data in the North Atlantic are used to verify the results. The results show that the PSOGRU algorithm can achieve accurate drifting buoy track prediction results.
2024, 46(8): 3305-3313.
doi: 10.11999/JEIT231283
Abstract:
To better leverage complementary image information from infrared and visible light images and generate fused images that align with human perception characteristics, a two-stage training strategy is proposed to obtain a novel infrared-visible image fusion Network based on pre-trained fixed Parameters and Deep feature modulation (PDNet). Specifically, in the self-supervised pre-training stage, a substantial dataset of clear natural images is employed as both inputs and outputs for the UNet backbone network, and pre-training is accomplished with autoencoder technology. As such, the resulting encoder module can proficiently extract multi-scale depth features from the input image, while the decoder module can faithfully reconstruct it into an output image with minimal deviation from the input. In the unsupervised fusion training stage, the pre-trained encoder and decoder module parameters remain fixed, and a fusion module featuring a Transformer structure is introduced between them. Within the Transformer structure, the multi-head self-attention mechanism allocates deep feature weights, extracted by the encoder from both infrared and visible light images, in a rational manner. This process fuses and modulates the deep image features at various scales into the manifold space of deep features of clear natural image, thereby ensuring the visual perception quality of the fused image after reconstruction by the decoder. Extensive experimental results demonstrate that, in comparison to current mainstream fusion models (algorithms), the proposed PDNet model exhibits substantial advantages across various objective evaluation metrics. Furthermore, in subjective visual evaluations, it aligns more closely with human visual perception characteristics.
To better leverage complementary image information from infrared and visible light images and generate fused images that align with human perception characteristics, a two-stage training strategy is proposed to obtain a novel infrared-visible image fusion Network based on pre-trained fixed Parameters and Deep feature modulation (PDNet). Specifically, in the self-supervised pre-training stage, a substantial dataset of clear natural images is employed as both inputs and outputs for the UNet backbone network, and pre-training is accomplished with autoencoder technology. As such, the resulting encoder module can proficiently extract multi-scale depth features from the input image, while the decoder module can faithfully reconstruct it into an output image with minimal deviation from the input. In the unsupervised fusion training stage, the pre-trained encoder and decoder module parameters remain fixed, and a fusion module featuring a Transformer structure is introduced between them. Within the Transformer structure, the multi-head self-attention mechanism allocates deep feature weights, extracted by the encoder from both infrared and visible light images, in a rational manner. This process fuses and modulates the deep image features at various scales into the manifold space of deep features of clear natural image, thereby ensuring the visual perception quality of the fused image after reconstruction by the decoder. Extensive experimental results demonstrate that, in comparison to current mainstream fusion models (algorithms), the proposed PDNet model exhibits substantial advantages across various objective evaluation metrics. Furthermore, in subjective visual evaluations, it aligns more closely with human visual perception characteristics.
2024, 46(8): 3314-3323.
doi: 10.11999/JEIT231161
Abstract:
Currently, the Contrastive Language-Image Pre-training (CLIP) has shown great potential in zero-shot 3D shape classification. However, there is a large modality gap between 3D shapes and texts, which limits further improvement of classification accuracy. To address the problem, a zero-shot 3D shape classification method based on semantic-enhanced CLIP is proposed in this paper. Firstly, 3D shapes are represented as views. Then, in order to improve recognition ability of unknown categories in zero-shot learning, the semantic descriptive text of each view and its corresponding category are obtained through a visual language generative model, and it is used as the semantic bridge between views and category prompt texts. The semantic descriptive texts are obtained through image captioning and visual question answering. Finally, the finely-adjusted semantic encoder is used to concretize the semantic descriptive texts to the semantic descriptions of each category, which have rich semantic information and strong interpretability, and effectively reduce the semantic gap between views and category prompt texts. Experiments show that our method outperforms existing zero-shot classification methods on the ModelNet10 and ModelNet40 datasets.
Currently, the Contrastive Language-Image Pre-training (CLIP) has shown great potential in zero-shot 3D shape classification. However, there is a large modality gap between 3D shapes and texts, which limits further improvement of classification accuracy. To address the problem, a zero-shot 3D shape classification method based on semantic-enhanced CLIP is proposed in this paper. Firstly, 3D shapes are represented as views. Then, in order to improve recognition ability of unknown categories in zero-shot learning, the semantic descriptive text of each view and its corresponding category are obtained through a visual language generative model, and it is used as the semantic bridge between views and category prompt texts. The semantic descriptive texts are obtained through image captioning and visual question answering. Finally, the finely-adjusted semantic encoder is used to concretize the semantic descriptive texts to the semantic descriptions of each category, which have rich semantic information and strong interpretability, and effectively reduce the semantic gap between views and category prompt texts. Experiments show that our method outperforms existing zero-shot classification methods on the ModelNet10 and ModelNet40 datasets.
2024, 46(8): 3324-3333.
doi: 10.11999/JEIT231170
Abstract:
To comprehensively explore the information content of camouflaged target features, leverage the potential of target detection algorithms, and address issues such as low camouflage target detection accuracy and high false positive rates, a camouflage target detection algorithm named CAFM-YOLOv5 (Cross Attention Fusion Module Based on YOLOv5) is proposed. Firstly, a camouflaged target multispectral dataset is constructed for the performance validation of the multimodal image fusion method; secondly, a dual-stream convolution channel is constructed for visible and infrared image feature extraction; and finally, a cross-attention fusion module is proposed based on the channel-attention mechanism and spatial-attention mechanism in order to realise the effective fusion of two different features.Experimental results demonstrate that the model achieves a detection accuracy of 96.4% and a recognition probability of 88.1%, surpassing the YOLOv5 baseline network. Moreover, when compared with unimodal detection algorithms like YOLOv8 and multimodal detection algorithms such as SLBAF-Net, the proposed algorithm exhibits superior performance in detection accuracy metrics. These findings highlight the practical value of the proposed method for military target detection on the battlefield, enhancing situational awareness capabilities significantly.
To comprehensively explore the information content of camouflaged target features, leverage the potential of target detection algorithms, and address issues such as low camouflage target detection accuracy and high false positive rates, a camouflage target detection algorithm named CAFM-YOLOv5 (Cross Attention Fusion Module Based on YOLOv5) is proposed. Firstly, a camouflaged target multispectral dataset is constructed for the performance validation of the multimodal image fusion method; secondly, a dual-stream convolution channel is constructed for visible and infrared image feature extraction; and finally, a cross-attention fusion module is proposed based on the channel-attention mechanism and spatial-attention mechanism in order to realise the effective fusion of two different features.Experimental results demonstrate that the model achieves a detection accuracy of 96.4% and a recognition probability of 88.1%, surpassing the YOLOv5 baseline network. Moreover, when compared with unimodal detection algorithms like YOLOv8 and multimodal detection algorithms such as SLBAF-Net, the proposed algorithm exhibits superior performance in detection accuracy metrics. These findings highlight the practical value of the proposed method for military target detection on the battlefield, enhancing situational awareness capabilities significantly.
2024, 46(8): 3334-3342.
doi: 10.11999/JEIT231338
Abstract:
Realizing high accuracy and low computational burden is a serious challenge faced by Convolutional Neural Network (CNN) for real-time semantic segmentation. In this paper, an efficient real-time semantic segmentation Adaptive Attention mechanism Fusion Network(AAFNet) is designed for complex urban street scenes with numerous types of targets and large changes in lighting. Image spatial details and semantic information are respectively extracted by the network, and then, through Feature Fusion Network(FFN), accurate semantic images are obtained. Dilated Deep-Wise separable convolution (DDW) is adopted by AAFNet to increase the receptive field of semantic feature extraction, an Adaptive Attention mechanism Fusion Module (AAFM) is proposed, which combines Adaptive average pooling(Avp) and Adaptive max pooling(Amp) to refine the edge segmentation effect of the target and reduce the leakage rate of small targets. Finally, semantic segmentation experiments are performed on the Cityscapes and CamVid datasets for complex urban street scenes. The designed AAFNet achieves 73.0% and 69.8% mean Intersection over Union (mIoU) at inference speeds of 32 fps (Cityscapes) and 52 fps (CamVid). Compared with Dilated Spatial Attention Network (DSANet), Multi-Scale Context Fusion Network (MSCFNet), and Lightweight Bilateral Asymmetric Residual Network (LBARNet), AAFNet has the highest segmentation accuracy.
Realizing high accuracy and low computational burden is a serious challenge faced by Convolutional Neural Network (CNN) for real-time semantic segmentation. In this paper, an efficient real-time semantic segmentation Adaptive Attention mechanism Fusion Network(AAFNet) is designed for complex urban street scenes with numerous types of targets and large changes in lighting. Image spatial details and semantic information are respectively extracted by the network, and then, through Feature Fusion Network(FFN), accurate semantic images are obtained. Dilated Deep-Wise separable convolution (DDW) is adopted by AAFNet to increase the receptive field of semantic feature extraction, an Adaptive Attention mechanism Fusion Module (AAFM) is proposed, which combines Adaptive average pooling(Avp) and Adaptive max pooling(Amp) to refine the edge segmentation effect of the target and reduce the leakage rate of small targets. Finally, semantic segmentation experiments are performed on the Cityscapes and CamVid datasets for complex urban street scenes. The designed AAFNet achieves 73.0% and 69.8% mean Intersection over Union (mIoU) at inference speeds of 32 fps (Cityscapes) and 52 fps (CamVid). Compared with Dilated Spatial Attention Network (DSANet), Multi-Scale Context Fusion Network (MSCFNet), and Lightweight Bilateral Asymmetric Residual Network (LBARNet), AAFNet has the highest segmentation accuracy.
2024, 46(8): 3343-3352.
doi: 10.11999/JEIT231262
Abstract:
An end-to-end quadruple Super-Resolution Inpainting Generative Adversarial Network (SRIGAN) is proposed in this paper, for low-resolution random occlusion face images. The generative network consists of an encoder, a feature compensation subnetwork, and a decoder constructed with a pyramid attention module. The discriminant network is an improved Patch discriminant network. The network can effectively learn the absent features of the occluded region through a feature compensation subnetwork and a two-stage training strategy. Then, the information is constructed with the decoder with a pyramid attention module and multi-scale reconstruction loss. Hence, the generative network can transform a low-resolution occlusion image into a quadruple high-resolution complete image. Furthermore, the improvements of the loss function and Patch discriminant network are employed to ensure the stability of network training and enhance the performance of the generated network. The effectiveness of the proposed algorithm is verified by comparison and module verification experiments.
An end-to-end quadruple Super-Resolution Inpainting Generative Adversarial Network (SRIGAN) is proposed in this paper, for low-resolution random occlusion face images. The generative network consists of an encoder, a feature compensation subnetwork, and a decoder constructed with a pyramid attention module. The discriminant network is an improved Patch discriminant network. The network can effectively learn the absent features of the occluded region through a feature compensation subnetwork and a two-stage training strategy. Then, the information is constructed with the decoder with a pyramid attention module and multi-scale reconstruction loss. Hence, the generative network can transform a low-resolution occlusion image into a quadruple high-resolution complete image. Furthermore, the improvements of the loss function and Patch discriminant network are employed to ensure the stability of network training and enhance the performance of the generated network. The effectiveness of the proposed algorithm is verified by comparison and module verification experiments.
2024, 46(8): 3353-3362.
doi: 10.11999/JEIT231145
Abstract:
Elements such as pulse interference and outlier measurement information usually lead to abnormal heavy-tailed noise, which sharply reduces the performance of the Extended Target Tracking (ETT) estimator based on the Gaussian hypothesis. To address this problem, a Student’s t Inverse Wishart Smoothing (StIWS) algorithm based on the Random Matrix Model (RMM) is proposed. Firstly, the kinematic state of the target, process noise and measurement noise are modeled as a Student’s t distribution to characterize the effect of anomalous noise on the probability distribution of extended target, and the extended state of target is modeled as a random matrix which obeys inverse Wishart distribution. Then, in a Student’s t bayesian smoothing frame, the StIWS algorithm is derived in detail, which can effectively estimate target state in the process of the dynamic evolution of multiple characteristics of extended target. Finally, the effectiveness of the proposed algorithm is verified by the simulation experiment and the engineering experiment of extended target tracking.
Elements such as pulse interference and outlier measurement information usually lead to abnormal heavy-tailed noise, which sharply reduces the performance of the Extended Target Tracking (ETT) estimator based on the Gaussian hypothesis. To address this problem, a Student’s t Inverse Wishart Smoothing (StIWS) algorithm based on the Random Matrix Model (RMM) is proposed. Firstly, the kinematic state of the target, process noise and measurement noise are modeled as a Student’s t distribution to characterize the effect of anomalous noise on the probability distribution of extended target, and the extended state of target is modeled as a random matrix which obeys inverse Wishart distribution. Then, in a Student’s t bayesian smoothing frame, the StIWS algorithm is derived in detail, which can effectively estimate target state in the process of the dynamic evolution of multiple characteristics of extended target. Finally, the effectiveness of the proposed algorithm is verified by the simulation experiment and the engineering experiment of extended target tracking.
2024, 46(8): 3363-3371.
doi: 10.11999/JEIT231267
Abstract:
The use of semantic segmentation technology to extract high-resolution remote sensing image object segmentation has important application prospects. With the rapid development of multi-sensor technology, the good complementary advantages between multimodal remote sensing images have received widespread attention, and joint analysis of them has become a research hotspot. This article analyzes both optical remote sensing images and elevation data, and proposes a multi-task collaborative model based on multimodal remote sensing data (United Refined PSPNet, UR-PSPNet) to address the issue of insufficient fusion classification accuracy of the two types of data due to insufficient fully registered elevation data in real scenarios. This model extracts deep features of optical images, predicts semantic labels and elevation values, and embeds elevation data as supervised information, to improve the accuracy of target segmentation. This article designs a comparative experiment based on ISPRS, which proves that this algorithm can better fuse multimodal data features and improve the accuracy of object segmentation in optical remote sensing images.
The use of semantic segmentation technology to extract high-resolution remote sensing image object segmentation has important application prospects. With the rapid development of multi-sensor technology, the good complementary advantages between multimodal remote sensing images have received widespread attention, and joint analysis of them has become a research hotspot. This article analyzes both optical remote sensing images and elevation data, and proposes a multi-task collaborative model based on multimodal remote sensing data (United Refined PSPNet, UR-PSPNet) to address the issue of insufficient fusion classification accuracy of the two types of data due to insufficient fully registered elevation data in real scenarios. This model extracts deep features of optical images, predicts semantic labels and elevation values, and embeds elevation data as supervised information, to improve the accuracy of target segmentation. This article designs a comparative experiment based on ISPRS, which proves that this algorithm can better fuse multimodal data features and improve the accuracy of object segmentation in optical remote sensing images.
2024, 46(8): 3372-3381.
doi: 10.11999/JEIT231274
Abstract:
Deep learning methods have gained popularity in multimodal sentiment analysis due to their impressive representation and fusion capabilities in recent years. Existing studies often analyze the emotions of individuals using multimodal information such as text, facial expressions, and speech intonation, primarily employing complex fusion methods. However, existing models inadequately consider the dynamic changes in emotions over long time sequences, resulting in suboptimal performance in sentiment analysis. In response to this issue, a Multimodal Sentiment Analysis Model Enhanced with Non-verbal Information and Contrastive Learning is proposed in this paper. Firstly, the paper employs long-term textual information to enable the model to learn dynamic changes in audio and video across extended time sequences. Subsequently, a gating mechanism is employed to eliminate redundant information and semantic ambiguity between modalities. Finally, contrastive learning is applied to strengthen the interaction between modalities, enhancing the model’s generalization. Experimental results demonstrate that on the CMU-MOSI dataset, the model improves the Pearson Correlation coefficient (Corr) and F1 score by 3.7% and 2.1%, respectively. On the CMU-MOSEI dataset, the model increases “Corr” and “F1 score” by 1.4% and 1.1%, respectively. Therefore, the proposed model effectively utilizes intermodal interaction information while eliminating information redundancy.
Deep learning methods have gained popularity in multimodal sentiment analysis due to their impressive representation and fusion capabilities in recent years. Existing studies often analyze the emotions of individuals using multimodal information such as text, facial expressions, and speech intonation, primarily employing complex fusion methods. However, existing models inadequately consider the dynamic changes in emotions over long time sequences, resulting in suboptimal performance in sentiment analysis. In response to this issue, a Multimodal Sentiment Analysis Model Enhanced with Non-verbal Information and Contrastive Learning is proposed in this paper. Firstly, the paper employs long-term textual information to enable the model to learn dynamic changes in audio and video across extended time sequences. Subsequently, a gating mechanism is employed to eliminate redundant information and semantic ambiguity between modalities. Finally, contrastive learning is applied to strengthen the interaction between modalities, enhancing the model’s generalization. Experimental results demonstrate that on the CMU-MOSI dataset, the model improves the Pearson Correlation coefficient (Corr) and F1 score by 3.7% and 2.1%, respectively. On the CMU-MOSEI dataset, the model increases “Corr” and “F1 score” by 1.4% and 1.1%, respectively. Therefore, the proposed model effectively utilizes intermodal interaction information while eliminating information redundancy.
2024, 46(8): 3382-3389.
doi: 10.11999/JEIT231417
Abstract:
Empathic dialogue aims to provide mental health support for anxious users, thus chatbots with empathic capabilities is a noteworthy issue. The existing methods can only identify users’ sentiment states, but can not effectively generate empathetic responses according to users’ different sentiment states and let alone effectively relieve users’ bad emotions. Therefore, in the research of building sentiment support chatbots, how to dynamically capture users’ fine-grained sentiment features and provide corresponding psychological support according to sentiment features needs to be further explored. This paper proposes an empathic dialogue generation method based on the fusion of emotion and strategy. Firstly, the sentiment classification network is used to dynamically perceive the user’s sentiment state. Then the support strategy is used to accurately model the strategy matching network which is introduced according to the context of the conversation to generate the conversation. Finally, by comparing the experimental results of the proposed method and the current advanced methods on the corresponding datasets, the effectiveness of the proposed method and the importance of sentiment support are verified.
Empathic dialogue aims to provide mental health support for anxious users, thus chatbots with empathic capabilities is a noteworthy issue. The existing methods can only identify users’ sentiment states, but can not effectively generate empathetic responses according to users’ different sentiment states and let alone effectively relieve users’ bad emotions. Therefore, in the research of building sentiment support chatbots, how to dynamically capture users’ fine-grained sentiment features and provide corresponding psychological support according to sentiment features needs to be further explored. This paper proposes an empathic dialogue generation method based on the fusion of emotion and strategy. Firstly, the sentiment classification network is used to dynamically perceive the user’s sentiment state. Then the support strategy is used to accurately model the strategy matching network which is introduced according to the context of the conversation to generate the conversation. Finally, by comparing the experimental results of the proposed method and the current advanced methods on the corresponding datasets, the effectiveness of the proposed method and the importance of sentiment support are verified.
2024, 46(8): 3390-3399.
doi: 10.11999/JEIT230826
Abstract:
Cervical cell classification plays a crucial role in assisting the diagnosis of cervical cancer. However, existing methods for cervical cell classification do not enough consider relationships among cells and background information, and fail to effectively simulate the diagnostic approach of pathology doctors. As a result, their classification performance is limited. In this study, a novel approach that integrates cell relationships and background information for cervical cell classification is proposed. The proposed method consists of a Graph Attention Branching for Cell-Cell Relationships (GAB-CCR) and a Background Attention Branching for Whole Slide Images (BAB-WSI). GAB-CCR utilizes cosine similarity of cell features to construct preliminary graphs representing similar and distinct cell relationships. Additionally, GAB-CCR enhances the ability of models in modeling cell relationships through GATv2. BAB-WSI employs multi-head attention to effectively capture crucial information on the slide background and reflect the importance of different regions. Finally, the enhanced cell and background features are fused to improve the classification performance of the network. Experimental results demonstrate that the proposed method achieves significant improvements over the baseline model, Swin Transformer-L, with improvement in accuracy, sensitivity, specificity, and F1-Score by 15.9%, 30.32%, 8.11%, and 31.62% respectively.
Cervical cell classification plays a crucial role in assisting the diagnosis of cervical cancer. However, existing methods for cervical cell classification do not enough consider relationships among cells and background information, and fail to effectively simulate the diagnostic approach of pathology doctors. As a result, their classification performance is limited. In this study, a novel approach that integrates cell relationships and background information for cervical cell classification is proposed. The proposed method consists of a Graph Attention Branching for Cell-Cell Relationships (GAB-CCR) and a Background Attention Branching for Whole Slide Images (BAB-WSI). GAB-CCR utilizes cosine similarity of cell features to construct preliminary graphs representing similar and distinct cell relationships. Additionally, GAB-CCR enhances the ability of models in modeling cell relationships through GATv2. BAB-WSI employs multi-head attention to effectively capture crucial information on the slide background and reflect the importance of different regions. Finally, the enhanced cell and background features are fused to improve the classification performance of the network. Experimental results demonstrate that the proposed method achieves significant improvements over the baseline model, Swin Transformer-L, with improvement in accuracy, sensitivity, specificity, and F1-Score by 15.9%, 30.32%, 8.11%, and 31.62% respectively.
2024, 46(8): 3400-3409.
doi: 10.11999/JEIT231272
Abstract:
In the information era, information security is the priority that cannot be ignored. Attacks and protection against password devices are research hotspots in this field. In recent years, various attacks on cryptographic devices have become well-known, all aimed at obtaining keys from the device. Among these attacks, power side channel attack is one of the most concerned attack techniques. Mask technology is an effective method to combat power side channel attacks, however, with the continuous progress of attack methods, the protection of first-order mask is no longer sufficient to cope with second-order and higher order power analysis attack, so the research on higher-order mask has considerable significance. To enhance the encryption circuit’s capability of anti-attack, high-order masking schemes: N-share masking is implemented on S-box in this paper, and a universal design method for galois field secure multiplication is proposed, which is based on the secure scheme published by Ishai et al. at Crypto 2003 (ISW framework). Through experiments, it has been shown that the encryption scheme adopted in this paper does not affect the functionality of the encryption algorithm, and can resist first-order and second-order correlation power analysis attack.
In the information era, information security is the priority that cannot be ignored. Attacks and protection against password devices are research hotspots in this field. In recent years, various attacks on cryptographic devices have become well-known, all aimed at obtaining keys from the device. Among these attacks, power side channel attack is one of the most concerned attack techniques. Mask technology is an effective method to combat power side channel attacks, however, with the continuous progress of attack methods, the protection of first-order mask is no longer sufficient to cope with second-order and higher order power analysis attack, so the research on higher-order mask has considerable significance. To enhance the encryption circuit’s capability of anti-attack, high-order masking schemes: N-share masking is implemented on S-box in this paper, and a universal design method for galois field secure multiplication is proposed, which is based on the secure scheme published by Ishai et al. at Crypto 2003 (ISW framework). Through experiments, it has been shown that the encryption scheme adopted in this paper does not affect the functionality of the encryption algorithm, and can resist first-order and second-order correlation power analysis attack.
2024, 46(8): 3410-3418.
doi: 10.11999/JEIT231332
Abstract:
Perfect complementary sequence is a kind of signal with ideal correlation function, which is widely used in multiple access communication system, radar waveform design and so on. However, the set size of perfect complementary sequences is at most equal to the number of its subsequences. In order to expand the number of complementary sequences, the construction methods of aperiodic low correlation zone complementary sequence set are studied in this paper. First, two kinds of mapping functions on finite fields are proposed, and then two kinds of low correlation zone complementary sequence set with asymptotically optimal parameters are obtained. The number of these kinds of low correlation zone complementary sequence set are more than that of the perfect complementary sequence set, thereby supporting more users in the communication system.
Perfect complementary sequence is a kind of signal with ideal correlation function, which is widely used in multiple access communication system, radar waveform design and so on. However, the set size of perfect complementary sequences is at most equal to the number of its subsequences. In order to expand the number of complementary sequences, the construction methods of aperiodic low correlation zone complementary sequence set are studied in this paper. First, two kinds of mapping functions on finite fields are proposed, and then two kinds of low correlation zone complementary sequence set with asymptotically optimal parameters are obtained. The number of these kinds of low correlation zone complementary sequence set are more than that of the perfect complementary sequence set, thereby supporting more users in the communication system.
2024, 46(8): 3419-3427.
doi: 10.11999/JEIT231234
Abstract:
In vertical federated learning, the datasets of the clients have overlapping sample IDs and features of different dimensions, thus the data alignment is necessary for model training. As the intersection of the sample IDs is public in current data alignment technologies, how to align the data without any leakage of the intersection becomes a key issue. The proposed private-preserving data ALIGNment framework (ALIGN) is based on interchangeable encryption and homomorphic encryption technologies, mainly including data encryption, ciphertext blinding, private intersecting, and feature splicing. The sample IDs are encrypted twice based on an interchangeable encryption algorithm, where the same ciphertexts correspond to the same plaintexts, and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm. The intersection of the encrypted sample IDs is obtained, and the corresponding features are then spliced and secretly shared with the participants. Compared to the existing technologies, the privacy of the ID intersection is protected, and the samples corresponding to the IDs outside intersection can be removed safely in our framework. The security proof shows that each participant cannot obtain any knowledge of each other except for the data size, which guarantees the effectiveness of the private-preserving strategies. The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85% with every 10% reduction of the redundant data. The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training.
In vertical federated learning, the datasets of the clients have overlapping sample IDs and features of different dimensions, thus the data alignment is necessary for model training. As the intersection of the sample IDs is public in current data alignment technologies, how to align the data without any leakage of the intersection becomes a key issue. The proposed private-preserving data ALIGNment framework (ALIGN) is based on interchangeable encryption and homomorphic encryption technologies, mainly including data encryption, ciphertext blinding, private intersecting, and feature splicing. The sample IDs are encrypted twice based on an interchangeable encryption algorithm, where the same ciphertexts correspond to the same plaintexts, and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm. The intersection of the encrypted sample IDs is obtained, and the corresponding features are then spliced and secretly shared with the participants. Compared to the existing technologies, the privacy of the ID intersection is protected, and the samples corresponding to the IDs outside intersection can be removed safely in our framework. The security proof shows that each participant cannot obtain any knowledge of each other except for the data size, which guarantees the effectiveness of the private-preserving strategies. The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85% with every 10% reduction of the redundant data. The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training.
2024, 46(8): 3428-3435.
doi: 10.11999/JEIT231202
Abstract:
The integration of satellite communication and ground mobile communication in a complementary manner has emerged as a prevailing trend, which means the wireless radio frequency front-end with Power Amplifier (PA) as the core need to tackle the dual challenges of high efficiency and large bandwidth. In this paper, the proposed input harmonic phase control method effectively overcomes the bottleneck of mutual restriction between bandwidth and efficiency. By employing a continuous inverse Class-F operating mode, it enables the reconstruction of transistor drain waveform through precise control of the input second harmonic phase. This approach ensures high efficiency, while significantly enhancing the impedance design space. Based on the expanded design space, a continuous inverse Class-F PA is designed and fabricated over the frequency band of 1.7~3.0 GHz. Experimental results demonstrate an output power of 40.62~42.78 dBm, accompanied by a drain efficiency ranging from 72.2% to 78.6%. Additionally, the gain of the designed PA ranges from 10.6 dB to 14.8 dB.
The integration of satellite communication and ground mobile communication in a complementary manner has emerged as a prevailing trend, which means the wireless radio frequency front-end with Power Amplifier (PA) as the core need to tackle the dual challenges of high efficiency and large bandwidth. In this paper, the proposed input harmonic phase control method effectively overcomes the bottleneck of mutual restriction between bandwidth and efficiency. By employing a continuous inverse Class-F operating mode, it enables the reconstruction of transistor drain waveform through precise control of the input second harmonic phase. This approach ensures high efficiency, while significantly enhancing the impedance design space. Based on the expanded design space, a continuous inverse Class-F PA is designed and fabricated over the frequency band of 1.7~3.0 GHz. Experimental results demonstrate an output power of 40.62~42.78 dBm, accompanied by a drain efficiency ranging from 72.2% to 78.6%. Additionally, the gain of the designed PA ranges from 10.6 dB to 14.8 dB.
2024, 46(8): 3436-3444.
doi: 10.11999/JEIT231257
Abstract:
Static power consumption dominates the power overhead of Network-on-Chip (NoC) as the technology size shrinks. Power gating, a generalized power saving technique, turns off idle modules in NoCs to reduce static power consumption. However, the conventional power gating technique brings problems such as packet wake-up delay, break-even time, etc. To solve the above problems, the Partition Bypass Transmission Infrastructure (PBTI), which replaces the power gated router for packet transmission, is proposed in this paper, and a low-latency, low-power power gating scheme has been designed based upon this bypass mechanism. PBTI uses mutually independent bypasses to handle east-west packets separately, and uses common buffers within the bypasses to improve buffer utilization. PBTI can inject, transmit, and eject packets when the router is powered off. Packets can be transmitted from the source node to the destination node even if all routers in the network are power gated. When the traffic increases beyond the transmission capacity of PBTI, the routers perform a uniform wake-up in columns. Experimental results show that compared to the NoC without power gating, the scheme in this paper reduces 83.4% of static power consumption and 17.2% of packet delay, while adding only 6.2% additional area overhead. Compared to the conventional power gating scheme the power gated design in this paper achieves lower power consumption and delay, which is a significant advantage.
Static power consumption dominates the power overhead of Network-on-Chip (NoC) as the technology size shrinks. Power gating, a generalized power saving technique, turns off idle modules in NoCs to reduce static power consumption. However, the conventional power gating technique brings problems such as packet wake-up delay, break-even time, etc. To solve the above problems, the Partition Bypass Transmission Infrastructure (PBTI), which replaces the power gated router for packet transmission, is proposed in this paper, and a low-latency, low-power power gating scheme has been designed based upon this bypass mechanism. PBTI uses mutually independent bypasses to handle east-west packets separately, and uses common buffers within the bypasses to improve buffer utilization. PBTI can inject, transmit, and eject packets when the router is powered off. Packets can be transmitted from the source node to the destination node even if all routers in the network are power gated. When the traffic increases beyond the transmission capacity of PBTI, the routers perform a uniform wake-up in columns. Experimental results show that compared to the NoC without power gating, the scheme in this paper reduces 83.4% of static power consumption and 17.2% of packet delay, while adding only 6.2% additional area overhead. Compared to the conventional power gating scheme the power gated design in this paper achieves lower power consumption and delay, which is a significant advantage.