Email alert
Current Issue
2024 Vol. 46, No. 11
Display Method:
2024, 46(11): 1-4.
Abstract:
2024, 46(11): 4081-4091.
doi: 10.11999/JEIT240284
Abstract:
Computing-in-Memory (CiM) architectures based on Resistive Random Access Memory (ReRAM) have been recognized as a promising solution to accelerate deep learning applications. As intelligent applications continue to evolve, deep learning models become larger and larger, which imposes higher demands on the computational and storage resources on processing platforms. However, due to the non-idealism of ReRAM, large-scale ReRAM-based computing systems face severe challenges of low yield and reliability. Chiplet-based architectures assemble multiple small chiplets into a single package, providing higher fabrication yield and lower manufacturing costs, which has become a primary trend in chip design. However, compared to on-chip wiring, the expensive inter-chiplet communication becomes a performance bottleneck for chiplet-based systems which limits the chip’s scalability. As the countermeasure, a novel scaling framework for chiplet-based CiM accelerators, SMCA (SMT-based CiM chiplet Acceleration) is proposed in this paper. This framework comprises an adaptive deep learning task partition strategy and an automated SMT-based workload deployment to generate the most energy-efficient DNN workload scheduling strategy with the minimum data transmission on chiplet-based deep learning accelerators, achieving effective improvement in system performance and efficiency. Experimental results show that compared to existing strategies, the SMCA-generated automatically schedule strategy can reduce the energy costs of inter-chiplet communication by 35%.
Computing-in-Memory (CiM) architectures based on Resistive Random Access Memory (ReRAM) have been recognized as a promising solution to accelerate deep learning applications. As intelligent applications continue to evolve, deep learning models become larger and larger, which imposes higher demands on the computational and storage resources on processing platforms. However, due to the non-idealism of ReRAM, large-scale ReRAM-based computing systems face severe challenges of low yield and reliability. Chiplet-based architectures assemble multiple small chiplets into a single package, providing higher fabrication yield and lower manufacturing costs, which has become a primary trend in chip design. However, compared to on-chip wiring, the expensive inter-chiplet communication becomes a performance bottleneck for chiplet-based systems which limits the chip’s scalability. As the countermeasure, a novel scaling framework for chiplet-based CiM accelerators, SMCA (SMT-based CiM chiplet Acceleration) is proposed in this paper. This framework comprises an adaptive deep learning task partition strategy and an automated SMT-based workload deployment to generate the most energy-efficient DNN workload scheduling strategy with the minimum data transmission on chiplet-based deep learning accelerators, achieving effective improvement in system performance and efficiency. Experimental results show that compared to existing strategies, the SMCA-generated automatically schedule strategy can reduce the energy costs of inter-chiplet communication by 35%.
2024, 46(11): 4092-4100.
doi: 10.11999/JEIT240162
Abstract:
As chip manufacturing has advanced to the sub-micro-nanometer scale, shrinking technology nodes are accelerating link failures in on-chip network, and the growth of failure links reduces the number of available routing paths and might lead to severe traffic congestion or even system crashes. The difficulty in maintaining the correctness of the on-chip system dramatically rises as the technology node shrinks. Previous schemes typically utilize deflection algorithms to bypass packets. However, they incur additional transmission latency due to hop count and raise the probability of deadlock. In order to achieve normal packet transmission when encountering faulty links, a self-Adaptive Fault-tolerant Link NoC design (AFL_NoC) is proposed, which redirects packets encountering a faulty link to another reversible link. The scheme contains a specific implementation of the reversible link and the associated distributed control protocol. The dynamic fault-tolerant link design fully utilizes the idle, available link and ensures that the network communication is not interrupted in case of link failures. Compared with the advanced fault-tolerant deflection routing algorithm QFCAR-W, AFL_NoC can reduce the average delay by 10%, the area overhead by 14.2%, and the power overhead by 9.3%.
As chip manufacturing has advanced to the sub-micro-nanometer scale, shrinking technology nodes are accelerating link failures in on-chip network, and the growth of failure links reduces the number of available routing paths and might lead to severe traffic congestion or even system crashes. The difficulty in maintaining the correctness of the on-chip system dramatically rises as the technology node shrinks. Previous schemes typically utilize deflection algorithms to bypass packets. However, they incur additional transmission latency due to hop count and raise the probability of deadlock. In order to achieve normal packet transmission when encountering faulty links, a self-Adaptive Fault-tolerant Link NoC design (AFL_NoC) is proposed, which redirects packets encountering a faulty link to another reversible link. The scheme contains a specific implementation of the reversible link and the associated distributed control protocol. The dynamic fault-tolerant link design fully utilizes the idle, available link and ensures that the network communication is not interrupted in case of link failures. Compared with the advanced fault-tolerant deflection routing algorithm QFCAR-W, AFL_NoC can reduce the average delay by 10%, the area overhead by 14.2%, and the power overhead by 9.3%.
2024, 46(11): 4101-4111.
doi: 10.11999/JEIT240300
Abstract:
Physical Unclonable Functions (PUFs), as well as Exclusive OR (XOR) operations, play an important role in the field of information security. In order to break through the functional barrier between PUF and logic operation, an integrated design scheme of PUF and multi-bit parallel XOR operation circuit based on the random process deviation of Differential Cascode Voltage Switch Logic (DCVSL) XOR gate cascade unit is proposed by studying the working mechanism of PUF and DCVSL. By adding a pre-charge tube at the differential output of the DCVSL XOR gate and setting a control gate at the ground end, three operating modes of the PUF feature information extraction, XOR/ Negated Exclusive OR (XNOR) operation and power control can be switched freely. Meanwhile, for the PUF response stability problem, the unstable bit hybrid screening technique with extreme and golden operating point participation labeling was proposed. Based on TSMC process of 65 nm, a fully customized layout design for a 10-bit input bit-wide circuit with an area of 38.76 μm2 was carried out. The experimental results show that the1024 -bit output response can be generated in PUF mode, and a stable key of more than 512 bit can be obtained after hybrid screening, which has good randomness and uniqueness; In the operation mode, 10-bit parallel XOR and XNOR operations can be achieved simultaneously, with power consumption and delay of 2.67 μW and 593.52 ps, respectively. In power control mode, the standby power consumption is only 70.5 nW. The proposed method provides a novel way to break the function-wall of PUF.
Physical Unclonable Functions (PUFs), as well as Exclusive OR (XOR) operations, play an important role in the field of information security. In order to break through the functional barrier between PUF and logic operation, an integrated design scheme of PUF and multi-bit parallel XOR operation circuit based on the random process deviation of Differential Cascode Voltage Switch Logic (DCVSL) XOR gate cascade unit is proposed by studying the working mechanism of PUF and DCVSL. By adding a pre-charge tube at the differential output of the DCVSL XOR gate and setting a control gate at the ground end, three operating modes of the PUF feature information extraction, XOR/ Negated Exclusive OR (XNOR) operation and power control can be switched freely. Meanwhile, for the PUF response stability problem, the unstable bit hybrid screening technique with extreme and golden operating point participation labeling was proposed. Based on TSMC process of 65 nm, a fully customized layout design for a 10-bit input bit-wide circuit with an area of 38.76 μm2 was carried out. The experimental results show that the
2024, 46(11): 4112-4122.
doi: 10.11999/JEIT240090
Abstract:
The modular high-voltage power supply, characterized by high efficiency, reliability, and reconfigurability, has found widespread application in high-power high-voltage devices. Among them, the input series output series topology based on the series-parallel resonant converter is suitable for high-frequency high-voltage operating environments, offering advantages such as reduced power losses, winding dielectric losses, and utilizing parasitic parameters of multi-stage transformer. It has broad prospects for application. Current research on this topology primarily focuses on theoretical analysis and efficiency optimization. In practical high-voltage environments, the high-voltage isolation issues between windings of multi-stage transformers have not been effectively addressed. In this paper, a design of shared primary windings for multi-stage transformers is proposed to simplify the high-voltage isolation issues inherent in traditional transformer single-stage winding methods. However, this winding scheme can lead to non-uniform voltage distribution and voltage divergence in multi-stage transformers. Therefore, based on utilizing the parasitic parameters of diodes in transformers and voltage doubling rectifier circuits, an improved topology design is proposed to effectively address the uneven voltage distribution issue. Simulation and experimental validations were conducted, and the results from both simulations and experiments confirm the effectiveness of the proposed high-voltage isolation structure with shared primary windings and the improved topology.
The modular high-voltage power supply, characterized by high efficiency, reliability, and reconfigurability, has found widespread application in high-power high-voltage devices. Among them, the input series output series topology based on the series-parallel resonant converter is suitable for high-frequency high-voltage operating environments, offering advantages such as reduced power losses, winding dielectric losses, and utilizing parasitic parameters of multi-stage transformer. It has broad prospects for application. Current research on this topology primarily focuses on theoretical analysis and efficiency optimization. In practical high-voltage environments, the high-voltage isolation issues between windings of multi-stage transformers have not been effectively addressed. In this paper, a design of shared primary windings for multi-stage transformers is proposed to simplify the high-voltage isolation issues inherent in traditional transformer single-stage winding methods. However, this winding scheme can lead to non-uniform voltage distribution and voltage divergence in multi-stage transformers. Therefore, based on utilizing the parasitic parameters of diodes in transformers and voltage doubling rectifier circuits, an improved topology design is proposed to effectively address the uneven voltage distribution issue. Simulation and experimental validations were conducted, and the results from both simulations and experiments confirm the effectiveness of the proposed high-voltage isolation structure with shared primary windings and the improved topology.
2024, 46(11): 4123-4131.
doi: 10.11999/JEIT240224
Abstract:
Blind signal detection is of great significance in large-scale communication networks and has been widely used. How to quickly obtain blind signal detection results is an urgent need for the new generation of real-time communication networks. Considering this demand, a Complex-valued Hopfield Neural Network (CHNN) circuit is designed that can accelerate blind signal detection from an analog circuit perspective, the proposed circuit can accelerate the blind signal detection by rapidly performing massively parallel calculation in one step. At the same time, the circuit can be programmable by adjusting the conductance and input voltage of the memristor. The Pspice simulation results show that the computing accuracy of the proposed circuit can exceed 99%. Compared with Matlab software simulation, the proposed circuit is three orders of magnitude faster in terms of computing time. And the accuracy can be maintained at more than 99% even under the interference of 20% noise.
Blind signal detection is of great significance in large-scale communication networks and has been widely used. How to quickly obtain blind signal detection results is an urgent need for the new generation of real-time communication networks. Considering this demand, a Complex-valued Hopfield Neural Network (CHNN) circuit is designed that can accelerate blind signal detection from an analog circuit perspective, the proposed circuit can accelerate the blind signal detection by rapidly performing massively parallel calculation in one step. At the same time, the circuit can be programmable by adjusting the conductance and input voltage of the memristor. The Pspice simulation results show that the computing accuracy of the proposed circuit can exceed 99%. Compared with Matlab software simulation, the proposed circuit is three orders of magnitude faster in terms of computing time. And the accuracy can be maintained at more than 99% even under the interference of 20% noise.
2024, 46(11): 4132-4140.
doi: 10.11999/JEIT240210
Abstract:
With the advancement of robot automatic navigation technology, software-based path planning algorithms can no longer satisfy the needs in scenarios of many real-time applications. Fast and efficient hardware customization of the algorithm is required to achieve low-latency performance acceleration. In this work, High Level Synthesis (HLS) of classic A* algorithm is studied. Hardware-oriented data structure and function optimization, varying design constraints are explored to pick the right architecture, which is then followed by FPGA synthesis. Experimental results show that, compared to the conventional Register Transfer Level (RTL) method, the HLS-based FPGA implementation of the A* algorithm can achieve better productivity, improved hardware performance and resource utilization efficiency, which demonstrates the advantages of high level synthesis in hardware customization in algorithm-centric applications.
With the advancement of robot automatic navigation technology, software-based path planning algorithms can no longer satisfy the needs in scenarios of many real-time applications. Fast and efficient hardware customization of the algorithm is required to achieve low-latency performance acceleration. In this work, High Level Synthesis (HLS) of classic A* algorithm is studied. Hardware-oriented data structure and function optimization, varying design constraints are explored to pick the right architecture, which is then followed by FPGA synthesis. Experimental results show that, compared to the conventional Register Transfer Level (RTL) method, the HLS-based FPGA implementation of the A* algorithm can achieve better productivity, improved hardware performance and resource utilization efficiency, which demonstrates the advantages of high level synthesis in hardware customization in algorithm-centric applications.
2024, 46(11): 4141-4150.
doi: 10.11999/JEIT240049
Abstract:
As a new generation of flow-based microfluidics, Fully Programmable Valve Array (FPVA) biochips have become a popular biochemical experimental platform that provide higher flexibility and programmability. Due to environmental and human factors, however, there are usually some physical faults in the manufacturing process such as channel blockage and leakage, which, undoubtedly, can affect the results of bioassays. In addition, as the primary stage of architecture synthesis, high-level synthesis directly affects the quality of sub-sequent design. The fault tolerance problem in the high-level synthesis stage of FPVA biochips is focused on for the first time in this paper, and dynamic fault-tolerant techniques, including a cell function conversion method, a bidirectional redundancy scheme, and a fault mapping method, are presented, providing technical guarantee for realizing efficient fault-tolerant design. By integrating these techniques into the high-level synthesis stage, a high-quality fault-tolerance-oriented high-level synthesis algorithm for FPVA biochips is further realized in this paper, including a fault-aware real-time binding strategy and a fault-aware priority scheduling strategy, which lays a good foundation for the robustness of chip architecture and the correctness of assay outcomes. Experimental results confirm that a high-quality and fault-tolerant high-level synthesis scheme of FPVA biochips can be obtained by the proposed algorithm, providing a strong guarantee for the subsequent realization of a fault-tolerant physical design scheme.
As a new generation of flow-based microfluidics, Fully Programmable Valve Array (FPVA) biochips have become a popular biochemical experimental platform that provide higher flexibility and programmability. Due to environmental and human factors, however, there are usually some physical faults in the manufacturing process such as channel blockage and leakage, which, undoubtedly, can affect the results of bioassays. In addition, as the primary stage of architecture synthesis, high-level synthesis directly affects the quality of sub-sequent design. The fault tolerance problem in the high-level synthesis stage of FPVA biochips is focused on for the first time in this paper, and dynamic fault-tolerant techniques, including a cell function conversion method, a bidirectional redundancy scheme, and a fault mapping method, are presented, providing technical guarantee for realizing efficient fault-tolerant design. By integrating these techniques into the high-level synthesis stage, a high-quality fault-tolerance-oriented high-level synthesis algorithm for FPVA biochips is further realized in this paper, including a fault-aware real-time binding strategy and a fault-aware priority scheduling strategy, which lays a good foundation for the robustness of chip architecture and the correctness of assay outcomes. Experimental results confirm that a high-quality and fault-tolerant high-level synthesis scheme of FPVA biochips can be obtained by the proposed algorithm, providing a strong guarantee for the subsequent realization of a fault-tolerant physical design scheme.
2024, 46(11): 4151-4160.
doi: 10.11999/JEIT240219
Abstract:
With the rapid development of integrated circuit technology, chips are easily implanted with malicious hardware Trojan logic in the process of design, production and packaging. Current security detection methods for IP soft core are logically complex, prone to errors and omissions, and unable to detect encrypted IP soft core. The paper uses the feature differences of non-controllable IP soft core and hardware Trojan Register Transfer Level (RTL) code grayscale map, proposing a hardware Trojan detection method for IP soft cores based on graph feature analysis, through the map conversion and map enhancement to get the standard map, using the texture feature extraction matching algorithm to achieve hardware Trojan detection. The experimental subjects are functional logic units with seven types of typical Trojans implanted during the design phase, and the detection results show that the detection correct rate of seven types of typical hardware Trojans has reached more than 90%, and the average growth rate of the number of successful feature point matches after the image enhancement has reached 13.24%, effectively improving the effectiveness of hardware Trojan detection.
With the rapid development of integrated circuit technology, chips are easily implanted with malicious hardware Trojan logic in the process of design, production and packaging. Current security detection methods for IP soft core are logically complex, prone to errors and omissions, and unable to detect encrypted IP soft core. The paper uses the feature differences of non-controllable IP soft core and hardware Trojan Register Transfer Level (RTL) code grayscale map, proposing a hardware Trojan detection method for IP soft cores based on graph feature analysis, through the map conversion and map enhancement to get the standard map, using the texture feature extraction matching algorithm to achieve hardware Trojan detection. The experimental subjects are functional logic units with seven types of typical Trojans implanted during the design phase, and the detection results show that the detection correct rate of seven types of typical hardware Trojans has reached more than 90%, and the average growth rate of the number of successful feature point matches after the image enhancement has reached 13.24%, effectively improving the effectiveness of hardware Trojan detection.
2024, 46(11): 4161-4169.
doi: 10.11999/JEIT240183
Abstract:
The XEX-based Tweaked-codebook mode with ciphertext Stealing (XTS) is widely used in storage encryption. With the emergence and application of big data computing and novel side-channel analysis methods, the security of the XTS encryption mode has become a matter of concern. Recent studies have attempted side-channel analysis on the XTS mode, aiming to narrow down the key search space by identifying partial keys and tweak values, but a comprehensive analysis of the XTS mode system has not been achieved. In this paper, a side-channel analysis technique targeting the SM4-XTS circuit is proposed. By combining traditional Correlation Power Analysis (CPA) with a multi-stage fusion CPA technique, the technique addresses the binary number shifting issue caused by the iterative modulation multiplication of the tweak values, enabling precise extraction of both the tweak values and keys. To validate the effectiveness of this analytical technique, an SM4-XTS encryption module is implemented on an FPGA to simulate real-world encryption memory scenarios. Experimental results demonstrate that the technique can successfully extract partial tweak values and keys from the target encryption circuit using only 10 000 power traces.
The XEX-based Tweaked-codebook mode with ciphertext Stealing (XTS) is widely used in storage encryption. With the emergence and application of big data computing and novel side-channel analysis methods, the security of the XTS encryption mode has become a matter of concern. Recent studies have attempted side-channel analysis on the XTS mode, aiming to narrow down the key search space by identifying partial keys and tweak values, but a comprehensive analysis of the XTS mode system has not been achieved. In this paper, a side-channel analysis technique targeting the SM4-XTS circuit is proposed. By combining traditional Correlation Power Analysis (CPA) with a multi-stage fusion CPA technique, the technique addresses the binary number shifting issue caused by the iterative modulation multiplication of the tweak values, enabling precise extraction of both the tweak values and keys. To validate the effectiveness of this analytical technique, an SM4-XTS encryption module is implemented on an FPGA to simulate real-world encryption memory scenarios. Experimental results demonstrate that the technique can successfully extract partial tweak values and keys from the target encryption circuit using only 10 000 power traces.
2024, 46(11): 4170-4177.
doi: 10.11999/JEIT240161
Abstract:
Most of the existing lipreading models use a combination of single-layer 3D convolution and 2D convolutional neural networks to extract spatio-temporal joint features from lip video sequences. However, due to the limitations of single-layer 3D convolutions in capturing temporal information and the restricted capability of 2D convolutional neural networks in exploring fine-grained lipreading features, a Multi-Scale Lipreading Network (MS-LipNet) is proposed to improve lip reading tasks. In this paper, 3D spatio-temporal convolution is used to replace traditional two-dimensional convolution in Res2Net network to better extract spatio-temporal joint features, and a spatio-temporal coordinate attention module is proposed to make the network focus on task-related important regional features. The effectiveness of the proposed method was verified through experiments conducted on the LRW and LRW-1000 datasets.
Most of the existing lipreading models use a combination of single-layer 3D convolution and 2D convolutional neural networks to extract spatio-temporal joint features from lip video sequences. However, due to the limitations of single-layer 3D convolutions in capturing temporal information and the restricted capability of 2D convolutional neural networks in exploring fine-grained lipreading features, a Multi-Scale Lipreading Network (MS-LipNet) is proposed to improve lip reading tasks. In this paper, 3D spatio-temporal convolution is used to replace traditional two-dimensional convolution in Res2Net network to better extract spatio-temporal joint features, and a spatio-temporal coordinate attention module is proposed to make the network focus on task-related important regional features. The effectiveness of the proposed method was verified through experiments conducted on the LRW and LRW-1000 datasets.
2024, 46(11): 4178-4187.
doi: 10.11999/JEIT240316
Abstract:
Currently, traditional explicit scene representation Simultaneous Localization And Mapping (SLAM) systems discretize the scene and are not suitable for continuous scene reconstruction. A RGB-D SLAM system based on hybrid scene representation of Neural Radiation Fields (NeRF) is proposed in this paper. The extended explicit octree Signed Distance Functions (SDF) prior is used to roughly represent the scene, and multi-resolution hash coding is used to represent the scene with different details levels, enabling fast initialization of scene geometry and making scene geometry easier to learn. In addition, the appearance color decomposition method is used to decompose the color into diffuse reflection color and specular reflection color based on the view direction to achieve reconstruction of lighting consistency, making the reconstruction result more realistic. Through experiments on the Replica and TUM RGB-D dataset, the scene reconstruction completion rate of the Replica dataset reaches 93.65%. Compared with the Vox-Fusion positioning accuracy, it leads on average by 87.50% on the Replica dataset and by 81.99% on the TUM RGB-D dataset.
Currently, traditional explicit scene representation Simultaneous Localization And Mapping (SLAM) systems discretize the scene and are not suitable for continuous scene reconstruction. A RGB-D SLAM system based on hybrid scene representation of Neural Radiation Fields (NeRF) is proposed in this paper. The extended explicit octree Signed Distance Functions (SDF) prior is used to roughly represent the scene, and multi-resolution hash coding is used to represent the scene with different details levels, enabling fast initialization of scene geometry and making scene geometry easier to learn. In addition, the appearance color decomposition method is used to decompose the color into diffuse reflection color and specular reflection color based on the view direction to achieve reconstruction of lighting consistency, making the reconstruction result more realistic. Through experiments on the Replica and TUM RGB-D dataset, the scene reconstruction completion rate of the Replica dataset reaches 93.65%. Compared with the Vox-Fusion positioning accuracy, it leads on average by 87.50% on the Replica dataset and by 81.99% on the TUM RGB-D dataset.
2024, 46(11): 4188-4197.
doi: 10.11999/JEIT240359
Abstract:
Influenced by factors such as observation conditions and acquisition scenarios, underwater optical image data usually presents the characteristics of high-dimensional small samples and is easily accompanied with noise interference, resulting in many dimension reduction methods lacking robust performance in their recognition process. To solve this problem, a novel 2DPCA method for underwater image recognition, called Dual Flexible Metric Adaptive Weighted 2DPCA (DFMAW-2DPCA), is proposed. DFMAW-2DPCA not only utilizes a flexible robust distance metric mechanism in establishing a dual-layer relationship between reconstruction error and variance, but also adaptively learn matching weights based on the actual state of each sample, which effectively enhances the robustness of the model in underwater noise interference environments and improves recognition accuracy. In this paper, a fast nongreedy algorithm for obtaining the optimal solution is designed and has good convergence. The extensive experimental results on three underwater image databases show that DFMAW-2DPCA has more outstanding overall performance than other 2DPCA-based methods.
Influenced by factors such as observation conditions and acquisition scenarios, underwater optical image data usually presents the characteristics of high-dimensional small samples and is easily accompanied with noise interference, resulting in many dimension reduction methods lacking robust performance in their recognition process. To solve this problem, a novel 2DPCA method for underwater image recognition, called Dual Flexible Metric Adaptive Weighted 2DPCA (DFMAW-2DPCA), is proposed. DFMAW-2DPCA not only utilizes a flexible robust distance metric mechanism in establishing a dual-layer relationship between reconstruction error and variance, but also adaptively learn matching weights based on the actual state of each sample, which effectively enhances the robustness of the model in underwater noise interference environments and improves recognition accuracy. In this paper, a fast nongreedy algorithm for obtaining the optimal solution is designed and has good convergence. The extensive experimental results on three underwater image databases show that DFMAW-2DPCA has more outstanding overall performance than other 2DPCA-based methods.
2024, 46(11): 4198-4207.
doi: 10.11999/JEIT231394
Abstract:
To address the issues of insufficient multi-scale feature expression ability and insufficient utilization of shallow features in memory network algorithms, a Video Object Segmentation (VOS) algorithm based on multi-scale feature enhancement and global local feature aggregation is proposed in this paper. Firstly, the multi-scale feature enhancement module fuses different scale feature information from reference mask branches and reference RGB branches to enhance the expression ability of multi-scale features; At the same time, a global local feature aggregation module is established, which utilizes convolution operations of different sizes of receptive fields to extract features, through the feature aggregation module, the features of the global and local regions are adaptively fused. This fusion method can better capture the global features and detailed information of the target, improving the accuracy of segmentation; Finally, a cross layer fusion module is designed to improve the accuracy of masks segmentation by utilizing the spatial details of shallow features. By fusing shallow features with deep features, it can better capture the details and edge information of the target. The experimental results show that on the public datasets DAVIS2016, DAVIS2017, and YouTube 2018, the comprehensive performance of our algorithm reaches 91.8%, 84.5%, and 83.0%, respectively, and can run in real-time on both single and multi-objective segmentation tasks.
To address the issues of insufficient multi-scale feature expression ability and insufficient utilization of shallow features in memory network algorithms, a Video Object Segmentation (VOS) algorithm based on multi-scale feature enhancement and global local feature aggregation is proposed in this paper. Firstly, the multi-scale feature enhancement module fuses different scale feature information from reference mask branches and reference RGB branches to enhance the expression ability of multi-scale features; At the same time, a global local feature aggregation module is established, which utilizes convolution operations of different sizes of receptive fields to extract features, through the feature aggregation module, the features of the global and local regions are adaptively fused. This fusion method can better capture the global features and detailed information of the target, improving the accuracy of segmentation; Finally, a cross layer fusion module is designed to improve the accuracy of masks segmentation by utilizing the spatial details of shallow features. By fusing shallow features with deep features, it can better capture the details and edge information of the target. The experimental results show that on the public datasets DAVIS2016, DAVIS2017, and YouTube 2018, the comprehensive performance of our algorithm reaches 91.8%, 84.5%, and 83.0%, respectively, and can run in real-time on both single and multi-objective segmentation tasks.
2024, 46(11): 4208-4218.
doi: 10.11999/JEIT240330
Abstract:
Many traditional imbalanced learning algorithms suitable for low-dimensional data do not perform well on image data. Although the oversampling algorithm based on Generative Adversarial Networks (GAN) can generate high-quality images, it is prone to mode collapse in the case of class imbalance. Oversampling algorithms based on AutoEncoders (AE) are easy to train, but the generated images are of lower quality. In order to improve the quality of samples generated by the oversampling algorithm in imbalanced images and the stability of training, a Balanced oversampling method with AutoEncoders and Generative Adversarial Networks (BAEGAN) is proposed in this paper, which is based on the idea of GAN and AE. First, a conditional embedding layer is introduced in the Autoencoder, and the pre-trained conditional Autoencoder is used to initialize the GAN to stabilize the model training; then the output structure of the discriminator is improved, and a loss function that combines Focal Loss and gradient penalty is proposed to alleviate the impact of class imbalance; and finally the Synthetic Minority Oversampling TEchnique (SMOTE) is used to generate high-quality images from the distribution map of latent vectors. Experimental results on four image data sets show that the proposed algorithm is superior to oversampling methods such as Auxiliary Classifier Generative Adversarial Networks (ACGAN) and BAlancing Generative Adversarial Networks (BAGAN) in terms of image quality and classification performance after oversampling and can effectively solve the class imbalance problem in image data.
Many traditional imbalanced learning algorithms suitable for low-dimensional data do not perform well on image data. Although the oversampling algorithm based on Generative Adversarial Networks (GAN) can generate high-quality images, it is prone to mode collapse in the case of class imbalance. Oversampling algorithms based on AutoEncoders (AE) are easy to train, but the generated images are of lower quality. In order to improve the quality of samples generated by the oversampling algorithm in imbalanced images and the stability of training, a Balanced oversampling method with AutoEncoders and Generative Adversarial Networks (BAEGAN) is proposed in this paper, which is based on the idea of GAN and AE. First, a conditional embedding layer is introduced in the Autoencoder, and the pre-trained conditional Autoencoder is used to initialize the GAN to stabilize the model training; then the output structure of the discriminator is improved, and a loss function that combines Focal Loss and gradient penalty is proposed to alleviate the impact of class imbalance; and finally the Synthetic Minority Oversampling TEchnique (SMOTE) is used to generate high-quality images from the distribution map of latent vectors. Experimental results on four image data sets show that the proposed algorithm is superior to oversampling methods such as Auxiliary Classifier Generative Adversarial Networks (ACGAN) and BAlancing Generative Adversarial Networks (BAGAN) in terms of image quality and classification performance after oversampling and can effectively solve the class imbalance problem in image data.
2024, 46(11): 4219-4228.
doi: 10.11999/JEIT240113
Abstract:
Multi-exposure image fusion is used to enhance the dynamic range of images, resulting in higher-quality outputs. However, for blurred long-exposure images captured in fast-motion scenes, such as autonomous driving, the image quality achieved by directly fusing them with low-exposure images using generalized fusion methods is often suboptimal. Currently, end-to-end fusion methods for combining long and short exposure images with motion blur are lacking. To address this issue, a Deblur Fusion Network (DF-Net) is proposed to solve the problem of fusing long and short exposure images with motion blur in an end-to-end manner. A residual module combined with wavelet transform is proposed for constructing the encoder and decoder, where a single encoder is designed for the feature extraction of short exposure images, a multilevel structure based on encoder and decoder is built for feature extraction of long exposure images with blurring, a residual mean excitation fusion module is designed for the fusion of the long and short exposure features, and finally the image is reconstructed by the decoder. Due to the lack of a benchmark dataset, a multi-exposure fusion dataset with motion blur based on the dataset SICE is created for model training and testing. Finally, the designed model and method are experimentally compared with other state-of-the-art step-by-step optimization methods for image deblurring and multi-exposure fusion from both qualitative and quantitative perspectives to verify the superiority of the model and method in this paper for multi-exposure image fusion with motion blur. The validation is also conducted on a multi-exposure dataset acquired from a moving vehicle, and the effectiveness of the proposed method in solving practical problems is demonstrated by the results.
Multi-exposure image fusion is used to enhance the dynamic range of images, resulting in higher-quality outputs. However, for blurred long-exposure images captured in fast-motion scenes, such as autonomous driving, the image quality achieved by directly fusing them with low-exposure images using generalized fusion methods is often suboptimal. Currently, end-to-end fusion methods for combining long and short exposure images with motion blur are lacking. To address this issue, a Deblur Fusion Network (DF-Net) is proposed to solve the problem of fusing long and short exposure images with motion blur in an end-to-end manner. A residual module combined with wavelet transform is proposed for constructing the encoder and decoder, where a single encoder is designed for the feature extraction of short exposure images, a multilevel structure based on encoder and decoder is built for feature extraction of long exposure images with blurring, a residual mean excitation fusion module is designed for the fusion of the long and short exposure features, and finally the image is reconstructed by the decoder. Due to the lack of a benchmark dataset, a multi-exposure fusion dataset with motion blur based on the dataset SICE is created for model training and testing. Finally, the designed model and method are experimentally compared with other state-of-the-art step-by-step optimization methods for image deblurring and multi-exposure fusion from both qualitative and quantitative perspectives to verify the superiority of the model and method in this paper for multi-exposure image fusion with motion blur. The validation is also conducted on a multi-exposure dataset acquired from a moving vehicle, and the effectiveness of the proposed method in solving practical problems is demonstrated by the results.
2024, 46(11): 4229-4235.
doi: 10.11999/JEIT240114
Abstract:
Given that the performance of the Deep Image Prior (DIP) denoising model highly depends on the search space determined by the target image, a new improved denoising model called RS-DIP (Relatively clean image Space-based DIP) is proposed by comprehensively improving its network input, backbone network, and loss function.Initially, two state-of-the-art supervised denoising models are employed to preprocess two noisy images from the same scene, which are referred to as relatively clean images. Furthermore, these two relatively clean images are combined as the network input using a random sampling fusion method. At the same time, the noisy images are replaced with two relatively clean images, which serve as dual-target images. This strategy narrows the search space, allowing exploration of potential images that closely resemble the ground-truth image. Finally, the multi-scale U-shaped backbone network in the original DIP model is simplified to a single scale. Additionally, the inclusion of Transformer modules enhances the network’s ability to effectively model distant pixels. This augmentation bolsters the model’s performance while preserving the network’s search capability. Experimental results demonstrate that the proposed denoising model exhibits significant advantages over the original DIP model in terms of both denoising effectiveness and execution efficiency. Moreover, regarding denoising effectiveness, it surpasses mainstream supervised denoising models.
Given that the performance of the Deep Image Prior (DIP) denoising model highly depends on the search space determined by the target image, a new improved denoising model called RS-DIP (Relatively clean image Space-based DIP) is proposed by comprehensively improving its network input, backbone network, and loss function.Initially, two state-of-the-art supervised denoising models are employed to preprocess two noisy images from the same scene, which are referred to as relatively clean images. Furthermore, these two relatively clean images are combined as the network input using a random sampling fusion method. At the same time, the noisy images are replaced with two relatively clean images, which serve as dual-target images. This strategy narrows the search space, allowing exploration of potential images that closely resemble the ground-truth image. Finally, the multi-scale U-shaped backbone network in the original DIP model is simplified to a single scale. Additionally, the inclusion of Transformer modules enhances the network’s ability to effectively model distant pixels. This augmentation bolsters the model’s performance while preserving the network’s search capability. Experimental results demonstrate that the proposed denoising model exhibits significant advantages over the original DIP model in terms of both denoising effectiveness and execution efficiency. Moreover, regarding denoising effectiveness, it surpasses mainstream supervised denoising models.
2024, 46(11): 4236-4246.
doi: 10.11999/JEIT240257
Abstract:
Considering the issues of limited receptive field and insufficient feature interaction in vision-language tracking framework combineing Bi-level routing Perception and Scattering Visual Trans-formation (BPSVTrack) is proposed in this paper. Initially, a Bi-level Routing Perception Module (BRPM) is designed which combines Efficient Additive Attention(EAA) and Dual Dynamic Adaptive Module(DDAM) in parallel to enable bidirectional interaction for expanding the receptive field. Consequently, enhancing the model’s ability to integrate features between different windows and sizes efficiently, thereby improving the model’s ability to perceive objects in complex scenes. Secondly, the Scattering Vision Transform Module(SVTM) based on Dual-Tree Complex Wavelet Transform(DTCWT) is introduced to decompose the image into low frequency and high frequency information, aiming to capture the target structure and fine-grained details in the image, thus improving the robustness and accuracy of the model in complex environments. The proposed framework achieves accuracies of 86.1%, 64.4%, and 63.2% on OTB99, LaSOT and TNL2K tracking datasets respectively. Moreover, it attains an accuracy of 70.21% on the RefCOCOg dataset, the performance in tracking and locating surpasses that of the baseline model.
Considering the issues of limited receptive field and insufficient feature interaction in vision-language tracking framework combineing Bi-level routing Perception and Scattering Visual Trans-formation (BPSVTrack) is proposed in this paper. Initially, a Bi-level Routing Perception Module (BRPM) is designed which combines Efficient Additive Attention(EAA) and Dual Dynamic Adaptive Module(DDAM) in parallel to enable bidirectional interaction for expanding the receptive field. Consequently, enhancing the model’s ability to integrate features between different windows and sizes efficiently, thereby improving the model’s ability to perceive objects in complex scenes. Secondly, the Scattering Vision Transform Module(SVTM) based on Dual-Tree Complex Wavelet Transform(DTCWT) is introduced to decompose the image into low frequency and high frequency information, aiming to capture the target structure and fine-grained details in the image, thus improving the robustness and accuracy of the model in complex environments. The proposed framework achieves accuracies of 86.1%, 64.4%, and 63.2% on OTB99, LaSOT and TNL2K tracking datasets respectively. Moreover, it attains an accuracy of 70.21% on the RefCOCOg dataset, the performance in tracking and locating surpasses that of the baseline model.
2024, 46(11): 4247-4258.
doi: 10.11999/JEIT240295
Abstract:
Video compressed sensing reconstruction is a highly underdetermined problem, where the low-quality of initial reconstructed and the single-motion estimation approach limit the effective modeling of inter-frames correlations. To improve video reconstruction performance, the Static and Dynamic-domain Prior Enhancement Two-stage reconstruction Network (SDPETs-Net) is proposed. Firstly, a strategy of reconstructing second-order static-domain residuals using reference frame measurements is proposed, and a corresponding Static-domain Prior Enhancement Network (SPE-Net) is designed to provide a reliable basis for dynamic-domain prior modeling. Secondly, the Pyramid Deformable-convolution Combined with Attention-search Network (PDCA-Net) is designed, which combines the advantages of deformable-convolution and attention mechanisms, and a pyramid cascade structure is constructed to effectively model and utilize dynamic-domain prior knowledge. Lastly, the Multi-Feature Fusion Residual Reconstruction Network (MFRR-Net) extracts and fuses key information of each feature from multiple scales to reconstruct residues, alleviating the instability of model training caused by the coupling of the two stages and suppressing feature degradation. Simulation results show that the Peak Signal-to-Noise Ratio (PSNR) is improved by an average of 3.34 dB compared to the representative two-stage network JDR-TAFA-Net under the UCF101 test set, and by an average of 0.79 dB compared to the recent multi-stage network DMIGAN.
Video compressed sensing reconstruction is a highly underdetermined problem, where the low-quality of initial reconstructed and the single-motion estimation approach limit the effective modeling of inter-frames correlations. To improve video reconstruction performance, the Static and Dynamic-domain Prior Enhancement Two-stage reconstruction Network (SDPETs-Net) is proposed. Firstly, a strategy of reconstructing second-order static-domain residuals using reference frame measurements is proposed, and a corresponding Static-domain Prior Enhancement Network (SPE-Net) is designed to provide a reliable basis for dynamic-domain prior modeling. Secondly, the Pyramid Deformable-convolution Combined with Attention-search Network (PDCA-Net) is designed, which combines the advantages of deformable-convolution and attention mechanisms, and a pyramid cascade structure is constructed to effectively model and utilize dynamic-domain prior knowledge. Lastly, the Multi-Feature Fusion Residual Reconstruction Network (MFRR-Net) extracts and fuses key information of each feature from multiple scales to reconstruct residues, alleviating the instability of model training caused by the coupling of the two stages and suppressing feature degradation. Simulation results show that the Peak Signal-to-Noise Ratio (PSNR) is improved by an average of 3.34 dB compared to the representative two-stage network JDR-TAFA-Net under the UCF101 test set, and by an average of 0.79 dB compared to the recent multi-stage network DMIGAN.
2024, 46(11): 4259-4267.
doi: 10.11999/JEIT240253
Abstract:
Sea surface temperature is one of the key elements of the marine environment, which is of great significance to the marine dynamic process and air-sea interaction. Buoy is a commonly used method of sea surface temperature observation. However, due to the irregular distribution of buoys in space, the sea surface temperature data collected by buoys also show irregularity. In addition, it is inevitable that sometimes the buoy is out of order, so that the sea surface temperature data collected is incomplete. Therefore, it is of great significance to reconstruct the incomplete irregular sea surface temperature data. In this paper, the sea surface temperature data is established as a time-varying graph signal, and the graph signal processing method is used to solve the problem of missing data reconstruction of sea surface temperature. Firstly, the sea surface temperature reconstruction model is constructed by using the low rank data and the joint variation characteristics of time-domain and graph-domain. Secondly, a time-varying graph signal reconstruction method based on Low Rank and Joint Smoothness (LRJS) constraints is proposed to solve the optimization problem by using the framework of alternating direction multiplier method, and the computational complexity and the theoretical limit of the estimation error of the method are analyzed. Finally, the sea surface temperature data of the South China Sea and the Pacific Ocean are used to evaluate the effectiveness of the method. The results show that the LRJS method proposed in this paper can improve the reconstruction accuracy compared with the existing missing data reconstruction methods.
Sea surface temperature is one of the key elements of the marine environment, which is of great significance to the marine dynamic process and air-sea interaction. Buoy is a commonly used method of sea surface temperature observation. However, due to the irregular distribution of buoys in space, the sea surface temperature data collected by buoys also show irregularity. In addition, it is inevitable that sometimes the buoy is out of order, so that the sea surface temperature data collected is incomplete. Therefore, it is of great significance to reconstruct the incomplete irregular sea surface temperature data. In this paper, the sea surface temperature data is established as a time-varying graph signal, and the graph signal processing method is used to solve the problem of missing data reconstruction of sea surface temperature. Firstly, the sea surface temperature reconstruction model is constructed by using the low rank data and the joint variation characteristics of time-domain and graph-domain. Secondly, a time-varying graph signal reconstruction method based on Low Rank and Joint Smoothness (LRJS) constraints is proposed to solve the optimization problem by using the framework of alternating direction multiplier method, and the computational complexity and the theoretical limit of the estimation error of the method are analyzed. Finally, the sea surface temperature data of the South China Sea and the Pacific Ocean are used to evaluate the effectiveness of the method. The results show that the LRJS method proposed in this paper can improve the reconstruction accuracy compared with the existing missing data reconstruction methods.
2024, 46(11): 4268-4277.
doi: 10.11999/JEIT240342
Abstract:
In modern electronic countermeasures, grouping of multiple joint radar and communication systems can improve the detection efficiency and collaborative detection capability of the single joint radar and communication system. Due to the high peak to average power ratio of the joint radar and communication signal itself, the signal is easy to be intercepted, and the system’s survivability is seriously threatened. In order to improve the Low Probability of Intercept (LPI) performance of the joint radar and communication signal, a time-frequency structure of grouping LPI joint radar and communication signal with communication subcarrier grouping power optimization and radar subcarrier interleaving equal power optimization under the framework of filter bank multicarrier is proposed in this paper. Then, from the perspective of the information theory, the paper unifies the performance assessment metrics of the system; On this basis, minimizing the intercepted information divergence of the interceptor is taken as the optimization objective, and an LPI optimization model of the group network joint radar and communication signal is established. The paper converts this optimization model into a convex optimization problem and solves it using the Karush-Kuhn-Tucker condition. The simulation results show that the radar interference of the network LPI joint radar and communication signal designed in this paper has inter-node radar interference as low as nearly –60 dB when detecting moving targets, and the communication bit error rate satisfies 10–6 order of magnitude, while the signal-to-noise ratio of the intercepted signal is effectively reduced.
In modern electronic countermeasures, grouping of multiple joint radar and communication systems can improve the detection efficiency and collaborative detection capability of the single joint radar and communication system. Due to the high peak to average power ratio of the joint radar and communication signal itself, the signal is easy to be intercepted, and the system’s survivability is seriously threatened. In order to improve the Low Probability of Intercept (LPI) performance of the joint radar and communication signal, a time-frequency structure of grouping LPI joint radar and communication signal with communication subcarrier grouping power optimization and radar subcarrier interleaving equal power optimization under the framework of filter bank multicarrier is proposed in this paper. Then, from the perspective of the information theory, the paper unifies the performance assessment metrics of the system; On this basis, minimizing the intercepted information divergence of the interceptor is taken as the optimization objective, and an LPI optimization model of the group network joint radar and communication signal is established. The paper converts this optimization model into a convex optimization problem and solves it using the Karush-Kuhn-Tucker condition. The simulation results show that the radar interference of the network LPI joint radar and communication signal designed in this paper has inter-node radar interference as low as nearly –60 dB when detecting moving targets, and the communication bit error rate satisfies 10–6 order of magnitude, while the signal-to-noise ratio of the intercepted signal is effectively reduced.
2024, 46(11): 4278-4286.
doi: 10.11999/JEIT240389
Abstract:
The design and optimization issues of secure downlink transmission scheme for two users based on rate-splitting multiple access are studied. Considering a scenario where partial messages sent to two users need to be kept confidential between users, the sum rate of non-confidential messages is maximized while ensuring the transmission rate of confidential messages. The common stream only carries the non-confidential messages, while the private streams carry both the non-confidential and confidential messages in a time-sharing manner. Transmit precoding vectors for each message flow, rate splitting, transmission time allocation for the private streams of non-confidential and confidential messages are jointly optimized. By decomposing the original problem into a two-level optimization problem and using methods such as binary search, relaxation variables, and successive convex approximation, the original problem is transformed and solved. The simulation results show that the proposed scheme can achieve higher non-confidential sum rate compared to the rate-splitting multiple access, where the private streams carry only the confidential messages, and space division multiple access with time-sharing between non-confidential messages and confidential messages.
The design and optimization issues of secure downlink transmission scheme for two users based on rate-splitting multiple access are studied. Considering a scenario where partial messages sent to two users need to be kept confidential between users, the sum rate of non-confidential messages is maximized while ensuring the transmission rate of confidential messages. The common stream only carries the non-confidential messages, while the private streams carry both the non-confidential and confidential messages in a time-sharing manner. Transmit precoding vectors for each message flow, rate splitting, transmission time allocation for the private streams of non-confidential and confidential messages are jointly optimized. By decomposing the original problem into a two-level optimization problem and using methods such as binary search, relaxation variables, and successive convex approximation, the original problem is transformed and solved. The simulation results show that the proposed scheme can achieve higher non-confidential sum rate compared to the rate-splitting multiple access, where the private streams carry only the confidential messages, and space division multiple access with time-sharing between non-confidential messages and confidential messages.
2024, 46(11): 4287-4294.
doi: 10.11999/JEIT240275
Abstract:
To solve the bottleneck problem of constrained spectrum resource for Unmanned Aerial Vehicles (UAVs) in unlicensed bands, a co-optimization scheme high spectral efficiency in underlay mechanism is proposed for UAV-assisted monitoring communication networks in urban environment. Considering the high maneuverability of UAVs, the air-to-ground channel is modeled as a probabilistic Line-of-Sight (LoS) channel, and the co-channel interference and maximum speed constraints are adopted to formulate a hybrid resource optimization model for power allocation and trajectory planning, enabling UAVs to construct the fast transmission scheme for monitoring data with occupied spectrum within the given time. The original problem is an NP-hard and non-convex integer problem, which is first decomposed into a two-layer programming problem, and then solved by applying the slack variable and Successive Convex Approximation (SCA) technologies to transform the trajectory design problem into a convex programming problem. Compared with the Particle Swarm Optimization (PSO) algorithm, the proposed joint optimization scheme is verified to improve the spectral efficiency by up to about 19% in simulations. For high-dimensional trajectory planning problems, the SCA-based algorithm is proved to have lower complexity and faster convergence.
To solve the bottleneck problem of constrained spectrum resource for Unmanned Aerial Vehicles (UAVs) in unlicensed bands, a co-optimization scheme high spectral efficiency in underlay mechanism is proposed for UAV-assisted monitoring communication networks in urban environment. Considering the high maneuverability of UAVs, the air-to-ground channel is modeled as a probabilistic Line-of-Sight (LoS) channel, and the co-channel interference and maximum speed constraints are adopted to formulate a hybrid resource optimization model for power allocation and trajectory planning, enabling UAVs to construct the fast transmission scheme for monitoring data with occupied spectrum within the given time. The original problem is an NP-hard and non-convex integer problem, which is first decomposed into a two-layer programming problem, and then solved by applying the slack variable and Successive Convex Approximation (SCA) technologies to transform the trajectory design problem into a convex programming problem. Compared with the Particle Swarm Optimization (PSO) algorithm, the proposed joint optimization scheme is verified to improve the spectral efficiency by up to about 19% in simulations. For high-dimensional trajectory planning problems, the SCA-based algorithm is proved to have lower complexity and faster convergence.
2024, 46(11): 4295-4304.
doi: 10.11999/JEIT240201
Abstract:
The Multi-Model Gaussian Mixture-Probability Hypothesis Density (MM-GM-PHD) filter is widely used in uncertain maneuvering target tracking, but it does not maintain parallel estimates under different models, leading to the model-related likelihood lagging behind unknown target maneuvers. To solve this issue, a Joint Multi-Gaussian Mixture PHD (JMGM-PHD) filter is proposed and applied to bearings-only multi-target tracking in this paper. Firstly, a JMGM model is derived, where each single-target state estimate is described by a set of parallel Gaussian functions with model probabilities, and the probability of this state estimate is characterized by a nonegative weight. The weights, model-related probabilities, means and covariances are collectively called JMGM components. According to the Bayesian rule, the updating method of the JMGM components is derived. Then, the multi-target PHD is approximated using the JMGM model. According to the Interactive Multi-Model (IMM) rule, the interacting, prediction and estimation methods of the JMGM components are derived. When addressing Bearings-Only Tracking (BOT), a method based on the derivative rule for composite functions is derived to compute the linearized observation matrix of observers that simultaneously performs translations and rotations. The proposed JMGM-PHD filter preserves the form of regular single-model PHD filter but can adaptively track uncertain maneuvering targets. Simulations show that our algorithm overcomes the likelihood lag issue and outperforms the MM-GM-PHD filter in terms of tracking accuracy and computation cost.
The Multi-Model Gaussian Mixture-Probability Hypothesis Density (MM-GM-PHD) filter is widely used in uncertain maneuvering target tracking, but it does not maintain parallel estimates under different models, leading to the model-related likelihood lagging behind unknown target maneuvers. To solve this issue, a Joint Multi-Gaussian Mixture PHD (JMGM-PHD) filter is proposed and applied to bearings-only multi-target tracking in this paper. Firstly, a JMGM model is derived, where each single-target state estimate is described by a set of parallel Gaussian functions with model probabilities, and the probability of this state estimate is characterized by a nonegative weight. The weights, model-related probabilities, means and covariances are collectively called JMGM components. According to the Bayesian rule, the updating method of the JMGM components is derived. Then, the multi-target PHD is approximated using the JMGM model. According to the Interactive Multi-Model (IMM) rule, the interacting, prediction and estimation methods of the JMGM components are derived. When addressing Bearings-Only Tracking (BOT), a method based on the derivative rule for composite functions is derived to compute the linearized observation matrix of observers that simultaneously performs translations and rotations. The proposed JMGM-PHD filter preserves the form of regular single-model PHD filter but can adaptively track uncertain maneuvering targets. Simulations show that our algorithm overcomes the likelihood lag issue and outperforms the MM-GM-PHD filter in terms of tracking accuracy and computation cost.
2024, 46(11): 4305-4316.
doi: 10.11999/JEIT240242
Abstract:
Ground Penetrating Radar (GPR) is identified as a non-destructive method usable for the identification of underground targets. Existing methods often struggle with variable target sizes, complex image recognition, and precise target localization. To address these challenges, an innovative method is introduced that leverages a dual YOLOv8-pose model for the detection and precise localization of hyperbolic keypoint. This method, termed Dual YOLOv8-pose Keypoint Localization (DYKL), offers a sophisticated solution to the challenges inherent in GPR-based target identification and positioning. The proposed model architecture includes two stages: firstly, the YOLOv8-pose model is employed for the preliminary detection of GPR targets, adeptly identifying regions that are likely to contain these targets. Secondly, building upon the training weights established in the first phase, the model further hones the YOLOv8-pose network. This refinement is geared towards the precise detection of keypoints within the candidate target features, thereby facilitating the automated identification and exact localization of underground targets with enhanced accuracy. Through comparison with four advanced deep-learning models— Cascade Region-based Convolutional Neural Networks (Cascade R-CNN), Faster Region-based Convolutional Neural Networks (Faster R-CNN), Real-Time Models for object Detection (RTMDet), and You Only Look Once v7(YOLOv7-face), the proposed DYKL model exhibits an average recognition accuracy of 98.8%, surpassing these models. The results demonstrate the DYKL model’s high recognition accuracy and robustness, serving as a benchmark for the precise localization of subterranean targets.
Ground Penetrating Radar (GPR) is identified as a non-destructive method usable for the identification of underground targets. Existing methods often struggle with variable target sizes, complex image recognition, and precise target localization. To address these challenges, an innovative method is introduced that leverages a dual YOLOv8-pose model for the detection and precise localization of hyperbolic keypoint. This method, termed Dual YOLOv8-pose Keypoint Localization (DYKL), offers a sophisticated solution to the challenges inherent in GPR-based target identification and positioning. The proposed model architecture includes two stages: firstly, the YOLOv8-pose model is employed for the preliminary detection of GPR targets, adeptly identifying regions that are likely to contain these targets. Secondly, building upon the training weights established in the first phase, the model further hones the YOLOv8-pose network. This refinement is geared towards the precise detection of keypoints within the candidate target features, thereby facilitating the automated identification and exact localization of underground targets with enhanced accuracy. Through comparison with four advanced deep-learning models— Cascade Region-based Convolutional Neural Networks (Cascade R-CNN), Faster Region-based Convolutional Neural Networks (Faster R-CNN), Real-Time Models for object Detection (RTMDet), and You Only Look Once v7(YOLOv7-face), the proposed DYKL model exhibits an average recognition accuracy of 98.8%, surpassing these models. The results demonstrate the DYKL model’s high recognition accuracy and robustness, serving as a benchmark for the precise localization of subterranean targets.
2024, 46(11): 4317-4327.
doi: 10.11999/JEIT240286
Abstract:
A novel and effective information geometry-based method for detecting radar targets is proposed based on the theory of matrix information geometry. Due to the poor discriminative power between the target and the clutter on matrix manifold under complex heterogeneous clutter background with low Signal-to-Clutter Ratio (SCR), in this study, the problem of unsatisfactory performance for the conventional information geometry detector is considered, therefore, to address this issue, a manifold transformation-based information geometry detector is proposed. Concretely, a manifold-to-manifold mapping scheme is designed, and a joint optimization method based on the geometric distance between the Cell Under Test (CUT) and the clutter centroid is presented to enhance the discriminative power between the target and the clutter on the mapped manifold. Finally, the superior performance of the proposed method is evaluated using simulated and real clutter data. The results of simulated data show that the detection probability of the proposed method is over 60% when the SCR exceeds 1 dB. Meanwhile, the real data results confirm that the proposed method can achieve SCR improvement about 3~6 dB compared with the conventional information geometry detector.
A novel and effective information geometry-based method for detecting radar targets is proposed based on the theory of matrix information geometry. Due to the poor discriminative power between the target and the clutter on matrix manifold under complex heterogeneous clutter background with low Signal-to-Clutter Ratio (SCR), in this study, the problem of unsatisfactory performance for the conventional information geometry detector is considered, therefore, to address this issue, a manifold transformation-based information geometry detector is proposed. Concretely, a manifold-to-manifold mapping scheme is designed, and a joint optimization method based on the geometric distance between the Cell Under Test (CUT) and the clutter centroid is presented to enhance the discriminative power between the target and the clutter on the mapped manifold. Finally, the superior performance of the proposed method is evaluated using simulated and real clutter data. The results of simulated data show that the detection probability of the proposed method is over 60% when the SCR exceeds 1 dB. Meanwhile, the real data results confirm that the proposed method can achieve SCR improvement about 3~6 dB compared with the conventional information geometry detector.
2024, 46(11): 4328-4334.
doi: 10.11999/JEIT240188
Abstract:
The central symmetry based on the virtual array is a necessary fundamental assumption for the structure transformation of Uniform Circular Arrays (UCAs). In this paper, the virtual signal model for circular arrays is used to make an eigen analysis, and an efficient two-dimensional direction finding algorithm is proposed for arbitrary UCAs and Non Uniform Circular Arrays (NUCAs), where the structure transformation of linear arrays is avoided. As such, the Forward/Backward average of the Array Covariance Matrix (FBACM) and the sum-difference transformation method after separating the real and imaginary parts are both utilized to obtain the manifold and real-valued subspace with matching dimensions. Moreover, the linear relationship between the obtained real-valued subspace and the original complex-valued subspace is revealed, where the spatial spectrum is reconstructed without fake targets. The proposed method can be generalized to NUCAs, enhancing the adaptability of real-valued algorithms to circular array structures. Numerical simulations are applied to demonstrate that with significantly reduced complexity, the proposed method in this paper can provide similar performances and better angle resolution as compared to the traditional UCAs based on the mode-step. Meanwhile, the proposed method demonstrates high robustness with amplitude and phase errors in practical scenarios.
The central symmetry based on the virtual array is a necessary fundamental assumption for the structure transformation of Uniform Circular Arrays (UCAs). In this paper, the virtual signal model for circular arrays is used to make an eigen analysis, and an efficient two-dimensional direction finding algorithm is proposed for arbitrary UCAs and Non Uniform Circular Arrays (NUCAs), where the structure transformation of linear arrays is avoided. As such, the Forward/Backward average of the Array Covariance Matrix (FBACM) and the sum-difference transformation method after separating the real and imaginary parts are both utilized to obtain the manifold and real-valued subspace with matching dimensions. Moreover, the linear relationship between the obtained real-valued subspace and the original complex-valued subspace is revealed, where the spatial spectrum is reconstructed without fake targets. The proposed method can be generalized to NUCAs, enhancing the adaptability of real-valued algorithms to circular array structures. Numerical simulations are applied to demonstrate that with significantly reduced complexity, the proposed method in this paper can provide similar performances and better angle resolution as compared to the traditional UCAs based on the mode-step. Meanwhile, the proposed method demonstrates high robustness with amplitude and phase errors in practical scenarios.