2025, 47(11): 4112-4128.
doi: 10.11999/JEIT250567
Abstract:
Significance With the continuous advancement of information technology, digital images are evolving toward ultra-high-definition formats characterized by increased resolution, dynamic range, color depth, sampling rates, and multi-viewpoint support. In parallel, the rapid development of artificial intelligence is reshaping both the generation and application paradigms of digital imagery. As visual big data converges with AI technologies, the volume and diversity of image data expand exponentially, creating unprecedented challenges for storage and transmission. As a core technology in digital image processing, image compression reduces storage costs and bandwidth requirements by eliminating internal information redundancy, thereby serving as a fundamental enabler for visual big data applications. However, traditional image compression standards increasingly struggle to meet rising industrial demands due to limited modeling capacity, inadequate perceptual adaptability, and poor compatibility with machine vision tasks. Deep Neural Network (DNN)-based image compression methods, leveraging powerful modeling capabilities, end-to-end optimization mechanisms, and compatibility with both human perception and machine understanding, are progressively exceeding conventional coding approaches. These methods demonstrate clear advantages and broad potential across diverse application domains, drawing growing attention from both academia and industry. Progress This paper systematically reviews recent advances in DNN-based image compression from three core perspectives: signal fidelity, human visual perception, and machine analysis. First, in signal fidelity-oriented compression, the rate-distortion optimization framework is introduced, with detailed discussion of key components in lossy image compression, including nonlinear transforms, quantization strategies, entropy coding mechanisms, and variable-rate techniques for multi-rate adaptation. The synergistic design of these modules underpins the architecture of modern DNN-based image compression systems. Second, in perceptual quality-driven compression, the principles of joint rate-distortion-perception optimization models are examined, together with a comparative analysis of two major perceptual paradigms: Generative Adversarial Network (GAN)-based models and diffusion model–based approaches. Both strategies employ perceptual loss functions or generative modeling techniques to markedly improve the visual quality of reconstructed images, aligning them more closely with the characteristics of the human visual system. Finally, in machine analysis-oriented compression, a co-optimization framework for rate-distortion-accuracy trade-offs is presented, with semantic fidelity as the primary objective. From the perspective of integrating image compression with downstream machine analysis architectures, this section analyzes how current methods preserve essential semantic information that supports tasks such as object detection and semantic segmentation during the compression process. Conclusions DNN-based image compression shows strong potential across signal fidelity, human visual perception, and machine analysis. Through end-to-end jointly optimized neural network architectures, these methods provide comprehensive modeling of the encoding process and outperform traditional approaches in compression efficiency. By leveraging the probabilistic modeling and image generation capabilities of DNNs, they can accurately estimate distributional differences between reconstructed and original images, quantify perceptual losses, and generate high-quality reconstructions that align with human visual perception. Furthermore, their compatibility with mainstream image analysis frameworks enables the extraction of semantic features and the design of collaborative optimization strategies, allowing efficient compression tailored to machine vision tasks. Prospects Despite significant progress in compression performance, perceptual quality, and task adaptability, DNN-based image compression still faces critical technical challenges and practical limitations. First, computational complexity remains high. Most high-performance models rely on deep and sophisticated architectures (e.g., attention mechanisms and Transformer models), which enhance modeling capability but also introduce substantial computational overhead and long inference latency. These limitations are particularly problematic for deployment on mobile and embedded devices. Second, robustness and generalization continue to be major concerns. DNN-based compression models are sensitive to input perturbations and vulnerable to adversarial attacks, which can lead to severe reconstruction distortions or even complete failure. Moreover, while they perform well on training data and similar distributions, their performance often degrades markedly under cross-domain scenarios. Third, the evaluation framework for perceptual- and machine vision-oriented compression remains immature. Although new evaluation dimensions have been introduced, no unified and objective benchmark exists. This gap is especially evident in machine analysis-oriented compression, where downstream tasks vary widely and rely on different visual models. Therefore, comparability across methods is limited and consistent evaluation metrics are lacking, constraining both research and practical adoption. Overall, DNN-based image compression is in transition from laboratory research to real-world deployment. Although it demonstrates clear advantages over traditional approaches, further advances are needed in efficiency, robustness, generalization, and standardized evaluation protocols. Future research should strengthen the synergy between theoretical exploration and engineering implementation to accelerate widespread adoption and continued progress in areas such as multimedia communication, edge computing, and intelligent image sensing systems.