PSAQNet：面向真实失真无参考图像质量评价的感知结构自适应质量网络

贾惠珍; 赵宇轩; 傅鹏; 王同罕

doi:10.11999/JEIT251220

PSAQNet：面向真实失真无参考图像质量评价的感知结构自适应质量网络

doi: 10.11999/JEIT251220 cstr: 32379.14.JEIT251220

1.
东华理工大学人工智能与信息工程学院南昌 330013
2.
南京理工大学计算机科学与工程学院南京 210094

基金项目: 国家自然科学基金(62266001, 62261001)

详细信息

作者简介:
贾惠珍：女，副教授，研究方向为计算机视觉、模式识别、图像质量评价

赵宇轩：男，在读硕士生，研究方向为图像质量评价、计算机视觉

傅鹏：男，副教授，研究方向为模式识别与图像处理

王同罕：男，副教授，研究方向为计算机视觉、模式识别、图像质量评价

通讯作者:
王同罕　thwang@ecut.edu.cn

中图分类号: TP391
计量
- 文章访问数: 99
- HTML全文浏览量: 31
- PDF下载量: 7
- 被引次数: 0
出版历程
- 修回日期: 2026-01-30
- 录用日期: 2026-01-30
- 网络出版日期: 2026-02-12

PSAQNet: A Perceptual Structure Adaptive Quality Network for Authentic Distortion Oriented No-reference Image Quality Assessment

1.
School of Artificial Intelligence and Information Engineering, East China University of Technology, Nanchang 330013, China
2.
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

Funds: National Natural Science Foundation of China (62266001, 62261001)

摘要

摘要: 针对无参考图像质量评价方法在真实场景中存在鲁棒性不够、泛化能力不足、几何结构建模欠缺的问题，该文提出一种基于感知结构自适应质量网络(Perceptual Structure-Adaptive Quality Network, PSAQNet)的无参考图像质量评价方法。首先，利用预训练 CNN 提取多尺度特征，并通过高级失真增强模块对多分支特征进行门控筛选与适配，突出与失真相关的区域、抑制无关干扰；其次，引入通道感知自适应核卷积与空间引导卷积，从通道重标定、自适应采样以及空间引导调制等角度增强对旋转、扭曲等几何退化的建模与对齐能力；接着，将增强后的多尺度卷积特征经自适应池化与投影转换为token序列，并通过交叉注意力机制与Transformer全局表示进行选择性交互，实现局部细节与全局语义的有效融合；最后，在融合过程中结合分组卷积注意力进一步强调失真显著区域，通过预测头回归得到图像质量分数。在六个经典的数据库上进行实验结果显示，PSAQNet在PLCC/SRCC等相关性指标上优于多种代表性无参考图像质量评价方法。尤其在复杂失真和跨数据库测试中展现出更强的鲁棒性与泛化能力。
- 图像质量评价 /
- 失真感知 /
- 几何建模 /
- 通道感知
Abstract: Objective No-reference image quality assessment (NR-IQA) is critical for practical imaging systems, especially when pristine references are unavailable. However, many existing methods face three main challenges: limited robustness under complex distortions, poor generalization when distortion distributions shift (e.g., from synthetic to real-world settings), and insufficient modeling of geometric/structural degradations, such as spatially varying blur, misalignments, and texture–structure coupling. These issues cause models to overfit dataset-specific statistics, leading to poor performance when confronted with diverse scenes and mixed degradations. To address these problems, we propose the Perceptual Structure-Adaptive Quality Network (PSAQNet), which aims to improve both the accuracy and adaptability of NR-IQA under complex distortion conditions. Methods PSAQNet is a unified CNN-Transformer framework designed to retain hierarchical perceptual cues while enabling global context reasoning. Instead of relying on late-stage pooling, it progressively enhances distortion evidence throughout the network. The core of PSAQNet includes several key components: the Advanced Distortion Enhanced Module (ADEM), which operates on multi-scale features from a pre-trained backbone and utilizes multi-branch gating along with a distortion-aware adapter to prioritize degradation-related signals while minimizing content-dominant interference. This module dynamically selects feature branches that align with perceptual degradation patterns, making it effective for handling spatially non-uniform or mixed distortions. To model geometric degradations, PSAQNet integrates Spatial-guided Convolution (SGC) and Channel-aware Adaptive Kernel Convolution (CA_AK), where SGC enhances spatial sensitivity by guiding convolutional responses with structure-aware cues, focusing on regions where geometric distortion is significant, while CA_AK refines geometric modeling by adjusting receptive behavior and recalibrating channels to preserve distortion-sensitive components. Additionally, PSAQNet incorporates efficient fusion techniques like GroupCBAM, which enables lightweight attention-based fusion of multi-level CNN features, and AttInjector, which selectively injects local distortion cues into global Transformer representations, ensuring that global semantic reasoning is directed by localized degradation evidence without causing redundancy or instability. Results and Discussions Comprehensive experiments on six benchmark datasets, including both synthetic and real-world distortions, demonstrate that PSAQNet achieves strong performance and stable agreement with human subjective judgments. PSAQNet outperforms several recent methods, especially on real-world distortion datasets. This indicates that PSAQNet effectively enhances distortion evidence, models geometric degradation, and selectively bridges local distortion cues with global semantic representations. These contributions enable PSAQNet to maintain robustness under distribution shifts and avoid over-reliance on narrow distortion priors. The ablation studies verify the contributions of individual modules. ADEM improves distortion saliency, SGC and CA_AK enhance geometric sensitivity, and GroupCBAM and AttInjector strengthen the synergy between local and global cues. Cross-dataset evaluations confirm PSAQNet's generalization capabilities across various content categories and distortion types. Scalability tests also show that PSAQNet benefits from stronger pre-trained models without compromising its modular design. Conclusions PSAQNet effectively addresses key limitations in NR-IQA by synergizing local distortion enhancement, geometric alignment, and global semantic fusion. Its modular design ensures robustness and generalization across diverse distortion types, providing a practical solution for real-world applications. Future work will explore vision-language pre-training to further enhance cross-scene adaptability.
- Image quality assessment /
- distortion perception /
- geometric modeling /
- channel perception

HTML全文

图 1 PSAQNet模型结构

下载: 全尺寸图片幻灯片

图 2 高级失真增强模块

下载: 全尺寸图片幻灯片

图 3 空间－通道自适应提取器

下载: 全尺寸图片幻灯片

图 4 空间引导卷积模块

下载: 全尺寸图片幻灯片

图 5 通道感知自适应核卷积

下载: 全尺寸图片幻灯片

图 6 注意力引导注入模块

下载: 全尺寸图片幻灯片

图 7 分组卷积块注意力模块

下载: 全尺寸图片幻灯片

表 1 本实验使用的六个数据集

Dataset	Dist images	Dist Types	Dataset type
LIVE	779	5	Synthetic
CSIQ	866	6	Synthetic
TID2013	3000	24	Synthetic
KADID-10K	10125	25	Synthetic
LIVEC	1162	-	Authentic
KONIQ-10K	10073	-	Authentic

下载: 导出CSV

表 2 与经典和最新的模型的对比结果

Method	CSIQ		TID2013		LIVE		KADID-10K		LIVEC		KONIQ-10K
Method	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC
TReS	0.942	0.922	0.883	0.863	0.968	0.969	0.858	0.859	0.877	0.846	0.928	0.915
LoDa	0.968	0.961	0.901	0.869	0.979	0.975	0.936	0.931	0.887	0.871	0.934	0.920
SaTQA	0.972	0.965	0.931	0.948	0.983	0.983	0.949	0.946	0.899	0.877	0.941	0.930
MDM-GFIQA	0.973	0.965	0.936	0.929	0.976	0.976	0.921	0.918	0.908	0.887	0.942	0.930
Ours	0.979	0.974	0.938	0.929	0.988	0.987	0.942	0.937	0.908	0.887	0.943	0.935

下载: 导出CSV

表 3 消融实验结果

Method	KonIQ-10k		TID2013
Method	PLCC	SRCC	PLCC	SRCC
baseline	0.934	0.920	0.901	0.869
baseline+DEM+CA_AK	0.936	0.925	0.924	0.922
baseline+DEM+SGC	0.934	0.927	0.925	0.920
baseline+CA_AK+SGC	0.936	0.924	0.922	0.919
baseline+DEM+CA_AK+SGC	0.938	0.927	0.932	0.925
baseline+DEM+CA_AK+ GroupCBAM	0.936	0.930	0.933	0.923
baseline+DEM+SGC+ GroupCBAM	0.939	0.931	0.930	0.926
PSAQNet	0.943	0.935	0.938	0.929

下载: 导出CSV

表 4 泛化性能测评

Train on	KONIQ-10K	KADID-10k	LIVE
Test on	LIVEC	KONIQ10K	CSIQ	TID2013
HyperIQA	0.785	0.648	0.744	0.551
TReS	0.786	0.606	0.761	0.562
LoDa	0.794	0.654	0.823	0.621
SaTQA	0.791	0.661	0.831	0.627
MDM-GFIQA	0.813	0.670	0.840	0.641
Ours	0.817	0.677	0.842	0.659

下载: 导出CSV

参考文献(30)

[1]	韩玉兰, 崔玉杰, 罗轶宏, 等. 基于密集残差和质量评估引导的频率分离生成对抗超分辨率重构网络[J]. 电子与信息学报, 2024, 46(12): 4563–4574. doi: 10.11999/JEIT240388. HAN Yulan, CUI Yujie, LUO Yihong, et al. Frequency separation generative adversarial super-resolution reconstruction network based on dense residual and quality assessment[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4563–4574. doi: 10.11999/JEIT240388.
[2]	柏园超, 刘文昌, 江俊君, 等. 深度神经网络图像压缩方法进展综述[J]. 电子与信息学报, 2025, 47(11): 4112–4128. doi: 10.11999/JEIT250567. BAI Yuanchao, LIU Wenchang, JIANG Junjun, et al. Advances in deep neural network based image compression: A survey[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4112–4128. doi: 10.11999/JEIT250567.
[3]	WANG Zhou, BOVIK A C, SHEIKH H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600–612. doi: 10.1109/TIP.2003.819861.
[4]	WANG Zhou, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment[C]. Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, Pacific Grove, USA, 2003: 1398–1402. doi: 10.1109/ACSSC.2003.1292216.
[5]	YANG Jie, LYU Mengjin, QI Zhiquan, et al. Deep learning based image quality assessment: A survey[J]. Procedia Computer Science, 2023, 221: 1000–1005. doi: 10.1016/j.procs.2023.08.080.
[6]	MOORTHY A K and BOVIK A C. Blind image quality assessment: From natural scene statistics to perceptual quality[J]. IEEE Transactions on Image Processing, 2011, 20(12): 3350–3364. doi: 10.1109/TIP.2011.2147325.
[7]	MITTAL A, MOORTHY A K, and BOVIK A C. No-reference image quality assessment in the spatial domain[J]. IEEE Transactions on Image Processing, 2012, 21(12): 4695–4708. doi: 10.1109/TIP.2012.2214050.
[8]	BOSSE S, MANIRY D, MÜLLER K R, et al. Deep neural networks for no-reference and full-reference image quality assessment[J]. IEEE Transactions on Image Processing, 2018, 27(1): 206–219. doi: 10.1109/TIP.2017.2760518.
[9]	ZHANG Lin, ZHANG Lei, and BOVIK A C. A feature-enriched completely blind image quality evaluator[J]. IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591. doi: 10.1109/TIP.2015.2426416.
[10]	ZHANG Weixia, MA Kede, YAN Jia, et al. Blind image quality assessment using a deep bilinear convolutional neural network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(1): 36–47. doi: 10.1109/TCSVT.2018.2886771.
[11]	KE Junjie, WANG Qifei, WANG Yilin, et al. MUSIQ: Multi-scale image quality transformer[C]. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 5128–5137. doi: 10.1109/ICCV48922.2021.00510.
[12]	CHEON M, YOON S J, KANG B, et al. Perceptual image quality assessment with transformers[C]. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, USA, 2021: 433–442. doi: 10.1109/CVPRW53098.2021.00054.
[13]	CHEN Zewen, QIN Haina, WANG Juan, et al. PromptIQA: Boosting the performance and generalization for no-reference image quality assessment via prompts[C]. Proceedings of the 18th European Conference on Computer Vision, Milan, Italy, 2024: 247–264. doi: 10.1007/978-3-031-73232-4_14.
[14]	ZHANG Bo, WANG Luoxi, ZHANG Cheng, et al. No-reference image quality assessment based on improved vision transformer and transfer learning[J]. Signal Processing: Image Communication, 2025, 135: 117282. doi: 10.1016/j.image.2025.117282.
[15]	郭颖聪, 唐天航, 刘怡光. 基于Transformer与权重令牌引导的双分支无参考图像质量评价网络[J]. 四川大学学报: 自然科学版, 2025, 62(4): 847–856. doi: 10.19907/j.0490-6756.240396. GUO Yingcong, TANG Tianhang, and LIU Yiguang. A dual-branch no-reference image quality assessment network guided by Transformer and a weight token[J]. Journal of Sichuan University: Natural Science Edition, 2025, 62(4): 847–856. doi: 10.19907/j.0490-6756.240396.
[16]	陈勇, 朱凯欣, 房昊, 等. 基于空间分布分析的混合失真无参考图像质量评价[J]. 电子与信息学报, 2020, 42(10): 2533–2540. doi: 10.11999/JEIT190721. CHEN Yong, ZHU Kaixin, FANG Hao, et al. No-reference image quality evaluation for multiply-distorted images based on spatial domain coding[J]. Journal of Electronics & Information Technology, 2020, 42(10): 2533–2540. doi: 10.11999/JEIT190721.
[17]	XU Kangmin, LIAO Liang, XIAO Jing, et al. Boosting image quality assessment through efficient transformer adaptation with local feature enhancement[C]. Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 2662–2672. doi: 10.1109/CVPR52733.2024.00257.
[18]	SHI Jinsong, GAO Pan, and QIN Jie. Transformer-based no-reference image quality assessment via supervised contrastive learning[C]. Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 4829–4837. doi: 10.1609/aaai.v38i5.28285.
[19]	SU Shaolin, YAN Qingsen, ZHU Yu, et al. Blindly assess image quality in the wild guided by a self-adaptive hyper network[C]. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 3664–3673. doi: 10.1109/CVPR42600.2020.00372.
[20]	HOSU V, LIN Hanhe, SZIRANYI T, et al. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment[J]. IEEE Transactions on Image Processing, 2020, 29: 4041–4056. doi: 10.1109/TIP.2020.2967829.
[21]	LI Aobo, WU Jinjian, LIU Yongxu, et al. Bridging the synthetic-to-authentic gap: Distortion-guided unsupervised domain adaptation for blind image quality assessment[C]. Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 28422–28431. doi: 10.1109/CVPR52733.2024.02685.
[22]	GU Liping, LI Tongyan, and HE Jiyong. Classification of diabetic retinopathy grade based on G-ENet convolutional neural network model: Convolutional neural networks are used to solve the problem of diabetic retinopathy grade classification[C]. Proceedings of 2023 7th International Conference on Electronic Information Technology and Computer Engineering, Xiamen, China, 2023: 1590–1594. doi: 10.1145/3650400.3650666.
[23]	LI Yuhao and ZHANG Aihua. AKA-MobileNet: A cloud-noise-robust lightweight convolution neural network[C]. Proceedings of 2024 39th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Dalian, China, 2024: 188–193. doi: 10.1109/YAC63405.2024.10598582.
[24]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. Proceedings of the 15th European Conference on Computer Vision, Munich: Springer, 2018: 3–19. doi: 10.1007/978-3-030-01234-2_1.
[25]	SHEIKH H R, BOVIK A C, and DE VECIANA G. An information fidelity criterion for image quality assessment using natural scene statistics[J]. IEEE Transactions on Image Processing, 2005, 14(12): 2117–2128. doi: 10.1109/TIP.2005.859389.
[26]	LARSON E C and CHANDLER D M. Most apparent distortion: Full-reference image quality assessment and the role of strategy[J]. Journal of Electronic Imaging, 2010, 19(1): 011006. doi: 10.1117/1.3267105.
[27]	PONOMARENKO N, JIN Lina, IEREMEIEV O, et al. Image database TID2013: Peculiarities, results and perspectives[J]. Signal Processing: Image Communication, 2015, 30: 57–77. doi: 10.1016/j.image.2014.10.009.
[28]	LIN Hanhe, HOSU V, and SAUPE D. KADID-10k: A large-scale artificially distorted IQA database[C]. Proceedings of 2019 11th International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 2019: 1–3. doi: 10.1109/QoMEX.2019.8743252.
[29]	GHADIYARAM D and BOVIK A C. Massive online crowdsourced study of subjective and objective picture quality[J]. IEEE Transactions on Image Processing, 2016, 25(1): 372–387. doi: 10.1109/TIP.2015.2500021.
[30]	ZHAO Yongcan, ZHANG Yinghao, XIA Tianfeng, et al. No-reference image quality assessment based on multi-scale dynamic modulation and degradation information[J]. Displays, 2026, 91: 103207. doi: 10.1016/j.displa.2025.103207.