A Review of Causal Feature Learning in Deep Learning Image Classification Models

WANG Xiaodong; JIANG Ling; LI Huihui; WANG Buhong

doi:10.11999/JEIT250738

Volume 48 Issue 4

Apr. 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 > 48(4): 1569-1590

WANG Xiaodong, JIANG Ling, LI Huihui, WANG Buhong. A Review of Causal Feature Learning in Deep Learning Image Classification Models[J]. Journal of Electronics & Information Technology, 2026, 48(4): 1569-1590. doi: 10.11999/JEIT250738

Citation:

WANG Xiaodong, JIANG Ling, LI Huihui, WANG Buhong. A Review of Causal Feature Learning in Deep Learning Image Classification Models[J]. Journal of Electronics & Information Technology, 2026, 48(4): 1569-1590. doi: 10.11999/JEIT250738

Citation:

PDF( 2069 KB)

A Review of Causal Feature Learning in Deep Learning Image Classification Models

doi: 10.11999/JEIT250738 cstr: 32379.14.JEIT250738

1.
School of Artificial Intelligence and Computer Science, Xiamen University Tan Kah Kee College, Zhangzhou 363123, China
2.
School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, China
3.
School of Automation, Northwestern Polytechnical University, Xi’an 710129, China
4.
Information and Navigation School, Air Force Engineering University, Xi’an 710077, China
5.
Key Laboratory of Intelligent Manufacturing Equipment and Industrial Internet Technology, Fujian Provincial Universities, Zhangzhou 363123, China

Funds: The National Natural Science Foundation of China (62472437), The Natural Science Foundation of Fujian (2023J01035), The Natural Science Foundation of Xiamen (3502Z20227326)

Received Date: 2025-08-07
Accepted Date: 2026-02-13
Rev Recd Date: 2026-02-12

Available Online: 2026-03-07

Publish Date: 2026-04-10

Abstract

Abstract

Significance Deep learning is built on statistical correlations rather than causal relationships. Therefore, such models face major challenges in generalization, interpretability, and stability. Unlike human cognition, which mainly depends on causal discovery and use, current deep learning models remain at the bottom of the Pearl Causal Hierarchy (PCH). Therefore, integrating causal inference into deep learning has become a major research goal. As a core branch of deep learning, image classification models, represented by Convolutional Neural Networks (CNNs), show these limitations particularly clearly. Thus, causal inference is urgently needed to address this bottleneck. Among the available approaches for incorporating causal inference into these models, Causal Feature Learning (CFL), a framework that combines unsupervised machine learning with causal inference, shows clear advantages. Previous studies have confirmed that causal relationships are implicitly embedded in the pixel information of input images in image classification tasks. According to the Causal Coarsening Theorem (CCT), causal knowledge can be obtained from observed image data at low experimental cost. In classification tasks, the optimal solution is given by the Markov Boundary (MB) of the causal Bayesian network for the class variable. These theories strongly support efforts to connect deep image classification models with causal inference through CFL. Overall, the importance of CFL has become increasingly evident, and it is regarded as a promising breakthrough direction for next-generation models. Progress This paper provides a comprehensive review of CFL in deep learning image classification models from three core aspects: statistical causal inference theory, correlation analysis methods, and CFL implementations. First, the relevant definitions of CFL and its two mainstream statistical implementation frameworks are introduced, including causal discovery based on the Structural Causal Model (SCM) and causal effect estimation based on the Rubin Causal Model (RCM). Second, correlation analysis methods for deep learning image classification models, which are located at the threshold of the PCH, are systematically summarized from three perspectives: forward, backward, and horizontal. Third, with these auxiliary tools as a foundation, progress in CFL for image classification is classified into four main directions: causal Feature Discovery (CFD), Causal Feature Effect Estimation (CFEE), Causal Representation Learning (CRL), and Spurious Correlation Removal (SCR). CFD is based on the SCM framework and aims to derive confounding-free causal graphs through explicit or implicit causal intervention analysis of image data or models. Under the RCM framework, CFEE uses observed image data to quantitatively evaluate the causal effects of features, while addressing the lack of counterfactual samples and confounding bias. CRL focuses on selecting or extracting high-dimensional features from image data to learn causal relationships and identify low-dimensional cross-image representations. SCR removes non-causal features from images and preserves causal features through different methods. In addition, available toolkits, top conference resources, and academic organizations are listed. This paper also discusses key technical issues and future research directions. Conclusions This review summarizes the technological development of CFL. Overall, substantial progress has been made, although challenges remain in different research directions. CFD has the advantage of following the basic logic of causal theory, with clear and simple structures that are easy to understand. However, CFD still faces immature processing methods for high-dimensional image data and limited generalization ability. CFEE can effectively distinguish causal features from confounding features. Its evaluation results are closer to real decision-making logic and show strong general applicability. Common limitations of CFEE include the requirement for observable confounders, strong dependence on causal assumptions, and limited computational efficiency. CRL offers greater flexibility in representation dimensions and can identify causal factors that drive classification while excluding non-causal factors. Its main unresolved problems include generalization bias, factor coupling, prior dependence, weak evaluation, and high cost. SCR is highly targeted but has poor generalization ability. From a broader perspective, CFL should not be restricted to specific methods. Any method that aims to construct causal relationships from microvariables, such as image pixels, to causal macrovariables, such as global semantics, can be considered part of this field. Therefore, CFL remains an open research topic. Prospects The goal of causal inference is to move beyond correlation and clarify the causal relationships among variables by designing more rigorous experiments or using more advanced statistical methods. This requires deeper assumptions about feature relationships and broader exploration of underlying causal chains. Both remain highly challenging and are likely to become major focuses of future research in this field. To address the technical challenges in CFL, this paper proposes the following future directions: (1) unifying construction paradigms and establishing standards for image-based SCMs to improve the standardization and consistency of causal discovery; (2) developing RCM methods supported by generative artificial intelligence to address sample scarcity in causal effect estimation; (3) reforming models to learn new image causal representations, thereby fundamentally addressing the inherent limitations of CNNs in CFL; and (4) integrating spurious correlation analysis with reinforcement learning, and using reinforcement learning to equip deep learning image classification models with meta-learning capability for causal exploration. It can be expected that, once these key issues in CFL are resolved, the accuracy, generalization, interpretability, and stability of deep learning image classification models will improve substantially.
- Deep learning,
- Image classification models,
- Causal inference,
- Causal feature learning

FullText(HTML)

References(124)

References

[1]	NGUYEN A, YOSINSKI J, and CLUNE J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images[C]. IEEE Conference On Computer Vision and Pattern Recognition (CVPR), Boston, USA, 2015: 427–436. doi: 10.1109/CVPR.2015.7298640.
[2]	SCHÖLKOPF B. Causality for machine learning[M]. GEFFNER H, DECHTER R, and HALPERN J Y. Probabilistic and Causal Inference: The Works of Judea Pearl. New York: ACM, 2022: 765–804. doi: 10.1145/3501714.3501755.
[3]	PEARL J and MACKENZIE D. The book of why: The new science of cause and effect[J]. Science, 2018, 361(6405): 855. doi: 10.1126/science.aau9731.
[4]	李珊珊, 赵清杰, 朱文龙, 等. 引入因果发现学习的跨领域知识泛化方法[J]. 智能系统学报, 2025, 20(4): 1033–1045. doi: 10.11992/tis.202501005. LI Shanshan, ZHAO Qingjie, ZHU Wenlong, et al. Cross-domain knowledge generalization method introducing causal discovery learning[J]. CAAI Transactions on Intelligent Systems, 2025, 20(4): 1033–1045. doi: 10.11992/tis.202501005.
[5]	车翔玖, 武宇宁, 刘全乐. 基于因果特征学习的有权同构图分类算法[J]. 吉林大学学报: 工学版, 2025, 55(2): 681–686. doi: 10.13229/j.cnki.jdxbgxb.20230384. CHE Xiangjiu, WU Yuning, and LIU Quanle. A weighted isomorphic graph classification algorithm based on causal feature learning[J]. Journal of Jilin University: Engineering and Technology Edition, 2025, 55(2): 681–686. doi: 10.13229/j.cnki.jdxbgxb.20230384.
[6]	杨强, 范力欣, 朱军, 等. 可解释人工智能导论[M]. 北京: 电子工业出版社, 2022: 59–60. YANG Qiang, FAN Lixin, ZHU Jun, et al. Introduction to Explainable Artificial Intelligence[M]. Beijing: Publishing House of Electronics Industry, 2022: 59–60.
[7]	CHALUPKA K, EBERHARDT F, and PERONA P. Causal feature learning: An overview[J]. Behaviormetrika, 2017, 44(1): 137–164. doi: 10.1007/s41237-016-0008-2.
[8]	LOPEZ-PAZ D, NISHIHARA R, CHINTALA S, et al. Discovering causal signals in images[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 58–66. doi: 10.1109/CVPR.2017.14.
[9]	CHALUPKA K, PERONA P, and EBERHARDT F. Visual causal feature learning[C]. The 31st Conference on Uncertainty in Artificial Intelligence, Amsterdam, Netherlands, 2015: 181–190.
[10]	吴兴宇,江兵兵,吕胜飞,等.基于马尔科夫边界发现的因果特征选择算法综述[J].模式识别与人工智能,2022,35(05):422-438. doi: 10.16451/j.cnki.issn1003-6059.202205004. WU Xingyu, JIANG Bingbing , LÜ Shengfei , et al. A Survey on Causal Feature Selection Based on Markov Boundary Discovery[J], Pattern Recognition and Artificial Intelligence,2022,35(05):422-438. doi: 10.16451/j.cnki.issn1003-6059.202205004.
[11]	王增珍, 李君荣. 关联度分析及其与相关分析的比较[J]. 中国卫生统计, 1991, 8(6): 22–25. WANG Zengzhen and LI Junrong. Incidence degree analysis and its comparison with correlation analysis[J]. Chinese Journal of Health Statistics, 1991, 8(6): 22–25.
[12]	王晓东, 张盖群, 胡钰琪, 等. 深度学习分类模型解释图的对象相关性消融分析[J]. 厦门大学学报: 自然科学版, 2024, 63(3): 562–569. doi: 10.6043/j.issn.0438-0479.202206021. WANG Xiaodong, ZHANG Gaiqun, HU Yuqi, et al. Ablation based correspondence analysis of objects in deep learning interpretable heatmap[J]. Journal of Xiamen University: Natural Science, 2024, 63(3): 562–569. doi: 10.6043/j.issn.0438-0479.202206021.
[13]	JIN Zhijing, LIU Jiarui, LYU Zhiheng, et al. Can large language models infer causation from correlation?[C]. The 12th International Conference on Learning Representations, Vienna, Austria, 2024.
[14]	李家宁, 熊睿彬, 兰艳艳, 等. 因果机器学习的前沿进展综述[J]. 计算机研究与发展, 2023, 60(1): 59–84. doi: 10.7544/issn1000-1239.202110780. LI Jianing, XIONG Ruibin, LAN Yanyan, et al. Overview of the frontier progress of causal machine learning[J]. Journal of Computer Research and Development, 2023, 60(1): 59–84. doi: 10.7544/issn1000-1239.202110780.
[15]	WANG Tan, HUANG Jianqiang, ZHANG Hanwang, et al. Visual commonsense R-CNN[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10757–10767. doi: 10.1109/CVPR42600.2020.01077.
[16]	Anticoder. 因果推断--uplift model 评估[EB/OL]. https://zhuanlan.zhihu.com/p/343747851, 2025. Anticoder. Causal inference--uplift model assessment[EB/OL]. https://zhuanlan.zhihu.com/p/343747851, 2025.
[17]	CHEN Hang, DU Keqing, YANG Xinyu, et al. A review and roadmap of deep learning causal discovery in different variable paradigms[EB/OL]. https://doi.org/10.48550/arXiv.2209.06367, 2022.
[18]	LIU Yang, WEI Yushen, YAN Hong, et al. Causal reasoning meets visual representation learning: A prospective study[J]. Machine Intelligence Research, 2022, 19(6): 485–511. doi: 10.1007/s11633-022-1362-z.
[19]	胡志远, 高锦涛. 因果发现技术研究综述[J]. 计算机工程与应用, 2025, 61(24): 40–67. doi: 10.3778/j.issn.1002-8331.2501-0440. HU Zhiyuan and GAO Jintao. Review of research on causal discovery techniques[J]. Computer Engineering and Applications, 2025, 61(24): 40–67. doi: 10.3778/j.issn.1002-8331.2501-0440.
[20]	CHICHARRO D and PANZERI S. Algorithms of causal inference for the analysis of effective connectivity among brain regions[J]. Frontiers in Neuroinformatics, 2014, 8: 64. doi: 10.3389/fninf.2014.00064.
[21]	RAMSEY J, SPIRTES P, and ZHANG Jiji. Adjacency-faithfulness and conservative causal inference[C]. The 22nd Conference on Uncertainty in Artificial Intelligence, Cambridge, USA, 2006: 401–408.
[22]	KALISCH M and BÜHLMANN P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm[J]. The Journal of Machine Learning Research, 2007, 8: 613–636.
[23]	CHICKERING D M. Optimal structure identification with greedy search[J]. The Journal of Machine Learning Research, 2002, 3: 507–554. doi: 10.1162/153244303321897717.
[24]	WANG Lei, HUANG Shanshan, WANG Shu, et al. A survey of causal discovery based on functional causal model[J]. Engineering Applications of Artificial Intelligence, 2024, 133: 108258. doi: 10.1016/j.engappai.2024.108258.
[25]	Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies [J]. Journal of Educational Psychology, 1974, 66 (5): 688−701. doi: 10.1037/h0037350 doi: 10.1037/h0037350.
[26]	杨新新, 刘真, 卢思博, 等. 基于因果推断的推荐系统去偏研究综述[J]. 计算机学报, 2024, 47(10): 2307–2332. doi: 10.11897/SP.J.1016.2024.02307. YANG Xinxin, LIU Zhen, LU Sibo, et al. A survey on debiasing recommendation based on causal inference[J]. Chinese Journal of Computers, 2024, 47(10): 2307–2332. doi: 10.11897/SP.J.1016.2024.02307.
[27]	YAO Liuyi, CHU Zhixuan, LI Sheng, et al. A survey on causal inference[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 15(5): 74. doi: 10.1145/3444944.
[28]	丁梦远, 兰旭光, 彭茹, 等. 机器推理的进展与展望[J]. 模式识别与人工智能, 2021, 34(1): 1–13. doi: 10.16451/j.cnki.issn1003-6059.202101001. DING Mengyuan, LAN Xuguang, PENG Ru, et al. Progress and prospect of machine reasoning[J]. Pattern Recognition and Artificial Intelligence, 2021, 34(1): 1–13. doi: 10.16451/j.cnki.issn1003-6059.202101001.
[29]	OLAH C, CAMMARATA N, SCHUBERT L, et al. Zoom in: An introduction to circuits[J]. Distill, 2020, 5(3): e00024.001. doi: 10.23915/distill.00024.001.
[30]	ZEILER M D and FERGUS R. Visualizing and understanding convolutional networks[C]. 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 818–833. doi: 10.1007/978-3-319-10590-1_53.
[31]	PETSIUK V, DAS A, and SAENKO K. RISE: Randomized input sampling for explanation of black-box models[C]. The British Machine Vision Conference, Newcastle, UK, 2018: 151.
[32]	YUAN Hao, CAI Lei, HU Xia, et al. Interpreting image classifiers by generating discrete masks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(4): 2019–2030. doi: 10.1109/TPAMI.2020.3028783.
[33]	RIBEIRO M T, SINGH S, and GUESTRIN C. "Why should i trust you?": Explaining the predictions of any classifier[C]. The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, 2016: 1135–1144. doi: 10.1145/2939672.2939778.
[34]	FONG R C and VEDALDI A. Interpretable explanations of black boxes by meaningful perturbation[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 3449–3457. doi: 10.1109/ICCV.2017.371.
[35]	FONG R, PATRICK M, and VEDALDI A. Understanding deep networks via extremal perturbations and smooth masks[C]. IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 2950–2958. doi: 10.1109/iccv.2019.00304.
[36]	ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 2921–2929. doi: 10.1109/CVPR.2016.319.
[37]	LUNDBERG S M and LEE S I. A unified approach to interpreting model predictions[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 4768–4777.
[38]	NGUYEN T, DO K, NGUYEN D T, et al. Causal inference via style transfer for out-of-distribution generalisation[C]. The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, USA, 2023: 1746–1757. doi: 10.1145/3580305.3599270.
[39]	TAI Yan, FAN Weichen, ZHANG Zhao, et al. Link-context learning for multimodal LLMs[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 27166–27175. doi: 10.1109/CVPR52733.2024.02566.
[40]	ZHANG Congzhi, ZHANG Linhai, and ZHOU Deyu. Causal walk: Debiasing multi-hop fact verification with front-door adjustment[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024, 19533–19541. doi: 10.1609/aaai.v38i17.29925.
[41]	GOLDWASSER J and HOOKER G. Unifying image counterfactuals and feature attributions with latent-space adversarial attacks[EB/OL]. https://arxiv.org/abs/2504.15479, 2025.
[42]	SIMONYAN K, VEDALDI A, and ZISSERMAN A. Deep inside convolutional networks: Visualising image classification models and saliency maps[C]. 2nd International Conference on Learning Representations, Banff, Canada, 2014. doi: 10.5244/c.27.8.
[43]	EDELSON G S, SULLIVAN S F, ALSUP J M, et al. Deconvolutional beamforming for air and underwater acoustic sensor arrays[J]. The Journal of the Acoustical Society of America, 2003, 114(S4): 2367. doi: 10.1121/1.4777157.
[44]	SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: The all convolutional net[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[45]	BACH S, BINDER A, MONTAVON G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J]. PLoS One, 2015, 10(7): e0130140. doi: 10.1371/journal.pone.0130140.
[46]	MONTAVON G, LAPUSCHKIN S, BINDER A, et al. Explaining nonlinear classification decisions with deep Taylor decomposition[J]. Pattern Recognition, 2017, 65: 211–222. doi: 10.1016/j.patcog.2016.11.008.
[47]	KIM B, SEO J, JEON S, et al. Why are saliency maps noisy? Cause of and solution to noisy saliency maps[C]. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 2019: 4149–4157. doi: 10.1109/ICCVW.2019.00510.
[48]	SUNDARARAJAN M, TALY A, and YAN Qiqi. Axiomatic attribution for deep networks[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 3319–3328.
[49]	CHEN Xuexin, CAI Ruichu, HUANG Zhengting, et al. Feature attribution with necessity and sufficiency via dual-stage perturbation test for causal explanation[C]. The 41st International Conference on Machine Learning, Vienna, Austria, 2024: 250.
[50]	ZHU Zhiyu, CHEN Huaming, ZHANG Jiayu, et al. MFABA: A more faithful and accelerated boundary-based attribution method for deep neural networks[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 17228–17236. doi: 10.1609/aaai.v38i15.29669.
[51]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618–626. doi: 10.1109/ICCV.2017.74.
[52]	CHATTOPADHAY A, SARKAR A, HOWLADER P, et al. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks[C]. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, USA, 2018: 839–847. doi: 10.1109/WACV.2018.00097.
[53]	KINDERMANS P J, SCHÜTT K T, ALBER M, et al. Learning how to explain neural networks: PatternNet and PatternAttribution[C]. 6th International Conference on Learning Representations, Vancouver, Canada, 2018. doi: 10.48550/arXiv.1705.05598
[54]	SHEN Z , CUI P , LIU J ,et al.Stable Learning via Differentiated Variable Decorrelation[C/OL]. The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.ACM, 2020:2185-2193. doi: 10.1145/3394486.3403269.
[55]	KIM B, WATTENBERG M, GILMER J, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV)[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 2668–2677.
[56]	GHORBANI A, WEXLER J, ZOU J, et al. Towards automatic concept-based explanations[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 832. doi: 10.48550/arXiv.1902.03129.
[57]	ZHANG Ruihan, MADUMAL P, MILLER T, et al. Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors[C/OL]. Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 11682–11690. doi: 10.1609/aaai.v35i13.17389.
[58]	FU Jianlong, ZHENG Heliang, and MEI Tao. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4476–4484. doi: 10.1109/CVPR.2017.476.
[59]	赵小阳, 李仲年, 王文玉, 等. ADIC: 一种面向可解释图像识别的自适应解纠缠CNN分类器[J]. 计算机研究与发展, 2023, 60(8): 1754–1767. doi: 10.7544/issn1000-1239.202330231. ZHAO Xiaoyang, LI Zhongnian, WANG Wenyu, et al. ADIC: An adaptive disentangled CNN classifier for interpretable image recognition[J]. Journal of Computer Research and Development, 2023, 60(8): 1754–1767. doi: 10.7544/issn1000-1239.202330231.
[60]	WANG Xiaodong and ZHANG Gaiqun. Feature-granularity-based spurious correlation and causal analysis in CNNs exploring feature granularity to uncover spurious correlations[C]. The 2024 2nd International Conference on Internet of Things and Cloud Computing Technology, Paris, France, 2024: 336–341. doi: 10.1145/3702879.3702937.
[61]	望止洋. 因果推理初探(4)——干预[EB/OL]. https://zhuanlan.zhihu.com/p/111340526, 2025. WANG Zhiyang. A preliminary study on causal reasoning (4)——intervention[EB/OL]. https://zhuanlan.zhihu.com/p/111340526, 2025.
[62]	张盖群. 模型无关的深度学习因果特征相关分析研究[D]. [硕士论文], 厦门大学, 2025. ZHANG Gaiqun. Research on model-independent deep learning causal feature correlation analysis[D]. [Master dissertation], Xiamen University, 2025.
[63]	YANG Mengyue, LIU Furui, CHEN Zhitang, et al. CausalVAE: Disentangled representation learning via neural structural causal models[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 9588–9597. doi: 10.1109/CVPR46437.2021.00947.
[64]	WANG Dong, YANG Yuewei, CHEN Liqun, et al. Proactive pseudo-intervention: Pre-informed contrastive learning for interpretable vision models[C]. The 1st AAAI Bridge Program on AI for Medicine and Healthcare, Pennsylvania, USA, 2025: 20–34.
[65]	LI Xin, ZHANG Zhizheng, WEI Guoqiang, et al. Confounder identification-free causal visual feature learning[EB/OL]. https://arxiv.org/abs/2111.13420, 2021.
[66]	YUE Zhongqi, ZHANG Hanwang, SUN Qianru, et al. Interventional few-shot learning[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 230.
[67]	TANG Kaihua, HUANG Jianqiang, and ZHANG Hanwang. Long-tailed classification by keeping the good and removing the bad momentum causal effect[J]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 128.
[68]	李祥宁, 潘晨, 何灵敏. 基于因果不变表示的领域泛化算法[J]. 中国计量大学学报, 2024, 35(2): 297–308. doi: 10.3969/j.issn.2096-2835.2024.02.013. LI Xiangning, PAN Chen, and HE Lingmin. A domain generalization algorithm based on causal invariant representation[J]. Journal of China Jiliang University, 2024, 35(2): 297–308. doi: 10.3969/j.issn.2096-2835.2024.02.013.
[69]	梁天飚, 刘天元, 汪俊亮, 等. 因果推理引导的复杂花纹织物缺陷视觉检测深度学习方法[J]. 中国科学: 技术科学, 2023, 53(7): 1138–1149. doi: 10.1360/SST-2022-0432. LIANG Tianbiao, LIU Tianyuan, WANG Junliang, et al. Causal inference-guided deep learning method for vision-based defect detection of complex patterned fabrics[J]. Scientia Sinica Technologica, 2023, 53(7): 1138–1149. doi: 10.1360/SST-2022-0432.
[70]	YANG C H H, HUNG D I T, LIU Y C, et al. Treatment learning causal transformer for noisy image classification[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 6128–6139. doi: 10.1109/WACV56688.2023.00608.
[71]	MORGAN. Counterfactuals and causal inference[M]. New York: Cambridge University Press, 2007:140-187.
[72]	GU X S and ROSENBAUM P R. Comparison of multivariate matching methods: Structures, distances, and algorithms[J]. Journal of Computational and Graphical Statistics, 1993, 2(4): 405–420. doi: 10.1080/10618600.1993.10474623.
[73]	ROSENBAUM P R and RUBIN D B. The central role of the propensity score in observational studies for causal effects[J]. Biometrika, 1983, 70(1): 41–55. doi: 10.1093/biomet/70.1.41.
[74]	HAINMUELLER J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies[J]. Political Analysis, 2012, 20(1): 25–46. doi: 10.1093/pan/mpr025.
[75]	KUANG Kun, CUI Peng, LI Bo, et al. Estimating treatment effect in the wild via differentiated confounder balancing[C]. The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017: 265–274. doi: 10.1145/3097983.3098032.
[76]	SHEN Zheyan, CUI Peng, KUANG Kun, et al. On image classification: Correlation v. s. causality[EB/OL]. https://arxiv.org/abs/1708.06656v1, 2017.
[77]	鲍庆森. 深度学习驱动的因果效应评估及因果表征学习研究[D]. [硕士论文], 南京邮电大学, 2023. BAO Qingsen. Deep learning-driven causal effect estimation and causal representation learning[D]. [Master dissertation], Nanjing University of Posts and Telecommunications, 2023.
[78]	ARJOVSKY M, BOTTOU L, GULRAJANI I, et al. Invariant risk minimization[EB/OL]. https://arxiv.org/abs/1907.02893, 2019.
[79]	XU Zhengyi, JIANG Wen, and GENG Jie. Texture-aware causal feature extraction network for multimodal remote sensing data classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5103512. doi: 10.1109/TGRS.2024.3368091.
[80]	LV Fangrui, LIANG Jian, LI Shuang, et al. Causality inspired representation learning for domain generalization[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 8036–8046. doi: 10.1109/CVPR52688.2022.00788.
[81]	XIA K and BAREINBOIM E. Neural causal abstractions[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 20585–20595. doi: 10.1609/aaai.v38i18.30044.
[82]	黄珊珊, 王元浩, 龚志黎, 等. 基于因果表征学习的可控图像生成(英文)[J]. 信息与电子工程前沿(英文), 2024, 25(1): 135–148. doi: 10.1631/FITEE.2300303. HUANG Shanshan, WANG Yuanhao, GONG Zhili, et al. Controllable image generation based on causal representation learning[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(1): 135–148. doi: 10.1631/FITEE.2300303.
[83]	ACHILLE A and SOATTO S. Emergence of invariance and disentanglement in deep representations[J]. The Journal of Machine Learning Research, 2018, 19(1): 1947–1980. doi: 10.1109/ITA.2018.8503149.
[84]	ELAD A, HAVIV D, BLAU Y, et al. Direct validation of the information bottleneck principle for deep nets[C]. The IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea, 2019: 758–762. doi: 10.1109/ICCVW.2019.00099.
[85]	KIM J, LEE B K, and RO Y M. Distilling robust and non-robust features in adversarial examples by information bottleneck[C/OL]. The 35th International Conference on Neural Information Processing Systems, 2021: 1311. doi: 10.5555/3540261.3541572
[86]	GEIRHOS R, RUBISCH P, MICHAELIS C, et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness[C]. 7th International Conference on Learning Representations, New Orleans, USA, 2019.
[87]	杨朋波, 桑基韬, 张彪, 等. 面向图像分类的深度模型可解释性研究综述[J]. 软件学报, 2023, 34(1): 230–254. doi: 10.13328/j.cnki.jos.006415. YANG Pengbo, SANG Jitao, ZHANG Biao, et al. Survey on interpretability of deep models for image classification[J]. Journal of Software, 2023, 34(1): 230–254. doi: 10.13328/j.cnki.jos.006415.
[88]	DING P, VANDERWEELE T J, and ROBINS J M. Instrumental variables as bias amplifiers with general outcome and confounding[J]. Biometrika, 2017, 104(2): 291-302. doi: 10.1093/biomet/asx009.
[89]	SRIVASTAVA M, HASHIMOTO T, and LIANG P. Robustness to spurious correlations via human annotations[C/OL]. The 37th International Conference on Machine Learning, 2020: 845.
[90]	NAM J, KIM J, LEE J, et al. Spread spurious attribute: Improving worst-group accuracy with spurious attribute estimation[C/OL]. The 10th International Conference on Learning Representations, 2022.
[91]	PULI A M, JOSHI N, HE E, et al. Nuisances via negativa: Adjusting for spurious correlations via data augmentation[C]. The 11th International Conference on Learning Representations, Kigali, Rwanda, 2023.
[92]	YAO Huaxiu, WANG Yu, LI Sai, et al. Improving out-of-distribution robustness via selective augmentation[C]. The 39th International Conference on Machine Learning, Baltimore, USA, 2022: 25407–25437.
[93]	WU S, YUKSEKGONUL M, ZHANG Linjun, et al. Discover and cure: Concept-aware mitigation of spurious correlation[C]. The 40th International Conference on Machine Learning, Honolulu, USA, 2023: 1574.
[94]	SHALIT U, JOHANSSON F D, and SONTAG D. Estimating individual treatment effect: Generalization bounds and algorithms[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 3076–3085.
[95]	HASSANPOUR N and GREINER R. CounterFactual regression with importance sampling weights[C]. The 28th International Joint Conference on Artificial Intelligence, Macao, China, 2019: 5880–5887. doi: 10.24963/ijcai.2019/815.
[96]	ZHANG Xingxuan, CUI Peng, XU Renzhe, et al. Deep stable learning for out-of-distribution generalization[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 5368–5378, doi: 10.1109/CVPR46437.2021.00533.
[97]	LIU Mingzhou, LEE C W, SUN Xinwei, et al. Learning causal alignment for reliable disease diagnosis[C]. The 13th International Conference on Learning Representations, Singapore, Singapore, 2025: 1–19.
[98]	郭礼华, 王广飞. 基于任务感知关系网络的少样本图像分类[J]. 电子与信息学报, 2024, 46(3): 977–985. doi: 10.11999/JEIT230162. GUO Lihua and WANG Guangfei. Few-shot Image classification based on task-aware relation network[J]. Journal of Electronics & Information Technology, 2024, 46(3): 977–985. doi: 10.11999/JEIT230162.
[99]	LEVY D, CARMON Y, DUCHI J C, et al. Large-scale methods for distributionally robust optimization[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 742.
[100]	YIN Mingzhang, WANG Yixin, and BLEI D M. Optimization-based causal estimation from heterogeneous environments[J]. The Journal of Machine Learning Research, 2024, 25(1): 168.
[101]	SAGAWA S, KOH P W, HASHIMOTO T B, et al. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization[C]. The International Conference on Learning Representations, 2020. doi: 10.48550/arXiv.1911.08731.
[102]	BAHNG H, CHUN S, YUN S, et al. Learning de-biased representations with biased representations[C]. The 37th International Conference on Machine Learning, Vienna, Austria, 2020: 50.
[103]	KIM N, HWANG S, AHN S, et al. Learning debiased classifier with biased committee[C]. The 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 1337.
[104]	LI Kunpeng, WU Ziyan, PENG Kuanchuan, et al. Tell me where to look: Guided attention inference network[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 9215–9223. doi: 10.1109/CVPR.2018.00960.
[105]	罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8): 1306–1318. doi: 10.16383/j.aas.2017.c160425. LUO Jianhao and WU Jianxin. A survey on fine-grained image categorization UsingDeep convolutional features[J]. Acta Automatica Sinica, 2017, 43(8): 1306–1318. doi: 10.16383/j.aas.2017.c160425.
[106]	申志军, 穆丽娜, 高静, 等. 细粒度图像分类综述[J]. 计算机应用, 2023, 43(1): 51–60. doi: 10.11772/j.issn.1001-9081.2021122090. SHEN Zijun, MU Lina, GAO Jing, et al. Review of fine-grained image categorization[J]. Journal of Computer Applications, 2023, 43(1): 51–60. doi: 10.11772/j.issn.1001-9081.2021122090.
[107]	钟玲, 王添娇. 细粒度图像分类研究方法综述[J]. 信息记录材料, 2024, 25(7): 57–61. doi: 10.16009/j.cnki.cn13-1295/tq.2024.07.039. ZHONG Ling and WANG Tianjiao. A review of research methods for fine-grained image classification[J]. Information Recording Materials, 2024, 25(7): 57–61. doi: 10.16009/j.cnki.cn13-1295/tq.2024.07.039.
[108]	BICA I, ALAA A M, JORDON J, et al. Estimating counterfactual treatment outcomes over time through adversarially balanced representations[C]. The 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
[109]	LI Zongyu, GUO Xiaobo, and QIANG Siwei. A survey of deep causal models and their industrial applications[J]. Artificial Intelligence Review, 2024, 57(11): 298. doi: 10. 1007/s10462-024-10886-0.
[110]	SCHöLKOPF B, LOCATELLO F, BAUER S, et al. Toward causal representation learning[J]. Proceedings of the IEEE, 2021, 109(5): 612–634.
[111]	TANG Wenbo, SHIN J D, and JADHAV S P. Geometric transformation of cognitive maps for generalization across hippocampal-prefrontal circuits[J]. Cell Reports, 2023, 42(3): 112246.
[112]	DZANZA R and KABASO B. A survey on causal representation learning techniques to extract causal features for causal machine learning model building[C]. ICT for Intelligent Systems, Singapore, Singapore, 2024: 107–117. doi: 10.1007/978-981-97-5810-4_10.
[113]	CUI Peng and ATHEY S. Stable learning establishes some common ground between causal inference and machine learning[J]. Nature Machine Intelligence, 2022, 4(2): 110–115.
[114]	ZHANG Tianren, ZHAO Chujie, CHEN Guanyu, et al. Feature contamination: Neural networks learn uncorrelated features and fail to generalize[C]. The 41st International Conference on Machine Learning, Vienna, Austria, 2024: 2502.
[115]	赵凤, 耿苗苗, 刘汉强, 等. 卷积神经网络与视觉Transformer联合驱动的跨层多尺度融合网络高光谱图像分类方法[J]. 电子与信息学报, 2024, 46(5): 2237–2248. ZHAO Feng, GENG Miaomiao, LIU Hanqiang, et al. Convolutional neural network and vision Transformer-driven cross-layer multi-scale fusion network for hyperspectral image classification[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2237–2248.
[116]	KARAGODIN N, POLYANSKIY Y, and RIGOLLET P. Clustering in causal attention masking[C]. The 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 3673.
[117]	李哲, 王可, 王彪, 等. 人机融合智能决策: 概念、框架与应用[J]. 电子与信息学报, 2025, 47(10): 3439–3464. LI Zhe, WANG Ke, WANG Biao, et al. Human-machine fusion intelligent decision-making: Concepts, framework, and applications[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3439–3464.
[118]	PACHETTI E and COLANTONIO S. A systematic review of few-shot learning in medical imaging[J]. Artificial Intelligence in Medicine, 2024, 156: 102949. doi: 10.1016/j.artmed.2024.102949.
[119]	FALLER P M, VANKADARA L C, MASTAKOURI A A, et al. Self-compatibility: Evaluating causal discovery without ground truth[C]. The 27th International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 2024: 4132–4140.
[120]	LIU Xiaoyu, XU Paiheng, WU Junda, et al. Large language models and causal inference in collaboration: a comprehensive survey[C]. Findings of the Association for Computational Linguistics, Albuquerque, New Mexico, 2025: 7668–7684. doi: 10.18653/v1/2025.findings-naacl.427.
[121]	YANG Linying, CLIVIO O, SHIRVAIKAR V, et al. A critical review of causal reasoning benchmarks for large language models[C]. AAAI 2024 Workshop on "Are Large Language Models Simply Causal Parrots?", Vancouver, Canada, 2024.
[122]	JIN Zhijing, CHEN Yuen, LEEB F, et al. CLADDER: Assessing causal reasoning in language models[C]. The 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 31038–31065.
[123]	王逸豪, 黄敬英, 范勤勤. 基于因果模型和多模态多目标优化的两阶段特征选择方法[J]. 陕西师范大学学报: 自然科学版, 2023, 51(5): 25–34. doi: 10.15983/j.cnki.jsnu.2023023. WANG Yihao, HUANG Jingying, and FAN Qinqin. A two-stage feature selection method based on causal model and multimodal multi-objective optimization[J]. Journal of Shaanxi Normal University: Natural Science Edition, 2023, 51(5): 25–34. doi: 10.15983/j.cnki.jsnu.2023023.
[124]	SUTTON R. The bitter lesson[EB/OL]. http://www.incompleteideas.net/IncIdeas/BitterLesson.html, 2025.