Knowledge-guided Few-shot Earth Surface Anomalies Detection

JI Hong; GAO Zhi; CHEN Boan; AO Wei; CAO Min; WANG Qiao

doi:10.11999/JEIT251000

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

JI Hong, GAO Zhi, CHEN Boan, AO Wei, CAO Min, WANG Qiao. Knowledge-guided Few-shot Earth Surface Anomalies Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251000

Citation:

JI Hong, GAO Zhi, CHEN Boan, AO Wei, CAO Min, WANG Qiao. Knowledge-guided Few-shot Earth Surface Anomalies Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251000

JI Hong, GAO Zhi, CHEN Boan, AO Wei, CAO Min, WANG Qiao. Knowledge-guided Few-shot Earth Surface Anomalies Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251000

Citation:

JI Hong, GAO Zhi, CHEN Boan, AO Wei, CAO Min, WANG Qiao. Knowledge-guided Few-shot Earth Surface Anomalies Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251000

PDF( 6121 KB)

Knowledge-guided Few-shot Earth Surface Anomalies Detection

doi: 10.11999/JEIT251000 cstr: 32379.14.JEIT251000

JI Hong¹,
GAO Zhi^{2
,
,},
CHEN Boan²,
AO Wei²,
CAO Min³,
WANG Qiao⁴

1.
College of Geomatics, Xi'an University of Science and Technology, Xi’an 710054, China
2.
School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430072, China
3.
Wuhan Guanggu Zoyon Science and Technology Company Ltd., Wuhan 430223, China
4.
Beijing Normal University, Beijing 100091, China

Funds: The National Natural Science Foundation of China Major Program (42192580, 42192583), The National Natural Science Foundation of China (42501503)

Received Date: 2025-09-26
Accepted Date: 2025-11-05
Rev Recd Date: 2025-11-03

Available Online: 2025-11-13

Abstract

Abstract

Objective Earth Surface Anomalies (ESAs), defined as sudden natural or human-generated disruptions on the Earth’s surface, present severe risks and widespread effects. Timely and accurate ESA detection is therefore essential for public security and sustainable development. Remote sensing offers an effective approach for this task. However, current deep learning models remain limited due to the scarcity of labeled data, the complexity of anomalous backgrounds, and distribution shifts across multi-source remote sensing imagery. To address these issues, this paper proposes a knowledge-guided few-shot learning method. Large language models generate abstract textual descriptions of normal and anomalous geospatial features. These descriptions are encoded and fused with visual prototypes to construct a cross-modal joint representation. The integrated representation improves prototype discriminability in few-shot settings and demonstrates that linguistic knowledge strengthens ESA detection. The findings suggest a feasible direction for reliable disaster monitoring when annotated data are limited. Methods The knowledge-guided few-shot learning method is constructed on a metric-based paradigm in which each episode contains support and query sets, and classification is achieved by comparing query features with class prototypes through distance-based similarity and cross-entropy optimization (Fig. 1). To supplement limited visual prototypes, class-level textual descriptions are generated with ChatGPT through carefully designed prompts, producing semantic sentences that characterize the appearance, attributes, and contextual relations of normal and anomalous categories (Fig. 2, 3). These descriptions encode domain-specific properties such as anomaly extent, morphology, and environmental effect, which are otherwise difficult to capture when only a few visual samples are available. The sentences are encoded with a Contrastive Language–Image Pre-training (CLIP) text encoder, and task-adaptive soft prompts are introduced by generating tokens from support features and concatenating them with static embeddings to form adaptive word embeddings. Encoded sentence vectors are processed with a lightweight self-attention module to model dependencies across multiple descriptions and to obtain a coherent paragraph-level semantic representation (Fig. 4). The resulting semantic prototypes are fused with the visual prototypes through weighted addition to produce cross-modal prototypes that integrate visual grounding and linguistic abstraction. During training, query samples are compared with the cross-modal prototypes, and optimization is guided by two objectives: a classification loss that enforces accurate query–prototype alignment, and a prototype regularization loss that ensures semantic prototypes are discriminative and well separated. The entire method is implemented in an episodic training framework (Algorithm 1). Results and Discussions The proposed method is evaluated under both cross-domain and in-domain few-shot settings. In the cross-domain case, models are trained on NWPU45 or AID and tested on ESAD to assess ESAs recognition. As shown in the comparisons (Table 2), traditional meta-learning methods such as MAML and Meta-SGD reach accuracies below 50%, whereas metric-based baselines such as ProtoNet and RelationNet demonstrate greater stability but remain limited. The proposed method reaches 61.99% on the NWPU45→ESAD and 59.79% on the AID→ESAD settings, outperforming ProtoNet by 4.72% and 2.67% respectively. In the in-domain setting, where training and testing are conducted on the same dataset, the method reaches 76.94% on NWPU45 and 72.98% on AID, and consistently exceeds state-of-the-art baselines such as S2M2 and IDLN (Table 3). Ablation experiments further support the contribution of each component. Using only visual prototypes produces accuracies of 57.74% and 72.16%, and progressively incorporating simple class names, task-oriented templates, and ChatGPT-generated descriptions improves performance. The best accuracy is achieved by combining ChatGPT descriptions, learnable tokens, and an attention-based mechanism, reaching 61.99% and 76.94% (Table 4). Parameter sensitivity analysis shows that an appropriate weight for language features (α = 0.2) and the use of two learnable tokens yield optimal performance (Fig. 5). Conclusions This paper addresses ESAs detection in remote sensing imagery through a knowledge-guided few-shot learning method. The approach uses large language models to generate abstract textual descriptions for anomaly categories and conventional remote sensing scenes, thereby constructing multimodal training and testing resources. These descriptions are encoded into semantic feature vectors with a pretrained text encoder. To extract task-specific knowledge, a dynamic token learning strategy is developed in which a small number of learnable parameters are guided by visual samples within few-shot tasks to generate adaptive semantic vectors. An attention-based semantic knowledge module models dependencies among language features and produces cross-modal semantic vectors for each class. By fusing these vectors with visual prototypes, the method forms joint multimodal representations used for query–prototype matching and network optimization. Experimental evaluations show that the method effectively leverages prior knowledge contained in pretrained models, compensates for limited visual data, and improves feature discriminability for anomalies recognition. Both cross-domain and in-domain results confirm consistent gains over competitive baselines, highlighting the potential of the approach for reliable application in real-world remote sensing anomalies detection scenarios.
- Earth Surface Anomalies (ESAs),
- Remote sensing imagery,
- Few-shot learning,
- Language knowledge,
- Multimodal features

FullText(HTML)

References(37)

References

[1]	王桥. 地表异常遥感探测与即时诊断方法研究框架[J]. 测绘学报, 2022, 51(7): 1141–1152. doi: 10.11947/j.AGCS.2022.20220124. WANG Qiao. Research framework of remote sensing monitoring and real-time diagnosis of earth surface anomalies[J]. Acta Geodaetica et Cartographica Sinica, 2022, 51(7): 1141–1152. doi: 10.11947/j.AGCS.2022.20220124.
[2]	WEI Haishuo, JIA Kun, WANG Qiao, et al. Real-time remote sensing detection framework of the earth's surface anomalies based on a priori knowledge base[J]. International Journal of Applied Earth Observation and Geoinformation, 2023, 122: 103429. doi: 10.1016/j.jag.2023.103429.
[3]	高智, 胡傲涵, 陈泊安, 等. 多层级几何—语义融合的图神经网络地表异常检测框架[J]. 遥感学报, 2024, 28(7): 1760–1770. doi: 10.11834/jrs.20243301. GAO Zhi, HU Aohan, CHEN Boan, et al. A hierarchical geometry-to-semantic fusion GNN framework for earth surface anomalies detection[J]. National Remote Sensing Bulletin, 2024, 28(7): 1760–1770. doi: 10.11834/jrs.20243301.
[4]	刘思琪, 高智, 陈泊安, 等. 基于图网络的遥感地物关系表达与推理的地表异常检测[J]. 电子与信息学报, 2025, 47(6): 1690–1703. doi: 10.11999/JEIT240883. LIU Siqi, GAO Zhi, CHEN Boan, et al. Earth surface anomaly detection using graph neural network-based representation and reasoning of remote sensing geographic object relationships[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1690–1703. doi: 10.11999/JEIT240883.
[5]	ZHAO Chuanwu, PAN Yaozhong, WU Hanyi, et al. A novel spectral index for vegetation destruction event detection based on multispectral remote sensing imagery[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 11290–11309. doi: 10.1109/JSTARS.2024.3412737.
[6]	WU Hanyi, ZHAO Chuanwu, ZHU Yu, et al. A multiscale examination of heat health risk inequality and its drivers in mega-urban agglomeration: A case study in the Yangtze River Delta, China[J]. Journal of Cleaner Production, 2024, 458: 142528. doi: 10.1016/j.jclepro.2024.142528.
[7]	WEI Haishuo, JIA Kun, WANG Qiao, et al. A remote sensing index for the detection of multi-type water quality anomalies in complex geographical environments[J]. International Journal of Digital Earth, 2024, 17(1): 2313695. doi: 10.1080/17538947.2024.2313695.
[8]	ROY D P, JIN Y, LEWIS P E, et al. Prototyping a global algorithm for systematic fire-affected area mapping using MODIS time series data[J]. Remote Sensing of Environment, 2005, 97(2): 137–162. doi: 10.1016/j.rse.2005.04.007.
[9]	王立波, 高智, 王桥. 融合遥感指数协同推理的地表异常检测方法[J]. 电子与信息学报, 2025, 47(6): 1669–1678. doi: 10.11999/JEIT240882. WANG Libo, GAO Zhi, and WANG Qiao. A novel earth surface anomaly detection method based on collaborative reasoning of deep learning and remote sensing indexes[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1669–1678. doi: 10.11999/JEIT240882.
[10]	ZHANG Zilun, ZHAO Tiancheng, GUO Yulong, et al. RS5M and GeoRSCLIP: A large-scale vision- language dataset and a large vision-language model for remote sensing[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5642123. doi: 10.1109/TGRS.2024.3449154.
[11]	GE Junyao, ZHANG Xu, ZHENG Yang, et al. RSTeller: Scaling up visual language modeling in remote sensing with rich linguistic semantics from openly available data and large language models[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2025, 226: 146–163. doi: 10.1016/j.isprsjprs.2025.05.002.
[12]	ZHENG Zhuo, ZHONG Yanfei, WANG Junjue, et al. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters[J]. Remote Sensing of Environment, 2021, 265: 112636. doi: 10.1016/j.rse.2021.112636.
[13]	KYRKOU C and THEOCHARIDES T. EmergencyNet: Efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 1687–1699. doi: 10.1109/JSTARS.2020.2969809.
[14]	CHEN Boan, GAO Zhi, LI Ziyao, et al. Hierarchical GNN framework for earth’s surface anomaly detection in single satellite imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5627314. doi: 10.1109/TGRS.2024.3408330.
[15]	CHEN Weiyu, LIU Yencheng, KIRA Z, et al. A closer look at few-shot classification[C]. The 2019 International Conference on Learning Representations, New Orleans, USA, 2019: 1–16.
[16]	SNELL J, SWERSKY K, and ZEMEL R. Prototypical networks for few-shot learning[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 4080–4090.
[17]	FINN C, ABBEEL P, and LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]. The 34th International Conference on Machine Learning - Volume 70, Sydney, Australia, 2017: 1126–1135.
[18]	RADFORD A, KIM J, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]. The 38th International Conference on Machine Learning, Virtual Event, 2021: 8748–8763.
[19]	XU Jingyi and LE H. Generating representative samples for few-shot classification[C]. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 8993–9003. doi: 10.1109/CVPR52688.2022.00880.
[20]	ZHANG Baoquan, LI Xutao, YE Yunming, et al. Prototype completion with primitive knowledge for few-shot learning[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 3753–3761. doi: 10.1109/CVPR46437.2021.00375.
[21]	LIU Fan, CHEN Delong, GUAN Zhangqingyun, et al. RemoteCLIP: A vision language foundation model for remote sensing[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5622216. doi: 10.1109/TGRS.2024.3390838.
[22]	张永军, 李彦胜, 党博, 等. 多模态遥感基础大模型: 研究现状与未来展望[J]. 测绘学报, 2024, 53(10): 1942–1954. doi: 10.11947/j.AGCS.2024.20240019. ZHANG Yongjun, LI Yansheng, DANG Bo, et al. Multi-modal remote sensing large foundation models: Current research status and future prospect[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(10): 1942–1954. doi: 10.11947/j.AGCS.2024.20240019.
[23]	OpenAI. 隆重推出ChatGPT[EB/OL]. https://openai.com/blog/chatgpt, 2022.
[24]	GUPTA R, GOODMAN B, PATEL N, et al. Creating xBD: A dataset for assessing building damage from satellite imagery[C].The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, USA, 2019: 10–17.
[25]	RUDNER T G J, RUSSWURM M, FIL J, et al. Multi3Net: Segmenting flooded buildings via fusion of multiresolution, multisensor, and multitemporal satellite imagery[C]. The Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019, 33: 702–709. doi: 10.1609/aaai.v33i01.3301702.
[26]	曾超, 曹振宇, 苏凤环, 等. 四川及周边滑坡泥石流灾害高精度航空影像及解译数据集(2008–2020年)[J]. 中国科学数据, 2022, 7(2): 191–201. doi: 10.11922/noda.2021.0005.zh. ZENG Chao, CAO Zhenyu, SU Fenghuan, et al. A dataset of high-precision aerial imagery and interpretation of landslide and debris flow disaster in Sichuan and surrounding areas between 2008 and 2020[J]. China Scientific Data, 2022, 7(2): 191–201. doi: 10.11922/noda.2021.0005.zh.
[27]	CHENG Gong, HAN Junwei, and LU Xiaoqiang. Remote sensing image scene classification: Benchmark and state of the art[J]. Proceedings of the IEEE, 2017, 105(10): 1865–1883. doi: 10.1109/JPROC.2017.2675998.
[28]	LI Haifeng, CUI Zhenqi, ZHU Zhiqiang, et al. RS-MetaNet: Deep metametric learning for few-shot remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(8): 6983–6994. doi: 10.1109/TGRS.2020.3027387.
[29]	LI Lingjun, HAN Junwei, YAO Xiwen, et al. DLA-MatchNet for few-shot remote sensing image scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(9): 7844–7853. doi: 10.1109/TGRS.2020.3033336.
[30]	XIA Guisong, HU Jingwen, HU Fan, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965–3981. doi: 10.1109/TGRS.2017.2685945.
[31]	MANGLA P, SINGH M, SINHA A, et al. Charting the right manifold: Manifold mixup for few-shot learning[C]. The 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, 2020: 2207–2216. doi: 10.1109/WACV45572.2020.9093338.
[32]	NICHOL A, ACHIAM J, and SCHULMAN J. On first-order meta-learning algorithms[J]. arXiv preprint arXiv: 1803.02999, 2018. doi: 10.48550/arXiv.1803.02999.
[33]	VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3637–3645.
[34]	SUNG F, YANG Yongxin, ZHANG Li, et al. Learning to compare: Relation network for few-shot learning[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1199–1208. doi: 10.1109/CVPR.2018.00131.
[35]	LI Xiaomin, SHI Daqian, DIAO Xiaolei, et al. SCL-MLNet: Boosting few-shot remote sensing scene classification via self-supervised contrastive learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5801112. doi: 10.1109/TGRS.2021.3109268.
[36]	CHENG Gong, CAI Liming, LANG Chunbo, et al. SPNet: Siamese-prototype network for few-shot remote sensing image scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5608011. doi: 10.1109/TGRS.2021.3099033.
[37]	ZENG Qingjie, GENG Jie, JIANG Wen, et al. IDLN: Iterative distribution learning network for few-shot remote sensing image scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 8020505. doi: 10.1109/LGRS.2021.3109728.