稀疏视角下基于几何一致性的神经辐射场卫星城市场景渲染与数字表面模型生成

孙文博; 高智; 张依晨; 朱军; 李衍璋; 路遥

doi:10.11999/JEIT240898

稀疏视角下基于几何一致性的神经辐射场卫星城市场景渲染与数字表面模型生成

doi: 10.11999/JEIT240898 cstr: 32379.14.JEIT240898

孙文博¹,
高智^1, ,,
张依晨¹,
朱军²,
李衍璋²,
路遥³

1.
武汉大学遥感信息工程学院武汉 430072
2.
航天东方红卫星有限公司北京 100094
3.
北京市遥感信息研究所北京 100011

基金项目: 民用航天项目(D040103)

详细信息

作者简介:
孙文博：男，博士生，研究方向为遥感影像智能解译

高智：男，教授，研究方向为遥感影像智能解译、计算机视觉

张依晨：男，硕士生，研究方向为遥感影像智能解译

朱军：男，研究员，研究方向为遥感影像智能解译

李衍璋：男，工程师，研究方向为遥感影像智能解译

路遥：男，助理研究员，研究方向为遥感影像智能解译

通讯作者:
高智　gaozhinus@gmail.com

中图分类号: TN911.6;TP391.4
计量
- 文章访问数: 311
- HTML全文浏览量: 216
- PDF下载量: 36
- 被引次数: 0
出版历程
- 收稿日期: 2024-10-22
- 修回日期: 2025-04-14
- 网络出版日期: 2025-04-30
- 刊出日期: 2025-06-30

Geometrically Consistent Based Neural Radiance Field for Satellite City Scene Rendering and Digital Surface Model Generation in Sparse Viewpoints

1.
School of Remote Sensing Information Engineering, Wuhan University, Wuhan 430072, China
2.
DFH Satellite Co., Ltd., Beijing 100094, China
3.
Beijing Institute of Remote Sensing Information, Beijing 100000, China

Funds: Civilian Space Project (D040103)

摘要

摘要: 卫星遥感提供了全球、连续、多尺度的地表观测能力。近年来，神经辐射场(NeRF)因其连续渲染和隐式重建特性，在自动驾驶与大场景重建等领域表现出良好鲁棒性，受到广泛关注。然而，NeRF在卫星对地观测中的应用效果有限，主要因其训练需大量多视角图像，而卫星影像获取受限。在视角稀疏时，模型易对训练视角过拟合，导致新视角下性能下降。针对上述问题，该文提出一种新的方法，通过在NeRF的训练过程中引入场景深度与表面法线的几何约束，旨在提升在稀疏视角条件下的渲染与数字表面模型(DSM)生成能力。通过在DFC2019数据集上进行广泛实验，验证了所提出方法的有效性。实验结果表明，采用几何约束的NeRF模型在稀疏视角条件下的新视角合成和DSM生成任务上均取得了领先的结果，显示出其在稀疏视角条件下卫星观测场景中的应用潜力。
- 卫星对地观测 /
- 神经辐射场 /
- 场景渲染 /
- 数字表面模型
Abstract: Objective Satellite-based Earth observation enables global, continuous, multi-scale, and multi-dimensional surface monitoring through diverse remote sensing techniques. Recent progress in 3D modelling and rendering has seen widespread adoption of Neural Radiance Fields (NeRF), owing to their continuous-view synthesis and implicit geometry representation. Although NeRF performs robustly in areas such as autonomous driving and large-scale scene reconstruction, its direct application to satellite observation scenarios remains limited. This limitation arises primarily from the nature of satellite imaging, which often lacks the tens or hundreds of viewpoints typically required for NeRF training. Under sparse-view conditions, NeRF tends to overfit the available training perspectives, leading to poor generalization to novel viewpoints. Methods To address the performance limitations of NeRF under sparse-view conditions, this study proposes an approach that introduces geometric constraints on scene depth and surface normals during model training. These constraints are designed to compensate for the lack of prior knowledge inherent in sparse-view satellite imagery and to improve rendering and DSM generation. The approach leverages the importance of scene geometry in both novel view synthesis and DSM generation, particularly in accurately representing spatial structures through DSMs. To mitigate the degradation in NeRF performance under limited viewpoint conditions, the geometric relationships between scene depth and surface normals are formulated as loss functions. These functions enforce consistency between estimated depth and surface orientation, enabling the model to learn more reliable geometric features despite limited input data. The proposed constraints guide the model toward generating geometrically coherent and realistic scene reconstructions. Results and Discussions The proposed method is evaluated on the DFC2019 dataset to assess its effectiveness in novel view synthesis and DSM generation under sparse-view conditions. Experimental results demonstrate that the NeRF model with geometric constraints achieves superior performance across both tasks, confirming its applicability to satellite observation scenarios with limited viewpoints. For novel view synthesis, model performance is assessed using 2, 3, and 5 input images. The proposed method consistently outperforms existing approaches across all configurations. In the JAX 004 scene, Peak Signal-to-Noise Ratio (PSNR) values of 21.365 dB, 21.619 dB, and 23.681 dB are achieved under the 2-view, 3-view, and 5-view settings, respectively. Moreover, the method exhibits the smallest degradation in PSNR and Structural Similarity Index (SSIM) as the number of training views decreases, indicating greater robustness under sparse input conditions. Qualitative results further confirm that the method yields sharper and more detailed renderings across all view configurations. For DSM generation, the proposed method achieves comparable or better performance relative to other NeRF-based approaches in most test scenarios. In the JAX 004 scene, Mean Absolute Error (MAE) values of 2.414 m, 2.198 m, and 1.602 m are obtained under the 2-view, 3-view, and 5-view settings, respectively. Qualitative assessments show that the generated DSMs exhibit clearer structural boundaries and finer geometric details compared to those produced by baseline methods. Conclusions Incorporating geometric consistency constraints between scene depth and surface normals enhances the model’s ability to capture the spatial structure of objects in satellite imagery. The proposed method achieves state-of-the-art performance in both novel view synthesis and DSM generation tasks under sparse-view conditions, outperforming both NeRF-based and traditional Multi-View Stereo (MVS) approaches.
- Satellite-based earth observation /
- Neural Radiance Fields (NeRF) /
- Scene rendering /
- Digital surface model

HTML全文

图 1 本文所使用的基线模型

下载: 全尺寸图片幻灯片

图 2 各种方法在JAX068场景的新视角合成结果

下载: 全尺寸图片幻灯片

图 3 各种方法在JAX068场景的DSM生成结果

下载: 全尺寸图片幻灯片

表 1 DFC2019数据集具体细节

	JAX 004	JAX 068	JAX 214	JAX 260
训练/测试样本数量	9/2	17/2	21/3	15/2
高程范围(m)	[–24, 1]	[–27. 30]	[–29, 73]	[–30, 13]

下载: 导出CSV

表 2 各种方法在不同AOI上的实验结果表格

(a)JAX004实验结果
训练视角数量	PSNR			SSIM			Altitude MAE (m)
训练视角数量	2	3	5	2	3	5	2	3	5
NeRF^[1]	12.168	13.376	14.279	0.421	0.474	0.493	5.273	5.137	4.493
S-NeRF^[2]	15.867	16.972	18.687	0.691	0.717	0.757	4.513	4.337	4.001
Sat-NeRF^[3]	20.619	20.831	22.186	0.794	0.794	0.814	2.917	2.907	2.738
基线模型^[29]	20.614	20.943	22.167	0.786	0.781	0.804	2.614	2.437	1.986
本文算法	21.365	21.619	23.681	0.801	0.812	0.848	2.414	2.198	1.602
S2P^[29]	–	–	–	–	–	–	1.932	1.804	1.437
(b) JAX068实验结果
训练视角数量	PSNR			SSIM			Altitude MAE (m)
训练视角数量	2	3	5	2	3	5	2	3	5
NeRF^[1]	7.329	7.845	8.772	0.411	0.42	0.456	9.132	6.467	4.545
S-NeRF^[2]	15.874	17.129	20.191	0.781	0.799	0.816	6.179	4.777	3.826
Sat-NeRF^[3]	16.536	18.425	21.215	0.803	0.810	0.843	5.842	4.387	3.411
基线模型^[29]	18.473	18.431	22.017	0.800	0.831	0.843	4.539	3.048	2.619
本文算法	19.837	19.493	23.271	0.807	0.833	0.843	3.316	2.531	2.249
S2P^[29]	–	–	–	–	–	–	2.883	2.314	2.032
(c) JAX214实验结果
训练视角数量	PSNR			SSIM			Altitude MAE (m)
训练视角数量	2	3	5	2	3	5	2	3	5
NeRF^[1]	10.892	12.543	12.632	0.421	0.433	0.487	9.964	9.176	8.721
S-NeRF^[2]	12.816	12.997	14.573	0.557	0.571	0.599	8.142	8.157	7.134
Sat-NeRF^[3]	15.834	16.214	17.691	0.804	0.806	0.814	5.101	5.147	4.721
基线模型^[29]	16.117	16.924	18.871	0.817	0.863	0.840	4.617	4.394	3.466
本文算法	18.991	20.505	21.317	0.820	0.867	0.831	3.217	2.496	2.408
S2P^[29]	–	–	–	–	–	–	3.014	2.837	2.414
(d) JAX260实验结果
训练视角数量	PSNR			SSIM			Altitude MAE (m)
训练视角数量	2	3	5	2	3	5	2	3	5
NeRF^[1]	8.762	8.712	10.753	0.318	0.346	0.384	7.816	7.591	6.622
S-NeRF^[2]	11.259	11.573	13.836	0.543	0.551	0.654	6.165	6.189	5.432
Sat-NeRF^[3]	16.007	15.715	16.314	0.641	0.644	0.694	3.857	3.423	2.995
基线模型^[29]	16.179	16.197	18.423	0.688	0.689	0.735	3.386	3.214	2.887
本文算法	16.728	17.150	19.934	0.690	0.690	0.742	3.016	2.926	2.119
S2P^[29]	–	–	–	–	–	–	3.423	2.873	2.317

下载: 导出CSV

表 3 本文使用各种损失函数在JAX068场景中的实验结果表格

训练视角数量	PSNR			SSIM			Altitude MAE (m)
训练视角数量	2	3	5	2	3	5	2	3	5
基线模型	18.473	18.431	22.017	0.800	0.831	0.843	4.539	3.048	2.619
基线模型+${L_{{\text{DS}}}}$	18.857	19.011	22.719	0.796	0.831	0.837	3.817	2.764	2.418
基线模型+${L_{{\text{GC}}}}$	19.017	19.082	22.947	0.804	0.831	0.840	3.782	2.716	2.298
基线模型+${L_{{\text{full}}}}$	19.837	19.493	23.271	0.807	0.833	0.843	3.316	2.531	2.249

下载: 导出CSV

参考文献(30)

[1]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99–106. doi: 10.1145/3503250.
[2]	DERKSEN D and IZZO D. Shadow neural radiance fields for multi-view satellite photogrammetry[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 1152–1161. doi: 10.1109/CVPRW53098.2021.00126.
[3]	MARÍ R, FACCIOLO G, and EHRET T. Sat-NeRF: Learning multi-view satellite photogrammetry with transient objects and shadow modeling using rpc cameras[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1310–1320. doi: 10.1109/CVPRW56347.2022.00137.
[4]	LE SAUX B, YOKOYA N, HANSCH R, et al. 2019 Data fusion contest [technical committees][J]. IEEE Geoscience and Remote Sensing Magazine, 2019, 7(1): 103–105. doi: 10.1109/MGRS.2019.2893783.
[5]	TANCIK M, MILDENHALL B, WANG T, et al. Learned initializations for optimizing coordinate-based neural representations[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2846–2855. doi: 10.1109/CVPR46437.2021.00287.
[6]	JAIN A, TANCIK M, and ABBEEL P. Putting NeRF on a diet: Semantically consistent few-shot view synthesis[C]. The IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 5865–5874. doi: 10.1109/ICCV48922.2021.00583.
[7]	HIRSCHMULLER H. Accurate and efficient stereo processing by semi-global matching and mutual information[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005: 807–814. doi: 10.1109/CVPR.2005.56.
[8]	D'ANGELO P and KUSCHK G. Dense multi-view stereo from satellite imagery[C]. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 2012: 6944–6947. doi: 10.1109/IGARSS.2012.6352565.
[9]	BEYER R A, ALEXANDROV O, and MCMICHAEL S. The ames stereo pipeline: NASA’s open source software for deriving and processing terrain data[J]. Earth and Space Science, 2018, 5(9): 537–548. doi: 10.1029/2018EA000409.
[10]	DE FRANCHIS C, MEINHARDT-LLOPIS E, MICHEL J, et al. An automatic and modular stereo pipeline for pushbroom images[C]. The ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Zürich, Switzerland, 2014: 49–56. doi: 10.5194/isprsannals-ii-3-49-2014.
[11]	GONG K and FRITSCH D. DSM generation from high resolution multi-view stereo satellite imagery[J]. Photogrammetric Engineering & Remote Sensing, 2019, 85(5): 379–387. doi: 10.14358/PERS.85.5.379.
[12]	HIRSCHMULLER H. Stereo processing by semiglobal matching and mutual information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(2): 328–341. doi: 10.1109/TPAMI.2007.1166.
[13]	YAO Yao, LUO Zixin, LI Shiwei, et al. MVSNet: Depth inference for unstructured multi-view stereo[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 785–801. doi: 10.1007/978-3-030-01237-3_47.
[14]	GU Xiaodong, FAN Zhiwen, ZHU Siyu, et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 2492–2501. doi: 10.1109/CVPR42600.2020.00257.
[15]	CHENG Shuo, XU Zexiang, ZHU Shilin, et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 2521–2531. doi: 10.1109/CVPR42600.2020.00260.
[16]	YANG Jiayu, MAO Wei, ALVAREZ J M, et al. Cost volume pyramid based depth inference for multi-view stereo[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 4876–4885. doi: 10.1109/CVPR42600.2020.00493.
[17]	GAO Jian, LIU Jin, and JI Shunping. A general deep learning based framework for 3D reconstruction from multi-view stereo satellite images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195: 446–461. doi: 10.1016/j.isprsjprs.2022.12.012.
[18]	MARÍ R, FACCIOLO G, and EHRET T. Multi-date earth observation NeRF: The detail is in the shadows[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 2035–2045. doi: 10.1109/CVPRW59228.2023.00197.
[19]	SUN Wenbo, LU Yao, ZHANG Yichen, et al. Neural radiance fields for multi-view satellite photogrammetry leveraging intrinsic decomposition[C]. The IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 2024: 4631–4635. doi: 10.1109/IGARSS53475.2024.10641455.
[20]	YU A, YE V, TANCIK M, et al. pixelNeRF: Neural radiance fields from one or few images[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 4576–4585. doi: 10.1109/CVPR46437.2021.00455.
[21]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. Proceedings of the 9th International Conference on Learning Representations, 2021.
[22]	XU Dejia, JIANG Yifan, WANG Peihao, et al. SinNeRF: Training neural radiance fields on complex scenes from a single image[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 736–753. doi: 10.1007/978-3-031-20047-2_42.
[23]	AMIR S, GANDELSMAN Y, BAGON S, et al. Deep vit features as dense visual descriptors[J]. arXiv preprint arXiv: 2112.05814, 2021.
[24]	DENG Kangle, LIU A, ZHU Junyan, et al. Depth-supervised NeRF: Fewer views and faster training for free[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 12872–12881. doi: 10.1109/CVPR52688.2022.01254.
[25]	TRUONG P, RAKOTOSAONA M J, MANHARDT F, et al. SPARF: Neural radiance fields from sparse and noisy poses[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 4190–4200. doi: 10.1109/CVPR52729.2023.00408.
[26]	WANG Guangcong, CHEN Zhaoxi, LOY C C, et al. SparseNeRF: Distilling depth ranking for few-shot novel view synthesis[C]. The IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 9031–9042. doi: 10.1109/ICCV51070.2023.00832.
[27]	SHI Ruoxi, WEI Xinyue, WANG Cheng, et al. ZeroRF: Fast sparse view 360° reconstruction with zero pretraining[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 21114–21124. doi: 10.1109/CVPR52733.2024.01995.
[28]	ZHANG Lulin and RUPNIK E. SparseSat-NeRF: Dense depth supervised neural radiance fields for sparse satellite images[J]. arXiv preprint arXiv: 2309.00277, 2023.
[29]	FACCIOLO G, DE FRANCHIS C, and MEINHARDT-LLOPIS E. Automatic 3D reconstruction from multi-date satellite images[C]. The IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, 2017: 1542–1551. doi: 10.1109/CVPRW.2017.198.
[30]	DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.