Vegetation Height Prediction Dataset Oriented to Mountainous Forest Areas
-
摘要: 植被高度是刻画森林垂直结构、碳储量和生态系统功能的重要参数,在生态学、气候变化和生物地理等领域具有广泛应用。随着人工智能尤其是大模型技术的发展,森林生态研究对大规模、标准化训练数据的需求愈加迫切。然而,目前公开数据仍缺乏覆盖广区域、统一规范的林冠高度预测数据集,限制了先进智能方法的应用。为此,该文构建了面向山地森林区域的植被高度预测数据集(VHP-Dataset),融合多光谱遥感影像、数字高程模型(DEM)、植被覆盖度和覆盖类型等多源数据,以全球生态系统动力学调查(GEDI)冠层高度为目标变量,形成18维输入特征。该文介绍了数据集的构建流程,并通过空间分布、模型验证和特征重要性分析等实验进行评估。结果表明,VHP-Dataset能够有效支持监督学习建模,在多地貌、多区域的植被高度预测中展现出良好的科学性与适用性,为森林结构反演提供了标准化训练样本支撑。Abstract:
Objective Vegetation height is a key ecological parameter that reflects forest vertical structure, biomass, ecosystem functions, and biodiversity. Existing open-source vegetation height datasets are often sparse, unstable, and poorly suited to mountainous forest regions, which limits their utility for large-scale modeling. This study constructs the Vegetation Height Prediction Dataset (VHP-Dataset) to provide a standardized large-scale training resource that integrates multi-source remote sensing features and supports supervised learning tasks for vegetation height estimation. Methods The VHP-Dataset is constructed by integrating Landsat 8 multispectral imagery, the digital elevation model AW3D30 (ALOS World 3D, 30 m), land cover data CGLS-LC100 (Copernicus Global Land Service, Land Cover 100 m), and tree canopy cover data GFCC30TC (Global Forest Canopy Cover 30 m Tree Canopy). Canopy height from GEDI L2A (Global Ecosystem Dynamics Investigation, Level 2A) footprints is used as the target variable. A total of 18 input features is extracted, covering spatial location, spectral reflectance, topographic structure, vegetation indices, and vegetation cover information (Table 3, Fig. 4). For model validation, five representative approaches are applied: Extremely Randomized Trees (ExtraTree), Random Forest (RF), Artificial Neural Network (ANN), Broad Learning System (BLS), and Transformer. Model performance is assessed using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Standard Deviation (SD), and Coefficient of Determination (R2). Results and Discussions The experimental results show that the VHP-Dataset supports stable vegetation height prediction across regions and terrain conditions, which reflects its scientific validity and practical applicability. Model comparisons indicate that ExtraTree achieves the best performance in most regions, and Transformer performs well in specific areas, which confirms that the dataset is compatible with different approaches ( Table 5 ). Stratified analyses show that prediction errors increase under high canopy cover and steep slope conditions, and predictions remain more stable at higher elevations (Fig. 6 ~9 ). These findings indicate that the dataset captures the effects of complex topography and canopy structure on model accuracy. Feature importance analysis shows that spatial location, topographic factors, and canopy cover indices are the primary drivers of prediction accuracy, while spectral and land cover information provide complementary contributions (Fig. 10 ).Conclusions The results show that the VHP-Dataset supports vegetation height prediction across regions and terrain types, which reflects its scientific validity and applicability. The dataset enables robust predictions with traditional machine learning methods such as tree-based models, and it also provides a foundation for deep learning approaches such as Transformers, which reflects broad methodological compatibility. Stratified analyses based on vegetation cover and terrain show the effects of complex canopy structures and topographic factors on prediction accuracy, and feature importance analysis identifies spatial location, topographic attributes, and canopy cover indices as the primary drivers. Overall, the VHP-Dataset fills the gap in large-scale high-quality datasets for vegetation height prediction in mountainous forests and provides a standardized benchmark for cross-regional model evaluation and comparison. This offers value for research on vegetation height prediction and forest ecosystem monitoring. -
表 1 研究数据基本属性
产品名称 覆盖范围 坐标系 数据类型 用途 GEDI L2A 51.6°N$ \sim $51.6°S WGS84 激光雷达光斑观测产品 提供冠层高度、地面高程及垂直结构参数,作为目标变量。 Landsat 8 全球 WGS84 多光谱与热红外遥感影像 提取光谱反射波段和归一化指数,用于表征植被生长状态。 AW3D30 84°N$ \sim $84°S WGS84 DEM产品 提供高程和坡度等地形特征,支持复杂地形条件下建模。 CGLS-LC100 全球 WGS84 全球土地覆盖产品 提供地表覆盖类型,辅助冠层高度差异化建模。 GFCC30TC 全球 WGS84 森林覆盖度数据 提供0%$ \sim $100%植被覆盖百分比,刻画森林密度和连续性。 表 2 GEDI L2A有效光斑筛选条件
筛选条件 说明 sensitivity>0.9 Sensitivity为信号灵敏度,为获取质量较高的光斑数据,通常在陆地区域采用不小于0.9的灵敏度阈值,
因此剔除了灵敏度低于0.9的光斑。quality_flag=1 quality_flag为质量标记值,当质量标记值为“1”时,表明该波形在能量、灵敏度、振幅以及实时地表跟踪
质量等方面达到了设定标准,可视为有效信号数据。degrade_flag=0 degrade_flag为退化状态,当状态退化标志为“1”时,说明指向或定位精度下降。为确保数据质量,
剔除了标志值为1的光斑,仅保留标志为0的有效光斑数据。beam>4 beam表示光束,GEDI L2A数据共包含8个波束通道,其中第1至第4波束为覆盖波束,第5$ \sim $8波束为全功率波束。
相较之下,覆盖波束在穿透高密度森林冠层时存在一定局限,仅能在树冠覆盖率不超过95%的林地条件下实现有效穿透。
因此,为提升数据穿透能力与精度,通常优先采用全功率波束进行分析。Solar_elevation<0 Solar_elevation是太阳高度角,表示太阳位于地平线以下,代表获取夜间的GEDI数据。 表 3 数据集特征与目标变量
数据源 类型 特征 目标变量 GEDI L2A 空间信息 经度和纬度 RH95
冠层高度Landsat 8 光谱波段 Band 2,Band 3,Band 4,Band 5,Band 6,Band 7 归一化指数 NDVI, NDWI, NDII, NDGI, NBR AW3D30 地形结构 高程,坡度,和坡向 CGLS-LC100 植被类型 植被覆盖类型 GFCC30TC 植被指数 植被覆盖指数 表 4 Landsat-8光谱波段基本信息
波段 名称 波长(μm) 分辨率(m) 作用 Band 2 蓝光波段 0.450$ \sim $0.515 30 用于水体识别和大气校正。 Band 3 绿光波段 0.525$ \sim $0.600 30 用于植被监测和土地覆盖分类等。 Band 4 红光波段 0.630$ \sim $0.680 30 用于植被反射和农作物健康监测等。 Band 5 近红外波段 0.845$ \sim $0.885 30 用于对植被高度敏感,用于植被指数计算。 Band 6 短波红外1 1.560$ \sim $1.660 30 用于土壤湿度、云识别和地质调查。 Band 7 短波红外2 2.100$ \sim $2.300 30 用于地表特征分类、植被区分和热异常检测。 表 5 VHP-Dataset在五种代表性方法下的评估结果
研究区域 方法 评估指标 RMSE(m) MAE(m) SD(m) R2 美国GP森林 Transformer 8.114 6.249 8.065 0.594 ANN 8.399 6.566 8.384 0.572 RF 9.036 7.215 8.984 0.508 BLS 8.512 6.683 8.512 0.557 ExtraTree 8.116 6.263 3.335 0.597 美国GWJ-M森林 Transformer 6.443 4.971 6.398 0.409 ANN 6.761 5.238 6.695 0.350 RF 6.815 5.325 6.735 0.341 BLS 6.616 5.139 6.616 0.375 ExtraTree 6.351 4.883 5.311 0.417 新西兰ENP森林 Transformer 5.866 4.347 4.041 0.308 ANN 5.430 4.403 5.332 0.321 RF 4.852 3.850 4.534 0.460 BLS 4.852 3.760 4.852 0.441 ExtraTree 4.755 3.731 2.931 0.482 德国TF森林 Transformer 7.872 5.709 4.743 0.478 ANN 7.002 5.424 6.888 0.537 RF 6.595 5.114 5.942 0.584 BLS 6.460 4.864 6.460 0.601 ExtraTree 6.217 4.606 5.166 0.638 -
[1] MALAMBO L and POPESCU S. Mapping vegetation canopy height across the contiguous United States using ICESat-2 and ancillary datasets[J]. Remote Sensing of Environment, 2024, 309: 114226. doi: 10.1016/j.rse.2024.114226. [2] ADRAH E, WONG J P, and YIN He. Integrating GEDI, Sentinel-2, and Sentinel-1 imagery for tree crops mapping[J]. Remote Sensing of Environment, 2025, 319: 114644. doi: 10.1016/j.rse.2025.114644. [3] CHANG Bingtao, XIONG Hao, LI Yuan, et al. ALCSF: An adaptive and anti-noise filtering method for extracting ground and top of canopy from ICESat-2 LiDAR data along single tracks[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2024, 215: 80–98. doi: 10.1016/j.isprsjprs.2024.07.002. [4] BEAUDOIN A, BERNIER P Y, VILLEMAIRE P, et al. Tracking forest attributes across Canada between 2001 and 2011 using a k nearest neighbors mapping approach applied to MODIS imagery[J]. Canadian Journal of Forest Research, 2018, 48(1): 85–93. doi: 10.1139/cjfr-2017-0184. [5] PICKSTONE B J, GRAHAM H A, and CUNLIFFE A M. Estimating canopy height in tropical forests: Integrating airborne LiDAR and multi-spectral optical data with machine learning[J]. Sustainable Environment, 2025, 11(1): 2469406. doi: 10.1080/27658511.2025.2469406. [6] BENHALIMA N, OUARZEDDINE M, SOUISSI B, et al. Integrating PolInSAR and GEDI data with machine learning for forest canopy height predicting in Pongara National Park, Gabon[J]. International Journal of Remote Sensing, 2025, 46(18): 6875–6896. doi: 10.1080/01431161.2025.2549131. [7] POURSHAMSI M, XIA Junshi, YOKOYA N, et al. Tropical forest canopy height estimation from combined polarimetric SAR and LiDAR using machine-learning[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 172: 79–94. doi: 10.1016/j.isprsjprs.2020.11.008. [8] LANG N, KALISCHEK N, ARMSTON J, et al. Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles[J]. Remote Sensing of Environment, 2022, 268: 112760. doi: 10.1016/j.rse.2021.112760. [9] TOLAN J, YANG H I, NOSARZEWSKI B, et al. Very high resolution canopy height maps from RGB imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar[J]. Remote Sensing of Environment, 2024, 300: 113888. doi: 10.1016/j.rse.2023.113888. [10] LANG N, JETZ W, SCHINDLER K, et al. A high-resolution canopy height model of the Earth[J]. Nature Ecology & Evolution, 2023, 7(11): 1778–1789. doi: 10.1038/s41559-023-02206-6. [11] LEI Yuqi, WANG Yuanjia, WANG Guilong, et al. Estimating forest canopy height based on GEDI lidar data and multi-source remote sensing images[C]. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Changsha, China, 2024: 297–303. doi: 10.5194/isprs-archives-XLVIII-1-2024-297-2024. [12] 钱亚冠, 孔亚鑫, 陈科成, 等. 利用频谱衰减增强深度神经网络对抗迁移攻击[J]. 电子与信息学报, 2025, 47(10): 3847–3857. doi: 10.11999/JEIT250157.QIAN Yaguan, KONG Yaxin, CHEN Kecheng, et al. Adversarial transferability attack on deep neural networks through spectral coefficient decay[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3847–3857. doi: 10.11999/JEIT250157. [13] DEMIR I, KOPERSKI K, LINDENBAUM D, et al. DeepGlobe 2018: A challenge to parse the earth through satellite images[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018: 172–181. doi: 10.1109/CVPRW.2018.00031. [14] CHEN Hao and SHI Zhenwei. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection[J]. Remote Sensing, 2020, 12(10): 1662. doi: 10.3390/rs12101662. [15] BURNS P, HAKKENBERG C R, and GOETZ S J. Multi-resolution gridded maps of vegetation structure from GEDI[J]. Scientific Data, 2024, 11(1): 881. doi: 10.1038/s41597-024-03668-4. [16] MIURA Y, SHAMSUDDUHA M, SUPPASRI A, et al. A global multi-sensor dataset of surface water indices from landsat-8 and sentinel-2 satellite measurements[J]. Scientific Data, 2025, 12(1): 1253. doi: 10.1038/s41597-025-05562-z. [17] HUANG Huabing, CHEN Peimin, XU Xiaoqing, et al. Estimating building height in China from ALOS AW3D30[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 185: 146–157. doi: 10.1016/j.isprsjprs.2022.01.022. [18] XU Yidi, YU Le, FENG Duole, et al. Comparisons of three recent moderate resolution African land cover datasets: CGLS-LC100, ESA-S2-LC20, and FROM-GLC-Africa30[J]. International Journal of Remote Sensing, 2019, 40(16): 6185–6202. doi: 10.1080/01431161.2019.1587207. [19] TOWNSHEND J. Global Forest cover change (GFCC) tree cover multi-year global 30 m V003[R]. 2016. doi: 10.5067/MEaSUREs/GFCC/GFCC30TC.003. [20] LIU Xiaoqiang, SU Yanjun, HU Tianyu, et al. Neural network guided interpolation for mapping canopy height of China’s forests by integrating GEDI and ICESat-2 data[J]. Remote Sensing of Environment, 2022, 269: 112844. doi: 10.1016/j.rse.2021.112844. [21] DUNCANSON L, NEUENSCHWANDER A, HANCOCK S, et al. Biomass estimation from simulated GEDI, ICESat-2 and NISAR across environmental gradients in Sonoma County, California[J]. Remote Sensing of Environment, 2020, 242: 111779. doi: 10.1016/j.rse.2020.111779. [22] LIU Aobo, CHENG Xiao, and CHEN Zhuoqi. Performance evaluation of GEDI and ICESat-2 laser altimeter data for terrain and canopy height retrievals[J]. Remote Sensing of Environment, 2021, 264: 112571. doi: 10.1016/j.rse.2021.112571. [23] ARSHAD T and ZHANG Junping. Hierarchical attention transformer for hyperspectral image classification[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 5504605. doi: 10.1109/LGRS.2024.3379509. [24] KURANI A, DOSHI P, VAKHARIA A, et al. A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting[J]. Annals of Data Science, 2023, 10(1): 183–208. doi: 10.1007/s40745-021-00344-x. [25] SALMAN H A, KALAKECH A, and STEITI A. Random forest algorithm overview[J]. Babylonian Journal of Machine Learning, 2024, 2024: 69–79. doi: 10.58496/BJML/2024/007. [26] DUAN Junwei. Broadfusion: A novel two-stage multifocus image fusion approach with human visual system embedded broad learning system[J]. Knowledge-Based Systems, 2025, 326: 114030. doi: 10.1016/j.knosys.2025.114030. [27] SAMAT A, PERSELLO C, LIU Sicong, et al. Classification of VHR multispectral images using extratrees and maximally stable extremal region-guided morphological profile[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(9): 3179–3195. doi: 10.1109/JSTARS.2018.2824354. -
下载:
下载: