Citation: | WANG Hongchang, XIAN Fengyu, XIE Zihui, DONG Miaomiao, JIAN Haifang. BIRD1445: Large-scale Multimodal Bird Dataset for Ecological Monitoring[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250647 |
[1] |
ZHU Ruizhe, JIN Hai, HAN Yonghua, et al. Aircraft target detection in remote sensing images based on improved YOLOv7-tiny network[J]. IEEE Access, 2025, 13: 48904–48922. doi: 10.1109/ACCESS.2025.3551320.
|
[2] |
侯志强, 董佳乐, 马素刚, 等. 基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法[J]. 电子与信息学报, 2024, 46(11): 4198–4207. doi: 10.11999/JEIT231394.
HOU Zhiqiang, DONG Jiale, MA Sugang, et al. Video object segmentation algorithm based on multi-scale feature enhancement and global-local feature aggregation[J]. Journal of Electronics & Information Technology, 2024, 46(11): 4198–4207. doi: 10.11999/JEIT231394.
|
[3] |
查志远, 袁鑫, 张嘉超, 等. 基于低秩正则联合稀疏建模的图像去噪算法[J]. 电子与信息学报, 2025, 47(2): 561–572. doi: 10.11999/JEIT240324.
ZHA Zhiyuan, YUAN Xin, ZHANG Jiachao, et al. Low-rank regularized joint sparsity modeling for image denoising[J]. Journal of Electronics & Information Technology, 2025, 47(2): 561–572. doi: 10.11999/JEIT240324.
|
[4] |
TUIA D, KELLENBERGER B, BEERY S, et al. Perspectives in machine learning for wildlife conservation[J]. Nature Communications, 2022, 13(1): 792. doi: 10.1038/s41467-022-27980-y.
|
[5] |
WANG Hongchang, LU Huaxiang, GUO Huimin, et al. Bird-Count: A multi-modality benchmark and system for bird population counting in the wild[J]. Multimedia Tools and Applications, 2023, 82(29): 45293–45315. doi: 10.1007/s11042-023-14833-z.
|
[6] |
王洪昌, 夏舫, 张渊媛, 等. 基于深度学习算法的鸟类及其栖息地识别——以北京翠湖国家城市湿地公园为例[J]. 生态学杂志, 2024, 43(7): 2231–2238. doi: 10.13292/j.1000-4890.202407.045.
WANG Hongchang, XIA Fang, ZHANG Yuanyuan, et al. Bird and habitat recognition based on deep learning algorithm: A case study of Beijing Cuihu National Urban Wetland Park[J]. Chinese Journal of Ecology, 2024, 43(7): 2231–2238. doi: 10.13292/j.1000-4890.202407.045.
|
[7] |
GUO Huimin, JIAN Haifang, WANG Yiyu, et al. CDPNet: Conformer-based dual path joint modeling network for bird sound recognition[J]. Applied Intelligence, 2024, 54(4): 3152–3168. doi: 10.1007/s10489-024-05362-9.
|
[8] |
NICHOLS J D and WILLIAMS B K. Monitoring for conservation[J]. Trends in Ecology & Evolution, 2006, 21(12): 668–673. doi: 10.1016/j.tree.2006.08.007.
|
[9] |
NOROUZZADEH M S, NGUYEN A, KOSMALA M, et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning[J]. Proceedings of the National Academy of Sciences, 2018, 115(25): E5716–E5725. doi: 10.1073/pnas.1719367115.
|
[10] |
HAMPTON S E, STRASSER C A, TEWKSBURY J J, et al. Big data and the future of ecology[J]. Frontiers in Ecology and the Environment, 2013, 11(3): 156–162. doi: 10.1890/120103.
|
[11] |
WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset[R]. 2011. (查阅网上资料, 未找到本条文献出版信息, 请确认).
|
[12] |
VAN HORN G, BRANSON S, FARRELL R, et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection[C]. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 595–604. doi: 10.1109/CVPR.2015.7298658.
|
[13] |
VAN HORN G, MAC AODHA O, SONG Yang, et al. The iNaturalist species classification and detection dataset[C]. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8769–8778. doi: 10.1109/CVPR.2018.00914.
|
[14] |
STEVENS S, WU Jiaman, THOMPSON M J, et al. BioCLIP: A vision foundation model for the tree of life[C]. Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA, 2024: 19412–19424. doi: 10.1109/CVPR52733.2024.01836.
|
[15] |
FERGUS P, CHALMERS C, LONGMORE S, et al. Harnessing artificial intelligence for wildlife conservation[J]. Conservation, 2024, 4(4): 685–702. doi: 10.3390/conservation4040041.
|
[16] |
EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4.
|
[17] |
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]. 13th European Conference on Computer Vision -- ECCV 2014, Zurich, Switzerland, 2014: 740–755. doi: 10.1007/978-3-319-10602-1_48.
|
[18] |
DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
|
[19] |
KUZNETSOVA A, ROM H, ALLDRIN N, et al. The open images dataset V4: Unified image classification, object detection, and visual relationship detection at scale[J]. International Journal of Computer Vision, 2020, 128(7): 1956–1981. doi: 10.1007/s11263-020-01316-z.
|
[20] |
GOYAL Y, KHOT T, SUMMERS-STAY D, et al. Making the V in VQA matter: Elevating the role of image understanding in visual question answering[C]. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6325–6334. doi: 10.1109/CVPR.2017.670.
|
[21] |
AINSLIE J, LEE-THORP J, DE JONG M, et al. GQA: Training generalized multi-query transformer models from multi-head checkpoints[C]. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Singapore, 2023: 4895–4901. doi: 10.18653/v1/2023.emnlp-main.298.
|
[22] |
MARINO K, RASTEGARI M, FARHADI A, et al. OK-VQA: A visual question answering benchmark requiring external knowledge[C]. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3190–3199. doi: 10.1109/CVPR.2019.00331.
|
[23] |
KAHL S, WILHELM-STEIN T, KLINCK H, et al. Recognizing birds from sound-the 2018 BirdCLEF baseline system[J]. arXiv preprint arXiv: 1804.07177, 2018. doi: 10.48550/arXiv.1804.07177(查阅网上资料,不确定文献类型是否正确,请确认).
|
[24] |
HE Wei, HAN Kai, NIE Ying, et al. Species196: A one-million semi-supervised dataset for fine-grained species recognition[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 1949. doi: 10.5555/3666122.3668071.
|
[25] |
GUO Huimin, JIAN Haifang, WANG Yequan, et al. MAMGAN: Multiscale attention metric GAN for monaural speech enhancement in the time domain[J]. Applied Acoustics, 2023, 209: 109385. doi: 10.1016/j.apacoust.2023.109385.
|
[26] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, 2021. (查阅网上资料, 未找到本条文献出版地信息, 请确认).
|
[27] |
WANG Xinlong, ZHANG Xiaosong, CAO Yue, et al. SegGPT: Towards segmenting everything in context[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 1130–1140. doi: 10.1109/ICCV51070.2023.00110.
|
[28] |
CHEN Yue, BAI Yalong, ZHANG Wei, et al. Destruction and construction learning for fine-grained image recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 5152–5161. doi: 10.1109/CVPR.2019.00530.
|
[29] |
CHOU P Y, LIN C H, and KAO W C. A novel plug-in module for fine-grained visual classification[J]. arxiv preprint arXiv: 2202.03822, 2022. doi: 10.48550/arXiv.2202.03822(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|
[30] |
LUO Wei, YANG Xitong, MO Xianjie, et al. Cross-X learning for fine-grained visual categorization[C]. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 8241–8250. doi: 10.1109/ICCV.2019.00833.
|
[31] |
WANG Jiahui, XU Qin, JIANG Bo, et al. Multi-granularity part sampling attention for fine-grained visual classification[J]. IEEE Transactions on Image Processing, 2024, 33: 4529–4542. doi: 10.1109/TIP.2024.3441813.
|
[32] |
LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 11966–11976. doi: 10.1109/CVPR52688.2022.01167.
|
[33] |
HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
|
[34] |
HE Ju, CHEN Jieneng, LIU Shuai, et al. TransFG: A transformer architecture for fine-grained recognition[C]. Proceedings of the 36th AAAI Conference on Artificial Intelligence, 2022: 852–860. doi: 10.1609/aaai.v36i1.19967. (查阅网上资料,未找到本条文献出版地信息,请确认).
|
[35] |
DU Ruoyi, CHANG Dongliang, BHUNIA A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 153–168. doi: 10.1007/978-3-030-58565-5_10.
|
[36] |
BAI Shuai, CHEN Keqin, LIU Xuejing, et al. Qwen2.5-VL technical report[J]. arXiv preprint arXiv: 2502.13923, 2025. doi: 10.48550/arXiv.2502.13923(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|