Citation: | Hongxin ZHI, Hongtao YU, Shaomei LI, Chao GAO, Yanchuan WANG. A Deep Metric Learning Based Video Classification Method[J]. Journal of Electronics & Information Technology, 2018, 40(11): 2562-2569. doi: 10.11999/JEIT171141 |
BRANSON S, VAN HORN G, PERONA P, et al. Improved bird species recognition using pose normalized deep convolutional nets[C]. Proceedings of the British Machine Vision Conference, Nottingham, British, 2014: 197–211. doi: 10.5244/C.28.87.
|
ZHANG Ning, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014, 8689: 834–849. doi: 10.1007/978-3-319-10590-1_54.
|
KRAUSE J, JIN Hailin, YANG Jianchao, et al. Fine-grained recognition without part annotations[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 5546–5555. doi: 10.1109/CVPR.2015.7299194.
|
LIN Tsungyu, ROYCHOWDHURY A, and MAJI S. Bilinear CNN models for fine-grained visual recognition[C]. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1449–1457.
|
CUI Yin, ZHOU Feng, LIN Yuanqing, et al. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 1153–1162.
|
JI Shuiwang, XU Wei, YANG Ming, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis&Machine Intelligence, 2013, 35(1): 221–231 doi: 10.1109/TPAMI.2012.59
|
SIMONYAN K and ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[J]. Advances in Neural Information Processing Systems, 2014, 1(4): 568–576.
|
MA Chihyao, CHEN Minhung, KIRA Z, et al. TS-LSTM and temporal-inception: exploiting spatiotemporal dynamics for activity recognition[OL]. arXiv preprint arXiv: 1703.10667, 2017.
|
IOFFE S and SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[OL]. arXiv preprint arXiv: 1502. 03167, 2015.
|
WANG Limin, XIONG Yuanjun, WANG Zhe, et al. Temporal segment networks: Towards good practices for deep action recognition[J]. ACM Transactions on Information Systems, 2016, 22(1): 20–36 doi: 10.1007/978-3-319-46484-8_2
|
SONG Hyunoh, XIANG Yu, JEGELKA S, et al. Deep metric learning via lifted structured feature embedding[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 4004–4012. doi: 10.1109/cvpr.2016.434.
|
YI Dong, LEI Zhen, LIAO Shengcai, et al. Deep metric learning for person re-identification[C]. International Conference on Pattern Recognition, Stockholm, Sweden, 2014: 34–39.
|
CHEN Xingyu, LAN Xuguang, LIANG Guoqiang, et al. Pose-and-illumination-invariant face representation via a triplet-loss trained deep reconstruction model[J]. Multimedia Tools&Applications, 2017(7): 1–16 doi: 10.1007/s11042-017-4782-y
|
BELL S and BALA K. Learning visual similarity for product design with convolutional neural networks[J]. ACM Transactions on Graphics, 2015, 34(4): 98–99 doi: 10.1145/2766959
|
SCHROFF F, KALENICHENKO D, and PHILBIN J. FaceNet: A unified embedding for face recognition and clustering[C]. IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 815–823.
|
NG Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: Deep networks for video classification[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 4694–4702. doi: 10.1109/cvpr.2015.7299101.
|
MCLAUGHLIN N, RINCON J M D, and MILLER P. Recurrent convolutional network for video-based person re-identification[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 1325–1334.
|
JIA Yangqing, SHELHAMER E, DONAHUE J, et al. Caffe: Convolutional architecture for fast feature embedding[C]. ACM International Conference on Multimedia, Orlando, USA, 2014: 675–678.
|
SOOMRO K, ZAMIR A R, and SHAH M. UCF101: A dataset of 101 human actions classes from videos in the wild[OL]. arXiv preprint arXiv: 1212. 0402, 2012.
|
KUEHNE H, JHUANG H, STIEFELHAGEN R, et al. HMDB51: A Large Video Database for Human Motion Recognition[M]. Heidelberg, Berlin: Springer, 2013: 2556–2563.
|
BRADSKI G. The opencv library[J]. Doctor Dobbs Journal, 2000, 25(11): 384–386.
|
ZACH C, POCK T, and BISCHOF H. A Duality Based Approach for Realtime TV-L1 Optical Flow[M]. Heidelberg, Berlin: Springer, 2007: 214–223.
|
DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. Proceedings of Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255.
|
EVERINGHAM M, GOOL LV, WILLIAMS CKI, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338 doi: 10.1007/s11263-009-0275-4
|
CAI Zhuowei, WANG Limin, PENG Xiaojiang, et al. Multi-view super vector for action recognition[C]. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 596–603.
|
PENG Xiaojiang, WANG Limin, WANG Xingxing, et al. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[J]. Computer Vision&Image Understanding, 2016, 150(C): 109–125 doi: 10.1016/j.cviu.2016.03.013
|
WANG Heng and SCHMID C. Lear-inria submission for the thumos workshop[C]. ICCV Workshop on Action Recognition with a Large Number of Classes, Sydney, Australia, 2013: 39–47.
|
WANG Limin, QIAO Yu, and TANG Xiaoou. MoFAP: A multi-level representation for action recognition[J]. International Journal of Computer Vision, 2016, 119(3): 254–271 doi: 10.1007/s11263-015-0859-0
|
SUN Lin, JIA Kui, YEUNG D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks[C]. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4597–4605.
|
WANG Limin, QIAO Yu, and TANG Xiaoou. Action recognition with trajectory-pooled deep-convolutional descriptors[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile, 2015: 4305–4314.
|
VAROL G, LAPTEV I, and SCHMID C. Long-term temporal convolutions for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1510–1517 doi: 10.1109/tpami.2017.2712608
|