|
多模态行人重识别系统研究
|
Abstract:
行人重识别系统用来找寻行人,有着举足轻重的作用,而且还可以利用语义快速从库中查找出最相似的人。这种应用基于图像和文本的不同模型结构,配上合适的损失函数,将模型收敛。现基于多模态学习,可以使用语义进行图像的搜寻,对于数据量极大的监控系统而言,这无疑能帮助更高效的找寻目标。在文本分类的模型中,深度学习模型常出现在人们视野中,但是由于模型深度等原因,深度学习框架往往时间复杂度比较高,而FastText模型是基于嵌入的模型,没有复杂和深度的框架,但是却能在保证准确性的同时大幅提高模型训练的速率,使行人查找搜寻任务可以更快被完成,推进相关行业发展。因此,本文将FastText模型应用于多模态行人重拾别系统研究时,显著提高了训练的速度,将多模态行人重识别系统推向了更好的应用层面。
The pedestrian re-identification system is used to find pedestrians, which plays a pivotal role, and can also use semantics to find the most similar person from the dataset quickly. This application is based on different model structures of images and text, with appropriate loss functions, to converge the model. Now based on multi-model learning, you can use semantics for image search, which can undoubtedly help to find targets more efficiently for monitoring systems with a large amount of data. In the model of text classification, deep learning models often appear in people’s field of vision, but due to model depth and other reasons, deep learning frameworks tend to have high time com-plexity, while model of FastText is based on embedding model, without complex and deep frame-works, but can ensure accuracy while improving the speed of model training, so that pedestrian re-identification tasks can be completed faster, promoting the development of related industries. Therefore, when the FastText model is applied to the research of multi-model pedestrian re-identification system, the training speed is significantly improved, and the multi-model pedestrian re-identification system is pushed to a better application level.
[1] | Sun, Y., Zheng, L., Yang, Y., Tian, Q. and Wang, S. (2018) Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline). The European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 480-496. |
[2] | Wang, G., Yuan, Y., Chen, X., Li, J. and Zhou, X. (2018) Learning Discriminative Features with Multiple Granularities for Person Re-Identification. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, 22-26 October 2018, 274-282. |
[3] | Zhao, L., Li, X., Zhuang, Y. and Wang, J. (2017) Deep-ly-Learned Part-Aligned Representations for Person Re-Identification. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 3219-3228. |
[4] | Song, G., Leng, B., Liu, Y., Hetang, C. and Cai, S. (2018) Region-Based Quality Estimation Network for Large-Scale Person Re-Identification. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, 2-7 February 2018, 32. |
[5] | Zhao, H., Tian, M., Sun, S., Shao, J., Yan, J., Yi, S., Wang, X. and Tang, X. (2017) Spindle Net: Person Re-Identification with Human Body Region Guided Feature Decomposition and Fusion. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 1077-1085. |
[6] | Kalayeh, M.M., Basaran, E., G?kmen, M., Kamasak, M.E. and Shah, M. (2018) Human Semantic Parsing for Person Re-Identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 1062-1071. |
[7] | Zhang, S., Yin, Z., Wu, X., Wang, K., Zhou, Q. and Kang, B. (2021) FPB: Feature Pyramid Branch for Person Re-Identification. arXiv preprint arXiv: 2108.01901. |
[8] | Ma, T., Yang, M., Rong, H., Qian, Y., Tian, Y. and Nabhan, N. (2021) Dual-Path CNN with Max Gated Block for Text-Based Person Re-Identification. Image and Vision Computing, 111, 104168. |
[9] | Hou, R., Ma, B., Chang, H., Gu, X., Shan, S. and Chen, X. (2019) Interaction-and-Aggregation Network for Person Re-Identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 16-20 June 2019, 9317-9326. |
[10] | Li, S., Xiao, T., Li, H., Zhou, B., Yue, D. and Wang, X. (2017) Person Search with Natural Language Description. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1970-1979.
https://doi.org/10.1109/CVPR.2017.551 |
[11] | Zhang, Y. and Lu, H. (2018) Deep Cross-Modal Projection Learning for Image-Text Matching. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 Sep-tember 2018, 686-701. |
[12] | Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018) Bert: Pre-Training of Deep Bi-directional Transformers for Language Understanding. arXiv preprint arXiv: 1810.04805. |
[13] | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. The International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, 4-7 December 2017, 6000-6010. |