|
基于深度学习的移动端水果识别
|
Abstract:
超市水果识别主要依赖人工,计算机视觉成为一种解决方案。然而目前仍面临部分水果识别精度低、终端设备部署困难、误识别图片难处理等挑战。因此,文章基于深度学习对移动端水果识别进行研究,旨在替代人工识别。首先文章构建了包含49种水果的超市水果图像数据集DailyFruit-49。并针对细分类特征相似度高、包装遮挡、形状小量少的水果识别困难,以及低算力设备模型部署问题,筛选了满足部署要求的骨干模型。设计了新的注意力模块RMA,改进了ViT Block以增强模型的细节识别能力和深层语义特征整合能力,最终得到DenseRMA_ViT模型,并基于Focal Loss改进损失函数。并在公开数据集Fruits-262上进行消融实验验证模型改进的有效性。最后结合实际设备,实现水果识别系统,满足实际使用。基于与用户的交互行为对误识别水果图像进行收集,并基于误识别图像实现模型权重自动微调,随使用时间延长,系统收集更多图片,提升模型识别精度与泛化能力,以处理实际应用中误识别水果。
Supermarket fruit recognition mainly relies on manual processes, and computer vision has emerged as a solution. However, challenges remain, including low accuracy for some fruits, difficulties in deploying them on terminal devices, and handling misidentified images. Therefore, this paper researches mobile fruit recognition based on deep learning, aiming to replace manual identification. First, the paper constructs the DailyFruit-49 dataset, which includes images of 49 types of fruits. Addressing the challenges of recognizing fruits with high feature similarity, packaging obstructions, and small shapes, as well as the deployment issues on low-compute devices, the backbone model meeting deployment requirements was selected. A new attention module, RMA, was designed, and the ViT Block was improved to enhance the model’s detail recognition and deep semantic feature integration capabilities, resulting in the Dense RMA_ViT model. The loss function was also improved based on Focal Loss. Ablation experiments on the public dataset Fruits-262 verified the effectiveness of these improvements. Finally, a fruit recognition system was implemented on actual devices to meet practical usage needs. The system collects misidentified fruit images based on user interactions and automatically fine-tunes the model’s weights based on these images. Over time, as the system collects more images, the model’s recognition accuracy and generalization ability improve, effectively handling misidentified fruits in real-world applications.
[1] | 吴中勇, 李延荣, 董中丹. 我国水果市场发展现状及对策研究[J]. 中国果菜, 2023, 43(11): 79-83+87. |
[2] | 中研普华公司, 2022-2027年中国果蔬行业市场全面分析及发展趋势调研报告[R]. 深圳: 中国行业研究网, 2022. |
[3] | Jana, S., Basak, S. and Parekh, R. (2017) Automatic Fruit Recognition from Natural Images Using Color and Texture Features. 2017 Devices for Integrated Circuit (DevIC), Kalyani, 23-24 March 2017, 620-624. https://doi.org/10.1109/devic.2017.8074025 |
[4] | Novak, C.L. and Shafer, S.A. (1992) Anatomy of a Color Histogram. Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, 15-18 June 1992, 559-605. https://doi.org/10.1109/cvpr.1992.223129 |
[5] | Movellan, J.R. (2002) Tutorial on Gabor filters. Open Source Document, 40, 1-23. |
[6] | Pietikäinen, M. (2010) Local Binary Patterns. Scholarpedia, 5, 9775. https://doi.org/10.4249/scholarpedia.9775 |
[7] | Tomasi, C. (2012) Histograms of Oriented Gradients. Computer Vision Sampler, 1, 1-6. |
[8] | Gao, W.S., Zhang, X.G., Yang, L. and Liu, H.Z. (2010) An Improved Sobel Edge Detection. 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, 9-11 July 2010, 67-71. https://doi.org/10.1109/iccsit.2010.5563693 |
[9] | Ding, L. and Goshtasby, A. (2001) On the Canny Edge Detector. Pattern Recognition, 34, 721-725. https://doi.org/10.1016/s0031-3203(00)00023-6 |
[10] | Cruz-Mota, J., Bogdanova, I., Paquier, B., Bierlaire, M. and Thiran, J. (2011) Scale Invariant Feature Transform on the Sphere: Theory and Applications. International Journal of Computer Vision, 98, 217-241. https://doi.org/10.1007/s11263-011-0505-4 |
[11] | Wold, S., Esbensen, K. and Geladi, P. (1987) Principal Component Analysis. Chemometrics and Intelligent Laboratory Systems, 2, 37-52. https://doi.org/10.1016/0169-7439(87)80084-9 |
[12] | Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J. and Scholkopf, B. (1998) Support Vector Machines. IEEE Intelligent Systems and their Applications, 13, 18-28. https://doi.org/10.1109/5254.708428 |
[13] | Kramer, O. (2013) K-Nearest Neighbors. In: Kramer, O., Ed., Dimensionality Reduction with Unsupervised Nearest Neighbors, Springer Berlin Heidelberg, 13-23. https://doi.org/10.1007/978-3-642-38652-7_2 |
[14] | Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/a:1010933404324 |
[15] | Song, Y.Y. and Ying, L.U. (2015) Decision Tree Methods: Applications for Classification and Prediction. Shanghai Archives of Psychiatry, 27, 130. |
[16] | Enciso-Aragón, C.J., Pachón-Suescún, C.G. and Jimenez-Moreno, R. (2018) Quality Control System by Means of CNN and Fuzzy Systems. International Journal of Applied Engineering Research, 13, 12846-12853. |
[17] | 孟欣欣, 阿里甫·库尔班, 吕情深, 等. 基于迁移学习的自然环境下香梨目标识别研究[J]. 新疆大学学报(自然科学版), 2019, 36(4): 461-467. |
[18] | Xue, G., Liu, S. and Ma, Y. (2020) A Hybrid Deep Learning-Based Fruit Classification Using Attention Model and Convolution Autoencoder. Complex & Intelligent Systems, 9, 2209-2219. https://doi.org/10.1007/s40747-020-00192-x |
[19] | Lu, T., Han, B., Chen, L., Yu, F. and Xue, C. (2021) A Generic Intelligent Tomato Classification System for Practical Applications Using Densenet-201 with Transfer Learning. Scientific Reports, 11, Article No. 15824. https://doi.org/10.1038/s41598-021-95218-w |
[20] | Chandel, N.S., Chakraborty, S.K., Rajwade, Y.A., Dubey, K., Tiwari, M.K. and Jat, D. (2020) Identifying Crop Water Stress Using Deep Learning Models. Neural Computing and Applications, 33, 5353-5367. https://doi.org/10.1007/s00521-020-05325-4 |
[21] | Kang, J. and Gwak, J. (2021) Ensemble of Multi-Task Deep Convolutional Neural Networks Using Transfer Learning for Fruit Freshness Classification. Multimedia Tools and Applications, 81, 22355-22377. https://doi.org/10.1007/s11042-021-11282-4 |
[22] | Ismail, N. and Malik, O.A. (2022) Real-time Visual Inspection System for Grading Fruits Using Computer Vision and Deep Learning Techniques. Information Processing in Agriculture, 9, 24-37. https://doi.org/10.1016/j.inpa.2021.01.005 |
[23] | Huang, R., Zheng, W., Zhang, B., Zhou, J., Cui, Z. and Zhang, Z. (2023) Deep Learning with Tactile Sequences Enables Fruit Recognition and Force Prediction for Damage-Free Grasping. Computers and Electronics in Agriculture, 211, Article ID: 107985. https://doi.org/10.1016/j.compag.2023.107985 |
[24] | Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929. |
[25] | Lin, T., Goyal, P., Girshick, R., He, K. and Dollar, P. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2999-3007. https://doi.org/10.1109/iccv.2017.324 |
[26] | Waltner, G., Schwarz, M., Ladstätter, S., Weber, A., Luley, P., Lindschinger, M., et al. (2017) Personalized Dietary Self-Management Using Mobile Vision-Based Assistance. In: Battiato, S., Farinella, G., Leo, M. and Gallo, G., Eds., New Trends in Image Analysis and Processing—ICIAP 2017, Springer International Publishing, 385-393. https://doi.org/10.1007/978-3-319-70742-6_36 |
[27] | Minut, M. and Iftene, A. (2021) Creating a Dataset and Models Based on Convolutional Neural Networks to Improve Fruit Classification. 2021 23rd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, 7-10 December 2021, 155-162. https://doi.org/10.1109/synasc54541.2021.00035 |
[28] | Mureşan, H. and Oltean, M. (2018) Fruit Recognition from Images Using Deep Learning. Acta Universitatis Sapientiae, Informatica, 10, 26-42. https://doi.org/10.2478/ausi-2018-0002 |
[29] | Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., et al. (2023) Segment Anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 3992-4003. https://doi.org/10.1109/iccv51070.2023.00371 |
[30] | Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017) Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2261-2269. https://doi.org/10.1109/cvpr.2017.243 |
[31] | Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L. (2018) MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520. https://doi.org/10.1109/cvpr.2018.00474 |
[32] | He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/cvpr.2016.90 |
[33] | Reed, J., DeVito, Z., He, H., et al. (2022) Torch. fx: Practical Program Capture and Transformation for Deep Learning in Python. Proceedings of Machine Learning and Systems, 4, 638-651. |