|
基于AnntDreamBooth微调技术的深度长尾哈希检索研究
|
Abstract:
本文聚焦于图像检索中的长尾分布问题,提出了一种基于数据增强的深度哈希检索框架。传统图像检索技术多依赖人工或自动标注,但由于文本描述与图像存在语义鸿沟,加之数据量的不断增长,检索效率和成本成为主要瓶颈。深度哈希技术通过深度神经网络自动提取图像特征并映射到二进制哈希码空间,显著提高了检索效率。然而,实际数据中长尾分布的普遍存在导致深度哈希模型对尾部类别表现较差,尤其在大规模、不平衡数据集上难以达到理想效果。为解决上述问题,本文引入扩散模型这一生成技术,通过生成尾部类别样本来丰富数据多样性,从而缓解数据分布不均的影响,增强模型对尾部类别的识别能力。同时,结合AttnDreamBooth微调技术,优化合成数据生成的质量,将其与深度哈希技术相结合,对东华大学丝绸数据集进行实验,取得了显著的性能提升。通过从数据层面平衡分布并在模型层面强化训练,本文提出的方法有效应对了长尾分布对深度哈希检索的负面影响,为图像检索领域提供了新思路与实践方向。
This paper focuses on the long-tail distribution problem in image retrieval and proposes a deep hashing retrieval framework based on data augmentation. Traditional image retrieval techniques often rely on manual or automatic annotation; however, the semantic gap between textual descriptions and images, coupled with the continuous growth of data volume, has made retrieval efficiency and cost major bottlenecks. Deep hashing technology leverages deep neural networks to automatically extract image features and map them into binary hash code space, significantly improving retrieval efficiency. Nevertheless, the widespread presence of long-tail distributions in real-world data leads to poor performance of deep hashing models on tail classes, especially on large-scale, imbalanced datasets. To address this issue, this paper introduces diffusion models as a generative technique to enrich data diversity by generating samples for tail classes, thereby mitigating the impact of uneven data distribution and enhancing the model’s ability to recognize tail classes. Additionally, AttnDreamBooth fine-tuning technology is employed to optimize the quality of synthesized data. By integrating this approach with deep hashing techniques, experiments on the Donghua University silk dataset demonstrate significant performance improvements. By balancing distribution at the data level and strengthening training at the model level, the proposed method effectively addresses the negative effects of long-tail distribution on deep hashing retrieval, offering new insights and practical directions for the field of image retrieval.
[1] | Chen, W., Liu, Y., Wang, W., et al. (2021) Deep Image Retrieval: A Survey. |
[2] | Benavent, J., Benavent, X., de Ves, E., et al. (2010) Experiences at Image CLEF 2010 Using CBIR and TBIR Mixing Information Approaches. CLEF (Notebook Papers/LABs/Workshops). |
[3] | Wijmans, J.G. and Baker, R.W. (1995) The Solution-Diffusion Model: A Review. Journal of Membrane Science, 107, 1-21. https://doi.org/10.1016/0376-7388(95)00102-i |
[4] | Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953 |
[5] | Kang, B., Xie, S., Rohrbach, M., et al. (2019) Decoupling Representation and Classifier for Long-Tailed Recognition. |
[6] | Lawrence, S., Burns, I., Back, A., Tsoi, A.C. and Giles, C.L. (1998) Neural Network Classification and Prior Class Probabilities. In: Lecture Notes in Computer Science, Springer, 299-313. https://doi.org/10.1007/3-540-49430-8_15 |
[7] | Chu, P., Bian, X., Liu, S. and Ling, H. (2020) Feature Space Augmentation for Long-Tailed Data. In: Lecture Notes in Computer Science, Springer, 694-710. https://doi.org/10.1007/978-3-030-58526-6_41 |
[8] | Zhang, S., Chen, C., Hu, X. and Peng, S. (2023) Balanced Knowledge Distillation for Long-Tailed Learning. Neurocomputing, 527, 36-46. https://doi.org/10.1016/j.neucom.2023.01.063 |
[9] | Sharma, S., Yu, N., Fritz, M. and Schiele, B. (2021) Long-Tailed Recognition Using Class-Balanced Experts. In: Lecture Notes in Computer Science, Springer, 86-100. https://doi.org/10.1007/978-3-030-71278-5_7 |
[10] | Xiang, L., Ding, G. and Han, J. (2020) Learning from Multiple Experts: Self-Paced Knowledge Distillation for Long-Tailed Classification. In: Lecture Notes in Computer Science, Springer, 247-263. https://doi.org/10.1007/978-3-030-58558-7_15 |
[11] | Cai, J., Wang, Y., Hsu, H., Hwang, J., Magrane, K. and Rose, C.S. (2022) LUNA: Localizing Unfamiliarity Near Acquaintance for Open-Set Long-Tailed Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 131-139. https://doi.org/10.1609/aaai.v36i1.19887 |
[12] | Wu, J., Song, L., Zhang, Q., Yang, M. and Yuan, J. (2022) Forestdet: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation. IEEE Transactions on Multimedia, 24, 3693-3705. https://doi.org/10.1109/tmm.2021.3106096 |
[13] | Zhu, B., Niu, Y., Hua, X. and Zhang, H. (2022) Cross-Domain Empirical Risk Minimization for Unbiased Long-Tailed Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 3589-3597. https://doi.org/10.1609/aaai.v36i3.20271 |
[14] | Wu, T., Liu, Z., Huang, Q., Wang, Y. and Lin, D. (2021) Adversarial Robustness under Long-Tailed Distribution. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 8655-8664. https://doi.org/10.1109/cvpr46437.2021.00855 |
[15] | Ren, J., Yu, C., Ma, X., et al. (2020) Balanced Meta-Softmax for Long-Tailed Visual Recognition. Advances in Neural Information Processing Systems, 33, 4175-4186. |
[16] | Dong, B., Zhou, P., Yan, S., et al. (2022) LPT: Long-Tailed Prompt Tuning for Image Classification. |
[17] | Rombach, R., Blattmann, A., Lorenz, D., Esser, P. and Ommer, B. (2022) High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 10674-10685. https://doi.org/10.1109/cvpr52688.2022.01042 |
[18] | Saharia, C., Chan, W., Saxena, S., et al. (2022) Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Advances in Neural Information Processing Systems, 35, 36479-36494. |
[19] | Nichol, A., Dhariwal, P., Ramesh, A., et al. (2021) Glide: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. |
[20] | Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M. and Aberman, K. (2023) Dreambooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 22500-22510. https://doi.org/10.1109/cvpr52729.2023.02155 |
[21] | Gal, R., Alaluf, Y., Atzmon, Y., et al. (2022) An Image Is Worth One Word: Personalizing Text-to-Image Generation Using Textual Inversion. |
[22] | Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252. https://doi.org/10.1007/s11263-015-0816-y |
[23] | Hoe, J.T., Ng, K.W., Zhang, T., et al. (2021) One Loss for All: Deep Hashing with a Single Cosine Similarity Based Learning Objective. Advances in Neural Information Processing Systems, 34, 24286-24298. |
[24] | Pang, L., Yin, J., Zhao, B., et al. (2024) AttnDreamBooth: towards Text-Aligned Personalized Text-to-Image Generation. |