OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computer Science and Application 2025

基于全局通道数剪枝的Wav2Lip模型轻量化的研究
Research on Lightweight Wav2Lip Model Based on Global Channel Number Pruning

DOI: 10.12677/csa.2025.155133, PP. 606-614

徐康杰, 陈云翔, 张龙, 唐帅, 周庆华

Keywords: Wav2Lip，深度学习，模型轻量化，全局通道数剪枝
Wav2Lip, Deep Learning, Model Lightweighting, Global Channel Pruning

Full-Text Cite this paper Add to My Lib

Abstract:

针对Wav2Lip模型计算量大，推理速度慢，在一些对实时性要求较高或算力较为有限的应用场景中可能难以满足预期效果等问题，论文提出了基于全局通道数剪枝的方法，选用了三种不同剪枝比例，对Wav2Lip模型进行了全局通道数剪枝并对比。实验结果表明，论文提出的全局通道数剪枝方案成功地：1) 提升了推理速度；2) 减小了模型体积；3) 保持或提升了所生成图像的效果。该方案在降低计算成本的同时，能够实现高效且稳定的推理性能。
In response to the issues of high computational complexity, slow inference speed, and potential difficulty in achieving expected results in some application scenarios that require high real-time performance or limited computing power for the Wav2Lip model, the paper proposes a method based on global channel pruning, using three different pruning ratios to perform global channel pruning on the Wav2Lip model and compare them, the experimental results show that the global channel pruning scheme proposed in the paper successfully: 1) improves inference speed; 2) Reduced the size of the model; 3) Maintained or improved the effect of the generated image. This solution can achieve efficient and stable inference performance while reducing computational costs.

References

[1]	Prajwal, K.R., Mukhopadhyay, R., Namboodiri, V.P. and Jawahar, C.V. (2020) A Lip Sync Expert Is All You Need for Speech to Lip Generation in the Wild. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, 12-16 October 2020, 484-492. https://doi.org/10.1145/3394171.3413532
[2]	Kim, B.K., Choi, S. and Park, H. (2022) Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net Gans. arXiv:2206.14658.
[3]	林景栋, 吴欣怡, 柴毅, 等. 卷积神经网络结构优化综述[J]. 自动化学报, 2020, 46(1): 24-37.
[4]	Mathew, M., Desappan, K., Swami, P.K. and Nagori, S. (2017) Sparse, Quantized, Full Frame CNN for Low Power Embedded Devices. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, 21-26 July 2017, 328-336. https://doi.org/10.1109/cvprw.2017.46
[5]	毕鹏程, 罗健欣, 陈卫卫. 轻量化卷积神经网络技术研究[J]. 计算机工程与应用, 2019, 55(16): 25-35.
[6]	Reiners, M., Klamroth, K., Heldmann, F. and Stiglmayr, M. (2022) Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach. Computers & Operations Research, 141, Article 105676. https://doi.org/10.1016/j.cor.2021.105676
[7]	Mishra, R., Gupta, H.P. and Dutta, T. (2020) A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions. arXiv:2010.03954.
[8]	Hanson, S. and Pratt, L. (1988) Comparing Biases for Minimal Network Construction with Back-Propagation. Advances in Neural Information Processing Systems 1, 1 January 1988, 177-185.
[9]	Han, S., Mao, H. and Dally, W.J. (2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149.
[10]	Frankle, J. and Carbin, M. (2018) The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv:1803.03635.
[11]	Shu, H., Wang, Y., Jia, X., Han, K., Chen, H., Xu, C., et al. (2019) Co-Evolutionary Compression for Unpaired Image Translation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October 2019-2 November 2019, 3234-3243. https://doi.org/10.1109/iccv.2019.00333
[12]	张良, 张增, 等. 基于YOLOv3的卷积层结构化剪枝[J]. 计算机工程与应用, 2021, 57(6): 131-137.
[13]	黄文斌, 陈仁文, 袁婷婷. 改进YOLOv3-SPP的无人机目标检测模型压缩方案[J]. 计算机工程与应用, 2021, 57(21): 165-173.
[14]	Li, H., Kadav, A., Durdanovic, I., et al. (2016) Pruning Filters for Efficient Convnets. arXiv:1608.08710.
[15]	Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S. and Zhang, C. (2017) Learning Efficient Convolutional Networks through Network Slimming. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2755-2763. https://doi.org/10.1109/iccv.2017.298
[16]	Afouras, T., Chung, J.S. and Zisserman, A. (2018) LRS3-TED: A Large-Scale Dataset for Visual Speech Recognition. arXiv:1809.00496.
[17]	Heusel, M., Ramsauer, H., Unterthiner, T., et al. (2017) Gans Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6629-6640.
[18]	Chung, J.S. and Zisserman, A. (2017) Out of Time: Automated Lip Sync in the Wild. In: Chen, C.S., Lu, J. and Ma, K.K., Eds., Lecture Notes in Computer Science, Springer International Publishing, 251-263. https://doi.org/10.1007/978-3-319-54427-4_19
[19]	Tousi, A., Jeong, H., Han, J., Choi, H. and Choi, J. (2021) Automatic Correction of Internal Units in Generative Neural Networks. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 7928-7936. https://doi.org/10.1109/cvpr46437.2021.00784

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于全局通道数剪枝的Wav2Lip模型轻量化的研究Research on Lightweight Wav2Lip Model Based on Global Channel Number Pruning

基于全局通道数剪枝的Wav2Lip模型轻量化的研究
Research on Lightweight Wav2Lip Model Based on Global Channel Number Pruning