OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Software Engineering and Applications 2021

基于深度学习的视频语音提取文本系统设计与实现
Design and Implementation of Video Speech Extraction Text System Based on Deep Learning

DOI: 10.12677/SEA.2021.104057, PP. 528-541

谢煜颖

Keywords: 语音识别，语音合成，视频处理，深度学习
Speech Recognition, Speech Synthesis, Video Processing, Deep Learning

Full-Text Cite this paper Add to My Lib

Abstract:

21世纪是信息化的时代，多媒体技术在网络教学中的应用越来越普及。在新冠疫情防控形势严峻的时期，网络教学凭借得天独厚的优势起到了重大作用。但当前市场上的在线视频编辑平台功能单一、效率低下、用户体验繁琐，本文利用基于循环神经网络和卷积神经网络实现的语音识别对视频进行文本提取，并且使用注意力机制算法实现的语音合成对视频文本的修改，使用FFmpeg对视频进行处理，同时使用多线程和异步队列提升系统性能。本文主要针对如何实现语音识别和语音合成，以及如何提升语音合成效果做了主要的研究，最终实现识别普通话的准确率为90.52%，以及声音合成近乎为ground truth的合成引擎，将其应用于产品实现了能对视频精准编辑，体验良好的视频语音提取文本系统。
The 21st century is an information age, and the application of multimedia technology in network teaching is becoming more and more popular. In the severe period of COVID-19 prevention and control, online teaching has played an important role by virtue of its unique advantages. However, the current online video editing platform in the market has the disadvantages of single function, low efficiency and cumbersome user experience. This paper uses speech recognition based on cy-clic neural network and convolutional neural network to extract the video text, uses speech syn-thesis realized by attention mechanism algorithm to modify the video text, and uses FFmpeg to process the video, it also uses multithreading and asynchronous queues to improve system per-formance. This paper mainly focuses on how to realize speech recognition and speech synthesis, and how to improve the effect of speech synthesis. Finally, the accuracy rate of Mandarin recogni-tion is 90.52%, and the sound synthesis engine is close to the ground truth, which can be applied to the product to realize accurate video editing and experience a good video-speech extraction text system.

References

[1]	杨辰雨, 庄磊. 语音合成技术及其在金融场景下的应用[J]. 中国金融电脑, 2021(6): 43-46.
[2]	潘丽鹏. 嵌入式英语语音识别控制系统研究[J]. 微型电脑应用, 2021, 37(6): 73-75.
[3]	卢林, 王东. 浅谈声音识别模型发展趋势[J]. 汽车实用技术, 2021, 46(12): 186-188.
[4]	鱼昆, 张绍阳, 侯佳正, 张少博. 语音识别及端到端技术现状及展望[J]. 计算机系统应用, 2021, 30(3):14-23.
[5]	Yao, K.S. and Zweig, G. (2015) Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion. https://dblp.uni-trier.de/rec/journals/corr/YaoZ15.html
[6]	胡亚军. 基于神经网络的统计参数语音合成方法研究[D]: [博士学术论文]. 合肥: 中国科学技术大学, 2018.
[7]	胡新月. 语音识别技术在软件工程中的应用[J]. 电子技术与软件工程, 2021(4): 240-241.
[8]	魏伟华. 语音合成技术综述及研究现状[J]. 软件, 2020, 41(12): 214-217.
[9]	吴大非. MOOC管理平台的设计与开发[J]. 电脑知识与技术, 2018, 14(27): 47-49+52.
[10]	潘丽鹏. 嵌入式英语语音识别控制系统研究[J]. 微型电脑应用, 2021, 37(6): 73-75.
[11]	马莉, 朱永胜, 王晓刚. 基于语音识别技术的复杂超声检查报告智能生成系统设计[J]. 现代医用影像学, 2021, 30(5): 928-930.
[12]	冯君. 基于Android平台的语音识别技术应用研究[J]. 铜陵职业技术学院学报, 2021, 20(1): 62-65+82.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于深度学习的视频语音提取文本系统设计与实现Design and Implementation of Video Speech Extraction Text System Based on Deep Learning

基于深度学习的视频语音提取文本系统设计与实现
Design and Implementation of Video Speech Extraction Text System Based on Deep Learning