![]() ![]() DenoiSpeechĬustomers swarmed into the store. Clean GTīasically we lost the game because we were outplayed. I saw the story, but that amount wouldn’t even pay my commission. Your browser does not support the audio element. We use the VCTK corpus for speech information and the Nonspeech100 for background noise. Audio SamplesĪll of the audio samples use Parallel WaveGAN (PWG) as vocoder. ![]() Experimental results show that DenoiSpeech can generate clean speech in complicated noisy situations. We carefully design a noise condition module that is jointly trained with the TTS model and can capture the fine-grained frame-level noise information. In this paper, we develop DenoiSpeech, a TTS system that can synthesize clean speech for a speaker with only noisy speech. However, they usually cannot handle noisy speech with complicated noise such as those with high variations along time. Previous works usually address the challenge in two ways: 1) training the TTS model using the speech denoised with an enhancement model 2) taking a single noise embedding as input when training with noisy speech. In many scenarios, only noisy speech of a target speaker is available, which presents challenges for TTS model training for this speaker. Chen Zhang (Zhejiang University) Yi Ren (Zhejiang University) Xu Tan (Microsoft Research) Jinglin Liu (Zhejiang University) Kejun Zhang (Zhejiang University) Tao Qin (Microsoft Research) Sheng Zhao (Microsoft STC Asia) Tie-Yan Liu (Microsoft Research) neural-based text to speech (TTS) models can synthesize natural and intelligible voice, they usually require high-quality speech data, which is costly to collect.DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |