Chunk-based prosody generation approach in English TTS
WANG Lia, WANG Yongshengb(a. CAD research center; b. German College, Tongji Univ., Shanghai 200092, China)
To predict the prosodic boundary for prosody generation module in English Text-to-Speech (TTS) system, the prosodic structure of speech synthesis is generated by tagging in different lengths behind intermediate phrase tagging, intonation phrase and part of speech, which is similar to one of the human voice. After segmenting the intermediate phrase by chunk parsing, a corpus for intonation phrase prediction is created, and then prosodic boundaries are predicated by using transformation-based learning to learn rules of intonation phrase prediction. The tagging accuracy for intonation phrase is 81.32% according to the experiment, which can be further improved by adding the constraint rules of intonation phrase syllable count and punctuation into the learning.