SSML

SpeechKit converts all content, whether imported automatically or manually, into Speech-Synthesis-Markup-Language (SSML).

For example plain text, like this:

Cyril Ramaphosa is a South African politician and the fifth and current President of South Africa.

Is converted into SSML, like this:

<speak>
<p>
<s>
Cyril <phoneme alphabet="ipa" ph="ˌræməˈpoʊsə">Ramaphosa</phoneme> is a South African politician and the fifth and current President of South Africa.
</s>
</p>
</speak>

Text-to-Speech APIs from Amazon, Google, and Microsoft support SSML tags but they do not convert content into SSML. SpeechKit automates this SSML tagging process.

SpeechKit uses the <p> and <s> SSML tags to denote paragraphs and sentences in text. The <break> SSML tag is also used to insert breaks of varying duration between different elements of the text. Combined, these tags ensure appropriate pauses are inserted in the audio.

SpeechKit uses the <phoneme> SSML tag and our library of phonemes to improve the pronunciation of words e.g, people, businesses, and locations.

SpeechKit uses the <sub> SSML tag and our library of acronyms to expand them into their full spoken form.

SpeechKit uses the <say-as> SSML tag and our library of acronyms to ensure that acronyms that should be spelled out are.

SpeechKit uses the <lang> SSML tag (where supported) and Language Detection APIs to ensure correct pronunciation fo foreign words.

SpeechKit uses the <w> SSML tag (where supported) and POS tagging to specify the words part of speech or alternate meaning e.g, read vs read.

Demo

We've created a visual demo to illustrate to you how SpeechKit uses SSML and other natural language processing techniques to prepare text content for speech synthesis.