Author Topic: SSML round-up  (Read 31071 times)

Mike308

  • Newbie
  • *
  • Posts: 48
SSML round-up
« on: December 27, 2017, 06:01:39 PM »
So I have been hacking around with using SSML and an IVONA voice (in this case, Emma) and thought I would share my findings, consolidate some stuff that has appeared here prior, and see if anybody can add to all this.

HEADER
So if you are planning to use SSML, this header needs to be at the top of your Text-To-Speech textblock:

<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
         xml:lang="en-US">
[option 1;option2;option 3]
</speak>

NOTE that the next-to-last line in this block is a placeholder for what you want to say. This can utilize some of the VA abilities, like randomly selecting between multiple choices if they are separated by a semi-colon, as shown in this example.

NOTE the third-from-bottom line of that block reads   xml:lang="en-US">   If, like me, you are using a non-US voice, you need to edit the "en-US" to match your language. In my case it reads "en-UK"  This is a small detail but missing it will bork the whole block.

======

PROSODY, or 'say it like you feel it'
So there are a number of SSML tags but thus far I have found very few that actually work. Thankfully, "prosody" is one of the magic ones. Prosody has a number of sub-parts, like volume, rate and pitch. I cannot detect any meaningful change from volume or pitch, but rate is a winner. I found it effective to express rate as such:

[This is a sentence <prosody rate="x-slow">with a lot</prosody> of emphasis. This is one without emphasis.]

This works pretty well to demonstrate the difference.  The range of rate is expressed as x-slow, slow, medium, fast and x-fast. The faster rates seem to very quickly accelerate the speech into the land of Alvin and the Chimpunks, and medium sounds pretty much like the default rate.

======

BREAK, or 'pause for effect'
I use break to insert a pause defined in seconds, such as <break time="1.5s"/> That allows me to insert a very precise pause, very precisely in a block of text.

======

I hope this is of some help. I would love to know if anybody has found success with any other SSML tags. Cheers and all the best in the New Year!
Mike


Pfeil

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4780
  • RTFM
Re: SSML round-up
« Reply #1 on: December 28, 2017, 05:28:13 AM »
I'd like to note that tokens will work with SSML, as they are parsed before the input is sent to the TTS engine.

E.G.
Code: [Select]
Set decimal [~randomPause] value as random from 0.1 to 1.5
Set Text [TTSLang] to 'en-US'
Say, '<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
         xml:lang="{TXT:TTSLang}">
This is a sentence <prosody rate="{TXTRANDOM:slow;x-slow;fast}">with a lot</prosody> of emphasis. This is one<break time="{DEC:~randomPause}s"/> without emphasis.
</speak>'

Mike308

  • Newbie
  • *
  • Posts: 48
Re: SSML round-up
« Reply #2 on: December 29, 2017, 06:53:04 PM »
So do I read then that you are setting the token that basically randomizes a chance for the SSML to go with a different prosody rate, creating more variance?  If so, thats very slick!   8)


Pfeil

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4780
  • RTFM
Re: SSML round-up
« Reply #3 on: December 30, 2017, 09:27:51 AM »
That's one option, yes.

Because tokens always return text, you can replace any part of the SSML markup.


If you're adding a lot of phrase options to your TTS actions, you can also put the entire header into a token so it takes up much less space(you could set the header and other static options in another command):
Code: [Select]
Set decimal [~randomPause] value as random from 0, to 1,5
Set Text [~TTSLang] to 'en-US'
Set Text [SSMLHeader] to '<?xml version="1.0"?><speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:sc...
Set Text [SSMLFooter] to '</speak>'
Say, '{TXT:SSMLHeader}
This is a sentence <prosody rate="{TXTRANDOM:slow' or 'x-slow' or 'fast}">with a lot</prosody> of emphasis. This is one<break time="{DEC:~randomPause}s"/> without emphasis.
{TXT:SSMLFooter}'
Where "SSMLHeader" contains
Code: [Select]
<?xml version="1.0"?><speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="{TXT:~TTSLang}">Which is just the header with carriage returns removed(which can also contain tokens, still).

I put the footer into a token as well for consistency, but you can insert it manually instead if you prefer(as it doesn't save any space).

Mike308

  • Newbie
  • *
  • Posts: 48
Re: SSML round-up
« Reply #4 on: December 30, 2017, 01:12:33 PM »
So I am sitting here feeling very "Inception" at the moment. Code... embedded within code... that contains another level of code.   :P

Still, I love the efficiency of that approach. In my normal view of a Say commend in edit mode, the SSML header eats up most all of the visible window, so I always have to scroll down. Using the token approach will clean up the clutter! As a bonus, if one were to swap from a UK to US voice (or any other combo) you would only need edit the header one time where the token is declared, and not in dozens or hundreds of individual blocks.

Egil Sandfeld

  • Newbie
  • *
  • Posts: 25
Re: SSML round-up
« Reply #5 on: October 09, 2020, 07:35:59 AM »
Sorry for necroposting, but is there a built-in way to get the selected TTS engine language?
I'm not talking about {STATE_SPEECHCULTURE} as that's for speaking to VA, but rather the culture of the TTS engine itself.

With that it would be trivial to add it to the above a la:
Set Text [~TTSLang] to {STATE_TTSCULTURE}

Gary

  • Administrator
  • Hero Member
  • *****
  • Posts: 2831
Re: SSML round-up
« Reply #6 on: October 09, 2020, 08:37:18 AM »
I can look into that - I don't have an ETA, as I'm preparing for a full release soon.  I'll see if I can squeeze it in, though.