The agent layer Text To Speech (TTS) can be played by sending a message with a Speech (SP) parameter to Softdial CallGem™, containing the text to be played by the Text To Speech engine in Softdial Telephony Gateway™. This request is passed to Softdial Telephony Gateway™ in the CTI layer version of the Play Message [PM] message.
The success or failure of the request to play the TTS message will be reported back from the telephony layer to Softdial CallGem™ in the Play Status [PS] message and this will be reflected back to the agent layer with the agent version of the Play Status [PS] message.
A successful TTS message playback will result in Resource Status (RS) of 0 being sent first to say that the message was played successfully. This will then be followed by Resource Status (RS) of 1 to signify that the message has now finished. The start or termination acknowledgement may occur unsolicited.
If the Play Message [PM] request was unsuccessful, only one Play Status [PS] message will be sent with Resource Status (RS) of 2.
Text to Speech is implemented in Softdial Telephony Gateway™. However, the following details are included in this section for convenience.
The TTS engine uses the .Net System.Speech.AudioFormat object which takes the following parameters:-
encodingFormat | ALaw / ULaw | this comes from "companding" setting in the STG config file |
---|---|---|
samplesPerSecond | 8000 | hardcoded |
bitsPerSample | 8 | hardcoded |
channelCount | 1 | hardcoded |
averageBytesPerSecond | 8000 | hardcoded |
blockAlign | 1 | hardcoded |
The above Text To Speech conversion object is set up on system startup. When any text to be converted is received by Softdial Telephony Gateway™, this object is then used to convert that text to the desired speech. TTS conversion is expensive in terms of processor time so Softdial Telephony Gateway™ caches TTS objects that are reused to minimise the processor load.
For this reason, we strongly recommend that TTS strings that include both fixed and dynamic components are split into separate fixed components and dynamic components. By doing this the fixed (reused) components will be cached and only the smaller dynamic component will need to be converted.
For example, consider the following TTS message that consists of a greeting / instruction message that includes a client name extracted from a database (e.g. in response to an earlier authentication message):
"Thank you for calling Acme Credit [Mr Jones]. Please select an option from the following menu."
The TTS strings making up this message should be arranged as below:
"Thank you for calling Acme Credit"
"Mr Jones" (from db lookup)
"Please select an option from the following menu."