Adjusting Speech

SSML stands for Speech Synthesis Mark Up Language. A markup language is a computer language that is used to annotate text documents to describe the structure, presentation, and semantics of a document. SSML elements are used to adjust the voice, style, prosody, volume, and other characteristics of a script.

Document Structure

An SSML document is constructed using SSML elements, also known as tags. These elements enable customization of various aspects of speech such as tone, style, pitch, prosody, volume, and others.

Here's an example of SSML in action, to demonstrate the basic structure and syntax:

SSML example

Supported Voices

Speech controls are currently limited to voices indicated by the circular Pipio logo. We will be adding support to additional voices in the near future.

pipio logo ssml

<break>

Add a break/pause

Attribute

Description

Required

time

The absolute duration of a pause in seconds (such as 2s) or milliseconds (such as 500ms). Valid values range from 0 to 5000 milliseconds. If you set a value greater than the supported maximum, the service will use 5000ms. If the time attribute is set, the strength attribute is ignored.

No

Break Examples

break example 1break example 2

<prosody>

Customize the pitch and speaking rate of text contained by the element. Currently the rate and pitch attributes are supported.

<pitch>

This is the baseline pitch for the contained text.

Attribute

Description

hertz

A number followed by Hz which represents the adjustment in pitch by hertz. For example, 20Hz

percentage

A percentage, e.g. 10%, +15.2%, or -8%. The "-" and "+" signs are optional. However, if you're looking to lower the pitch, the "-" sign is required.

Pitch Examples

pitch example 1pitch example 2

<rate>

The change in the speaking rate for the contained text

Attribute

Description

percentage

A non-negative percentage, e.g. 50% or +200%.
A value of 100% means no change in speaking rate
A value of 200% means a speaking rate twice the default rate
A value of 50% means a speaking rate of half the default rate.
The default rate for a voice depends on the language and dialect and on the personality of the voice.

Rate Example

rate example 1