Skip to main content

Text to Speech

Run in Postman

Prerequisites

Make sure to follow Getting Started to log. If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN before moving ahead. Refer to the Supported Languages page for language codes and supported AI voices.

Generate Speech from Text

Single Text to Speech API

Using this API you can generate a speech audio file for a single text.

curl --location --request POST 'https://platform.neuralspace.ai/api/tts/v1/single/synthesize' \
--header 'Authorization: <ACCESS-TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"data":{
"text": "your text goes here",
"gender": "male",
"language": "en-IN",
"speakerId": "aa3ff88b",
"requiredTime": 2,
"stretchMode": "none",
"meta":{
"foo": "bar"
}
}
}'
FieldRequiredDescription
texttrueThe text you want to generate speech for
gendertrueSupported values are male and female. This represents the gender of the voice to be used for speech generation.
languagetrueLong language code for generating speech.
speakerIdfalseThis is an optional parameter and if specified, then this particular speaker's voice is used for generating the speech. Refer to this page for a list of supported languages and voices.
requiredTimefalseThis is the generated speech duration. This is an optional field and is in seconds. E.g., 1 means 1 second. This field works with the following field stretchMode, which is described below.
stretchModefalseThis is a categorical field and takes these values: none, both, squeeze, stretch. none does nothing; stretch expands the generated speech to the given requiredTime if the generated speech duration is less than the give requiredTime; squeeze shrinks the generated speech to the given requiredTime if it's duration is greater than the given requiredTime; both either stretches or squeezes the audio based on the given requiredTime. Note that the pitch of the audio does not change and the audio is not trimmed.
metafalseAny object can be set over here. This is to assign any information related to the text. E.g., a unique related to the text, metadata related to the text. This is stored along with the results.

This API will return the following:

{
"success": true,
"message": "Successfully queued TTS batch.",
"data": {
"message": "Successfully queued TTS batch.",
"task_id": "<YOUR-TASK-ID>",
"file_id_text_mapping": {
"<SOME-FILE-ID>": "your text goes here"
}
}
}

The taskId can be used to fetch the status of this task later, and you can see that a fileId is assotiated with every text. This can be used for fetching the audio file later as well.

Batch Text to Speech API

Call this API to generate speech for multiple text chunks together

curl --location --request POST 'https://platform.neuralspace.ai/api/tts/v1/batch/synthesize' \
--header 'Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NjUwNjIxODQ0LCJkYXRhIjp7ImVtYWlsIjoibWlrYWVsQG5ldXJhbHNwYWNlLmFpIiwicm9sZSI6InByb3ZpZGVyIiwiYXBpa2V5IjoiM2I3NDUxMjItOGVmYi00YzQxLThjMzUtMDg5MGE0MTE5NDk4IiwicmVmZXJlbmNlS2V5IjoiM2I3NDUxMjItOGVmYi00YzQxLThjMzUtMDg5MGE0MTE5NDk4IiwicGxhblR5cGUiOiJkZWZhdWx0IiwiY291bnRyeSI6IkluZGlhIn0sImlhdCI6MTY2NTA2MTgyNH0.3njbqYs_jXKIMnaFQs-u0BZFM7hTUGuE7kg9-O20nRo' \
--header 'Content-Type: application/json' \
--data-raw '{
"data": [
{
"text": "your text goes here",
"gender": "male",
"language": "en-IN",
"speakerId": "aa3ff88b",
"requiredTime": 2,
"stretchMode": "none",
"meta":{
"foo": "bar"
}
},
{
"text": "your text goes here",
"gender": "male",
"language": "en-IN",
"speakerId": "aa3ff88b",
"requiredTime": 2,
"stretchMode": "none",
"meta":{
"foo": "bar"
}
}
]
}'

This like the previous API returns the following:

{
"success": true,
"message": "Successfully queued TTS batch.",
"data": {
"message": "Successfully queued TTS batch.",
"task_id": "<YOUR-TASK-ID>",
"file_id_text_mapping": {
"<SOME-FILE-ID>": "your text goes here",
"<SOME-OTHER-FILE-ID>": "your text goes here"
}
}
}

Since two texts were passed in the request you will get two file ids.

Get Text to Speech Job Status

curl --location --request GET 'https://platform.neuralspace.ai/api/tts/v1/task/status?taskId=9fd22c7f' \
--header 'Authorization: ACCESS_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
"taskId": "<YOUR-TASK-ID>"
}'

The response of this API will look something like this:

{
"success": true,
"message": "Task fetched successfully",
"data": [
{
"apikey": "...",
"taskId": "<YOUR-TASK-ID>",
"jobStatus": "Completed",
"jobProgress": [
"Queued",
"TTS Job Started",
"Synthesis Started",
"Synthesis Ended",
"Saving Synthesized Audio",
"Saved Synthesized Audio",
"TTS Job Ended",
"Completed"
],
"filePath": "uploads/7e5ac6d8",
"message": "Synthesis Job ended successfully",
"audioSaved": true,
"language": "en-IN",
"gender": "male",
"speakerId": "aa3ff88b",
"sampleRate": 24000,
"fileId": "<SOME-FILE-ID>",
"requiredTime": 2,
"stretchMode": "none",
"meta": {
"foo": "bar"
}
}
]
}

Note that in the data field a list of objects are returned. If you have used the single text to speech APi then only one object is returned, and if you have used the batch API then multiple objects are returned. Each of these objects represent a corresponding text that you passed in the request.

The audioSaved flag tells you whether speech has been generated or not and when it has you can use the fileId to fetch the audio file. Along with the audioSaved flag the jobStatus field when set to Completed indicates that the text to speech job is successfully over and you can fetch the audio file.

If the job fails for some reason jobStatus becomes Failed and you get a message in the message field. Note that the meta field contains exactly what you had sent in the request.

Text to Speech Status Codes

Since Text to Speech is an asynchronous API, every step is logged in the jobProgress attribute. All the status messages have been shown in the example above but in case a speaker identification job fails a Failed status is set instead for Completed. These are all the status messages:

  • Queued
  • TTS Job Started
  • Synthesis Started
  • Synthesis Ended
  • Saving Synthesized Audio
  • Saved Synthesized Audio
  • TTS Job Ended
  • Completed

Fetching audio files for your text

You can fetch the generated audio files by following the instructions here.