Text to Speech
Prerequisites
Make sure to follow Getting Started to log.
If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN
before moving ahead.
Refer to the Supported Languages page for language codes and supported AI voices.
Generate Speech from Text
Single Text to Speech API
Using this API you can generate a speech audio file for a single text.
curl --location --request POST 'https://platform.neuralspace.ai/api/tts/v1/single/synthesize' \
--header 'Authorization: <ACCESS-TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"data":{
"text": "your text goes here",
"gender": "male",
"language": "en-IN",
"speakerId": "aa3ff88b",
"requiredTime": 2,
"stretchMode": "none",
"meta":{
"foo": "bar"
}
}
}'
Field | Required | Description |
---|---|---|
text | true | The text you want to generate speech for |
gender | true | Supported values are male and female . This represents the gender of the voice to be used for speech generation. |
language | true | Long language code for generating speech. |
speakerId | false | This is an optional parameter and if specified, then this particular speaker's voice is used for generating the speech. Refer to this page for a list of supported languages and voices. |
requiredTime | false | This is the generated speech duration. This is an optional field and is in seconds. E.g., 1 means 1 second. This field works with the following field stretchMode , which is described below. |
stretchMode | false | This is a categorical field and takes these values: none , both , squeeze , stretch . none does nothing; stretch expands the generated speech to the given requiredTime if the generated speech duration is less than the give requiredTime ; squeeze shrinks the generated speech to the given requiredTime if it's duration is greater than the given requiredTime ; both either stretches or squeezes the audio based on the given requiredTime . Note that the pitch of the audio does not change and the audio is not trimmed. |
meta | false | Any object can be set over here. This is to assign any information related to the text. E.g., a unique related to the text, metadata related to the text. This is stored along with the results. |
This API will return the following:
{
"success": true,
"message": "Successfully queued TTS batch.",
"data": {
"message": "Successfully queued TTS batch.",
"task_id": "<YOUR-TASK-ID>",
"file_id_text_mapping": {
"<SOME-FILE-ID>": "your text goes here"
}
}
}
The taskId
can be used to fetch the status of this task later, and you can see that a fileId
is assotiated with every text. This can be used for fetching the audio file later as well.
Batch Text to Speech API
Call this API to generate speech for multiple text chunks together
curl --location --request POST 'https://platform.neuralspace.ai/api/tts/v1/batch/synthesize' \
--header 'Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NjUwNjIxODQ0LCJkYXRhIjp7ImVtYWlsIjoibWlrYWVsQG5ldXJhbHNwYWNlLmFpIiwicm9sZSI6InByb3ZpZGVyIiwiYXBpa2V5IjoiM2I3NDUxMjItOGVmYi00YzQxLThjMzUtMDg5MGE0MTE5NDk4IiwicmVmZXJlbmNlS2V5IjoiM2I3NDUxMjItOGVmYi00YzQxLThjMzUtMDg5MGE0MTE5NDk4IiwicGxhblR5cGUiOiJkZWZhdWx0IiwiY291bnRyeSI6IkluZGlhIn0sImlhdCI6MTY2NTA2MTgyNH0.3njbqYs_jXKIMnaFQs-u0BZFM7hTUGuE7kg9-O20nRo' \
--header 'Content-Type: application/json' \
--data-raw '{
"data": [
{
"text": "your text goes here",
"gender": "male",
"language": "en-IN",
"speakerId": "aa3ff88b",
"requiredTime": 2,
"stretchMode": "none",
"meta":{
"foo": "bar"
}
},
{
"text": "your text goes here",
"gender": "male",
"language": "en-IN",
"speakerId": "aa3ff88b",
"requiredTime": 2,
"stretchMode": "none",
"meta":{
"foo": "bar"
}
}
]
}'
This like the previous API returns the following:
{
"success": true,
"message": "Successfully queued TTS batch.",
"data": {
"message": "Successfully queued TTS batch.",
"task_id": "<YOUR-TASK-ID>",
"file_id_text_mapping": {
"<SOME-FILE-ID>": "your text goes here",
"<SOME-OTHER-FILE-ID>": "your text goes here"
}
}
}
Since two texts were passed in the request you will get two file ids.
Get Text to Speech Job Status
curl --location --request GET 'https://platform.neuralspace.ai/api/tts/v1/task/status?taskId=9fd22c7f' \
--header 'Authorization: ACCESS_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
"taskId": "<YOUR-TASK-ID>"
}'
The response of this API will look something like this:
{
"success": true,
"message": "Task fetched successfully",
"data": [
{
"apikey": "...",
"taskId": "<YOUR-TASK-ID>",
"jobStatus": "Completed",
"jobProgress": [
"Queued",
"TTS Job Started",
"Synthesis Started",
"Synthesis Ended",
"Saving Synthesized Audio",
"Saved Synthesized Audio",
"TTS Job Ended",
"Completed"
],
"filePath": "uploads/7e5ac6d8",
"message": "Synthesis Job ended successfully",
"audioSaved": true,
"language": "en-IN",
"gender": "male",
"speakerId": "aa3ff88b",
"sampleRate": 24000,
"fileId": "<SOME-FILE-ID>",
"requiredTime": 2,
"stretchMode": "none",
"meta": {
"foo": "bar"
}
}
]
}
Note that in the data
field a list of objects are returned. If you have used the single text to speech APi then only one object is returned, and if you have used the batch API then multiple objects are returned.
Each of these objects represent a corresponding text that you passed in the request.
The audioSaved
flag tells you whether speech has been generated or not and when it has you can use the fileId
to fetch the audio file.
Along with the audioSaved
flag the jobStatus
field when set to Completed
indicates that the text to speech job is successfully over and you can fetch the audio file.
If the job fails for some reason jobStatus
becomes Failed
and you get a message in the message
field.
Note that the meta
field contains exactly what you had sent in the request.
Text to Speech Status Codes
Since Text to Speech is an asynchronous API, every step is logged in the jobProgress
attribute.
All the status messages have been shown in the example above but in case a speaker identification job fails a Failed
status is set instead for Completed.
These are all the status messages:
- Queued
- TTS Job Started
- Synthesis Started
- Synthesis Ended
- Saving Synthesized Audio
- Saved Synthesized Audio
- TTS Job Ended
- Completed
Fetching audio files for your text
You can fetch the generated audio files by following the instructions here.