Transcribe File
In this article you will learn how to convert an audio file to text using our APIs.
Prerequisites
Make sure to follow Getting Started to log in and install the NeuralSpace STT Service.
If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN
before moving ahead.
Refer to the Supported Languages page for language codes and supported domains.
Transcribe
STEP 1 - Upload a file
To transcribe a file you will have to upload your file first and get the respective file ID. Follow the steps in this page to do so.
STEP 2 - Create a transcription job
Call this API to start a transcription process
curl --location --request POST 'https://platform.neuralspace.ai/api/transcription/v1/file/transcribe' \
--header 'Authorization;' \
--header 'Content-Type: application/json' \
--data-raw '{
"fileId": "<YOUR-FILE-ID>",
"language": "<YOUR-LANGUAGE-CODE>",
"domain": "<YOUR-DOMAIN>"
}'
This will return a transcribeId
, which you can use to check the status of your transcription job.
Here is a sample response:
{
"success": true,
"message": "File found successfully. Transcription will be prepared shortly",
"data": {
"transcribeId": "<YOUR-TRANSCRIBE-ID>"
}
}
Step 3 - Fetch Transcription
curl --location --request GET 'https://platform.neuralspace.ai/api/transcription/v1/single/transcription?transcribeId=<YOUR-TRANSCRIBE-ID>' \
--header 'Authorization: <ACCESS-TOKEN>'
This will return an object like this, in which you can see the transcription status along with some metadata related to your transcription job.
{
"success": true,
"message": "Data fetched succssfully",
"data": {
"fileId": "<YOUR-FILE-ID>",
"language": "<LANGUAGE-CODE>",
"transcribeId": "<YOUR-TRANSCRIBE-ID>",
"fileName": "...",
"transcriptionStatus": "Completed",
"transcriptionProgress": [
"Queued",
"Loading Model",
"Model Loaded",
"Preparing File",
"Transcribing",
"Transcribed",
"Uploading Transcript",
"Transcript Uploaded",
"Updating Result",
"Result Updated",
"Completed"
],
"apikey": "<YOUR-APIKEY>",
"appType": "transcription",
"duration": 767,
"fileSize": 0,
"domain": "<YOUR-DOMAIN>",
"suburl": "...",
"message": "Transcription completed successfully",
"createAt": 1667205327349,
"transcribingTime": 390.335329,
"timestamp": [
{
"start": 764.8,
"end": 765,
"conf": 0.89118,
"word": "time."
},
{
"start": 764.8,
"end": 765,
"conf": 0.89118,
"word": "time."
},
],
"transcripts": "......",
"publicUrl": "https://platformlargefilestore.blob.core.windows.net/common/uploads/<YOUR-FILE-ID>"
}
}
Transcription Status Codes
Since files are transcribed asynchronously, every step is logged in the transcriptionProgress
attribute.
All the status messages have been shown in the example above but in case a transcription job fails a Failed
status is set instead for Completed.
These are all the status messages:
- Queued
- Loading Model
- Model Loaded
- Preparing File
- Transcribing
- Transcribed
- Uploading Transcript
- Transcript Uploaded
- Updating Result
- Result Updated
- Completed
What are timestamps?
When transcriptionStatus
becomes Completed
you will see a list of objects in the timestamp
field.
These objects represent chunks of text and their respective start
and end
time in the audio file.
conf
is the confidence score of the model for predicting this text chunk.
You can use this information on your user interface or for any further post-processing.
Get Segments For Transcription
If you have used the same audio file for speaker identification that you have used for transcription, you can merge the results by calling the following API. This API takes speaker segments from the speaker identification API and transcripts from the transcription APi and merges them for you to get what each speaker has said.
E.g., if in an audio file there are 3 speakers then speaker identification will identify which parts of the audio file belongs to these three speakers. On the other hand the transcription API will extract what was spoken in the audio file in text format. When you combine them together you get what and when each of the three speakers said in the audio file.
curl --location --request POST 'https://platform.neuralspace.ai/api/transcription/v1/get_segments' \
--header 'Authorization: <YOUR-API-KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"transcribeId": "<YOUR-TRANSCRIBE-ID>",
"speakerIdentificationTaskId": "<YOUR-SPEAKER-IDENTIFICATION-TASK-ID>"
}'
In the response you get a segmentsFileId
, which contains all the speaker segments, text, start and end timestamps.
{
"success": true,
"message": "Data fetched succssfully",
"data": {
"segmentsFileId": "<YOUR-SEGMENTS-FILE-ID>"
}
}
For this API to give a success response both transcribeId
and speakerIdentificationTaskId
have to be valid and in Completed
status.
You can fetch the segments file using the segmentsFileId
by following the instructions here.
Using the UI
You can also use our Platform UI to transcribe your files.
Have you tried it out yet? Head over to our step by step guide on quickly setting up and using the UI to transcribe your files! 🙌
What's Next?
- Check out the language support page for transcribing in differnet languages or for different domains.