Skip to main content

Transcribe File

Run in Postman

In this article you will learn how to convert an audio file to text using our APIs.

Prerequisites

Make sure to follow Getting Started to log in and install the NeuralSpace STT Service. If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN before moving ahead. Refer to the Supported Languages page for language codes and supported domains.

Transcribe

STEP 1 - Upload a file

To transcribe a file you will have to upload your file first and get the respective file ID. Follow the steps in this page to do so.

STEP 2 - Create a transcription job

Call this API to start a transcription process

curl --location --request POST 'https://platform.neuralspace.ai/api/transcription/v1/file/transcribe' \
--header 'Authorization;' \
--header 'Content-Type: application/json' \
--data-raw '{
"fileId": "<YOUR-FILE-ID>",
"language": "<YOUR-LANGUAGE-CODE>",
"domain": "<YOUR-DOMAIN>"
}'

This will return a transcribeId, which you can use to check the status of your transcription job.

Here is a sample response:

{
"success": true,
"message": "File found successfully. Transcription will be prepared shortly",
"data": {
"transcribeId": "<YOUR-TRANSCRIBE-ID>"
}
}

Step 3 - Fetch Transcription

curl --location --request GET 'https://platform.neuralspace.ai/api/transcription/v1/single/transcription?transcribeId=<YOUR-TRANSCRIBE-ID>' \
--header 'Authorization: <ACCESS-TOKEN>'

This will return an object like this, in which you can see the transcription status along with some metadata related to your transcription job.

{
"success": true,
"message": "Data fetched succssfully",
"data": {
"fileId": "<YOUR-FILE-ID>",
"language": "<LANGUAGE-CODE>",
"transcribeId": "<YOUR-TRANSCRIBE-ID>",
"fileName": "...",
"transcriptionStatus": "Completed",
"transcriptionProgress": [
"Queued",
"Loading Model",
"Model Loaded",
"Preparing File",
"Transcribing",
"Transcribed",
"Uploading Transcript",
"Transcript Uploaded",
"Updating Result",
"Result Updated",
"Completed"
],
"apikey": "<YOUR-APIKEY>",
"appType": "transcription",
"duration": 767,
"fileSize": 0,
"domain": "<YOUR-DOMAIN>",
"suburl": "...",
"message": "Transcription completed successfully",
"createAt": 1667205327349,
"transcribingTime": 390.335329,
"timestamp": [
{
"start": 764.8,
"end": 765,
"conf": 0.89118,
"word": "time."
},
{
"start": 764.8,
"end": 765,
"conf": 0.89118,
"word": "time."
},
],
"transcripts": "......",
"publicUrl": "https://platformlargefilestore.blob.core.windows.net/common/uploads/<YOUR-FILE-ID>"
}
}

Transcription Status Codes

Since files are transcribed asynchronously, every step is logged in the transcriptionProgress attribute. All the status messages have been shown in the example above but in case a transcription job fails a Failed status is set instead for Completed. These are all the status messages:

  • Queued
  • Loading Model
  • Model Loaded
  • Preparing File
  • Transcribing
  • Transcribed
  • Uploading Transcript
  • Transcript Uploaded
  • Updating Result
  • Result Updated
  • Completed

What are timestamps?

When transcriptionStatus becomes Completed you will see a list of objects in the timestamp field. These objects represent chunks of text and their respective start and end time in the audio file. conf is the confidence score of the model for predicting this text chunk.

You can use this information on your user interface or for any further post-processing.

Get Segments For Transcription

If you have used the same audio file for speaker identification that you have used for transcription, you can merge the results by calling the following API. This API takes speaker segments from the speaker identification API and transcripts from the transcription APi and merges them for you to get what each speaker has said.

E.g., if in an audio file there are 3 speakers then speaker identification will identify which parts of the audio file belongs to these three speakers. On the other hand the transcription API will extract what was spoken in the audio file in text format. When you combine them together you get what and when each of the three speakers said in the audio file.

curl --location --request POST 'https://platform.neuralspace.ai/api/transcription/v1/get_segments' \
--header 'Authorization: <YOUR-API-KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"transcribeId": "<YOUR-TRANSCRIBE-ID>",
"speakerIdentificationTaskId": "<YOUR-SPEAKER-IDENTIFICATION-TASK-ID>"
}'

In the response you get a segmentsFileId, which contains all the speaker segments, text, start and end timestamps.

{
"success": true,
"message": "Data fetched succssfully",
"data": {
"segmentsFileId": "<YOUR-SEGMENTS-FILE-ID>"
}
}

For this API to give a success response both transcribeId and speakerIdentificationTaskId have to be valid and in Completed status.

You can fetch the segments file using the segmentsFileId by following the instructions here.

Using the UI

You can also use our Platform UI to transcribe your files.
Have you tried it out yet? Head over to our step by step guide on quickly setting up and using the UI to transcribe your files! 🙌

What's Next?

  • Check out the language support page for transcribing in differnet languages or for different domains.