Skip to main content


Service Code: stt


Speech To Text (STT)

Speech is the most natural choice of communication for humans. However, current AI models cannot semantically understand speech nearly as well as text. What if you could have a bridge that lets you use speech as an interface while also interpreting the meaning behind, to react meaningfully? The Speech To Text (STT) Service is that bridge for you. It is built with state-of-the-art AI models for providing accurate transcriptions of any kind of speech, may it be in conversations or other forms. Once we have transcriptions, it is basically text that can then be meaningfully interpreted by the advanced and battle-tested text services such as Language Understanding.


  • State-of-the-art Models: Use our pre-trained state-of-the-art models through APIs and integrate them in any application.

  • Domain Specialization: Use our state-of-the-art models which are specialized in pre-defined domains such as finance or medical. We also have specialized models for different accents. For example our medical domain specialized English STT model can accurately transcribe medical terms, and our Indian domain specialized English STT model can accurately transcribe English spoken in the Indian accent.

    Find out more about available domains in the Language Support page.

  • Low Resource Language Support: Use our STT to support a wide range of languages from all over the world. Even those that are not widely represented in the digital world.


Captioning for Videos or Meetings

You can use our APIs and CLIs for generating transcriptions for your videos or meetings very easily.


Using our Speech To Text, you can extend your chatbot interface to voice while re-using the same NLU pipeline. With our Speech To Text APIs you also get language support for various low-resource languages along with standard high-resource languages.

Automatic Transcription

With our STT models you can automatically acquire transcription of long speech audios within a few hours, that could otherwise take days to manually transcribe.

Try Out