Skip to main content

Overview


Service Code: sentence-splitter

Sentence Splitter (coming soon)

  • Sentence spitting (also called tokenization, segmentation, or boundary disambiguation) is the process of detecting sentence​​ boundaries, i.e., where the sentence begins and ends.​​ It is considered as a difficult task in NLP because of the ambiguous nature of the punctuation marks. For example, a period does not always show the end of a sentence. It may be a decimal point or represent any abbreviation​​ or email.​​ Moreover,​​ there are many other languages (especially Chinese, Japanese and Urdu) which have an ambiguity in sentence endings, i.e., the sentence sometimes have​​ no definite​​ boundary.​​ This process plays a vital role in text classification, chatbots, language translation, sentimental analysis and many more.​​

To overcome this issue, we have built the NeuralSpace Sentence Splitter, which can be used to tokenize​ words​ and​​ sentences, similar to popular Python library NLTK but for many more languages.

  • 👉 APIs (coming soon)

Features

  • State-of-the-art Models: Use our pre-trained state-of-the-art models through APIs and integrate them in any application.

  • Multi-language Support: Go global with 80+ supported languages.