Skip to main content

Training Data

Run in Postman

To train an NLU model that can reconize the intents and entities for your unique use-case, you need training data such that the AI model can learn these unique intents and entities. In this article we will create a project and add training data to it. Training data consists of a piece of text, with its corresponding intents and entities. Using these examples our AutoNLP learns to predict the intents and entities of a text that the model has never seen before.

Prerequisites

  • Make sure to follow Getting Started to login and install the Language Understanding Service. If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN before moving ahead
  • Create a Project:
    • Make sure to create a project and have the project id in a variable called PROJECT_ID
    • Make sure to have the language for which you added training examples in a variable called LANGUAGE

Upload a Single Training Example

Training data (or training examples) are what our AutoNLP learns from to predict intents and entities of unseen text. The training example below is for an intent called set_alarm. You can see that the text for this intent is set an alarm for 25 minutes ., which indicates that the intent of the user is to set an alarm at a given time. Since time is an entity, it has been added to the entities argument. For entities you have to specify the start and end character index of the text you want to extract. In this case, for 25 minutes is what we are tagging as entity called datetime and the start and end index are 13 and 27 respectively.

note

Multiple entities can be added to the same training example.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"example\": {
\"text\": \"set an alarm for 25 minutes .\",
\"intent\": \"set_alarm\",
\"type\": \"train\",
\"entities\": [
{
\"value\": \"for 25 minutes\",
\"entity\": \"datetime\",
\"start\": 13,
\"end\": 27
}
]
}
}"

This will return a unique exampleId. Store this value in a variable.

EXAMPLE_ID="YOUR-EXAMPLE-ID"
AttributeRequiredTypeLimitsDescription
projectIdtruestr-Set to the ID of the project you want this example to be uploaded to.
languagetruestrLanguages in projectSet to the language in the project you want this example to correspond to.
exampletrueobj-It is a json object which attributes of your training example. The following are its attributes.
example.texttruestr1000 charactersA piece of text that you want to predict the intent of. AutoNLP is trained to learn patterns from this text.
example.intenttruestr100 charactersThe intent behind the text. In this case it is setting the alarm.
example.typefalsestrtrain or testDefaults to train. AutoNLP takes two kinds of examples, train, and test. train examples are used for training AutoNLP and test example are used for reporting the model's performance.
example.entitiesfalselist-Information from the text that you want AutoNLP to learn to extract are called Entities. More details here.
example.entities.valuetruestr-The substring that you want to extract from the text. In this case it is for 25 minutes. This is the value that AutoNLP will learn to predict while training.
example.entities.entitytruestr30 charactersName of the entity that value corresponds to. In this example it is datetime.
example.entities.starttrueintPositive integers onlyThe character index where this entity value starts from in the given text. In this example it is 13. Note that we follow zero indexing.
example.entities.endtrueintPositive integers onlyThe character index where the entity value ends in the given text. In this example it is 27.
example.entities.entityTypefalsetextOne of trainable, pre-trained, lookup, regexDefaults to trainable, which means AutoNLP learns to predict this entity. Apart from that we support three other entity types. More details here . In this case it is pre-trained, which means AutoNLP uses an off-the-shelf entity extractor for datetime.

Upload a Single Test Example

Exactly like training examples, test examples can be added to a project by just changing the type attribute in the payload. Test data (examples) are not mandatory, but they are useful when you want to track how your models are performing using objective metrics.

In test examples you provide a text, its corresponding intents and the entities in it. Additionally you provide an attribute called type and set its value to test.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"example\": {
\"text\": \"remind me to get gas in car today .\",
\"intent\": \"set_alarm\",
\"type\": \"test\",
\"entities\": [
{
\"value\": \"today\",
\"entity\": \"datetime\",
\"start\": 28,
\"end\": 33
}
]
}
}"

Why do you need test examples?

Our AutoNLP never uses these example for training. Once it has trained successfully we feed the test examples through the trained models and generate evaluations metrics which you can use to track progress. Read more about evaluation metrics here.

There are various advantages of adding test examples to your project:

  • Get intent classification accuracy and entity extraction F1 (strict/partial) scores every time you train a model.
  • Adding test examples can help you track progress. It is a good practice to add test examples before you even add train examples. You can call this approach "Test Driven Modelling".

Minimum Data Requirements to Train a Model

Training Data

To start training you need the following

  • Two intents
  • Ten examples per intent

Our AutoNLP is extremely data-efficient, i.e., with 40-50 examples per intent you will get very good results.

Example Diversity

It is always better if you train AutoNLP with a diverse set of examples rather than repetitive or very similar examples.

What Happens When You Upload an Example?

Once you add an example (train or test) to a project, we prepare it for training by passing it through a processing pipeline. Hence, every example has an attribute called prepared. In some cases an example might not get prepared. This is a rare event but it can happen. In that case you can re-prepare these examples using the following API.

Prepare unprepared examples
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\":\"${EXAMPLE_ID}\"
}"

Fetch Examples

Single Example

You can fetch a single example and all its attributes from using this API.
curl --location --request GET "https://platform.neuralspace.ai/api/nlu/v1/example?exampleId=${EXAMPLE_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json'

Multiple Examples

Use this API to list your examples in a project. You can filter examples by search keyword, language, prepared status (true or false), and type of example (train or test). Here we are filtering by keyword Companion and language en, which is English.

You can fetch multiple examples using this pagination API.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/list/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"filter\": \"\",
\"filter\": {
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"prepared\": \"true\",
\"type\": \"train\"
},
\"pageNumber\":1,
\"pageSize\": 20
}"

This is a pagination API, hence, pageSize determines how many projects to retrieve and pageNumber determines which page to fetch.

Update Example

Let's change the text of the training example we inserted before. We will also modify the entity accordingly.

Update any attribute of an example using this API
curl --location --request PUT 'https://platform.neuralspace.ai/api/nlu/v1/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: text/plain' \
--data-raw "{
\"exampleId\":\"${EXAMPLE_ID}\",
\"text\": \"wake me up in 25 minutes .\",
\"intent\": \"set_alarm\",
\"type\": \"train\",
\"entities\": [
{
\"value\": \"25 minutes\",
\"entity\": \"datetime\",
\"start\": 14,
\"end\": 24
}
]
}"

When you update an example, it gets prepared again.

Prepare Unprepared Examples

In case an example is not prepared, you can call this API to prepare it.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\": \"${EXAMPLE_ID}\"
}"

Cleanup

Delete Single Example

Delete example by exampleId
curl --location --request DELETE 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\": \"${EXAMPLE_ID}\"
}"

Delete Project

Delete a project using its unique projectId
curl --location --request DELETE "https://platform.neuralspace.ai/api/nlu/v1/single/project?projectId=${PROJECT_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}"