Skip to main content

Training Data

Open In Collab Run in Postman

To train an NLU model you need data. In this article we will create a project and add training data to it. Training data consists of a piece of text, its corresponding intent and entities. Using these examples our AutoNLP learns to predict the intent and entities given a text that the model has never seen before.

Prerequisites

  • Make sure to follow Getting Started to login and install nlu app. If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN before moving ahead.
  • Create a Project:
    • Make sure to create a project and have the project id in a variable called PROJECT_ID.
    • Make sure to have the language for which you added training examples in a variable called LANGUAGE.

Upload a Single Training Example

Training data (training examples) are what our AutoNLP learns to predict intents and entities from. This training example is for an intent set_alarm. You can see tha the text for this intent is set an alarm for 25 minutes ., which indicates that the intent of the user is to set an alarm at a given time. Since time is an entity, it has been added to the entities argument. For entities you have to specify the start and end character index of the text you want to extract. In this case for 25 minutes is what we are tagging as entity datetime and the start and end index are 13 and 27 respectively.

note

Multiple entities can be added to a training examples.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"example\": {
\"text\": \"set an alarm for 25 minutes .\",
\"intent\": \"set_alarm\",
\"type\": \"train\",
\"entities\": [
{
\"value\": \"for 25 minutes\",
\"entity\": \"datetime\",
\"start\": 13,
\"end\": 27
}
]
}
}"

This will return a unique exampleId. Store this value in a variable.

EXAMPLE_ID="YOUR-EXAMPLE-ID"
AttributeRequiredTypeLimitsDescription
projectIdtruestr-Set to the id of the project you want this example to be uploaded to.
languagetruestrLanguages in projectSet to the language in the project you want this example to correspond to.
exampletrueobj-It is a json object which attributes of your training example. The following are its attribute.
example.texttruestr1000 charactersA piece of text that you want to predict the intent of. AutoNLP is trained to learn patterns from this text.
example.intenttruestr100 charactersThe intent behind the text. In this case it is setting the alarm. In case of multiple intents, use a + symbol to separate them. E.g., intent_a+intent_b
example.typefalsestrtrain or testDefaults to train. AutoNLP takes two kinds of examples, train, and test. train examples are used for training AutoNLP and test example are used for reporting the model's performance.
example.entitiesfalselist-Information from the text that you want AutoNLP to learn to extract are called Entities. More details here.
example.entities.valuetruestr-The substring that you want to extract from the text. In this case it is for 25 minutes. This is the value that AutoNLP will learn to predict while training.
example.entities.entitytruestr30 charactersName of the entity that value corresponds to. In this example it is datetime.
example.entities.starttrueintPositive integers onlyThe character index where this entity value starts from in the given text. In this example it is 13. Note that we follow zero indexing.
example.entities.endtrueintPositive integers onlyThe character index where the entity value ends in the given text. In this example it is 27.
example.entities.entityTypefalsetextOne of trainable, pre-trained, lookup, regexDefaults to trainable, which means AutoNLP learns to predict this entity. Apart from that we support three other entity types. More details here . In this case it is pre-trained, which means AutoNLP uses an off-the-shelf entity extractor for datetime.

Upload a Single Test Example

Exactly like training examples, test examples can be added to a project by just changing the type attribute in the payload. Test data (examples) are not mandatory, but they are useful when you want to track how your models are performing using objective metrics.

In test examples you provide a text, its corresponding intents and the entities in it. Additionally you provide an attribute called type and set its value to test.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"example\": {
\"text\": \"remind me to get gas in car today .\",
\"intent\": \"set_alarm\",
\"type\": \"test\",
\"entities\": [
{
\"value\": \"today\",
\"entity\": \"datetime\",
\"start\": 28,
\"end\": 33
}
]
}
}"

Upload Entire Dataset

Using CLI, you can upload the entire dataset to a project. Refer Datasets Format and Converters to see how to format your dataset before uploading.

neuralspace nlu upload-dataset -p $PROJECT_ID -L "en" -d "PATH TO DATASET"
AttributeRequiredTypeDescription
--projectId or -ptruestrSet to the id of the project you want all examples to be uploaded to.
--dataset-file or -dtruestrSet to the path of the dataset you wish to upload.
--language or -LtruestrSet to the language in the project you want this example to correspond to.
--skip-first or -sfalseintSet to number of examples you wish to skip from beginning.
--ignore-errors or -e falsestrSet to true, to ignore errors if any.

Why do you need test examples?

Our AutoNLP never uses these example for training. Once it has trained successfully we feed the test examples through the trained models and generate evaluations metrics which you can use to track progress. Read more about evaluation metrics here.

There are various advantages of adding test examples to your project.

  • Get intent classification and entity extraction accuracy every time you train a model.
  • Adding test examples can help you track progress. It is a good practice to add test examples before even you add train examples. You can call it "Test Driven Modelling".

Minimum Data Requirements to Train a Model

Training Data

To start training you need the following

  • Two intents
  • Ten examples per intent

Our AutoNLP is extremely data efficient, i.e., with 40-50 examples per intent you will get the desired results.

Example Diversity

It is always better if you train AutoNLP with a diverse set of examples rather than repetitive or very similar examples.

What Happens When You Upload an Example?

Once you add an example (train or test) to a project, we prepare it for training by passing it through a processing pipeline. Hence, every example has an attribute called prepared. In some cases an example might not get prepared. This is a rare event but it can happen. In that case you can re-prepare these examples using the following API.

Prepare unprepared examples
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\":\"${EXAMPLE_ID}\"
}"

Fetch Examples

If you wish to see your uploaded examples using the CLI, you can use the following command. Apply relevant filters mentioned in the table below to get more appropriate results.

neuralspace nlu list-examples -p $PROJECT_ID -L "en" -P true -t "train"
Attributerequiredtypedesciption
--project-id or -ptruestrSet to the id of the project you want to list examples for.
--language or -LtruestrSet to the language of the dataset
--prepared or -pfalsestrSet to true to list only prepared examples.
--type or -tfalsestrSet to either train or test depending upon which type of examples you wish to see.
--intent or -ifalsestrSet to a particular intent to list examples having this intent.
--page-number or -n falseintSet to which page number to fetch.
--page-size or -sfalseintSelect the number of examples to be present per page.
--verbosefalse-pass --verbose in the end to get Verbose results.

Update Example

Lets change the text of the training example we inserted before. We will also modify the entity accordingly.

Update any attribute of an example using this API
curl --location --request PUT 'https://platform.neuralspace.ai/api/nlu/v1/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: text/plain' \
--data-raw "{
\"exampleId\":\"${EXAMPLE_ID}\",
\"text\": \"wake me up in 25 minutes .\",
\"intent\": \"set_alarm\",
\"type\": \"train\",
\"entities\": [
{
\"value\": \"25 minutes\",
\"entity\": \"datetime\",
\"start\": 14,
\"end\": 24
}
]
}"

When you update an example, it gets prepared again.

Prepare Unprepared Examples

In case an example is not prepared, you can call this API to prepare it.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\": \"${EXAMPLE_ID}\"
}"

Cleanup

Delete Single Example

Delete example by exampleId
neuralspace nlu delete-example -e $EXAMPLE_ID

Delete Project

Delete a project using its unique projectId
neuralspace nlu delete-project -p $PROJECT_ID