Training Data

To train an NLU model that can reconize the intents and entities for your unique use-case, you need training data such that the AI model can learn these unique intents and entities. In this article we will create a project and add training data to it. Training data consists of a piece of text, with its corresponding intents and entities. Using these examples our AutoNLP learns to predict the intents and entities of a text that the model has never seen before.

Prerequisites

Make sure to follow Getting Started to login and install the Language Understanding Service. If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN before moving ahead
Create a Project:
- Make sure to create a project and have the project id in a variable called PROJECT_ID
- Make sure to have the language for which you added training examples in a variable called LANGUAGE

Upload a Single Training Example

Training data (or training examples) are what our AutoNLP learns from to predict intents and entities of unseen text. The training example below is for an intent called set_alarm. You can see that the text for this intent is set an alarm for 25 minutes ., which indicates that the intent of the user is to set an alarm at a given time. Since time is an entity, it has been added to the entities argument. For entities you have to specify the start and end character index of the text you want to extract. In this case, for 25 minutes is what we are tagging as entity called datetime and the start and end index are 13 and 27 respectively.

note

Multiple entities can be added to the same training example.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
    \"projectId\": \"${PROJECT_ID}\",
    \"language\": \"en\",
    \"example\": {
            \"text\": \"set an alarm for 25 minutes .\",
            \"intent\": \"set_alarm\",
            \"type\": \"train\",
            \"entities\": [
                {
                    \"value\": \"for 25 minutes\",
                    \"entity\": \"datetime\",
                    \"start\": 13,
                    \"end\": 27
                }
            ]
        }
}"

This will return a unique exampleId. Store this value in a variable.

EXAMPLE_ID="YOUR-EXAMPLE-ID"

Attribute	Required	Type	Limits	Description
`projectId`	true	str	-	Set to the ID of the project you want this example to be uploaded to.
`language`	true	str	Languages in project	Set to the language in the project you want this example to correspond to.
`example`	true	obj	-	It is a json object which attributes of your training example. The following are its attributes.
`example.text`	true	str	1000 characters	A piece of text that you want to predict the intent of. AutoNLP is trained to learn patterns from this text.
`example.intent`	true	str	100 characters	The intent behind the text. In this case it is setting the alarm.
`example.type`	false	str	`train` or `test`	Defaults to `train`. AutoNLP takes two kinds of examples, `train`, and `test`. `train` examples are used for training AutoNLP and `test` example are used for reporting the model's performance.
`example.entities`	false	list	-	Information from the text that you want AutoNLP to learn to extract are called Entities. More details here.
`example.entities.value`	true	str	-	The substring that you want to extract from the text. In this case it is `for 25 minutes`. This is the value that AutoNLP will learn to predict while training.
`example.entities.entity`	true	str	30 characters	Name of the entity that value corresponds to. In this example it is `datetime`.
`example.entities.start`	true	int	Positive integers only	The character index where this entity value starts from in the given text. In this example it is `13`. Note that we follow zero indexing.
`example.entities.end`	true	int	Positive integers only	The character index where the entity value ends in the given text. In this example it is `27`.
`example.entities.entityType`	false	text	One of `trainable`, `pre-trained`, `lookup`, `regex`	Defaults to `trainable`, which means AutoNLP learns to predict this entity. Apart from that we support three other entity types. More details here . In this case it is `pre-trained`, which means AutoNLP uses an off-the-shelf entity extractor for `datetime`.

Upload a Single Test Example

Exactly like training examples, test examples can be added to a project by just changing the type attribute in the payload. Test data (examples) are not mandatory, but they are useful when you want to track how your models are performing using objective metrics.

In test examples you provide a text, its corresponding intents and the entities in it. Additionally you provide an attribute called type and set its value to test.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
    \"projectId\": \"${PROJECT_ID}\",
    \"language\": \"en\",
    \"example\": {
            \"text\": \"remind me to get gas in car today .\",
            \"intent\": \"set_alarm\",
            \"type\": \"test\",
            \"entities\": [
                {
                    \"value\": \"today\",
                    \"entity\": \"datetime\",
                    \"start\": 28,
                    \"end\": 33
                }
            ]
        }
}"

Why do you need test examples?

Our AutoNLP never uses these example for training. Once it has trained successfully we feed the test examples through the trained models and generate evaluations metrics which you can use to track progress. Read more about evaluation metrics here.

There are various advantages of adding test examples to your project:

Get intent classification accuracy and entity extraction F1 (strict/partial) scores every time you train a model.
Adding test examples can help you track progress. It is a good practice to add test examples before you even add train examples. You can call this approach "Test Driven Modelling".

Minimum Data Requirements to Train a Model

Training Data

To start training you need the following

Two intents
Ten examples per intent

Our AutoNLP is extremely data-efficient, i.e., with 40-50 examples per intent you will get very good results.

Example Diversity

It is always better if you train AutoNLP with a diverse set of examples rather than repetitive or very similar examples.

What Happens When You Upload an Example?

Once you add an example (train or test) to a project, we prepare it for training by passing it through a processing pipeline. Hence, every example has an attribute called prepared. In some cases an example might not get prepared. This is a rare event but it can happen. In that case you can re-prepare these examples using the following API.

Prepare unprepared examples
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
    \"exampleId\":\"${EXAMPLE_ID}\"
}"

Fetch Examples

Single Example

You can fetch a single example and all its attributes from using this API.
curl --location --request GET "https://platform.neuralspace.ai/api/nlu/v1/example?exampleId=${EXAMPLE_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json'

Multiple Examples

Use this API to list your examples in a project. You can filter examples by search keyword, language, prepared status (true or false), and type of example (train or test). Here we are filtering by keyword Companion and language en, which is English.

You can fetch multiple examples using this pagination API.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/list/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
    \"filter\": \"\",
    \"filter\": {
        \"projectId\": \"${PROJECT_ID}\",
        \"language\": \"en\",
        \"prepared\": \"true\",
        \"type\": \"train\"
    },
    \"pageNumber\":1,
    \"pageSize\": 20
}"

This is a pagination API, hence, pageSize determines how many projects to retrieve and pageNumber determines which page to fetch.

Update Example

Let's change the text of the training example we inserted before. We will also modify the entity accordingly.

Update any attribute of an example using this API
curl --location --request PUT 'https://platform.neuralspace.ai/api/nlu/v1/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: text/plain' \
--data-raw "{
    \"exampleId\":\"${EXAMPLE_ID}\",
    \"text\": \"wake me up in 25 minutes .\",
    \"intent\": \"set_alarm\",
    \"type\": \"train\",
    \"entities\": [
        {
            \"value\": \"25 minutes\",
            \"entity\": \"datetime\",
            \"start\": 14,
            \"end\": 24
        }
    ]
}"

When you update an example, it gets prepared again.

Prepare Unprepared Examples

In case an example is not prepared, you can call this API to prepare it.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
    \"exampleId\": \"${EXAMPLE_ID}\"
}"

Cleanup

Delete Single Example

Delete example by exampleId
curl --location --request DELETE 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
    \"exampleId\": \"${EXAMPLE_ID}\"
}"

Delete Project

Delete a project using its unique projectId
curl --location --request DELETE "https://platform.neuralspace.ai/api/nlu/v1/single/project?projectId=${PROJECT_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}"

Training Data

Prerequisites​

Upload a Single Training Example​

note

Upload a Single Test Example​

Why do you need test examples?​

Minimum Data Requirements to Train a Model​

Training Data​

Example Diversity

What Happens When You Upload an Example?​

Fetch Examples​

Single Example​

Multiple Examples​

Update Example​

Prepare Unprepared Examples​

Cleanup​

Delete Single Example​

Delete Project​

Prerequisites

Upload a Single Training Example

Upload a Single Test Example

Why do you need test examples?

Minimum Data Requirements to Train a Model

Training Data

What Happens When You Upload an Example?

Fetch Examples

Single Example

Multiple Examples

Update Example

Prepare Unprepared Examples

Cleanup

Delete Single Example

Delete Project