Training Data
To train an NLU model that can reconize the intents and entities for your unique use-case, you need training data such that the AI model can learn these unique intents and entities. In this article we will create a project and add training data to it. Training data consists of a piece of text, with its corresponding intents and entities. Using these examples our AutoNLP learns to predict the intents and entities of a text that the model has never seen before.
Prerequisites
- Make sure to follow Getting Started to login and install the Language Understanding Service. If you are using APIs, save your authorization token in a variable called
AUTHORIZATION_TOKEN
before moving ahead - Create a Project:
- Make sure to create a project and have the project id in a variable called
PROJECT_ID
- Make sure to have the language for which you added training examples in a variable called
LANGUAGE
- Make sure to create a project and have the project id in a variable called
Upload a Single Training Example
- API
Training data (or training examples) are what our AutoNLP learns from to predict intents and entities of unseen text.
The training example below is for an intent called set_alarm
. You can see that the text
for this intent is set an alarm for 25 minutes .
, which indicates that the intent of the user is to set an alarm at a given time
.
Since time is an entity, it has been added to the entities argument.
For entities you have to specify the start
and end
character index of the text you want to extract.
In this case, for 25 minutes
is what we are tagging as entity called datetime
and the start
and end
index are 13
and 27
respectively.
note
Multiple entities can be added to the same training example.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"example\": {
\"text\": \"set an alarm for 25 minutes .\",
\"intent\": \"set_alarm\",
\"type\": \"train\",
\"entities\": [
{
\"value\": \"for 25 minutes\",
\"entity\": \"datetime\",
\"start\": 13,
\"end\": 27
}
]
}
}"
This will return a unique exampleId
. Store this value in a variable.
EXAMPLE_ID="YOUR-EXAMPLE-ID"
Attribute | Required | Type | Limits | Description |
---|---|---|---|---|
projectId | true | str | - | Set to the ID of the project you want this example to be uploaded to. |
language | true | str | Languages in project | Set to the language in the project you want this example to correspond to. |
example | true | obj | - | It is a json object which attributes of your training example. The following are its attributes. |
example.text | true | str | 1000 characters | A piece of text that you want to predict the intent of. AutoNLP is trained to learn patterns from this text. |
example.intent | true | str | 100 characters | The intent behind the text. In this case it is setting the alarm. |
example.type | false | str | train or test | Defaults to train . AutoNLP takes two kinds of examples, train , and test . train examples are used for training AutoNLP and test example are used for reporting the model's performance. |
example.entities | false | list | - | Information from the text that you want AutoNLP to learn to extract are called Entities. More details here. |
example.entities.value | true | str | - | The substring that you want to extract from the text. In this case it is for 25 minutes . This is the value that AutoNLP will learn to predict while training. |
example.entities.entity | true | str | 30 characters | Name of the entity that value corresponds to. In this example it is datetime . |
example.entities.start | true | int | Positive integers only | The character index where this entity value starts from in the given text. In this example it is 13 . Note that we follow zero indexing. |
example.entities.end | true | int | Positive integers only | The character index where the entity value ends in the given text. In this example it is 27 . |
example.entities.entityType | false | text | One of trainable , pre-trained , lookup , regex | Defaults to trainable , which means AutoNLP learns to predict this entity. Apart from that we support three other entity types. More details here . In this case it is pre-trained , which means AutoNLP uses an off-the-shelf entity extractor for datetime . |
Upload a Single Test Example
- API
Exactly like training examples, test examples can be added to a project by just changing the type
attribute in the payload.
Test data (examples) are not mandatory, but they are useful when you want to track how your models are performing using objective metrics.
In test examples you provide a text, its corresponding intents and the entities in it.
Additionally you provide an attribute called type
and set its value to test
.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"example\": {
\"text\": \"remind me to get gas in car today .\",
\"intent\": \"set_alarm\",
\"type\": \"test\",
\"entities\": [
{
\"value\": \"today\",
\"entity\": \"datetime\",
\"start\": 28,
\"end\": 33
}
]
}
}"
Upload Entire Dataset
- CLI
Using CLI, you can upload the entire dataset to a project. Refer to Datasets Format and Converters to learn how to format your dataset before uploading it.
neuralspace nlu upload-dataset -p $PROJECT_ID -L "en" -d "PATH TO DATASET"
Attribute | Required | Type | Description |
---|---|---|---|
--projectId or -p | true | str | Set to the ID of the project you want all examples to be uploaded to. |
--dataset-file or -d | true | str | Set to the path of the dataset you wish to upload. |
--language or -L | true | str | Set to the language in the project you want this example to correspond to. |
--skip-first or -s | false | int | Set to number of examples you wish to skip from beginning. |
--ignore-errors or -e | false | str | Set to true , to ignore errors if any. |
Why do you need test examples?
Our AutoNLP never uses these example for training. Once it has trained successfully we feed the test examples through the trained models and generate evaluations metrics which you can use to track progress. Read more about evaluation metrics here.
There are various advantages of adding test examples to your project:
- Get intent classification accuracy and entity extraction F1 (strict/partial) scores every time you train a model.
- Adding test examples can help you track progress. It is a good practice to add test examples before you even add train examples. You can call this approach
"Test Driven Modelling".
Minimum Data Requirements to Train a Model
Training Data
To start training you need the following
- Two intents
- Ten examples per intent
Our AutoNLP is extremely data-efficient, i.e., with 40-50 examples per intent you will get very good results.
Example Diversity
It is always better if you train AutoNLP with a diverse set of examples rather than repetitive or very similar examples.
What Happens When You Upload an Example?
Once you add an example (train or test) to a project, we prepare it for training by passing it through a processing pipeline.
Hence, every example has an attribute called prepared
.
In some cases an example might not get prepared.
This is a rare event but it can happen.
In that case you can re-prepare these examples using the following API.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\":\"${EXAMPLE_ID}\"
}"
Fetch Examples
- CLI
- API
If you wish to see your uploaded examples using the CLI, you can use the following command. Apply relevant filters mentioned in the table below to get more appropriate results.
neuralspace nlu list-examples -p $PROJECT_ID -L "en" -P true -t "train"
Attribute | required | type | desciption |
---|---|---|---|
--project-id or -p | true | str | Set to the ID of the project you want to list examples for. |
--language or -L | true | str | Set to the language of the dataset. |
--prepared or -p | false | str | Set to true to list only prepared examples. |
--type or -t | false | str | Set to either train or test depending upon which type of examples you wish to see. |
--intent or -i | false | str | Set to a particular intent to list examples having this intent. |
--page-number or -n | false | int | Set to which page number to fetch. |
--page-size or -s | false | int | Select the number of examples to be present per page. |
--verbose | false | - | pass --verbose in the end to get Verbose results. |
Single Example
curl --location --request GET "https://platform.neuralspace.ai/api/nlu/v1/example?exampleId=${EXAMPLE_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json'
Multiple Examples
Use this API to list your examples in a project.
You can filter examples by search
keyword, language
, prepared
status (true
or false
), and type
of example (train
or test
).
Here we are filtering by keyword Companion
and language en
, which is English.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/list/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json;charset=UTF-8' \
--data-raw "{
\"filter\": \"\",
\"filter\": {
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"en\",
\"prepared\": \"true\",
\"type\": \"train\"
},
\"pageNumber\":1,
\"pageSize\": 20
}"
This is a pagination API, hence, pageSize
determines how many projects to retrieve and pageNumber
determines which page to fetch.
Update Example
- API
Let's change the text of the training example we inserted before. We will also modify the entity accordingly.
curl --location --request PUT 'https://platform.neuralspace.ai/api/nlu/v1/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: text/plain' \
--data-raw "{
\"exampleId\":\"${EXAMPLE_ID}\",
\"text\": \"wake me up in 25 minutes .\",
\"intent\": \"set_alarm\",
\"type\": \"train\",
\"entities\": [
{
\"value\": \"25 minutes\",
\"entity\": \"datetime\",
\"start\": 14,
\"end\": 24
}
]
}"
When you update an example, it gets prepared again.
Prepare Unprepared Examples
- API
In case an example is not prepared, you can call this API to prepare it.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/prepare/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\": \"${EXAMPLE_ID}\"
}"
Cleanup
Delete Single Example
- CLI
- API
neuralspace nlu delete-example -e $EXAMPLE_ID
curl --location --request DELETE 'https://platform.neuralspace.ai/api/nlu/v1/single/example' \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw "{
\"exampleId\": \"${EXAMPLE_ID}\"
}"
Delete Project
- CLI
- API
neuralspace nlu delete-project -p $PROJECT_ID
curl --location --request DELETE "https://platform.neuralspace.ai/api/nlu/v1/single/project?projectId=${PROJECT_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header "authorization: ${AUTHORIZATION_TOKEN}"