Train NLU Models using AutoNLP
To train NLU models on the NeuralSpace Platform you don't need any machine learning knowledge. In this article we will learn how to train our models.
Prerequisites
- Getting Started: Make sure to follow Getting Started to log in and install the Language Understanding Service. If you are using APIs, save your authorization token in a variable called
AUTHORIZATION_TOKEN
before moving ahead - Create a Project:
- Make sure to create a project and have the project ID in a variable called
PROJECT_ID
- Make sure to have the language for which you added training examples in a variable called
LANGUAGE
- Make sure to create a project and have the project ID in a variable called
- Add training data: Make sure to have at least two intents with 10 examples each
Related Topics
Train Model
Multiple Train Jobs
Sometimes while training a model, specifically when you have less training data, same model when trained seperately multiple times can show slight variation in performance (2-4%). To solve this, you can parallelly run multiple train jobs for the same data and then select the model which gives the best performance. By default, we run 5 training jobs for you, but you can set it to any number of your choice by changing the noOfTrainingJob
parameter in train API.
- API
Train API launches a training job on our Platform and returns a unique model ID.
This model ID can be used to monitor the training status of this job.
Give a name to the model by specifying it in the modelName
parameter.
As mentioned above, you can also set the number of training jobs you want to run by specifying it in noOfTrainingJob
parameter.
If not set, 5 will be run.
projectId
and language
are mandatory parameters.
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/model/train/queue' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\":\"${LANGUAGE}\",
\"modelName\": \"My First Model\",
\"noOfTrainingJob\": 3
}"
Store this returned model ID in a variable.
MODEL_ID="YOUR-MODEL-ID"
note
Every time you call a train job for a given project and language a new model ID gets generated. The total number of training jobs you can queue at a time is equal to the number of trained models left in your subscription.
Get Model Status
After calling the train job, a new model ID is generated. You can use this ID to track your training progress as well as fetch model related attributes.
- API
curl --location --request GET "https://platform.neuralspace.ai/api/nlu/v1/model?modelID=${MODEL_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}"
Status | Description | |
---|---|---|
Initiated | A training job has been created but not yet queued. Only jobs which are valid get queued in the training pipeline. | |
Queued | Queued jobs are ready for training. If you are a free plan user then your jobs have the lowest priority. Jobs by basic plan users get a higher priority than free plan users. Advanced plan users always get the highest priority. | |
Pipeline Building | Our AutoNLP pipeline is getting built based on the data in your project. | |
Pipeline Built | AutoNLP pipeline built and is ready to execute. | |
Preparing Data | Our secret sauce gets poured on your data here. | |
Data Prepared | Your data is ready for training. | |
Training | AutoNLP has started training. | |
Trained | AutoNLP trained successfully. | |
Saved | Model artifacts saved in our secure cloud storage. | |
Completed | Model is ready to be deployed. | |
Failed | Training failed. Reason for training can be found in the message attribute of the model object. | |
Timed Out | We have a hard time-out of 6 hours set for all NLU models. If a model takes longer than that, we cancel the job. This occurs rarely and only when our platform is overloaded. | |
Dead | Training jobs which have not updated their status for more than 10 hours are declared dead. This means they are not responding. This is also a rare event and happens only when our platform is overloaded. |
Once a model's trainingStatus
becomes Completed
then only it's ready for deployment.
note
Only models with status Completed
, Failed
, Timed Out
, Dead
can be deleted.
List Models
You can also list all your projects within a single project.
Read about how to list all your projects in this article.
- API
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/list/model' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"filter\": {
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"${LANGUAGE}\"
},
\"pageNumber\": 1,
\"pageSize\": 10
}"
This is a pagination API, hence, pageSize
determines how many projects to retrieve, and pageNumber
determines which page to fetch.
This Api will return a list of all the models in the language you have specified for the given project ID.
Model attributes like trainingStatus
, trainingTime
, etc. are described in the next section.
Get Single Model
- API
Use this API to fetch a single model and its attributes.
MODEL_ID="YOUR-MODEL-ID"
curl --location --request GET "https://platform.neuralspace.ai/api/nlu/v1/model?modelId=${MODEL_ID}" \
--header "Authorization: ${AUTHORIZATION_TOKEN}"
This will return a json object in the following format:
{
"success": true,
"message": "Model status fetched",
"data": {
"name": "...",
"appType": "nlu",
"projectId": "...",
"apikey": "...",
"createdBy": {
"email": "...",
"role": "provider",
"apikey": "...",
"referenceKey": "..."
},
"active": true,
"status": "active",
"createdAt": 1620059929722,
"updatedAt": 1620059929722,
"modelId": "...",
"replicas": 0,
"trainingStatus": "Completed",
"lastStatusUpdateAt": "2021-05-04T09:28:45.672Z",
"trainingProgress": [
"Initiated",
"Pipeline Building",
"Queued",
"Pipeline Built",
"Preparing Data",
"Data Prepared",
"Training",
"Trained",
"Saved",
"Completed"
],
"examplesPerIntent": {
"SOME-INTENT": NUMBER-OF-EXAMPLES-FOR-THIS-INTENT,
...
},
"metrics": {
"intentClassifierPerformance": {
"i_acc": 0.9894935488700867,
"i_f1": 0.9894935488700867
},
"nerPerformance": {
"e_f1_strict": 0.971139669418335,
"e_f1_partial": 0.971139669418335
}
},
"language": ".."
}
"timestamp": 1620120623094
}
Description of Fields
Fields | Description |
---|---|
name | Name of the model. |
appType | This will always be nlu . |
projectId | The ID of the project this model belongs to. |
apikey | Your API Key |
createdAt | Timestamp of when this model was created. |
updatedAt | Timestamp of when this model was updated. |
modelId | A unique ID for your model. |
replicas | This indicates how many replicas of this model is deployed on our platform. Multiple replicas ensue higher throughput and higher availability. |
trainingStatus | The current training status. |
lastStatusUpdateAt | During training, whenever the status changes this field is updated with a timestamp. |
trainingProgress | A list of all training statuses this model has gone through. |
examplesPerIntent | This is the distribution of your training dataset. Keys in this dictionary are intents and values are the number of examples you have in the training set for that intent. |
metrics | When you have test examples in a project, the model is evaluated on them. Here you will find some metrics that we calculate to gauge the performance of the model. These numbers are all zeros if you don't upload any test examples. |
metrics.intentClassifierPerformance | These are the metrics for the intent classifier. |
metrics.intentClassifierPerformance.i_acc | The fraction of test examples for which AutoNLP predicted the right intent. |
metrics.intentClassifierPerformance.i_f1 | [For advanced users only] This is the macro averaged F1 score |
metrics.nerPerformance | These are the metrics for entity recognition. |
metrics.nerPerformance.e_f1_strict | Here we consider exact boundary surface string match and entity type |
metrics.nerPerformance.e_f1_partial | Here we consider partial boundary match over the surface string, regardless of the type; |
language | The language this model was trained for. |
When running machine learning models for entity recognition, it is common to report metrics (precision, recall and f1-score) at the individual token level. This may not be the best approach, as a named entity can be made up of multiple tokens. At the same time, regular NER scheme tend to ignore the possibility of partial matches which are scenarios when the entity recognition system gets the named-entity surface string correct but the type wrong.
Update Model Name
- API
This API lets you update your model name to your customized need.
MODEL_ID="YOUR-MODEL-ID"
curl --location --request PUT 'https://platform.neuralspace.ai/api/nlu/v1/model' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"modelId\": \"${MODEL_ID}\",
\"modelName\": \"New Name\"
}"
Delete Model
- API
Delete your models using this API.
curl --location --request DELETE 'https://platform.neuralspace.ai/api/nlu/v1/model' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"modelId\": \"${MODEL_ID}\",
}"