Skip to main content

Train NLU Models using AutoNLP

Run in Postman

To train NLU models on the NeuralSpace Platform you don't need any machine learning knowledge. In this article we will learn how to train our models.

Prerequisites

  • Getting Started: Make sure to follow Getting Started to log in and install the Language Understanding Service. If you are using APIs, save your authorization token in a variable called AUTHORIZATION_TOKEN before moving ahead
  • Create a Project:
    • Make sure to create a project and have the project ID in a variable called PROJECT_ID
    • Make sure to have the language for which you added training examples in a variable called LANGUAGE
  • Add training data: Make sure to have at least two intents with 10 examples each

Train Model

Multiple Train Jobs

Sometimes while training a model, specifically when you have less training data, same model when trained seperately multiple times can show slight variation in performance (2-4%). To solve this, you can parallelly run multiple train jobs for the same data and then select the model which gives the best performance. By default, we run 5 training jobs for you, but you can set it to any number of your choice by changing the noOfTrainingJob parameter in train API.

Train API launches a training job on our Platform and returns a unique model ID. This model ID can be used to monitor the training status of this job. Give a name to the model by specifying it in the modelName parameter. As mentioned above, you can also set the number of training jobs you want to run by specifying it in noOfTrainingJob parameter. If not set, 5 will be run. projectId and language are mandatory parameters.

curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/model/train/queue' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"projectId\": \"${PROJECT_ID}\",
\"language\":\"${LANGUAGE}\",
\"modelName\": \"My First Model\",
\"noOfTrainingJob\": 3
}"

Store this returned model ID in a variable.

MODEL_ID="YOUR-MODEL-ID"
note

Every time you call a train job for a given project and language a new model ID gets generated. The total number of training jobs you can queue at a time is equal to the number of trained models left in your subscription.

Get Model Status

After calling the train job, a new model ID is generated. You can use this ID to track your training progress as well as fetch model related attributes.

Request
curl --location --request GET "https://platform.neuralspace.ai/api/nlu/v1/model?modelID=${MODEL_ID}" \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}"
StatusDescription
InitiatedA training job has been created but not yet queued. Only jobs which are valid get queued in the training pipeline.
QueuedQueued jobs are ready for training. If you are a free plan user then your jobs have the lowest priority. Jobs by basic plan users get a higher priority than free plan users. Advanced plan users always get the highest priority.
Pipeline BuildingOur AutoNLP pipeline is getting built based on the data in your project.
Pipeline BuiltAutoNLP pipeline built and is ready to execute.
Preparing DataOur secret sauce gets poured on your data here.
Data PreparedYour data is ready for training.
TrainingAutoNLP has started training.
TrainedAutoNLP trained successfully.
SavedModel artifacts saved in our secure cloud storage.
CompletedModel is ready to be deployed.
FailedTraining failed. Reason for training can be found in the message attribute of the model object.
Timed OutWe have a hard time-out of 6 hours set for all NLU models. If a model takes longer than that, we cancel the job. This occurs rarely and only when our platform is overloaded.
DeadTraining jobs which have not updated their status for more than 10 hours are declared dead. This means they are not responding. This is also a rare event and happens only when our platform is overloaded.

Once a model's trainingStatus becomes Completed then only it's ready for deployment.

note

Only models with status Completed, Failed, Timed Out, Dead can be deleted.

List Models

You can also list all your projects within a single project.
Read about how to list all your projects in this article.

Request
curl --location --request POST 'https://platform.neuralspace.ai/api/nlu/v1/list/model' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"filter\": {
\"projectId\": \"${PROJECT_ID}\",
\"language\": \"${LANGUAGE}\"
},
\"pageNumber\": 1,
\"pageSize\": 10
}"

This is a pagination API, hence, pageSize determines how many projects to retrieve, and pageNumber determines which page to fetch. This Api will return a list of all the models in the language you have specified for the given project ID. Model attributes like trainingStatus, trainingTime, etc. are described in the next section.

Get Single Model

Use this API to fetch a single model and its attributes.

MODEL_ID="YOUR-MODEL-ID"

curl --location --request GET "https://platform.neuralspace.ai/api/nlu/v1/model?modelId=${MODEL_ID}" \
--header "Authorization: ${AUTHORIZATION_TOKEN}"

This will return a json object in the following format:

Model Details
{
"success": true,
"message": "Model status fetched",
"data": {
"name": "...",
"appType": "nlu",
"projectId": "...",
"apikey": "...",
"createdBy": {
"email": "...",
"role": "provider",
"apikey": "...",
"referenceKey": "..."
},
"active": true,
"status": "active",
"createdAt": 1620059929722,
"updatedAt": 1620059929722,
"modelId": "...",
"replicas": 0,
"trainingStatus": "Completed",
"lastStatusUpdateAt": "2021-05-04T09:28:45.672Z",
"trainingProgress": [
"Initiated",
"Pipeline Building",
"Queued",
"Pipeline Built",
"Preparing Data",
"Data Prepared",
"Training",
"Trained",
"Saved",
"Completed"
],
"examplesPerIntent": {
"SOME-INTENT": NUMBER-OF-EXAMPLES-FOR-THIS-INTENT,
...
},
"metrics": {
"intentClassifierPerformance": {
"i_acc": 0.9894935488700867,
"i_f1": 0.9894935488700867
},
"nerPerformance": {
"e_f1_strict": 0.971139669418335,
"e_f1_partial": 0.971139669418335
}
},
"language": ".."
}
"timestamp": 1620120623094
}

Description of Fields

FieldsDescription
nameName of the model.
appTypeThis will always be nlu.
projectIdThe ID of the project this model belongs to.
apikeyYour API Key
createdAtTimestamp of when this model was created.
updatedAtTimestamp of when this model was updated.
modelIdA unique ID for your model.
replicasThis indicates how many replicas of this model is deployed on our platform. Multiple replicas ensue higher throughput and higher availability.
trainingStatusThe current training status.
lastStatusUpdateAtDuring training, whenever the status changes this field is updated with a timestamp.
trainingProgressA list of all training statuses this model has gone through.
examplesPerIntentThis is the distribution of your training dataset. Keys in this dictionary are intents and values are the number of examples you have in the training set for that intent.
metricsWhen you have test examples in a project, the model is evaluated on them. Here you will find some metrics that we calculate to gauge the performance of the model. These numbers are all zeros if you don't upload any test examples.
metrics.intentClassifierPerformanceThese are the metrics for the intent classifier.
metrics.intentClassifierPerformance.i_accThe fraction of test examples for which AutoNLP predicted the right intent.
metrics.intentClassifierPerformance.i_f1[For advanced users only] This is the macro averaged F1 score
metrics.nerPerformanceThese are the metrics for entity recognition.
metrics.nerPerformance.e_f1_strictHere we consider exact boundary surface string match and entity type
metrics.nerPerformance.e_f1_partialHere we consider partial boundary match over the surface string, regardless of the type;
languageThe language this model was trained for.

When running machine learning models for entity recognition, it is common to report metrics (precision, recall and f1-score) at the individual token level. This may not be the best approach, as a named entity can be made up of multiple tokens. At the same time, regular NER scheme tend to ignore the possibility of partial matches which are scenarios when the entity recognition system gets the named-entity surface string correct but the type wrong.

Update Model Name

This API lets you update your model name to your customized need.

Request
MODEL_ID="YOUR-MODEL-ID"

curl --location --request PUT 'https://platform.neuralspace.ai/api/nlu/v1/model' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"modelId\": \"${MODEL_ID}\",
\"modelName\": \"New Name\"
}"

Delete Model

Delete your models using this API.

curl --location --request DELETE 'https://platform.neuralspace.ai/api/nlu/v1/model' \
--header 'Accept: application/json, text/plain, */*' \
--header 'Content-Type: application/json;charset=UTF-8' \
--header "Authorization: ${AUTHORIZATION_TOKEN}" \
--data-raw "{
\"modelId\": \"${MODEL_ID}\",
}"