Pipeline Classifier — Guide to Core ML Tools (original) (raw)
Contents
Pipeline Classifier#
This example creates a model which can be used to train a simple drawing or sketch classifier based on user examples. The model is a pipeline composed of a drawing-embedding model and a nearest-neighbor classifier.
The model is updatable and starts off empty, meaning that the nearest-neighbor classifier has no examples or labels. Before updating with training examples, the model predicts “unknown” for all input.
The input to the model is a 28 x 28 grayscale drawing. The background is expected to be black (0
), while the strokes of the drawing should be rendered as white (255
). Right-click these 28 x 28 images for the following example:
- Drawing of a star:
- Drawing of a heart:
- Drawing of 5:
Get the Embedding Model#
The drawing-embedding model is used as a feature extractor. Start by getting the first part of the model, the spec:
import coremltools from coremltools.models import MLModel
embedding_path = './models/TinyDrawingEmbedding.mlmodel' embedding_model = MLModel(embedding_path)
embedding_spec = embedding_model.get_spec() print embedding_spec.description
In the following output, the shortDescription
indicates that the embedding model takes in a 28 x 28 grayscale image about outputs a 128 dimensional float vector:
tf.estimator package not installed. tf.estimator package not installed. input { name: "drawing" shortDescription: "Input sketch image with black background and white strokes" type { imageType { width: 28 height: 28 colorSpace: GRAYSCALE } } } output { name: "embedding" shortDescription: "Vector embedding of sketch in 128 dimensional space" type { multiArrayType { shape: 128 dataType: FLOAT32 } } } metadata { shortDescription: "Embeds a 28 x 28 grayscale image of a sketch into 128 dimensional space. The model was created by removing the last layer of a simple convolution based neural network classifier trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)." author: "Core ML Tools Example" license: "MIT" }
Create the Nearest Neighbor Classifier#
Now that the feature extractor is in place, create the second model of your pipeline model. It is a nearest-neighbor classifier operating on the embedding:
from coremltools.models.nearest_neighbors import KNearestNeighborsClassifierBuilder import coremltools.models.datatypes as datatypes
knn_builder = KNearestNeighborsClassifierBuilder(input_name='embedding', output_name='label', number_of_dimensions=128, default_class_label='unknown', k=3, weighting_scheme='inverse_distance', index_type='linear')
knn_builder.author = 'Core ML Tools Example' knn_builder.license = 'MIT' knn_builder.description = 'Classifies 128 dimension vector based on 3 nearest neighbors'
knn_spec = knn_builder.spec knn_spec.description.input[0].shortDescription = 'Input vector to classify' knn_spec.description.output[0].shortDescription = 'Predicted label. Defaults to 'unknown'' knn_spec.description.output[1].shortDescription = 'Probabilities / score for each possible label.'
print knn_spec.description
Create an Updatable Pipeline Model#
The last step is to create the pipeline model and insert the feature extractor and the nearest-neighbor classifier. The model will be set to be updatable. Follow these steps:
- Create the spec, set it to be updatable, and set the specification version:
pipeline_spec = coremltools.proto.Model_pb2.Model()
pipeline_spec.specificationVersion = coremltools._MINIMUM_UPDATABLE_SPEC_VERSION
pipeline_spec.isUpdatable = True - Set the inputs to the inputs from the embedding model:
Inputs are the inputs from the embedding model
pipeline_spec.description.input.extend(embedding_spec.description.input[:]) 3. Set the outputs to the outputs from the classification model:
Outputs are the outputs from the classification model
pipeline_spec.description.output.extend(knn_spec.description.output[:])
pipeline_spec.description.predictedFeatureName = knn_spec.description.predictedFeatureName
pipeline_spec.description.predictedProbabilitiesName = knn_spec.description.predictedProbabilitiesName
4. Set the training inputs:
Training inputs
pipeline_spec.description.trainingInput.extend([embedding_spec.description.input[0]])
pipeline_spec.description.trainingInput[0].shortDescription = 'Example sketch'
pipeline_spec.description.trainingInput.extend([knn_spec.description.output[0]])
pipeline_spec.description.trainingInput[1].shortDescription = 'Associated true label of example sketch'
5. Provide the metadata:
Provide metadata
pipeline_spec.description.metadata.author = 'Core ML Tools'
pipeline_spec.description.metadata.license = 'MIT'
pipeline_spec.description.metadata.shortDescription = ('An updatable model which can be used to train a tiny 28 x 28 drawing classifier based on user examples.'
' It uses a drawing embedding trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)')
6. Construct the pipeline by adding the embedding and the nearest-neighbor classifier:
Construct pipeline by adding the embedding and then the nearest neighbor classifier
pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(embedding_spec)
pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(knn_spec)
7. Save the updated spec:
Save the updated spec.
from coremltools.models import MLModel
mlmodel = MLModel(pipeline_spec)
output_path = './TinyDrawingClassifier.mlmodel'
from coremltools.models.utils import save_spec
mlmodel.save(output_path)