Pipeline Classifier — Guide to Core ML Tools (original) (raw)

Contents

Pipeline Classifier#

This example creates a model which can be used to train a simple drawing or sketch classifier based on user examples. The model is a pipeline composed of a drawing-embedding model and a nearest-neighbor classifier.

The model is updatable and starts off empty, meaning that the nearest-neighbor classifier has no examples or labels. Before updating with training examples, the model predicts “unknown” for all input.

The input to the model is a 28 x 28 grayscale drawing. The background is expected to be black (0), while the strokes of the drawing should be rendered as white (255). Right-click these 28 x 28 images for the following example:

Get the Embedding Model#

The drawing-embedding model is used as a feature extractor. Start by getting the first part of the model, the spec:

import coremltools from coremltools.models import MLModel

embedding_path = './models/TinyDrawingEmbedding.mlmodel' embedding_model = MLModel(embedding_path)

embedding_spec = embedding_model.get_spec() print embedding_spec.description

In the following output, the shortDescription indicates that the embedding model takes in a 28 x 28 grayscale image about outputs a 128 dimensional float vector:

tf.estimator package not installed. tf.estimator package not installed. input { name: "drawing" shortDescription: "Input sketch image with black background and white strokes" type { imageType { width: 28 height: 28 colorSpace: GRAYSCALE } } } output { name: "embedding" shortDescription: "Vector embedding of sketch in 128 dimensional space" type { multiArrayType { shape: 128 dataType: FLOAT32 } } } metadata { shortDescription: "Embeds a 28 x 28 grayscale image of a sketch into 128 dimensional space. The model was created by removing the last layer of a simple convolution based neural network classifier trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)." author: "Core ML Tools Example" license: "MIT" }

Create the Nearest Neighbor Classifier#

Now that the feature extractor is in place, create the second model of your pipeline model. It is a nearest-neighbor classifier operating on the embedding:

from coremltools.models.nearest_neighbors import KNearestNeighborsClassifierBuilder import coremltools.models.datatypes as datatypes

knn_builder = KNearestNeighborsClassifierBuilder(input_name='embedding', output_name='label', number_of_dimensions=128, default_class_label='unknown', k=3, weighting_scheme='inverse_distance', index_type='linear')

knn_builder.author = 'Core ML Tools Example' knn_builder.license = 'MIT' knn_builder.description = 'Classifies 128 dimension vector based on 3 nearest neighbors'

knn_spec = knn_builder.spec knn_spec.description.input[0].shortDescription = 'Input vector to classify' knn_spec.description.output[0].shortDescription = 'Predicted label. Defaults to 'unknown'' knn_spec.description.output[1].shortDescription = 'Probabilities / score for each possible label.'

print knn_spec.description

Create an Updatable Pipeline Model#

The last step is to create the pipeline model and insert the feature extractor and the nearest-neighbor classifier. The model will be set to be updatable. Follow these steps:

  1. Create the spec, set it to be updatable, and set the specification version:
    pipeline_spec = coremltools.proto.Model_pb2.Model()
    pipeline_spec.specificationVersion = coremltools._MINIMUM_UPDATABLE_SPEC_VERSION
    pipeline_spec.isUpdatable = True
  2. Set the inputs to the inputs from the embedding model:

Inputs are the inputs from the embedding model

pipeline_spec.description.input.extend(embedding_spec.description.input[:]) 3. Set the outputs to the outputs from the classification model:

Outputs are the outputs from the classification model

pipeline_spec.description.output.extend(knn_spec.description.output[:])
pipeline_spec.description.predictedFeatureName = knn_spec.description.predictedFeatureName
pipeline_spec.description.predictedProbabilitiesName = knn_spec.description.predictedProbabilitiesName 4. Set the training inputs:

Training inputs

pipeline_spec.description.trainingInput.extend([embedding_spec.description.input[0]])
pipeline_spec.description.trainingInput[0].shortDescription = 'Example sketch'
pipeline_spec.description.trainingInput.extend([knn_spec.description.output[0]])
pipeline_spec.description.trainingInput[1].shortDescription = 'Associated true label of example sketch' 5. Provide the metadata:

Provide metadata

pipeline_spec.description.metadata.author = 'Core ML Tools'
pipeline_spec.description.metadata.license = 'MIT'
pipeline_spec.description.metadata.shortDescription = ('An updatable model which can be used to train a tiny 28 x 28 drawing classifier based on user examples.'
' It uses a drawing embedding trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)') 6. Construct the pipeline by adding the embedding and the nearest-neighbor classifier:

Construct pipeline by adding the embedding and then the nearest neighbor classifier

pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(embedding_spec)
pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(knn_spec) 7. Save the updated spec:

Save the updated spec.

from coremltools.models import MLModel
mlmodel = MLModel(pipeline_spec)
output_path = './TinyDrawingClassifier.mlmodel'
from coremltools.models.utils import save_spec
mlmodel.save(output_path)