Debugging And Performance Utilities — Guide to Core ML Tools (original) (raw)
Debugging And Performance Utilities#
These utilities help identify and resolve both numerical and performance issues in exported Core ML models. When a model produces unexpected outputs-such as NaNs, infinities, or results that differ from the source model or exhibits performance bottlenecks, these tools assist in isolating the problematic operations. Once identified, targeted fixes can be applied to address and correct these issues, improving both the accuracy and efficiency of the model.
Experimental
These APIs are currently located under the experimental namespace, which means they may change or become incompatible with previous versions in future releases. They will remain in this namespace until they have been refined and are ready to be promoted to stable APIs.
MLModelInspector#
MLModelInspector
is a utility class that retrieves intermediate outputs from a Core ML model by modifying the model to expose specified internal operations as the model outputs. MLModelInspector
can be used to debug a model and is utilized by both MLModelComparator
and MLModelValidator
.
For example, to retrieve output of a convolution
operation identified by its output name var_1
, you can use the following:
import coremltools as ct from coremltools.models.ml_program.experimental.debugging_utils import MLModelInspector
Initialize the MLModelInspector.
inspector = MLModelInspector( model=model, compute_units=ct.ComputeUnit.CPU_ONLY, )
Use the inspector to retrieve intermediate outputs from the model.
inputs
specifies the input data for the model, and output_names
lists the internal operations
(e.g., variables) whose outputs you want to inspect.
outputs = await inspector.retrieve_outputs( inputs={"input": np.array([1, 2, 3])}, # Input data for the model output_names=["var_1"], # Name of the intermediate variable to retrieve )
Print the retrieved output for the specified internal operation ("var_1").
print(outputs["var_1"])
MLModelValidator#
If an exported Core ML model produces unexpected outputs, such as infinities or NaNs, the MLModelValidator
can assist in identifying and isolating the problematic operations within the ML program.
For example, if an exported Core ML model produces NaN values as output, the find_failing_ops_with_nan_output
method can be used to identify the specific operations responsible for generating NaNs in the model’s output.
import coremltools as ct from coremltools.models.ml_program.experimental.debugging_utils import MLModelValidator
Initialize MLModelValidator
validator = MLModelValidator( model = model, compute_unit = ct.ComputeUnit.CPU_ONLY, )
Find the ops that are responsible for NaN output
failing_ops = await validator.find_failing_ops_with_nan_output( inputs={"input": np.array([1, 2, 3])} # Inputs that produce NaN output )
print(failing_ops)
If the exported Core ML model produces infinity values as outputs, the find_failing_ops_with_infinite_output
method can be used to identify the specific operations responsible for generating infinities in the model’s output.
import coremltools as ct from coremltools.models.ml_program.experimental.debugging_utils import MLModelValidator
Initialize MLModelValidator
validator = MLModelValidator( model = model, compute_unit = ct.ComputeUnit.CPU_ONLY, )
Find the ops that are responsible for NaN output
failing_ops = await validator.find_failing_ops_with_infinite_output( inputs={"input": np.array([1, 2, 3])} # Inputs that produce infinities in the output )
print(failing_ops)
MLModelValidator
also supports passing a custom validation function, enabling more tailored debugging for specific use cases.
import coremltools as ct import numpy as np
from coremltools.models.ml_program.experimental.debugging_utils import MLModelValidator from coremltools import proto
Initialize MLModelValidator
validator = MLModelValidator( model = model, compute_unit = ct.ComputeUnit.CPU_ONLY, )
def validate_output(op: proto.MIL_pb2.Operation, value: np.array): # Check if the output is zero return np.all(value == 0)
Find the ops that are responsible for unexpected output
failing_ops = await validator.find_failing_ops( validate_output=validatate_output, inputs={"input": np.array([1, 2, 3])} # Inputs that produce infinities in the output )
print(failing_ops)
After identifying the problematic operations, the issue may stem from either division by zero or numerical overflow. For division by zero, the source model can be updated to address the problem directly. In cases of overflow, employing higher precision for the affected operations is often sufficient to resolve the issue.
Note: The process of identifying failing operations may be time-consuming, as the duration depends on the model’s complexity.
MLModelComparator#
MLModelComparator is a utility designed to compare reference and target models derived from the same source model. It is particularly useful in scenarios where an exported Core ML model produces unexpected outputs on specific compute units or when using a particular precision (such as float16). By comparing the outputs of a reference model and a target model, MLModelComparator helps identify the operations responsible for these discrepancies.
For example, if an exported Core ML model produces correct outputs when using float32
precision but generates unexpected outputs with float16
precision, you can use MLModelComparator
to identify the operations responsible for the discrepancies.
import coremltools as ct import numpy as np
from coremltools.models.ml_program.experimental.debugging_utils import MLModelComparator
Initialize MLModelComparator to compare reference and target models
comparator = MLModelComparator( reference_model=reference_model, # Model with expected behavior target_model=target_model, # Model to be debugged )
Define a custom comparison function to evaluate output discrepancies
def compare_outputs(operation, reference_output, target_output): # Compare outputs with a tolerance of 1e-1 # Return True if outputs are close, False otherwise return np.allclose(reference_output, target_output, atol=1e-1)
Identify operations causing discrepancies between models
failing_ops = await comparator.find_failing_ops( inputs={"input": np.array([1, 2, 3])}, # Sample input for comparison compare_outputs=compare_outputs # Custom comparison function )
print(failing_ops)
After identifying the problematic operations, the issue might be related to the precision of those operations. In such cases, you can resolve the problem by using higher precision (e.g., float32) for the operations when exporting the model.
Note: The process of identifying failing operations may be time-consuming, as the duration depends on the model’s complexity.
TorchScriptMLModelComparator#
TorchScriptMLModelComparator
is a utility designed to compare the outputs of a torch module and its corresponding exported Core ML model. It utilizes torch.jit.trace
to convert the PyTorch model into a TorchScript representation, which is then converted into a Core ML model. This utility is useful for debugging cases where inconsistent outputs occur during the conversion process from PyTorch to Core ML using TorchScript. It helps to identify specific PyTorch modules that produce inconsistent results between the original torch model and the converted Core ML model.
Before employing this utility, first verify if the float32
precision model produces consistent results. If it does, it’s preferable to use MLModelComparator
with the float32
model as the reference and the problematic model as the target. TorchScriptMLModelComparator
operates at the module level, which may require additional steps to pinpoint specific problematic operations.
For example, to find the modules that produce inconsistent results, you can use the following:
import coremltools as ct import numpy as np import torch
from coremltools.models.ml_program.experimental.torch.debugging_utils import TorchScriptMLModelComparator
Define a simple PyTorch model
class Model(torch.nn.Module): def forward(self, x, y): return x + y
Create an instance of the model
torch_model = Model()
Prepare example inputs for the model
input1 = torch.full((1, 10), 1, dtype=torch.float) input2 = torch.full((1, 10), 2, dtype=torch.float) inputs = (input1, input2)
Initialize the TorchScriptMLModelComparator
comparator = TorchScriptMLModelComparator( model=torch_model, example_inputs=inputs, # Inputs used to trace the PyTorch model inputs=[ # Define input tensor specifications for Core ML coremltools.TensorType(name="x", shape=inputs[0].shape, dtype=np.float32), coremltools.TensorType(name="y", shape=inputs[1].shape, dtype=np.float32), ], compute_unit = ct.ComputeUnit.CPU_ONLY, )
Define a custom comparison function
def compare_outputs(module, reference_output, target_output): # Compare outputs with a tolerance of 0.1 return np.allclose(reference_output, target_output, atol=1e-1)
Use the comparator to find failing modules
modules = await comparator.find_failing_modules( inputs=inputs, compare_outputs=compare_outputs )
Print the modules that failed the comparison
print(modules)
TorchExportMLModelComparator#
TorchExportMLModelComparator
is a utility designed to compare the outputs of a torch module and its corresponding exported Core ML model. It utilizes torch.export.export
to convert the PyTorch model into an ExportedProgram
, which is then converted into a Core ML model. This utility is useful for debugging cases where inconsistent outputs occur during the conversion process from PyTorch to Core ML using torch.export.export
. It helps to identify specific PyTorch operations that produce inconsistent results between the original torch model and the converted Core ML model.
Before employing this utility, first verify if the float32
precision model produces consistent results. If it does, it’s preferable to use MLModelComparator
with the float32
model as the reference and the problematic model as the target.
For example, to find the modules that produce inconsistent results, you can use the following:
import coremltools as ct import numpy as np import torch
from coremltools.models.ml_program.experimental.torch.debugging_utils import TorchExportMLModelComparator
Define a simple PyTorch model
class Model(torch.nn.Module): def forward(self, x, y): return x + y
Create an instance of the model
torch_model = Model()
Prepare example inputs for the model
input1 = torch.full((1, 10), 1, dtype=torch.float) input2 = torch.full((1, 10), 2, dtype=torch.float) inputs = (input1, input2) exported_program = torch.export.export(torch_model, inputs)
Initialize the TorchExportMLModelComparator
comparator = TorchExportMLModelComparator( model=exported_program, inputs=[ # Define input tensor specifications for Core ML ct.TensorType(name="x", shape=inputs[0].shape, dtype=np.float32), ct.TensorType(name="y", shape=inputs[1].shape, dtype=np.float32), ], compute_unit = ct.ComputeUnit.CPU_ONLY, )
Define a custom comparison function
def compare_outputs(operation, reference_output, target_output): # Compare outputs with a tolerance of 0.1 return np.allclose(reference_output, target_output, atol=1e-1)
Use the comparator to find failing operations
operations = await comparator.find_failing_ops( inputs=inputs, compare_outputs=compare_outputs )
Print the ops that failed the comparison
print(operations)
MLModelBenchmarker#
MLModelBenchmarker
is a utility for analyzing the performance of Core ML models. It measures key metrics such as model loading time, prediction latency, and the execution times of individual operations.
For example, to benchmark a model’s load and prediction performance, you can use the following:
from coremltools.models.ml_program.experimental.perf_utils import MLModelBenchmarker
Initialize the MLModelBenchmarker with the Core ML model
benchmarker = MLModelBenchmarker(model=model)
Benchmark the model's loading time over 5 iterations
This measures how long it takes to load the model.
load_measurement = await benchmarker.benchmark_load(iterations=5)
Print the median loading time from the benchmark results
print(load_measurement.statistics.median)
Benchmark the model's prediction time over 5 iterations with a warmup phase
The warmup ensures that any initialization overhead is excluded from the measurements.
predict_measurement = await benchmarker.benchmark_predict(iterations=5, warmup=True)
Print the median prediction time from the benchmark results
print(predict_measurement.statistics.median)
To evaluate the execution performance of operations, you can use the following:
from coremltools.models.ml_program.experimental.perf_utils import MLModelBenchmarker
Initialize the MLModelBenchmarker with the Core ML model
benchmarker = MLModelBenchmarker(model=model)
Benchmark operation execution times over 5 iterations with a warmup phase
The warmup ensures that any initialization overhead is excluded from the measurements.
execution_time_measurements = benchmarker.benchmark_operation_execution(iterations=5, warmup=True)
Print the median execution time of the most time-consuming operation
The operations are sorted in descending order of execution time.
print(f"Median execution time of the slowest operation: {execution_time_measurements[0].statistics.median} seconds")
Note: MLModelBenchmarker
utilizes the model’s compute plan to estimate the execution time of individual operations within the model.
TorchMLModelBenchmarker#
TorchMLModelBenchmarker
is a specialized benchmarking tool designed for PyTorch models. It extends the capabilities of MLModelBenchmarker
to offer tailored performance analysis for PyTorch models. While retaining all the functionality of its parent class, TorchMLModelBenchmarker
introduces additional methods to estimate execution times for individual torch nodes and modules.
For example, to benchmark the execution time of individual nodes in the PyTorch model, you can use the following:
import coremltools as ct import numpy as np import torch
from coremltools.models.ml_program.experimental.torch.perf_utils import TorchMLModelBenchmarker
Define a simple PyTorch model
class Model(torch.nn.Module): def forward(self, x, y): # Perform addition and subtraction on inputs x and y return (x + y, x - y)
Create an instance of the PyTorch model
torch_model = Model()
Prepare example inputs for the model
input1 = torch.full((1, 10), 1, dtype=torch.float) # Tensor filled with ones input2 = torch.full((1, 10), 2, dtype=torch.float) # Tensor filled with twos
Export the PyTorch model using torch.export or torch.jit.trace
traced_model = torch.export.export(torch_model, (input1, input2)) # For PyTorch >= 2.0
traced_model = torch.jit.trace(torch_model, (input1, input2)) # For older versions of PyTorch
Initialize the TorchMLModelBenchmarker for benchmarking the Torch model
benchmarker = TorchMLModelBenchmarker( model=traced_model, inputs=[ ct.TensorType(name="x", shape=input1.shape, dtype=np.float16), # Define input tensor x ct.TensorType(name="y", shape=input2.shape, dtype=np.float16), # Define input tensor y ], minimum_deployment_target=ct.target.iOS16, # Specify minimum deployment target (e.g., iOS16) compute_units=coremltools.ComputeUnit.ALL, # Use all available compute units (CPU/GPU/Neural Engine) )
Benchmark node execution times in the model
Perform 5 iterations with a warmup phase for stable measurements
node_execution_times = await benchmarker.benchmark_node_execution(iterations=5, warmup=True)
Print the median execution time of the slowest PyTorch operation
print(f"Median execution time of the slowest operation: {node_execution_times[0].measurement.statistics.median} ms"
Remote-Device#
Remote-Device is a utility that allows you to run and analyze Core ML models on connected devices, offering tools for debugging and benchmarking issues specific to those devices. It utilizes devicectl
to establish communication with the connected device, facilitating the deployment and execution of Core ML models. To leverage this utility, you must have Xcode and Xcode Command Line installed on your local system and have a development device.
Make sure that the development device is connected and is unlocked. Running the following command will output the list of connected iPhone devices.
from coremltools.models.ml_program.experimental.remote_device import ( AppSigningCredentials, Device, DeviceType, )
Get a list of connected iPhone devices
connected_devices = Device.get_connected_devices(device_type=DeviceType.IPHONE)
This will display information about each connected iPhone, which may include device name, os version, and other relevant details
print(connected_devices)
The connected device should appear in the displayed information. The next step involves installing an application that coremltools uses to load and execute Core ML models on the device.
connected_device = connected_devices[0]
Define the app signing credentials
credentials = AppSigningCredentials( development_team="", # Your Apple Developer Team ID bundle_identifier="com.example.modelrunnerd", # Unique identifier for your app provisioning_profile_uuid=None # UUID of provisioning profile (if applicable) )
Prepare the device for model debugging
This installs the application on the device
prepared_device = await connected_device.prepare_for_model_debugging(credentials=credentials)
In this example, we use the Apple Developer Team ID
. Xcode
will automatically create and manage a team provisioning profile associated with the specified Team ID. However, if you have a specific provisioning profile UUID, you can use it instead. Ensure that the bundle_identifier
matches the one defined in the provisioning profile to avoid any conflicts.
prepare_for_model_debugging
builds and installs the ModelRunner
application on the device. The initial launch may take some time, but subsequent launches should be significantly faster. Once prepare_for_model_debugging
completes, the ModelRunner
application will be launched on the connected device.
You can now execute the model on the connected device.
import coremltools as ct import numpy as np import torch
from coremltools.models.ml_program.experimental.async_wrapper import MLModelAsyncWrapper
Define a simple PyTorch model
class Model(torch.nn.Module): def forward(self, x, y): # Perform element-wise addition and subtraction on inputs x and y return (x + y, x - y)
Create example input tensors for the model
input1 = torch.randn(1, 100) # Random tensor with shape (1, 100) input2 = torch.randn(1, 100) # Random tensor with shape (1, 100)
Instantiate the PyTorch model and set it to evaluation mode
model = Model() model.eval()
Trace the PyTorch model to create a TorchScript representation
traced_model = torch.jit.trace(model, (input1, input2))
Convert the TorchScript model to a Core ML model
ml_model = ct.convert( traced_model, inputs=[ ct.TensorType(name="x", shape=input1.shape, dtype=np.float16), # Define input tensor x ct.TensorType(name="y", shape=input2.shape, dtype=np.float16), # Define input tensor y ], minimum_deployment_target=ct.target.iOS17, # Specify the minimum deployment target (iOS 17) compute_units=ct.ComputeUnit.ALL, # Use all available compute units (CPU/GPU/Neural Engine) )
Wrap the Core ML model for remote execution on a connected device
remote_model = MLModelAsyncWrapper.from_spec_or_path( spec_or_path=ml_model.get_spec(), # Provide the Core ML model specification weights_dir=ml_model.weights_dir, # Specify the directory containing model weights device=prepared_device # Target device for remote execution )
Prepare example inputs for prediction
x = np.full((1, 100), 1.0) # Input tensor x filled with ones y = np.full((1, 100), 2.0) # Input tensor y filled with twos
Perform prediction on the remote device and print the results
print(await remote_model.predict(inputs={"x": x, "y": y}))
The remote device can also be utilized with other tools, such as MLModelBenchmarker
, TorchMLModelBenchmarker
, MLModelInspector
, MLModelValidator
, and MLModelComparator
, to perform benchmarking and debugging on the remote device.
For instance, MLModelBenchmarker
can be used with a connected device to benchmark the model’s performance directly on the device.
from coremltools.models.ml_program.experimental.perf_utils import MLModelBenchmarker
Initialize the MLModelBenchmarker with the Core ML model and the remote device.
benchmarker = MLModelBenchmarker(model=model, device=prepared_device)
Benchmark operation execution times over 5 iterations with a warmup phase
The warmup ensures that any initialization overhead is excluded from the measurements.
execution_time_measurements = benchmarker.benchmark_operation_execution(iterations=5, warmup=True)
Print the median execution time of the most time-consuming operation
The operations are sorted in descending order of execution time.
print(f"Median execution time of the slowest operation: {execution_time_measurements[0].statistics.median} seconds")