The new sparse encoding model is not deployable (original) (raw)

August 30, 2024, 7:29am 1

OpenSearch 2.16

Hi all, I appreciate your help with the following. I am trying to test the new Improving search efficiency and accuracy with the newest v2 neural sparse models · OpenSearch. I tried the following but they all failed.

POST /_plugins/_ml/models/_register?deploy=true
{
“name”: “amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill”,
“version”: “1.0.0”,
“model_group_id”: “JgsYopEBNoOdJzL1DevN”,
“function_name”: “SPARSE_ENCODING”,
“model_format”: “TORCH_SCRIPT”,
“model_content_size_in_bytes”: 268867313,
“model_content_hash_value”: “a7a80f911838c402d74a7ce05e20672642fc63aafaa982b1055ab277abe808d2”,
“url”:“https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v2-distill-1.0.0-torch_script.zip
}

POST /_plugins/_ml/models/_register?deploy=true
{
“name”: “amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill”,
“version”: “1.0.0”,
“model_format”: “TORCH_SCRIPT”
}

The error I am getting is shown below.

having an enabled model controller. Please use the create model controller api to create one if this is unexpected.
[2024-08-30T10:27:39,955][ERROR][o.o.m.e.a.DLModel ] [LAPTOP-C0J5I3L2] Failed to deploy model mQsuopEBNoOdJzL1buu9
ai.djl.engine.EngineException: open file failed because of errno 2 on fopen: No such file or directory, file path: D:\my-files\work\opensearch-releases\2.16.0\opensearch-2.16.0-windows-x64\opensearch-2.16.0\data\ml_cache\models_cache\models\mQsuopEBNoOdJzL1buu9\3\amazon\neural-sparse\opensearch-neural-sparse-encoding-v2-distill\opensearch-neural-sparse-encoding-v2-distill.pt
at ai.djl.pytorch.jni.PyTorchLibrary.moduleLoad(Native Method) ~[pytorch-engine-0.28.0.jar:?]
at ai.djl.pytorch.jni.JniUtils.loadModule(JniUtils.java:1756) ~[pytorch-engine-0.28.0.jar:?]
at ai.djl.pytorch.engine.PtModel.load(PtModel.java:99) ~[pytorch-engine-0.28.0.jar:?]
at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:166) ~[api-0.28.0.jar:?]
at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:174) ~[api-0.28.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:217) ~[opensearch-ml-algorithms-2.16.0.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:286) [opensearch-ml-algorithms-2.16.0.0.jar:?]
at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) [?:?]
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:252) [opensearch-ml-algorithms-2.16.0.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:142) [opensearch-ml-algorithms-2.16.0.0.jar:?]
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.16.0.0.jar:?]
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1083) [opensearch-ml-2.16.0.0.jar:2.16.0.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.16.0.jar:2.16.0]
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$73(MLModelManager.java:1703) [opensearch-ml-2.16.0.0.jar:2.16.0.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.16.0.jar:2.16.0]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.16.0.jar:2.16.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) [opensearch-2.16.0.jar:2.16.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.16.0.jar:2.16.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2024-08-30T10:27:40,014][ERROR][o.o.m.m.MLModelManager ] [LAPTOP-C0J5I3L2] Failed to retrieve model mQsuopEBNoOdJzL1buu9
org.opensearch.ml.common.exception.MLException: Failed to deploy model mQsuopEBNoOdJzL1buu9
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:300) ~[?:?]
at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:252) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:142) ~[?:?]
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?]
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1083) ~[?:?]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.16.0.jar:2.16.0]
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$73(MLModelManager.java:1703) [opensearch-ml-2.16.0.0.jar:2.16.0.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.16.0.jar:2.16.0]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.16.0.jar:2.16.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) [opensearch-2.16.0.jar:2.16.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.16.0.jar:2.16.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: ai.djl.engine.EngineException: open file failed because of errno 2 on fopen: No such file or directory, file path: D:\my-files\work\opensearch-releases\2.16.0\opensearch-2.16.0-windows-x64\opensearch-2.16.0\data\ml_cache\models_cache\models\mQsuopEBNoOdJzL1buu9\3\amazon\neural-sparse\opensearch-neural-sparse-encoding-v2-distill\opensearch-neural-sparse-encoding-v2-distill.pt
at ai.djl.pytorch.jni.PyTorchLibrary.moduleLoad(Native Method) ~[?:?]
at ai.djl.pytorch.jni.JniUtils.loadModule(JniUtils.java:1756) ~[?:?]
at ai.djl.pytorch.engine.PtModel.load(PtModel.java:99) ~[?:?]
at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:166) ~[?:?]
at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:174) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:217) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:286) ~[?:?]
… 14 more
[2024-08-30T10:27:40,020][ERROR][o.o.m.a.f.TransportForwardAction] [LAPTOP-C0J5I3L2] deploy model failed on all nodes, model id: mQsuopEBNoOdJzL1buu9
[2024-08-30T10:27:40,020][INFO ][o.o.m.a.f.TransportForwardAction] [LAPTOP-C0J5I3L2] deploy model done with state: DEPLOY_FAILED, model id: mQsuopEBNoOdJzL1buu9
[2024-08-30T10:27:40,021][INFO ][o.o.m.a.d.TransportDeployModelOnNodeAction] [LAPTOP-C0J5I3L2] deploy model task done mgsuopEBNoOdJzL1tuvT

############################################################
Note that I was able to deploy the old sparse encoding using the following:
POST /_plugins/_ml/models/_register?deploy=true
{
“name”: “amazon/neural-sparse/opensearch-neural-sparse-encoding-v1”,
“version”: “1.0.0”,
“model_group_id”: “JgsYopEBNoOdJzL1DevN”,
“function_name”: “SPARSE_ENCODING”,
“model_format”: “TORCH_SCRIPT”,
“model_content_size_in_bytes”: 492184214,
“model_content_hash_value”: “d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8”,
“url”:“https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip
}

When you try _register API with deploy query parameter, opensearch itself lets djl to loadModel(if not download pytorch files). Please check the path /opensearch/data/ml_cache/ has two directory: pytorch and tokenizers. The files(.so format) in these two folders should be set first when deploying models.

I am getting the error in Windows 11. Also, I do have pytorch and tokernizer folders with all dlls.

It worked after removing “amazon” from the model name!

POST /_plugins/_ml/models/_register?deploy=true
{
“name”: “neural-sparse/opensearch-neural-sparse-encoding-v2-distill”,
“version”: “1.0.0”,
“model_group_id”: “S6qUp5EBwNsUHK65A66F”,
“function_name”: “SPARSE_ENCODING”,
“model_format”: “TORCH_SCRIPT”,
“model_content_size_in_bytes”: 268867313,
“model_content_hash_value”: “a7a80f911838c402d74a7ce05e20672642fc63aafaa982b1055ab277abe808d2”,
“url”:“https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v2-distill-1.0.0-torch_script.zip
}