Make Text Embedding Server compatible (original) (raw)

I have cloned embedding model udever-bloom-1b1.
At first it asked for onnx files as it didnt have so i convert them to onnx.
Then pushed to my repo here
I also configured config.json with {"max_position_embeddings": 2048,}. After that copied 1_Pooling/config.json from jinaai embeddings .

{
  "word_embedding_dimension": 1536,
  "pooling_mode_cls_token": false,
  "pooling_mode_mean_tokens": true,
  "pooling_mode_max_tokens": false,
  "pooling_mode_mean_sqrt_len_tokens": false
}

Now when running embedding server

2024-08-05T09:52:57.224194Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-08-05T09:52:57.307450Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-05T09:53:00.166636Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-05T09:53:00.405920Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-05T09:53:00.405936Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-05T09:53:00.885433Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-05T09:53:05.337696Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-08-05T09:53:05.990842Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 5.584921999s
2024-08-05T09:53:06.456293Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[BOQ]' was expected to have ID '250680' but was given ID 'None'    
2024-08-05T09:53:06.456307Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[EOQ]' was expected to have ID '250681' but was given ID 'None'    
2024-08-05T09:53:06.456310Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[BOD]' was expected to have ID '250682' but was given ID 'None'    
2024-08-05T09:53:06.456313Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[EOD]' was expected to have ID '250683' but was given ID 'None'    
2024-08-05T09:53:06.457094Z  WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
2024-08-05T09:53:06.457162Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 2048
2024-08-05T09:53:06.457515Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 4 tokenization workers
2024-08-05T09:53:06.936365Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor h.8.input_layernorm.weight failed.GetFileLength for /data/models--Saugatkafley--udever-bloom-1b1-onnx/snapshots/698acf469fd193b51dd1125dbd460c8258c7b606/model.onnx_data failed:Invalid fd was supplied: -1

regisss August 7, 2024, 10:12am 2

Hi @Saugatkafley, can you also share the command you used to launch the server? And which Docker image did you use?

I have used cpu version , I dont have GPU.
How do i make an embedding model compatible to Server .
What are the requirements?

Run embedding server sh file

MODEL="Saugatkafley/udever-bloom-1b1-onnx"
VOLUME=/home/saugat/Desktop/UBUNTU_desktop/MODELS/embeddings

docker run -p 8080:80 -v <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi><mi>O</mi><mi>L</mi><mi>U</mi><mi>M</mi><mi>E</mi><mo>:</mo><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mo>−</mo><mo>−</mo><mi>p</mi><mi>u</mi><mi>l</mi><mi>l</mi><mi>a</mi><mi>l</mi><mi>w</mi><mi>a</mi><mi>y</mi><mi>s</mi><mi>g</mi><mi>h</mi><mi>c</mi><mi>r</mi><mi mathvariant="normal">.</mi><mi>i</mi><mi>o</mi><mi mathvariant="normal">/</mi><mi>h</mi><mi>u</mi><mi>g</mi><mi>g</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>f</mi><mi>a</mi><mi>c</mi><mi>e</mi><mi mathvariant="normal">/</mi><mi>t</mi><mi>e</mi><mi>x</mi><mi>t</mi><mo>−</mo><mi>e</mi><mi>m</mi><mi>b</mi><mi>e</mi><mi>d</mi><mi>d</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>s</mi><mo>−</mo><mi>i</mi><mi>n</mi><mi>f</mi><mi>e</mi><mi>r</mi><mi>e</mi><mi>n</mi><mi>c</mi><mi>e</mi><mo>:</mo><mi>c</mi><mi>p</mi><mi>u</mi><mo>−</mo><mn>1.5</mn><mo>−</mo><mo>−</mo><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi><mo>−</mo><mi>i</mi><mi>d</mi></mrow><annotation encoding="application/x-tex">VOLUME:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mord mathnormal" style="margin-right:0.02778em;">O</span><span class="mord mathnormal" style="margin-right:0.10903em;">LU</span><span class="mord mathnormal" style="margin-right:0.05764em;">ME</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">:</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">−</span><span class="mord mathnormal">p</span><span class="mord mathnormal">u</span><span class="mord mathnormal" style="margin-right:0.01968em;">ll</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.02691em;">lw</span><span class="mord mathnormal">a</span><span class="mord mathnormal">ys</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">h</span><span class="mord mathnormal" style="margin-right:0.02778em;">cr</span><span class="mord">.</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord">/</span><span class="mord mathnormal">h</span><span class="mord mathnormal" style="margin-right:0.03588em;">ugg</span><span class="mord mathnormal">in</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">a</span><span class="mord mathnormal">ce</span><span class="mord">/</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">x</span><span class="mord mathnormal">t</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">e</span><span class="mord mathnormal">mb</span><span class="mord mathnormal">e</span><span class="mord mathnormal">dd</span><span class="mord mathnormal">in</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">s</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">in</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">ere</span><span class="mord mathnormal">n</span><span class="mord mathnormal">ce</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">:</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">c</span><span class="mord mathnormal">p</span><span class="mord mathnormal">u</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em;"></span><span class="mord">1.5</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.0833em;"></span><span class="mord">−</span><span class="mord mathnormal">m</span><span class="mord mathnormal">o</span><span class="mord mathnormal">d</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">i</span><span class="mord mathnormal">d</span></span></span></span>MODEL