Minish (original) (raw)
Hello, we're Minish!
About us
We're a two-person (pringled and stephantul) open-source lab, with a focus on Natural Language Processing.
We believe that if you make models fast enough, you unlock new possibilities.
Using our models and packages, you can:
- Embed the entire English Wikipedia in 5 minutes
- Classify tens of thousands of documents per second on a CPU
- Approximately deduplicate extremely large datasets in minutes
- Build the fastest RAG application in the world
- Easily evaluate which ANN algorithm works best for your data
Our projects:
- model2vec: tiny static embedding models with state-of-the-art performance.
- potion: the best small models in the world. 100-500x faster than a sentence-transformer, and almost as good.
- vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
- semhash: lightning-fast, super accuracte, semantic deduplication and filtering for your text datasets.
- model2vec-rs: a Rust port of model2vec.
You can also find us on:
- 🤗 huggingface
- đź‘˝ LinkedIn
- đź’¬ Discord
Pinned Loading
- Fast State-of-the-Art Static Embeddings
Python 2k 120 - Fast Multimodal Semantic Deduplication & Filtering
Python 909 56 - Lightweight Nearest Neighbors with Flexible Backends
Python 336 10 - Pre-train Static Word Embeddings
Python 98 8 - Official Rust Implementation of Model2Vec
Rust 167 17
Repositories
Showing 10 of 10 repositories
- model2vec-rs Public
Official Rust Implementation of Model2Vec
MinishLab/model2vec-rs’s past year of commit activity
Rust
167
MIT
17 0 1
Updated Apr 10, 2026 - vicinity Public
Lightweight Nearest Neighbors with Flexible Backends
MinishLab/vicinity’s past year of commit activity
Python
336
MIT
10 1 1
Updated Apr 10, 2026 - model2vec Public
Fast State-of-the-Art Static Embeddings
MinishLab/model2vec’s past year of commit activity - MinishLab/docs’s past year of commit activity
MDX 0
2 0 0
Updated Mar 30, 2026 - tokenlearn Public
Pre-train Static Word Embeddings
MinishLab/tokenlearn’s past year of commit activity
Python
98
MIT
8 1 0
Updated Mar 27, 2026 - MinishLab/.github’s past year of commit activity
0 0
0 0
Updated Mar 16, 2026 - semhash Public
Fast Multimodal Semantic Deduplication & Filtering
MinishLab/semhash’s past year of commit activity
Python
909
MIT
56 0 0
Updated Jan 20, 2026 - evaluation Public
Code to evaluate performance for embeddings
MinishLab/evaluation’s past year of commit activity
Python
11
MIT 0
0 0
Updated Sep 20, 2025 - MinishLab/minishlab.github.io’s past year of commit activity
SCSS 0 MIT
1 0 0
Updated Jun 1, 2025 - MinishLab/watertemplate’s past year of commit activity
Makefile
4
MIT
3 0 1
Updated Dec 9, 2024