Embedding Models
Explore the delicious Mixedbread embed family, featuring state-of-the-art performance, size efficiency, and open-source availability. Elevate your search, classification, recommendation, and more.
Mixedbread embed is our flagship embedding model family. Enjoy easy access and stellar performance that can help you elevate your retrieval pipeline. Use embeddings for search, classification, recommendation, and other impactful tasks.
What's new in the Mixedbread embed family?
The mixedbread embed family has recently seen exciting developments:
- Release of our German/English embedding model deepset-mxbai-embed-de-large-v1
- Release of our English embedding model mxbai-embed-large-v1
- Introduction of the 2D embedding model mxbai-embed-2d-large-v1
Coming Soon: We are currently working on specialized models to extend the family! Please feel free to contact us for more information.
Model Family
Here's an overview of our current model lineup:
Model | Status | Context Length | Dimension | MTEB Average |
---|---|---|---|---|
deepset-mxbai-embed-de-large-v1 | API available | 512 | 1024 | 64.68 |
mxbai-embed-large-v1 | API available | 512 | 1024 | 64.68 |
mxbai-embed-2d-large-v1 | API available - Research preview | 512 | 1024 (base) | 63.25 (base) |
Why Choose Mixedbread Embeddings
The Mixedbread embed family offers several advantages:
- Powerful Performance: State-of-the-art results on benchmarks
- Size Efficiency: Optimized for resource utilization
- Open-Source: Fully accessible and customizable
- Versatility: Suitable for various NLP tasks
Performance Comparison
Our new mxbai-embed-large-v1 model outperforms other similarly sized open models and even surpasses some closed-source models on the MTEB benchmark:
Model | Avg (56 datasets) |
---|---|
mxbai-embed-large-v1 | 64.68 |
bge-large-en-v1.5 | 64.23 |
jina-embeddings-v2-base-en | 60.38 |
OpenAI text-embedding-3-large (Proprietary) | 64.58 |
Cohere embed-english-v3.0 (Proprietary) | 64.47 |
API Benefits
While you can use our open-source models directly, our API offers additional advantages:
- Enhanced Performance: API-exclusive versions offer improvements like better int8-quantization and take advantage of our optimized inference pipeline.
- Calibration Data: Generated using over 50 million samples for more accurate float32 to int8 mapping
- Faster Response Times: Optimized for low-latency retrieval tasks