October 14, 2024
Every Byte Matters: Introducing mxbai-embed-xsmall-v1
We are happy to introduce mxbai-embed-xsmall-v1, our smallest and most efficient embedding model to date. Despite its small size, it comes with competitive performance, making it ideal for retrieval tasks where resources are limited. It has an Apache 2.0 license and is available on Hugging Face.
Read on to learn more about our approach and to check out our benchmarks. If you want to skip right to the model instead, you can access it here:
- mxbai-embed-xsmall-v1: Tiny but mighty embedding model optimized for retrieval tasks.
TLDR:
Introducing our smallest and most efficient English embedding model, mxbai-embed-xsmall-v1, offering competitive performance with a small footprint. It supports long context, binary quantization and Matryoshka representation learning!
Why Embeddings?
Embeddings are the backbone of many natural language processing applications. They transform complex textual data into numerical vectors, allowing us to measure semantic similarity between texts. This capability is crucial for tasks like recommendation systems, search engines, clustering, classification, and especially Retrieval-Augmented Generation (RAG).
In RAG, embeddings enable large language models (LLMs) to access and understand custom data. For instance, if you need a report based on internal documents, an embedding model can encode these documents into a vector database. When you query the system, it retrieves the most relevant information to inform the LLM, allowing it to generate accurate and context-specific responses.
Introducing Our Smallest Embedding Model
We understand that not every application has the luxury of great computational resources. That's why we've developed mxbai-embed-xsmall-v1, a compact yet powerful embedding model optimized for English retrieval tasks. Based on sentence-transformers/all-MiniLM-L6-v2, our model has only 22.7 million parameters and is trained in float16 for efficiency.
Despite its small size, the model supports features like binary quantization and Matryoshka representation learning (MRL), allowing for significant reductions in storage and computational requirements without sacrificing much performance.
Specs
- Small Footprint: Only 22.7 million parameters and 384 dimensions.
- Long Context Support: Inputs up to 4096 tokens.
- Binary Quantization and MRL: Up to 32x storage and 40x speed efficiency gains.
- Optimized for English Retrieval: Specifically trained for English retrieval tasks.
Model Evaluation with Benchmarks
Let's dive straight into the performance of our new model across various benchmarks.
MTEB Benchmark
On the Massive Text Embedding Benchmark (MTEB), our model performs well for its size:
Task | all-MiniLM-L6-v2 | mxbai-embed-xsmall-v1 |
---|---|---|
ArguAna | 48.42 | 49.58 |
SciDOCS | 21.58 | 21.50 |
SciFact | 65.41 | 65.81 |
NFCorpus | 31.00 | 32.05 |
TREC-COVID | 46.09 | 48.90 |
Touche2020 | 16.39 | 17.09 |
FiQA2018 | 36.10 | 37.10 |
HotpotQA | 46.50 | 48.37 |
MSMARCO (dev) | 36.54 | 36.76 |
Fever | 50.93 | 56.45 |
NQ | 43.67 | 44.44 |
DBPedia | 32.30 | 32.19 |
Quora | 87.54 | 87.70 |
Climate-Fever | 20.29 | 22.42 |
cqadupstack | 40.69 | 41.59 |
Average | 41.56 | 42.80 |
Long Context Benchmark (LoCo)
Our model supports inputs with a length of up to 4096 tokens, making it suitable for long documents:
Model | GovReport | QasperFA | QasperFT | QMSum | SummScreenFD | Average |
---|---|---|---|---|---|---|
all-MiniLM-L6-v2 | 86.31 | 81.9 | 78.88 | 34.86 | 54.75 | 67.34 |
mxbai-embed-xsmall-v1 | 95.6 | 94.15 | 86.91 | 26.75 | 78.27 | 76.34 |
LongEmb Benchmark
We compared our model against sentence-transformers/all-MiniLM-L6-v2:
Model | NarrativeQA | QMSum | SummScreenFD | 2WikiMultiQA | Average |
---|---|---|---|---|---|
all-MiniLM-L6-v2 | 15.6 | 20.53 | 60.57 | 47.70 | 36.10 |
mxbai-embed-xsmall-v1 | 15.65 | 28.62 | 81.45 | 58.05 | 45.94 |
Our model shows significant improvements, especially in longer document tasks.
Optimized for Efficiency with Binary Quantization and MRL
One of the standout features of our model is its support for binary quantization and Matryoshka representation learning (MRL). These techniques allow you to drastically reduce the size of your embeddings and speed up computations, making large-scale deployments more feasible and cost-effective.
Matryoshka Representation Learning (MRL)
MRL enables the model to produce embeddings that are still effective even when truncated to smaller dimensions. This allows you to choose an embedding size that balances performance and storage requirements.
We evaluated the model's performance at different embedding sizes:
Dimension (d) | SciFact | SciDocs | NFCorpus | ArguAna | Average | Ratio |
---|---|---|---|---|---|---|
384 | 65.81 | 21.50 | 32.05 | 49.58 | 42.235 | 1 |
256 | 63.62 | 20.56 | 30.74 | 47.61 | 40.6325 | 0.962 |
128 | 58.55 | 17.20 | 26.81 | 40.11 | 35.6675 | 0.845 |
64 | 43.49 | 11.67 | 18.72 | 30.93 | 26.2025 | 0.620 |
As you can see, even at smaller dimensions, the model retains a good portion of its full-dimension performance.
Binary Quantization
By converting embeddings into binary format, you can achieve up to 32x in storage and 40x in computational efficiency gains. Learn more about binary quantization here.
We also tested the performance of the model with binary quantization:
Encoding | SciFact | SciDocs | NFCorpus | ArguAna | Average | Ratio |
---|---|---|---|---|---|---|
Raw | 65.81 | 21.50 | 32.05 | 49.58 | 42.235 | 1 |
Binary | 61.95 | 19.94 | 30.62 | 46.14 | 39.6625 | 0.939 |
The performance drop is minimal and can even be further improved with on-disk rescoring!
Using It in Action
To get started, install the necessary packages:
Here’s how you can use the model:
The output will look like this:
How We Built mxbai-embed-xsmall-v1
Building mxbai-embed-xsmall-v1 was a journey focused on maximizing efficiency without compromising performance. Here's how we achieved it:
Base Model and Training
We started with the sentence-transformers/all-MiniLM-L6-v2 model, known for its balance between performance and size. We then trained it using the AnglE loss function and Espresso, techniques that improve the model's ability to generate high-quality embeddings for retrieval tasks.
Focus on Retrieval Tasks
Our primary goal was to create an embedding model optimized for retrieval tasks in English. By tailoring the training data and focusing on relevant loss functions, we made sure that mxbai-embed-xsmall-v1 is great at finding semantically similar texts, which is important for search engines, recommendation systems, and RAG applications.
Why Small Size Matters
In many real-world applications, computational resources are limited. Smaller models like mxbai-embed-xsmall-v1 offer several advantages. They provide faster inference, as reduced computation leads to quicker results. These models are also ideal for deployment on devices with limited memory and processing power due to their lower resource consumption. Additionally, they offer cost-effectiveness by lowering infrastructure costs when scaling up.
Give Us Feedback
We are excited to see how mxbai-embed-xsmall-v1 is used in your projects. Your feedback is helping us improve our models and make them more user-friendly.
Please share your thoughts through our Discord community. We are here to assist and always ready to discuss the exciting field of machine learning!