Every Byte Matters: Introducing mxbai-embed-xsmall-v1

We are happy to introduce mxbai-embed-xsmall-v1, our smallest and most efficient embedding model to date. Despite its small size, it comes with competitive performance, making it ideal for retrieval tasks where resources are limited. It has an Apache 2.0 license and is available on Hugging Face.

Read on to learn more about our approach and to check out our benchmarks. If you want to skip right to the model instead, you can access it here:

mxbai-embed-xsmall-v1: Tiny but mighty embedding model optimized for retrieval tasks.

TLDR:
Introducing our smallest and most efficient English embedding model, mxbai-embed-xsmall-v1, offering competitive performance with a small footprint. It supports long context, binary quantization and Matryoshka representation learning!

Why Embeddings?

Embeddings are the backbone of many natural language processing applications. They transform complex textual data into numerical vectors, allowing us to measure semantic similarity between texts. This capability is crucial for tasks like recommendation systems, search engines, clustering, classification, and especially Retrieval-Augmented Generation (RAG).

In RAG, embeddings enable large language models (LLMs) to access and understand custom data. For instance, if you need a report based on internal documents, an embedding model can encode these documents into a vector database. When you query the system, it retrieves the most relevant information to inform the LLM, allowing it to generate accurate and context-specific responses.

Introducing Our Smallest Embedding Model

We understand that not every application has the luxury of great computational resources. That's why we've developed mxbai-embed-xsmall-v1, a compact yet powerful embedding model optimized for English retrieval tasks. Based on sentence-transformers/all-MiniLM-L6-v2, our model has only 22.7 million parameters and is trained in float16 for efficiency.

Despite its small size, the model supports features like binary quantization and Matryoshka representation learning (MRL), allowing for significant reductions in storage and computational requirements without sacrificing much performance.

Specs

Small Footprint: Only 22.7 million parameters and 384 dimensions.
Long Context Support: Inputs up to 4096 tokens.
Binary Quantization and MRL: Up to 32x storage and 40x speed efficiency gains.
Optimized for English Retrieval: Specifically trained for English retrieval tasks.

Model Evaluation with Benchmarks

Let's dive straight into the performance of our new model across various benchmarks.

MTEB Benchmark

On the Massive Text Embedding Benchmark (MTEB), our model performs well for its size:

Task	all-MiniLM-L6-v2	mxbai-embed-xsmall-v1
ArguAna	48.42	49.58
SciDOCS	21.58	21.50
SciFact	65.41	65.81
NFCorpus	31.00	32.05
TREC-COVID	46.09	48.90
Touche2020	16.39	17.09
FiQA2018	36.10	37.10
HotpotQA	46.50	48.37
MSMARCO (dev)	36.54	36.76
Fever	50.93	56.45
NQ	43.67	44.44
DBPedia	32.30	32.19
Quora	87.54	87.70
Climate-Fever	20.29	22.42
cqadupstack	40.69	41.59
Average	41.56	42.80

Long Context Benchmark (LoCo)

Our model supports inputs with a length of up to 4096 tokens, making it suitable for long documents:

Model	GovReport	QasperFA	QasperFT	QMSum	SummScreenFD	Average
all-MiniLM-L6-v2	86.31	81.9	78.88	34.86	54.75	67.34
mxbai-embed-xsmall-v1	95.6	94.15	86.91	26.75	78.27	76.34

LongEmb Benchmark

We compared our model against sentence-transformers/all-MiniLM-L6-v2:

Model	NarrativeQA	QMSum	SummScreenFD	2WikiMultiQA	Average
all-MiniLM-L6-v2	15.6	20.53	60.57	47.70	36.10
mxbai-embed-xsmall-v1	15.65	28.62	81.45	58.05	45.94

Our model shows significant improvements, especially in longer document tasks.

Optimized for Efficiency with Binary Quantization and MRL

One of the standout features of our model is its support for binary quantization and Matryoshka representation learning (MRL). These techniques allow you to drastically reduce the size of your embeddings and speed up computations, making large-scale deployments more feasible and cost-effective.

Matryoshka Representation Learning (MRL)

MRL enables the model to produce embeddings that are still effective even when truncated to smaller dimensions. This allows you to choose an embedding size that balances performance and storage requirements.

We evaluated the model's performance at different embedding sizes:

Dimension (d)	SciFact	SciDocs	NFCorpus	ArguAna	Average	Ratio
384	65.81	21.50	32.05	49.58	42.235	1
256	63.62	20.56	30.74	47.61	40.6325	0.962
128	58.55	17.20	26.81	40.11	35.6675	0.845
64	43.49	11.67	18.72	30.93	26.2025	0.620

As you can see, even at smaller dimensions, the model retains a good portion of its full-dimension performance.

Binary Quantization

By converting embeddings into binary format, you can achieve up to 32x in storage and 40x in computational efficiency gains. Learn more about binary quantization here.

We also tested the performance of the model with binary quantization:

Encoding	SciFact	SciDocs	NFCorpus	ArguAna	Average	Ratio
Raw	65.81	21.50	32.05	49.58	42.235	1
Binary	61.95	19.94	30.62	46.14	39.6625	0.939

The performance drop is minimal and can even be further improved with on-disk rescoring!

Using It in Action

To get started, install the necessary packages:

pip install -U sentence-transformers

Here’s how you can use the model:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
 
# 1. Load the model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-xsmall-v1")
 
# 2. Prepare your data
query = 'What are the best ingredients for sourdough bread?'
docs = [
    query,
    "Our bakery lays at the heart of the city.",
    "The key to great croissants is high-quality butter.",
    "Our signature sourdough uses a 100-year-old starter for optimal flavor.",
    "To create a perfect loaf of sourdough, follow these steps: mix the flour and water, let it sit for 24 hours, knead the dough, shape it, let it rise, bake it, and enjoy!"
]
 
# 3. Encode the texts
embeddings = model.encode(docs)
 
# 4. Calculate cosine similarity
similarities = cos_sim(embeddings[0], embeddings[1:])
print(similarities)

The output will look like this:

tensor([[0.2590, 0.1946, 0.3870, 0.5797]])

How We Built mxbai-embed-xsmall-v1

Building mxbai-embed-xsmall-v1 was a journey focused on maximizing efficiency without compromising performance. Here's how we achieved it:

Base Model and Training

We started with the sentence-transformers/all-MiniLM-L6-v2 model, known for its balance between performance and size. We then trained it using the AnglE loss function and Espresso, techniques that improve the model's ability to generate high-quality embeddings for retrieval tasks.

Focus on Retrieval Tasks

Our primary goal was to create an embedding model optimized for retrieval tasks in English. By tailoring the training data and focusing on relevant loss functions, we made sure that mxbai-embed-xsmall-v1 is great at finding semantically similar texts, which is important for search engines, recommendation systems, and RAG applications.

Why Small Size Matters

In many real-world applications, computational resources are limited. Smaller models like mxbai-embed-xsmall-v1 offer several advantages. They provide faster inference, as reduced computation leads to quicker results. These models are also ideal for deployment on devices with limited memory and processing power due to their lower resource consumption. Additionally, they offer cost-effectiveness by lowering infrastructure costs when scaling up.

Give Us Feedback

We are excited to see how mxbai-embed-xsmall-v1 is used in your projects. Your feedback is helping us improve our models and make them more user-friendly.

Please share your thoughts through our Discord community. We are here to assist and always ready to discuss the exciting field of machine learning!

Citation

@online{xsmall2024mxbai,
  title={Every Byte Matters: Introducing mxbai-embed-xsmall-v1},
  author={Sean Lee and Julius Lipp and Rui Huang and Darius Koenig},
  year={2024},
  url={https://www.mixedbread.ai/blog/mxbai-embed-xsmall-v1},
}

Reading Time

Publish Date

Authors