October 14, 2024

Every Byte Matters: Introducing mxbai-embed-xsmall-v1

All posts
Every Byte Matters: Introducing mxbai-embed-xsmall-v1

Reading Time

4 min read

Publish Date

October 14, 2024

Authors

  • Sean Lee

    Sean Lee

  • Julius Lipp

    Julius Lipp

  • Rui Huang

    Rui Huang

  • Darius Koenig

    Darius Koenig

We are happy to introduce mxbai-embed-xsmall-v1, our smallest and most efficient embedding model to date. Despite its small size, it comes with competitive performance, making it ideal for retrieval tasks where resources are limited. It has an Apache 2.0 license and is available on Hugging Face.

Read on to learn more about our approach and to check out our benchmarks. If you want to skip right to the model instead, you can access it here:

Why Embeddings?

Embeddings are the backbone of many natural language processing applications. They transform complex textual data into numerical vectors, allowing us to measure semantic similarity between texts. This capability is crucial for tasks like recommendation systems, search engines, clustering, classification, and especially Retrieval-Augmented Generation (RAG).

In RAG, embeddings enable large language models (LLMs) to access and understand custom data. For instance, if you need a report based on internal documents, an embedding model can encode these documents into a vector database. When you query the system, it retrieves the most relevant information to inform the LLM, allowing it to generate accurate and context-specific responses.

Introducing Our Smallest Embedding Model

We understand that not every application has the luxury of great computational resources. That's why we've developed mxbai-embed-xsmall-v1, a compact yet powerful embedding model optimized for English retrieval tasks. Based on , our model has only 22.7 million parameters and is trained in float16 for efficiency.

Despite its small size, the model supports features like and (MRL), allowing for significant reductions in storage and computational requirements without sacrificing much performance.

Specs

  • Small Footprint: Only 22.7 million parameters and 384 dimensions.
  • Long Context Support: Inputs up to 4096 tokens.
  • Binary Quantization and MRL: Up to efficiency gains.
  • Optimized for English Retrieval: Specifically trained for English retrieval tasks.

Model Evaluation with Benchmarks

Let's dive straight into the performance of our new model across various benchmarks.

MTEB Benchmark

On the (MTEB), our model performs well for its size:

Taskall-MiniLM-L6-v2mxbai-embed-xsmall-v1
ArguAna48.4249.58
SciDOCS21.5821.50
SciFact65.4165.81
NFCorpus31.0032.05
TREC-COVID46.0948.90
Touche202016.3917.09
FiQA201836.1037.10
HotpotQA46.5048.37
MSMARCO (dev)36.5436.76
Fever50.9356.45
NQ43.6744.44
DBPedia32.3032.19
Quora87.5487.70
Climate-Fever20.2922.42
cqadupstack40.6941.59
Average41.5642.80

Long Context Benchmark (LoCo)

Our model supports inputs with a length of up to 4096 tokens, making it suitable for long documents:

ModelGovReportQasperFAQasperFTQMSumSummScreenFDAverage
all-MiniLM-L6-v286.3181.978.8834.8654.7567.34
mxbai-embed-xsmall-v195.694.1586.9126.7578.2776.34

LongEmb Benchmark

We compared our model against :

ModelNarrativeQAQMSumSummScreenFD2WikiMultiQAAverage
all-MiniLM-L6-v215.620.5360.5747.7036.10
mxbai-embed-xsmall-v115.6528.6281.4558.0545.94

Our model shows significant improvements, especially in longer document tasks.

Optimized for Efficiency with Binary Quantization and MRL

One of the standout features of our model is its support for binary quantization and Matryoshka representation learning (MRL). These techniques allow you to drastically reduce the size of your embeddings and speed up computations, making large-scale deployments more feasible and cost-effective.

Matryoshka Representation Learning (MRL)

MRL enables the model to produce embeddings that are still effective even when truncated to smaller dimensions. This allows you to choose an embedding size that balances performance and storage requirements.

We evaluated the model's performance at different embedding sizes:

Dimension (d)SciFactSciDocsNFCorpusArguAnaAverageRatio
38465.8121.5032.0549.5842.2351
25663.6220.5630.7447.6140.63250.962
12858.5517.2026.8140.1135.66750.845
6443.4911.6718.7230.9326.20250.620

As you can see, even at smaller dimensions, the model retains a good portion of its full-dimension performance.

Binary Quantization

By converting embeddings into binary format, you can achieve up to efficiency gains. Learn more about binary quantization .

We also tested the performance of the model with binary quantization:

EncodingSciFactSciDocsNFCorpusArguAnaAverageRatio
Raw65.8121.5032.0549.5842.2351
Binary61.9519.9430.6246.1439.66250.939

The performance drop is minimal and can even be further improved with on-disk rescoring!

Using It in Action

To get started, install the necessary packages:

pip install -U sentence-transformers

Here’s how you can use the model:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
 
# 1. Load the model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-xsmall-v1")
 
# 2. Prepare your data
query = 'What are the best ingredients for sourdough bread?'
docs = [
    query,
    "Our bakery lays at the heart of the city.",
    "The key to great croissants is high-quality butter.",
    "Our signature sourdough uses a 100-year-old starter for optimal flavor.",
    "To create a perfect loaf of sourdough, follow these steps: mix the flour and water, let it sit for 24 hours, knead the dough, shape it, let it rise, bake it, and enjoy!"
]
 
# 3. Encode the texts
embeddings = model.encode(docs)
 
# 4. Calculate cosine similarity
similarities = cos_sim(embeddings[0], embeddings[1:])
print(similarities)

The output will look like this:

tensor([[0.2590, 0.1946, 0.3870, 0.5797]])

How We Built mxbai-embed-xsmall-v1

Building mxbai-embed-xsmall-v1 was a journey focused on maximizing efficiency without compromising performance. Here's how we achieved it:

Base Model and Training

We started with the model, known for its balance between performance and size. We then trained it using the and , techniques that improve the model's ability to generate high-quality embeddings for retrieval tasks.

Focus on Retrieval Tasks

Our primary goal was to create an embedding model optimized for retrieval tasks in English. By tailoring the training data and focusing on relevant loss functions, we made sure that mxbai-embed-xsmall-v1 is great at finding semantically similar texts, which is important for search engines, recommendation systems, and RAG applications.

Why Small Size Matters

In many real-world applications, computational resources are limited. Smaller models like mxbai-embed-xsmall-v1 offer several advantages. They provide faster inference, as reduced computation leads to quicker results. These models are also ideal for deployment on devices with limited memory and processing power due to their lower resource consumption. Additionally, they offer cost-effectiveness by lowering infrastructure costs when scaling up.

Give Us Feedback

We are excited to see how mxbai-embed-xsmall-v1 is used in your projects. Your feedback is helping us improve our models and make them more user-friendly.

Please share your thoughts through our . We are here to assist and always ready to discuss the exciting field of machine learning!

Citation

@online{xsmall2024mxbai,
  title={Every Byte Matters: Introducing mxbai-embed-xsmall-v1},
  author={Sean Lee and Julius Lipp and Rui Huang and Darius Koenig},
  year={2024},
  url={https://www.mixedbread.ai/blog/mxbai-embed-xsmall-v1},
}