mxbai-rerank-base-v1
API Reference
Reranking
Model Reference
mxbai-rerank-base-v1
Blog Post
Boost Your Search With The Crispy mixedbread Rerank Models
mxbai-rerank-base-v1 is not available via API. Please use mxbai-rerank-large-v1 instead.
Model description
mxbai-rerank-base-v1 is part of the Mixedbread rerank model family, a set of best-in-class reranking models that are fully open-source under the Apache 2.0 license. These models are designed to boost search results by adding a semantic layer to existing search systems, making it easier to find relevant results.
The models were trained using a large collection of real-life search queries and the top-10 results from search engines for these queries. First, a large language model ranked the results according to their relevance to the query. These signals were then used to train the rerank models. Experiments show that these models significantly boost search performance, particularly for complex and domain-specific queries.
When used in combination with a keyword-based search engine, such as Elasticsearch, OpenSearch, or Solr, the rerank model can be added to the end of an existing search workflow, allowing users to incorporate semantic relevance into their keyword-based search system without changing the existing infrastructure. This is an easy, low-complexity method of improving search results by introducing semantic search technology into a user's stack with one line of code.
mxbai-rerank-base-v1 offers the best balance between size and performance in the Mixedbread rerank model family. On a subset of 11 BEIR datasets, mxbai-rerank-base-v1 achieves an NDCG@10 score of 46.9 and an Accuracy@3 score of 72.3, outperforming lexical search and other reranking models of similar size.
Recommended Sequence Length | Language |
---|---|
512 | English |
Suitable Scoring Methods
- Model Output: The model directly scores the relevance of each document to the query. You can use the model output directly. If you want a score between 0 and 1, you can use the sigmoid function on the scores.
Limitations
- Language: mxbai-rerank-base-v1 is trained on English text and is specifically designed for the English language.
- Sequence Length: The suggested maximum sequence length is 512 tokens. Longer sequences may be truncated, leading to a loss of information. Please note that max sequence length is for the query and document combined. It means that
len(query) + len(document)
should not be longer than 512 tokens.