mxbai-rerank-large-v1
mxbai-rerank-large-v1 is the flagship model in the Mixedbread rerank family, offering great accuracy and performance for semantic search enhancement. This open-source reranking model excels at boosting search results, particularly for complex and domain-specific queries, and can be seamlessly integrated into existing keyword-based search systems.
API Reference
Reranking
Model Reference
mxbai-rerank-large-v1
Blog Post
Boost Your Search With The Crispy Mixedbread Rerank Models
Model description
mxbai-rerank-large-v1 is part of the Mixedbread rerank family, a set of best-in-class reranking models that are fully open-source under the Apache 2.0 license. These models are designed to boost search results by adding a semantic layer to existing search systems, making it easier to find relevant results.
The models were trained using a large collection of real-life search queries and the top-10 results from search engines for these queries. First, a large language model ranked the results according to their relevance to the query. These signals were then used to train the rerank models. Experiments show that these models significantly boost search performance, particularly for complex and domain-specific queries.
When used in combination with a keyword-based search engine, such as Elasticsearch, OpenSearch, or Solr, the reranking model can be added to the end of an existing search workflow, allowing users to incorporate semantic relevance into their keyword-based search system without changing the existing infrastructure. This is an easy, low-complexity method of improving search results by introducing semantic search technology into a user's stack with one line of code.
mxbai-rerank-large-v1 is the largest model in the family, delivering the highest accuracy and performance. On a subset of 11 BEIR datasets, mxbai-rerank-large-v1 achieves an NDCG@10 score of 48.8 and an Accuracy@3 score of 74.9, outperforming lexical search and other reranking models of similar or larger size.
Recommended Sequence Length | Language |
---|---|
512 | English |
Suitable Scoring Methods
- Model Output: The model directly scores the relevance of each document to the query. You can use the model output directly. If you want a score between 0 and 1, you can use the sigmoid function on the scores.
Limitations
- Language: mxbai-rerank-large-v1 is trained on English text and is specifically designed for the English language.
- Sequence Length: The suggested maximum sequence length is 512 tokens. Longer sequences may be truncated, leading to a loss of information. Please note that max sequence length is for the query and document combined. It means that
len(query) + len(document)
should not be longer than 512 tokens.
Examples
Reranking
The following code demonstrates how to use mxbai-rerank-large-v1 to rerank a list of documents based on their relevance to a given query.