Text Ranking
sentence-transformers
Safetensors
Transformers
new
text-classification
text-embeddings-inference
custom_code
Instructions to use Alibaba-NLP/gte-multilingual-reranker-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-multilingual-reranker-base with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Alibaba-NLP/gte-multilingual-reranker-base", trust_remote_code=True) query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Transformers
How to use Alibaba-NLP/gte-multilingual-reranker-base with Transformers:
# Load model directly from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("Alibaba-NLP/gte-multilingual-reranker-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Some questions about the results in Table 5
#17
by begonie - opened
I am trying to reproduce the evaluation metrics provided by gte-multilingual-reranker-base
Retrieval model: gte-multilingual-base
Ranker model: gte-multilingual-reranker-base
datastes: MLDR(nDCG@10[13])
CMD:
python -m FlagEmbedding.evaluation.mldr \
--eval_name mldr \
--dataset_dir ./mldr/data \
--dataset_names ar de en es fr hi it ja ko pt ru th zh \
--splits test \
--corpus_embd_save_dir ./mldr/corpus_embd \
--output_dir ./mldr/search_results \
--search_top_k 1000 \
--rerank_top_k 100 \
--overwrite False \
--k_values 10 100 \
--eval_output_method markdown \
--eval_output_path ./mldr/mldr_eval_results.md \
--eval_metrics ndcg_at_10 \
--embedder_name_or_path Alibaba-NLP/gte-multilingual-base \
--reranker_name_or_path Alibaba-NLP/gte-multilingual-reranker-base \
--embedder_passage_max_length 8192 \
--reranker_max_length 8192 \
--trust_remote_code True \
--embedder_batch_size 64 \
--reranker_batch_size 64
Result:
| Model | Reranker | average | ar-test | de-test | en-test | es-test | fr-test | hi-test | it-test | ja-test | ko-test | pt-test | ru-test | th-test | zh-test |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gte-multilingual-base | gte-multilingual-reranker-base | 72.875 | 77.082 | 68.048 | 69.663 | 94.798 | 88.294 | 65.428 | 82.078 | 67.169 | 70.880 | 88.400 | 83.732 | 47.039hh | 44.763 |
| gte-multilingual-base | NoReranker | 56.602 | 54.981 | 55.155 | 51.032 | 81.228 | 76.218 | 45.197 | 66.926 | 52.053 | 46.773 | 79.298 | 64.037 | 35.472 | 27.461 |
I have a question. The score of gte-multilingual-base, 56.6, is consistent with that in the Table. However, after adding gte-multilingual-reranker-base, the score is only 72.875, which is not consistent with the 78.7 provided in the article. Is there something wrong with the usage?
begonie changed discussion title from Table 5 中结果的一些疑问 to Some questions about the results in Table 5