Alibaba-NLP/gte-multilingual-reranker-base

Name: Alibaba-NLP/gte-multilingual-reranker-base
Rating: 5 (173 reviews)
Author: Alibaba-NLP

text rankingsentence-transformersafarsentence-transformerssafetensorsnewtext-classificationtransformerstext-embeddings-inferenceapache-2.0

173

HuggingFace

245.2K

tensor([1.2315, 0.5923, 0.3041])


Usage with infinity:

[Infinity](https://github.com/michaelfeil/infinity), a MIT Licensed Inference RestAPI Server.

docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997"
michaelf34/infinity:0.0.68
v2 --model-id Alibaba-NLP/gte-multilingual-reranker-base --revision "main" --dtype bfloat16 --batch-size 32 --device cuda --engine torch --port 7997


Usage with [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference):

- CPU:

```bash
docker run --platform linux/amd64 \
  -p 8080:80 \
  -v $PWD/data:/data \
  --pull always \
  ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 \
  --model-id Alibaba-NLP/gte-multilingual-reranker-base

GPU:

docker run --gpus all \
  -p 8080:80 \
  -v $PWD/data:/data \
  --pull always \
  ghcr.io/huggingface/text-embeddings-inference:1.7 \
  --model-id Alibaba-NLP/gte-multilingual-reranker-base

Then you can send requests to the deployed API via the /rerank route (see the Text Embeddings Inference OpenAPI Specification for more details):

curl https://0.0.0.0:8080/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "中国的首都在哪儿",
    "raw_scores": false,
    "return_text": false,
    "texts": [ "北京" ],
    "truncate": true,
    "truncation_direction": "right"
  }'

Evaluation

Results of reranking based on multiple text retreival datasets

More detailed experimental results can be found in the paper.

Cloud API Services

In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud.

Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service.
ReRank Models: The gte-rerank model service is available.

Note that the models behind the commercial APIs are not entirely identical to the open-source models.

Citation

If you find our paper or models helpful, please consider cite:

@inproceedings{zhang2024mgte,
  title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
  author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
  pages={1393--1412},
  year={2024}
}

Deploy Model on Runcrate

Run this model on powerful GPU infrastructure. Deploy in 60 seconds.

Pay per second

H100, A100, RTX GPUs

Instant deployment

DEPLOY IN 60 SECONDS

Run gte-multilingual-reranker-base on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.