BAAI/bge-multilingual-gemma2

feature extractionsentence-transformerssentence-transformerssafetensorsgemma2feature-extractionsentence-similaritytransformersgemma
311.9K

[[ 0.559 0.01654 ]

[-0.002575 0.4998 ]]


By default, FlagLLMModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.

### Using Sentence Transformers

```python
from sentence_transformers import SentenceTransformer
import torch

# Load the model, optionally in float16 precision for faster inference
model = SentenceTransformer("BAAI/bge-multilingual-gemma2", model_kwargs={"torch_dtype": torch.float16})

# Prepare a prompt given an instruction
instruction = 'Given a web search query, retrieve relevant passages that answer the query.'
prompt = f'<instruct>{instruction}\n<query>'
# Prepare queries and documents
queries = [
    'how much protein should a female eat',
    'summit define',
]
documents = [
    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
    "Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments."
]

# Compute the query and document embeddings
query_embeddings = model.encode(queries, prompt=prompt)
document_embeddings = model.encode(documents)

# Compute the cosine similarity between the query and document embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5591,  0.0164],
#         [-0.0026,  0.4993]], dtype=torch.float16)

Using HuggingFace Transformers

import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'<instruct>{task_description}\n<query>{query}'


task = 'Given a web search query, retrieve relevant passages that answer the query.'
queries = [
    get_detailed_instruct(task, 'how much protein should a female eat'),
    get_detailed_instruct(task, 'summit define')
]
# No need to add instructions for documents
documents = [
    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
    "Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-multilingual-gemma2')
model = AutoModel.from_pretrained('BAAI/bge-multilingual-gemma2')
model.eval()

max_length = 4096
# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt', pad_to_multiple_of=8)

with torch.no_grad():
	outputs = model(**batch_dict)
	embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
    
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T) * 100
print(scores.tolist())
# [[55.92064666748047, 1.6549524068832397], [-0.2698777914047241, 49.95653533935547]]

Evaluation

bge-multilingual-gemma2 exhibits state-of-the-art (SOTA) results on benchmarks like MIRACL, MTEB-pl, and MTEB-fr. It also achieves excellent performance on other major evaluations, including MTEB, C-MTEB and AIR-Bench.

nDCG@10: MIRACL-nDCG@10

Recall@100: MIRACL-Recall@100

MTEB-fr/pl MTEB BEIR C-MTEB

Long-Doc (en, Recall@10): AIR-Bench_Long-Doc

QA (en&zh, nDCG@10): AIR-Bench_QA

Model List

bge is short for BAAI general embedding.

ModelLanguageDescriptionquery instruction for retrieval [1]
BAAI/bge-multilingual-gemma2Multilingual-A LLM-based multilingual embedding model, trained on a diverse range of languages and tasks.
BAAI/bge-en-iclEnglish-A LLM-based dense retriever with in-context learning capabilities can fully leverage the model's potential based on a few shot examples(4096 tokens)Provide instructions and few-shot examples freely based on the given task.
BAAI/bge-m3MultilingualInference Fine-tuneMulti-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens)
BAAI/llm-embedderEnglishInference Fine-tunea unified embedding model to support diverse retrieval augmentation needs for LLMsSee README
BAAI/bge-reranker-largeChinese and EnglishInference Fine-tunea cross-encoder model which is more accurate but less efficient [2]
BAAI/bge-reranker-baseChinese and EnglishInference Fine-tunea cross-encoder model which is more accurate but less efficient [2]
BAAI/bge-large-en-v1.5EnglishInference Fine-tuneversion 1.5 with more reasonable similarity distributionRepresent this sentence for searching relevant passages:
BAAI/bge-base-en-v1.5EnglishInference Fine-tuneversion 1.5 with more reasonable similarity distributionRepresent this sentence for searching relevant passages:
BAAI/bge-small-en-v1.5EnglishInference Fine-tuneversion 1.5 with more reasonable similarity distributionRepresent this sentence for searching relevant passages:
BAAI/bge-large-zh-v1.5ChineseInference Fine-tuneversion 1.5 with more reasonable similarity distribution为这个句子生成表示以用于检索相关文章:
BAAI/bge-base-zh-v1.5ChineseInference Fine-tuneversion 1.5 with more reasonable similarity distribution为这个句子生成表示以用于检索相关文章:
BAAI/bge-small-zh-v1.5ChineseInference Fine-tuneversion 1.5 with more reasonable similarity distribution为这个句子生成表示以用于检索相关文章:
BAAI/bge-large-enEnglishInference Fine-tune:trophy: rank 1st in MTEB leaderboardRepresent this sentence for searching relevant passages:
BAAI/bge-base-enEnglishInference Fine-tunea base-scale model but with similar ability to bge-large-enRepresent this sentence for searching relevant passages:
BAAI/bge-small-enEnglishInference Fine-tunea small-scale model but with competitive performanceRepresent this sentence for searching relevant passages:
BAAI/bge-large-zhChineseInference Fine-tune:trophy: rank 1st in C-MTEB benchmark为这个句子生成表示以用于检索相关文章:
BAAI/bge-base-zhChineseInference Fine-tunea base-scale model but with similar ability to bge-large-zh为这个句子生成表示以用于检索相关文章:
BAAI/bge-small-zhChineseInference Fine-tunea small-scale model but with competitive performance为这个句子生成表示以用于检索相关文章:

Citation

If you find this repository useful, please consider giving a star :star: and citation

@misc{bge-m3,
      title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, 
      author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
      year={2024},
      eprint={2402.03216},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}



```bibtex
@misc{bge_embedding,
      title={C-Pack: Packaged Resources To Advance General Chinese Embedding}

, author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff}, year={2023}, eprint={2309.07597}, archivePrefix={arXiv}, primaryClass={cs.CL} }

DEPLOY IN 60 SECONDS

Run bge-multilingual-gemma2 on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.