jinaai/jina-embeddings-v2-base-code

feature extractionsentence-transformersensentence-transformerspytorchonnxsafetensorsbertfill-maskapache-2.0
258.4K

control your input sequence length up to 8192

model.max_seq_length = 1024

embeddings = model.encode([ 'How do I access the index while iterating over a sequence with a for loop?', '# Use the built-in enumerator\nfor idx, x in enumerate(xs):\n print(idx, x)', ]) print(cos_sim(embeddings[0], embeddings[1]))


You can also use the [Transformers.js](https://huggingface.co/docs/transformers.js) library to compute embeddings in JavaScript.
```js
// npm i @xenova/transformers
import { pipeline, cos_sim } from '@xenova/transformers';

const extractor = await pipeline('feature-extraction', 'jinaai/jina-embeddings-v2-base-code', {
    quantized: false, // Comment out this line to use the 8-bit quantized version
});

const texts = [
    'How do I access the index while iterating over a sequence with a for loop?',
    '# Use the built-in enumerator\nfor idx, x in enumerate(xs):\n    print(idx, x)',
]
const embeddings = await extractor(texts, { pooling: 'mean' });

const score = cos_sim(embeddings[0].data, embeddings[1].data);
console.log(score);
// 0.7281748759529421

Plans

  1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
  2. Multimodal embedding models enable Multimodal RAG applications.
  3. High-performt rerankers.

Contact

Join our Discord community and chat with other community members about ideas.

DEPLOY IN 60 SECONDS

Run jina-embeddings-v2-base-code on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.