File size: 1,548 Bytes
1b6c197 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
base_model:
- Snowflake/snowflake-arctic-embed-m-long
---
# CodeRankEmbed
`CodeRankEmbed` is a 137M bi-encoder supporting 8192 context length for code retrieval. It significantly outperforms various open-source and proprietary code embedding models on various code retrieval tasks.
# Performance Benchmarks
| Name | Parameters | CSN | CoIR |
| :-------------------------------:| :----- | :-------- | :------: |
| **CodeRankEmbed** | 137M | **77.9** |**60.1** |
| CodeSage-Large | 1.3B | 71.2 | 59.4 |
| Jina-Code-v2 | 161M | 67.2 | 58.4 |
| CodeT5+ | 110M | 74.2 | 45.9 |
| Voyage-Code-002 | Unknown | 68.5 | 56.3 |
# Usage
**Important**: the query prompt *must* include the following *task instruction prefix*: "Represent this query for searching relevant code"
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("cornstack/CodeRankEmbed", trust_remote_code=True)
queries = ['Represent this query for searching relevant code: Calculate the n-th Fibonacci number']
codes = ["""def func(n):
if n <= 0:
return "Input should be a positive integer"
elif n == 1:
return 0
elif n == 2:
return 1
else:
a, b = 0, 1
for _ in range(2, n):
a, b = b, a + b
return b
"""]
query_embeddings = model.encode(queries)
print(query_embeddings)
code_embeddings = model.encode(codes)
print(code_embeddings)
```
|