--- base_model: - Snowflake/snowflake-arctic-embed-m-long --- # CodeRankEmbed `CodeRankEmbed` is a 137M bi-encoder supporting 8192 context length for code retrieval. It significantly outperforms various open-source and proprietary code embedding models on various code retrieval tasks. # Performance Benchmarks | Name | Parameters | CSN | CoIR | | :-------------------------------:| :----- | :-------- | :------: | | **CodeRankEmbed** | 137M | **77.9** |**60.1** | | CodeSage-Large | 1.3B | 71.2 | 59.4 | | Jina-Code-v2 | 161M | 67.2 | 58.4 | | CodeT5+ | 110M | 74.2 | 45.9 | | Voyage-Code-002 | Unknown | 68.5 | 56.3 | # Usage **Important**: the query prompt *must* include the following *task instruction prefix*: "Represent this query for searching relevant code" ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("cornstack/CodeRankEmbed", trust_remote_code=True) queries = ['Represent this query for searching relevant code: Calculate the n-th Fibonacci number'] codes = ["""def func(n): if n <= 0: return "Input should be a positive integer" elif n == 1: return 0 elif n == 2: return 1 else: a, b = 0, 1 for _ in range(2, n): a, b = b, a + b return b """] query_embeddings = model.encode(queries) print(query_embeddings) code_embeddings = model.encode(codes) print(code_embeddings) ```