Update README.md
Browse files
README.md
CHANGED
@@ -280,11 +280,34 @@ model-index:
|
|
280 |
|
281 |
The successor of German_Semantic_STS_V2 is here!
|
282 |
|
283 |
-
## Major updates:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
284 |
|
285 |
-
- **Sequence length: 8192, (16 times more than V2 and other models) => thanks to the alibi implementation of Jina-Team!**
|
286 |
-
- **Matryoshka Embeddings: Your embeddings can be sized from 1024 down to 64**
|
287 |
-
- **License: Apache 2.0**
|
288 |
|
289 |
|
290 |
## Model Details
|
|
|
280 |
|
281 |
The successor of German_Semantic_STS_V2 is here!
|
282 |
|
283 |
+
## Major updates and USPs:
|
284 |
+
|
285 |
+
- **Sequence length:** 8192, (16 times more than V2 and other models) => thanks to the ALiBi implementation of Jina-Team!
|
286 |
+
- **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
|
287 |
+
- **License:** Apache 2.0
|
288 |
+
- **German only:** This model is German-only, causing the model to learn more efficient and deal better with shorter queries.
|
289 |
+
|
290 |
+
## Usage:
|
291 |
+
|
292 |
+
```python
|
293 |
+
from sentence_transformers import SentenceTransformer
|
294 |
+
|
295 |
+
|
296 |
+
matryoshka_dim = 1024 # How big your embeddings should be, choose from: 64, 128, 256, 512, 1024
|
297 |
+
model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True, truncate_dim=matryoshka_dim)
|
298 |
+
|
299 |
+
# Run inference
|
300 |
+
sentences = [
|
301 |
+
'Eine Flagge weht.',
|
302 |
+
'Die Flagge bewegte sich in der Luft.',
|
303 |
+
'Zwei Personen beobachten das Wasser.',
|
304 |
+
]
|
305 |
+
embeddings = model.encode(sentences)
|
306 |
+
|
307 |
+
# Get the similarity scores for the embeddings
|
308 |
+
similarities = model.similarity(embeddings, embeddings)
|
309 |
+
```
|
310 |
|
|
|
|
|
|
|
311 |
|
312 |
|
313 |
## Model Details
|