Update README.md
Browse files
README.md
CHANGED
@@ -31,7 +31,7 @@ The successors of [German_Semantic_STS_V2](https://huggingface.co/aari1995/Germa
|
|
31 |
|
32 |
**Note:** To run this model properly, see "Usage".
|
33 |
|
34 |
-
|
35 |
|
36 |
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model. Yet, smaller dimensions bring a minor trade-off in quality.
|
37 |
- **Sequence length:** Embed up to 8192 tokens (16 times more than V2 and other models)
|
@@ -42,7 +42,7 @@ The successors of [German_Semantic_STS_V2](https://huggingface.co/aari1995/Germa
|
|
42 |
- **License:** Apache 2.0
|
43 |
|
44 |
|
45 |
-
|
46 |
|
47 |
This model has some build-in functionality that is rather hidden. To profit from it, use this code:
|
48 |
|
@@ -74,7 +74,7 @@ similarities = model.similarity(embeddings, embeddings)
|
|
74 |
|
75 |
```
|
76 |
|
77 |
-
|
78 |
|
79 |
```
|
80 |
SentenceTransformer(
|
@@ -84,7 +84,7 @@ SentenceTransformer(
|
|
84 |
```
|
85 |
|
86 |
|
87 |
-
|
88 |
|
89 |
**Q: Is this Model better than V2?**
|
90 |
|
@@ -111,17 +111,17 @@ Another noticable difference is that V3 has a broader cosine_similarity spectrum
|
|
111 |
**A:** Broadly speaking, when going from 1024 to 512 dimensions, there is very little trade-off (1 percent). When going down to 64 dimensions, you may face a decrease of up to 3 percent.
|
112 |
|
113 |
|
114 |
-
|
115 |
|
116 |
Storage comparison:
|
117 |

|
118 |
|
119 |
Benchmarks: soon.
|
120 |
|
121 |
-
|
122 |
-
German_Semantic_V3_Instruct: Guiding your embeddings towards self-selected aspects
|
123 |
|
124 |
-
|
125 |
|
126 |
- To [jinaAI](https://huggingface.co/jinaai) for their BERT implementation that is used, especially ALiBi
|
127 |
- To [deepset](https://huggingface.co/deepset) for the gbert-large, which is a really great model
|
|
|
31 |
|
32 |
**Note:** To run this model properly, see "Usage".
|
33 |
|
34 |
+
# Major updates and USPs:
|
35 |
|
36 |
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model. Yet, smaller dimensions bring a minor trade-off in quality.
|
37 |
- **Sequence length:** Embed up to 8192 tokens (16 times more than V2 and other models)
|
|
|
42 |
- **License:** Apache 2.0
|
43 |
|
44 |
|
45 |
+
# Usage:
|
46 |
|
47 |
This model has some build-in functionality that is rather hidden. To profit from it, use this code:
|
48 |
|
|
|
74 |
|
75 |
```
|
76 |
|
77 |
+
## Full Model Architecture
|
78 |
|
79 |
```
|
80 |
SentenceTransformer(
|
|
|
84 |
```
|
85 |
|
86 |
|
87 |
+
# FAQ
|
88 |
|
89 |
**Q: Is this Model better than V2?**
|
90 |
|
|
|
111 |
**A:** Broadly speaking, when going from 1024 to 512 dimensions, there is very little trade-off (1 percent). When going down to 64 dimensions, you may face a decrease of up to 3 percent.
|
112 |
|
113 |
|
114 |
+
# Evaluation
|
115 |
|
116 |
Storage comparison:
|
117 |

|
118 |
|
119 |
Benchmarks: soon.
|
120 |
|
121 |
+
# Up next:
|
122 |
+
German_Semantic_V3_Instruct: Guiding your embeddings towards self-selected aspects. - planned: 2024.
|
123 |
|
124 |
+
# Thank You and Credits
|
125 |
|
126 |
- To [jinaAI](https://huggingface.co/jinaai) for their BERT implementation that is used, especially ALiBi
|
127 |
- To [deepset](https://huggingface.co/deepset) for the gbert-large, which is a really great model
|