Update README.md
Browse files
README.md
CHANGED
@@ -48,7 +48,7 @@ You can use the model to ask questions about the latest developments in quantum
|
|
48 |
|
49 |
Although this model should be able to generalize well, the quantum science terminology and context is very complex, so it might struggle with simplification, hence, should not be used in that context.
|
50 |
|
51 |
-
Since there is a risk of possible overfitting in certain cases, the model might be able to answer
|
52 |
|
53 |
## Bias, Risks, and Limitations
|
54 |
|
@@ -134,7 +134,7 @@ Given that GPT-4-turbo was already used in this context for the reference questi
|
|
134 |
| **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
|
135 |
| **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
|
136 |
|
137 |
-
_quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001.
|
138 |
|
139 |
Most other metrics, such as TruthfulQA, MMLU, and similar benchmarks, are not applicable here because this model has been fine-tuned for a very specific domain of knowledge.
|
140 |
|
|
|
48 |
|
49 |
Although this model should be able to generalize well, the quantum science terminology and context is very complex, so it might struggle with simplification, hence, should not be used in that context.
|
50 |
|
51 |
+
Since there is a risk of possible overfitting in certain cases, the model might be able to answer incorrectly on some small changes to the questions.
|
52 |
|
53 |
## Bias, Risks, and Limitations
|
54 |
|
|
|
134 |
| **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
|
135 |
| **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
|
136 |
|
137 |
+
_quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001. The Gemini model is able to recognize subtle differences in the input better, but lacks the latest knowledge, making it perform worse in general.
|
138 |
|
139 |
Most other metrics, such as TruthfulQA, MMLU, and similar benchmarks, are not applicable here because this model has been fine-tuned for a very specific domain of knowledge.
|
140 |
|