nenad1002 commited on
Commit
5eff434
·
verified ·
1 Parent(s): 97546fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -48,7 +48,7 @@ You can use the model to ask questions about the latest developments in quantum
48
 
49
  Although this model should be able to generalize well, the quantum science terminology and context is very complex, so it might struggle with simplification, hence, should not be used in that context.
50
 
51
- Since there is a risk of possible overfitting in certain cases, the model might be able to answer correctly on some small changes to the questions.
52
 
53
  ## Bias, Risks, and Limitations
54
 
@@ -134,7 +134,7 @@ Given that GPT-4-turbo was already used in this context for the reference questi
134
  | **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
135
  | **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
136
 
137
- _quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001.
138
 
139
  Most other metrics, such as TruthfulQA, MMLU, and similar benchmarks, are not applicable here because this model has been fine-tuned for a very specific domain of knowledge.
140
 
 
48
 
49
  Although this model should be able to generalize well, the quantum science terminology and context is very complex, so it might struggle with simplification, hence, should not be used in that context.
50
 
51
+ Since there is a risk of possible overfitting in certain cases, the model might be able to answer incorrectly on some small changes to the questions.
52
 
53
  ## Bias, Risks, and Limitations
54
 
 
134
  | **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
135
  | **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
136
 
137
+ _quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001. The Gemini model is able to recognize subtle differences in the input better, but lacks the latest knowledge, making it perform worse in general.
138
 
139
  Most other metrics, such as TruthfulQA, MMLU, and similar benchmarks, are not applicable here because this model has been fine-tuned for a very specific domain of knowledge.
140