nenad1002
/

quantum-research-bot-v1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

nenad1002 commited on Sep 3, 2024

Commit

5eff434

·

verified ·

1 Parent(s): 97546fe

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -48,7 +48,7 @@ You can use the model to ask questions about the latest developments in quantum
 Although this model should be able to generalize well, the quantum science terminology and context is very complex, so it might struggle with simplification, hence, should not be used in that context.
-Since there is a risk of possible overfitting in certain cases, the model might be able to answer correctly on some small changes to the questions.
 ## Bias, Risks, and Limitations
@@ -134,7 +134,7 @@ Given that GPT-4-turbo was already used in this context for the reference questi
 | **ROUGE-2**|  0.4098          | 0.1751    | 0.3104 |
 | **ROUGE-L**| 0.5809          |  0.2902    | 0.4856  |
-_quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001.
 Most other metrics, such as TruthfulQA, MMLU, and similar benchmarks, are not applicable here because this model has been fine-tuned for a very specific domain of knowledge.

 Although this model should be able to generalize well, the quantum science terminology and context is very complex, so it might struggle with simplification, hence, should not be used in that context.
+Since there is a risk of possible overfitting in certain cases, the model might be able to answer incorrectly on some small changes to the questions.
 ## Bias, Risks, and Limitations
 | **ROUGE-2**|  0.4098          | 0.1751    | 0.3104 |
 | **ROUGE-L**| 0.5809          |  0.2902    | 0.4856  |
+_quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001. The Gemini model is able to recognize subtle differences in the input better, but lacks the latest knowledge, making it perform worse in general.
 Most other metrics, such as TruthfulQA, MMLU, and similar benchmarks, are not applicable here because this model has been fine-tuned for a very specific domain of knowledge.