emnlp 2023
commited on
Commit
·
5bcec32
1
Parent(s):
48b6599
Update README.md
Browse files
README.md
CHANGED
@@ -67,13 +67,13 @@ which is subsequently served by extending model's decoder input context by addin
|
|
67 |
- **Developed by:** Anonymous
|
68 |
- **Model type:** Autoregressive Encoder-Decoder
|
69 |
- **Language(s):** en
|
70 |
-
- **Finetuned from:**
|
71 |
|
72 |
### Model Sources
|
73 |
|
74 |
<!-- Provide the basic links for the model. -->
|
75 |
|
76 |
-
- **Repository:** https://github.com/
|
77 |
- **Paper:** Stay tuned!
|
78 |
|
79 |
## Usage
|
@@ -82,8 +82,8 @@ Additionally to conventional generation, using Tool-augmented generation require
|
|
82 |
(1) implementation of the tool(s) and
|
83 |
(2) a customization of generate() method augmenting input context on-demand with the outputs of the tools.
|
84 |
|
85 |
-
You can find these two components implemented in the
|
86 |
-
|
87 |
|
88 |
After adding these two scripts to your directory, you can use the model as follows:
|
89 |
|
@@ -130,24 +130,17 @@ Final result is<result>800</result></s>
|
|
130 |
Note that given the limited scope of the exercises' complexity in the training, this model will not work well for tasks requiring
|
131 |
more complex algebraic operations, including equations, variables and operations outside the scope of (+-*/).
|
132 |
|
|
|
133 |
## Training Details
|
134 |
|
135 |
### Training Data
|
136 |
-
|
137 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
138 |
|
139 |
-
|
140 |
-
[
|
141 |
-
[gsm8k HF
|
142 |
-
[aqua_rat](https://huggingface.co/datasets/aqua_rat)
|
|
|
143 |
in a standard auto-regressive setup i.e. for a conditional next-token prediction with teacher-forced prefix.
|
144 |
|
145 |
-
### Training Procedure
|
146 |
-
|
147 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
148 |
-
|
149 |
-
The model was fine-tuned from [google/calc-t5-large](https://huggingface.co/google/calc-t5-large) for TODO steps
|
150 |
-
aiming to maximise exact-match ration on a validation split of the questions from [gsm8k dataset](https://huggingface.co/datasets/gsm8k).
|
151 |
-
We fine-tune only TODO of the parameters finding that this circumvents overfitting to relatively small training dataset.
|
152 |
-
|
153 |
-
The full training configuration can be identified from the [training script](https://github.com/emnlp2023/gadgets/blob/9185d1fc4b4812321179f8e5cad3e2f2a764f1df/examples/train_gsm8k_flan-t5-slice.py).
|
|
|
67 |
- **Developed by:** Anonymous
|
68 |
- **Model type:** Autoregressive Encoder-Decoder
|
69 |
- **Language(s):** en
|
70 |
+
- **Finetuned from:** t5-large
|
71 |
|
72 |
### Model Sources
|
73 |
|
74 |
<!-- Provide the basic links for the model. -->
|
75 |
|
76 |
+
- **Repository:** https://github.com/emnlp2023sub/gadgets
|
77 |
- **Paper:** Stay tuned!
|
78 |
|
79 |
## Usage
|
|
|
82 |
(1) implementation of the tool(s) and
|
83 |
(2) a customization of generate() method augmenting input context on-demand with the outputs of the tools.
|
84 |
|
85 |
+
You can find these two components implemented in the **gadgets/gadget_assisted_model.py** and **gadgets/gadget.py** in the project's [home repo](https://github.com/emnlp2023sub/gadgets).
|
86 |
+
|
87 |
|
88 |
After adding these two scripts to your directory, you can use the model as follows:
|
89 |
|
|
|
130 |
Note that given the limited scope of the exercises' complexity in the training, this model will not work well for tasks requiring
|
131 |
more complex algebraic operations, including equations, variables and operations outside the scope of (+-*/).
|
132 |
|
133 |
+
|
134 |
## Training Details
|
135 |
|
136 |
### Training Data
|
|
|
137 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
138 |
+
This model was trained on our Calculator-augmented set of
|
139 |
|
140 |
+
- [Calc Ape210k](https://huggingface.co/datasets/emnlp2023/Calc-ape210k) ([original Ape210k on github](https://github.com/Chenny0808/ape210k))
|
141 |
+
- [Calc MathQA](https://huggingface.co/datasets/emnlp2023/Calc-math_qa) ([original MathQA on HF](https://huggingface.co/datasets/math_qa))
|
142 |
+
- [Calc GSM8K](https://huggingface.co/datasets/emnlp2023/Calc-gsm8k) ([original GSM8K on HF](https://huggingface.co/datasets/gsm8k))
|
143 |
+
- [Calc Aqua-RAT](https://huggingface.co/datasets/emnlp2023/Calc-aqua_rat) ([original Aqua-RAT on HF](https://huggingface.co/datasets/aqua_rat)
|
144 |
+
|
145 |
in a standard auto-regressive setup i.e. for a conditional next-token prediction with teacher-forced prefix.
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|