PhilipQuirke
/

QuantaMaths_sub_d10_l2_h3_t75K_gf_s173289

Transformers

subtraction

mathematics

Inference Endpoints

Model card Files Files and versions Community

PhilipQuirke commited on Jan 8

Commit

c293301

verified ·

1 Parent(s): 1a4d0fc

Update model card with revised readme

Browse files

Files changed (1) hide show

README.md +16 -21

README.md CHANGED Viewed

@@ -1,14 +1,23 @@
 # QuantaMaths: `sub_d10_l2_h3_t75K_gf_s173289`
-### Model-specific metadata for `sub_d10_l2_h3_t75K_gf_s173289`
-- **Operation type**: sub
-- **Max digits**: d10
-- **Layers**: l2
-- **Attention Heads**: h3
-- **Training steps**: t75K
-This repository contains a transformer model that can predict addition questions, subtraction questions, or both.
 **Folder name details**:
 - "add", "sub", or "mix": The types of questions the model can predict.
@@ -18,18 +27,4 @@ This repository contains a transformer model that can predict addition questions
 - "t15K" to "t85K", etc.: The number of batches the model was trained on.
 - "s372001", etc.: The random seed used in model training.
-Some folder names also contain:
-- "ins1": Before training, the model was initialized with a smaller, accurate addition model.
-- "ins2": Same as ins1, but the inserted attention heads were not allowed to change.
-- "ins3": Same as ins2, but the inserted MLP layers were also not allowed to change.
-**Contents**:
-- `model.pth`: The trained transformer model.
-- `training_loss.json`: Data gathered during model training (used to plot "loss over training batches").
-- `behaviors.json`: Facts gathered about the model by direct inspection (attention pattern data, PCA data, digit impact data, etc.).
-- `features.json`: Facts gathered about hypothesized algorithm features via experimentation, e.g. node P12L0H1 implements the feature A3.ST.
-**Provenance**:
-- `model.pth` and `training_loss.json` were created by [QuantaMathsTrain.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsTrain.ipynb).
-- `behaviors.json` and `features.json` were created by [QuantaMathsAnalyse.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAnalyse.ipynb).
-- The JSON files are used by [QuantaMathsAlgorithm.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAlgorithm.ipynb).

 # QuantaMaths: `sub_d10_l2_h3_t75K_gf_s173289`
+This repository contains a transformer model that can predict subtraction questions.
+### Model-specific metadata
+- **Operation type**: subtraction
+- **Num digits**: 10
+- **Layers**: 2
+- **Attention Heads**: 3
+- **Training steps**: 75,000
+**Contents**:
+- `model.pth`: The trained transformer model.
+- `training_loss.json`: Data gathered during model training (used to plot "loss over training batches").
+- `behaviors.json`: Facts gathered about the model by direct inspection (attention pattern data, PCA data, digit impact data, etc.).
+- `features.json`: Facts gathered about hypothesized algorithm features via experimentation, e.g. node P12L0H1 implements the feature A3.ST.
+**Provenance**:
+- `model.pth` and `training_loss.json` were created by [QuantaMathsTrain.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsTrain.ipynb).
+- `behaviors.json` and `features.json` were created by [QuantaMathsAnalyse.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAnalyse.ipynb).
+- The JSON files are used by [QuantaMathsAlgorithm.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAlgorithm.ipynb).
 **Folder name details**:
 - "add", "sub", or "mix": The types of questions the model can predict.
 - "t15K" to "t85K", etc.: The number of batches the model was trained on.
 - "s372001", etc.: The random seed used in model training.