PhilipQuirke
/

QuantaMaths_mix_d11_l3_h4_t80K_s572091

Transformers

mathematics

addition

subtraction

Inference Endpoints

Model card Files Files and versions Community

PhilipQuirke commited on Jan 8

Commit

0dd04fb

verified ·

1 Parent(s): b54e2b9

Update model card with revised readme

Browse files

Files changed (1) hide show

README.md +17 -22

README.md CHANGED Viewed

@@ -1,15 +1,24 @@
 # QuantaMaths: `mix_d11_l3_h4_t80K_s572091`
-### Model-specific metadata for `mix_d11_l3_h4_t80K_s572091`
-- **Operation type**: mix
-- **Max digits**: d11
-- **Layers**: l3
-- **Attention Heads**: h4
-- **Training steps**: t80K
-- **Random seed**: s572091
-This repository contains a transformer model that can predict addition questions, subtraction questions, or both.
 **Folder name details**:
 - "add", "sub", or "mix": The types of questions the model can predict.
@@ -19,18 +28,4 @@ This repository contains a transformer model that can predict addition questions
 - "t15K" to "t85K", etc.: The number of batches the model was trained on.
 - "s372001", etc.: The random seed used in model training.
-Some folder names also contain:
-- "ins1": Before training, the model was initialized with a smaller, accurate addition model.
-- "ins2": Same as ins1, but the inserted attention heads were not allowed to change.
-- "ins3": Same as ins2, but the inserted MLP layers were also not allowed to change.
-**Contents**:
-- `model.pth`: The trained transformer model.
-- `training_loss.json`: Data gathered during model training (used to plot "loss over training batches").
-- `behaviors.json`: Facts gathered about the model by direct inspection (attention pattern data, PCA data, digit impact data, etc.).
-- `features.json`: Facts gathered about hypothesized algorithm features via experimentation, e.g. node P12L0H1 implements the feature A3.ST.
-**Provenance**:
-- `model.pth` and `training_loss.json` were created by [QuantaMathsTrain.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsTrain.ipynb).
-- `behaviors.json` and `features.json` were created by [QuantaMathsAnalyse.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAnalyse.ipynb).
-- The JSON files are used by [QuantaMathsAlgorithm.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAlgorithm.ipynb).

 # QuantaMaths: `mix_d11_l3_h4_t80K_s572091`
+This repository contains a transformer model that can predict both addition and subtraction questions.
+### Model-specific metadata
+- **Operation type**: mixed
+- **Num digits**: 11
+- **Layers**: 3
+- **Attention Heads**: 4
+- **Training steps**: 80,000
+- **Random seed**: 572091
+**Contents**:
+- `model.pth`: The trained transformer model.
+- `training_loss.json`: Data gathered during model training (used to plot "loss over training batches").
+- `behaviors.json`: Facts gathered about the model by direct inspection (attention pattern data, PCA data, digit impact data, etc.).
+- `features.json`: Facts gathered about hypothesized algorithm features via experimentation, e.g. node P12L0H1 implements the feature A3.ST.
+**Provenance**:
+- `model.pth` and `training_loss.json` were created by [QuantaMathsTrain.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsTrain.ipynb).
+- `behaviors.json` and `features.json` were created by [QuantaMathsAnalyse.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAnalyse.ipynb).
+- The JSON files are used by [QuantaMathsAlgorithm.ipynb](https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAlgorithm.ipynb).
 **Folder name details**:
 - "add", "sub", or "mix": The types of questions the model can predict.
 - "t15K" to "t85K", etc.: The number of batches the model was trained on.
 - "s372001", etc.: The random seed used in model training.