PhilipQuirke
/

QuantaMaths_mix_d8_l3_h4_t60K_s173289

Inference Endpoints

Model card Files Files and versions Community

QuantaMaths: `mix_d8_l3_h4_t60K_s173289`

This repository contains a transformer model that can predict both addition and subtraction questions.

Model-specific metadata

Operation type: mixed
Num digits: 8
Layers: 3
Attention Heads: 4
Training steps: 60,000
Random seed: 173289

Contents:

model.pth: The trained transformer model.
training_loss.json: Data gathered during model training (used to plot "loss over training batches").
behaviors.json: Facts gathered about the model by direct inspection (attention pattern data, PCA data, digit impact data, etc.).
features.json: Facts gathered about hypothesized algorithm features via experimentation, e.g. node P12L0H1 implements the feature A3.ST.

Provenance:

model.pth and training_loss.json were created by QuantaMathsTrain.ipynb.
behaviors.json and features.json were created by QuantaMathsAnalyse.ipynb.
The JSON files are used by QuantaMathsAlgorithm.ipynb.

Folder name details:

"add", "sub", or "mix": The types of questions the model can predict.
"d5" to "d20": How many digits the model handles (e.g. a d5 sub model can predict the answer in 123450-345670=-0123230).
"l1", "l2", or "l3": The number of layers in the model.
"h3" or "h4": The number of attention heads in the model.
"t15K" to "t85K", etc.: The number of batches the model was trained on.
"s372001", etc.: The random seed used in model training.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Collection including PhilipQuirke/QuantaMaths_mix_d8_l3_h4_t60K_s173289

QuantaMaths mixed models

Transformer models that predict answers to addition and subtraction questions • 26 items • Updated Jan 9