QuantaMaths: mix_d8_l3_h4_t60K_s173289

This repository contains a transformer model that can predict both addition and subtraction questions.

Model-specific metadata

  • Operation type: mixed
  • Num digits: 8
  • Layers: 3
  • Attention Heads: 4
  • Training steps: 60,000
  • Random seed: 173289

Contents:

  • model.pth: The trained transformer model.
  • training_loss.json: Data gathered during model training (used to plot "loss over training batches").
  • behaviors.json: Facts gathered about the model by direct inspection (attention pattern data, PCA data, digit impact data, etc.).
  • features.json: Facts gathered about hypothesized algorithm features via experimentation, e.g. node P12L0H1 implements the feature A3.ST.

Provenance:

Folder name details:

  • "add", "sub", or "mix": The types of questions the model can predict.
  • "d5" to "d20": How many digits the model handles (e.g. a d5 sub model can predict the answer in 123450-345670=-0123230).
  • "l1", "l2", or "l3": The number of layers in the model.
  • "h3" or "h4": The number of attention heads in the model.
  • "t15K" to "t85K", etc.: The number of batches the model was trained on.
  • "s372001", etc.: The random seed used in model training.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Collection including PhilipQuirke/QuantaMaths_mix_d8_l3_h4_t60K_s173289