pico-lm
/

pico-decoder-medium

Text Generation

Model card Files Files and versions Community

pico-decoder-medium / README.md

rdiehlmartinez's picture

Create README.md

50cbbce verified about 1 month ago

|

1.94 kB

	---
	license: apache-2.0
	datasets:
	- pico-lm/pretokenized-paloma
	language:
	- en
	metrics:
	- pico-lm/perplexity
	pipeline_tag: text-generation
	---

	# Pico Decoder Medium

	pico-decoder-medium is a 181M parameter model in the `pico-decoder` suite, balancing scale and analyzability. Built with [`pico-train`](https://github.com/pico-lm) and instrumented with [`pico-analyze`](https://github.com/pico-lm), it enables detailed studies of layer-wise learning behavior during language model pretraining.

	## 🔧 Model Details

	\| Field \| Value \|
	\|---------------------\|------------------------------------\|
	\| Architecture \| Decoder-only transformer (LLaMA-style) \|
	\| Parameters \| 181M \|
	\| Layers \| 12 \|
	\| Hidden Size \| 768 \|
	\| Feed Forward Size\| 3072 \|
	\| Attention Heads \| 12 \|
	\| Key/Value Heads \| 4 \|

	## 📚 Training

	- Dataset: [`pretokenized-dolma`](https://github.com/pico-lm)
	- Training steps: 200,000
	- Batch size: 1024
	- Sequence length: 2048
	- Optimizer: AdamW
	- Learning rate schedule: Linear decay with warmup
	- Compute: 16 A100-SXM4-80GB GPUs

	## 📈 Evaluation and Analysis

	Compatible with [`pico-analyze`](https://github.com/pico-lm) for introspecting:

	- Per-head loss and gradient stats
	- Learning saturation across layers
	- Token-level memorization dynamics

	Evaluated on [`pico-paloma-tinsy`](https://huggingface.co/datasets/pico-lm/pretokenized-paloma-tinsy) using perplexity.

	## 📄 Citation

	```bibtex
	@software{pico2025,
	author = {Diehl Martinez, Richard},
	title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics},
	year = {2025},
	url = {https://github.com/pico-lm}
	}