Spaces:
Sleeping
Sleeping
| # Shakespeare GPT | |
| A GPT-2 model fine-tuned on Shakespeare's works, capable of generating Shakespeare-style text. | |
| ## Project Overview | |
| This project implements a GPT-2 architecture trained on Shakespeare's works to generate Shakespeare-style text. The model uses a context window of 1024 tokens and implements various optimizations including gradient accumulation and learning rate scheduling. | |
| ## Model Architecture | |
| - Base Architecture: GPT-2 (124M parameters) | |
| - Layers: 12 | |
| - Heads: 12 | |
| - Embedding Dimension: 768 | |
| - Context Length: 1024 tokens | |
| - Total Parameters: ~124M | |
| ## Training Details | |
| - Dataset: Shakespeare's complete works | |
| - Training Device: GPU/MPS (Apple Silicon) | |
| - Batch Size: 16 (Effective batch size: 64 with gradient accumulation) | |
| - Learning Rate: 6e-4 with cosine decay | |
| - Weight Decay: 0.1 | |
| - Training Steps: 10,000 | |
| ## Performance | |
| - Best Validation Loss: [Insert your best validation loss] | |
| - Training Time: [Insert your training time] | |
| ## Requirements | |
| - bash | |
| - pip install -r requirements.txt | |
| ## Project Structure | |
| βββ src/ | |
| β βββ train_shakespeare.py # Training script | |
| β βββ app.py # Gradio interface | |
| β βββ input.txt # Training data | |
| βββ requirements.txt | |
| βββ README.md | |
| ## Usage | |
| ### Training | |
| To train the model: | |
| bash | |
| python src/train_shakespeare.py | |
| ### Inference | |
| - To run the Gradio interface locally: | |
| - bash | |
| - python src/app.py | |
| bash | |
| python src/app.py | |