|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: reinforcement-learning |
|
tags: |
|
- code |
|
--- |
|
# Tetris-Neural-Network-Q-Learning |
|
|
|
|
|
## Overview |
|
**PyTorch** implementation of a simplified Tetris-playing AI using **Q-Learning**. |
|
The Tetris board is just 4×4, with the agent deciding in which of the 4 columns to drop the next piece. The agent’s neural network receives a **16-dimensional** board representation (flattened 4×4) and outputs **4** Q-values, one for each possible move. Through repeated training (via self-play and the Q-Learning algorithm), the agent learns to fill the board without making illegal moves—eventually achieving a perfect score. |
|
|
|
<img src="images/game.png" /> |
|
|
|
## Project Structure |
|
|
|
```plaintext |
|
|
|
├── model.py # Contains the TetrisAI class and TetrisNet model (PyTorch) |
|
├── train.py # Main training script |
|
├── evaluate.py # Script to load the model checkpoint and interactively run the game |
|
├── tetris.py # Defines the GameState and game logic |
|
├── representation.py # Defines how the game state is turned into a 1D list of ints |
|
└── checkpoints # Directory where model checkpoints (.pth) are saved/loaded |
|
``` |
|
|
|
## Model Architecture |
|
- **Input Layer (16 units):** Flattened 4x4 board state, where each cell is `0` (empty) or `1` (occupied). |
|
- **Hidden Layers:** Dense layers (64 → 64 → 32) with ReLU activations. |
|
- **Output Layer (4 units):** Linear activation, representing the estimated Q-value for each move (column 1–4). |
|
|
|
## Training |
|
1. **Game Environment:** A 4x4 Tetris-like grid where each move places a block in one of the four columns. |
|
2. **Reward Function:** |
|
- **Immediate Reward:** Increase in the number of occupied squares, minus |
|
- **Penalty:** A scaled standard deviation of the “column depth” to encourage balanced play. |
|
3. **Q-Learning Loop:** |
|
- For each move, the model is passed the current game state and returns predicted Q-values. |
|
- An action (move) is chosen based on either: |
|
- **Exploitation:** Highest Q-value prediction (greedy choice). |
|
- **Exploration:** Random move to discover new states. |
|
- The agent observes the new state and reward, and stores this experience (state, action, reward, next_state) to update the model. |
|
|
|
## Reward Function |
|
|
|
The reward function for each action is based on two parts: |
|
|
|
1. **Board Occupancy** |
|
- The reward starts with the number of occupied squares on the board (i.e., how many cells are filled). |
|
|
|
2. **Penalty for Unbalanced Columns** |
|
- Next, the standard deviation of each column's unoccupied cell count is calculated. |
|
- A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris. |
|
- By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible. |
|
|
|
<img src="images/standard-deviation.png" /> |
|
|
|
Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy. |
|
|
|
## Installation & Usage |
|
1. Clone this repo or download the source code. |
|
2. Install Python (3.8+ recommended). |
|
3. Install dependencies: |
|
|
|
```bash |
|
pip install torch numpy |
|
``` |
|
- You may also need other libraries like pandas or statistics depending on your environment. |
|
|
|
1. Training: |
|
|
|
- Adjust the hyperparameters (learning rate, exploration rate, etc.) in ```train.py``` if desired. |
|
- Run: |
|
|
|
```bash |
|
python train.py |
|
``` |
|
|
|
- This script will generate a ```checkpointX.pth``` file in checkpoints/ upon completion (or periodically during training). |
|
|
|
1. Evaluation: |
|
|
|
- Ensure you have a valid checkpoint saved, for example ```checkpoint14.pth.``` |
|
- Run: |
|
```bash |
|
python evaluate.py |
|
``` |
|
- The script will load the checkpoint, instantiate the ```TetrisAI```, and then interactively show how the AI plays Tetris. You can step through the game move by move in the console. |