init: tetris neural network model with q learning

Browse files

Files changed (29) hide show

README.md +87 -0
__pycache__/intelligence.cpython-312.pyc +0 -0
__pycache__/model.cpython-312.pyc +0 -0
__pycache__/representation.cpython-312.pyc +0 -0
__pycache__/tetris.cpython-312.pyc +0 -0
__pycache__/tools.cpython-312.pyc +0 -0
checkpoints/checkpoint0.pth +0 -0
checkpoints/checkpoint1.pth +0 -0
checkpoints/checkpoint10.pth +0 -0
checkpoints/checkpoint11.pth +0 -0
checkpoints/checkpoint12.pth +0 -0
checkpoints/checkpoint13.pth +0 -0
checkpoints/checkpoint14.pth +0 -0
checkpoints/checkpoint2.pth +0 -0
checkpoints/checkpoint3.pth +0 -0
checkpoints/checkpoint4.pth +0 -0
checkpoints/checkpoint5.pth +0 -0
checkpoints/checkpoint6.pth +0 -0
checkpoints/checkpoint7.pth +0 -0
checkpoints/checkpoint8.pth +0 -0
checkpoints/checkpoint9.pth +0 -0
checkpoints/log.txt +0 -0
evaluate.py +41 -0
images/game.png +0 -0
model.py +102 -0
play.py +20 -0
representation.py +9 -0
tetris.py +121 -0
train.py +141 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+# Tetris-Neural-Network-Q-Learning
+## Overview
+**PyTorch** implementation of a simplified Tetris-playing AI using **Q-Learning**.
+The Tetris board is just 4×4, with the agent deciding in which of the 4 columns to drop the next piece. The agent’s neural network receives a **16-dimensional** board representation (flattened 4×4) and outputs **4** Q-values, one for each possible move. Through repeated training (via self-play and the Q-Learning algorithm), the agent learns to fill the board without making illegal moves—eventually achieving a perfect score.
+<img src="images/game.png" />
+## Project Structure
+```plaintext
+├── model.py            # Contains the TetrisAI class and TetrisNet model (PyTorch)
+├── train.py            # Main training script
+├── evaluate.py         # Script to load the model checkpoint and interactively run the game
+├── tetris.py           # Defines the GameState and game logic
+├── representation.py   # Defines how the game state is turned into a 1D list of ints
+└── checkpoints         # Directory where model checkpoints (.pth) are saved/loaded
+```
+## Model Architecture
+- **Input Layer (16 units):** Flattened 4x4 board state, where each cell is `0` (empty) or `1` (occupied).
+- **Hidden Layers:** Dense layers (64 → 64 → 32) with ReLU activations.
+- **Output Layer (4 units):** Linear activation, representing the estimated Q-value for each move (column 1–4).
+## Training
+1. **Game Environment:** A 4x4 Tetris-like grid where each move places a block in one of the four columns.
+2. **Reward Function:**
+   - **Immediate Reward:** Increase in the number of occupied squares, minus
+   - **Penalty:** A scaled standard deviation of the “column depth” to encourage balanced play.
+3. **Q-Learning Loop:**
+   - For each move, the model is passed the current game state and returns predicted Q-values.
+   - An action (move) is chosen based on either:
+     - **Exploitation:** Highest Q-value prediction (greedy choice).
+     - **Exploration:** Random move to discover new states.
+   - The agent observes the new state and reward, and stores this experience (state, action, reward, next_state) to update the model.
+## Reward Function
+The reward function for each action is based on two parts:
+1. **Board Occupancy**
+   - The reward starts with the number of occupied squares on the board (i.e., how many cells are filled).
+2. **Penalty for Unbalanced Columns**
+   - Next, the standard deviation of each column's unoccupied cell count is calculated.
+   - A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris.
+   - By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible.
+In other words:
+\[
+\text{Reward} = \text{OccupiedSquares} - \alpha \times \text{StdDev}(\text{ColumnDepths})
+\]
+Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.
+## Installation & Usage
+1. Clone this repo or download the source code.
+2. Install Python (3.8+ recommended).
+3. Install dependencies:
+    ```bash
+    pip install torch numpy
+    ```
+   - You may also need other libraries like pandas or statistics depending on your environment.
+1. Training:
+   - Adjust the hyperparameters (learning rate, exploration rate, etc.) in ```train.py``` if desired.
+   - Run:
+```bash
+python train.py
+```
+   - This script will generate a ```checkpointX.pth``` file in checkpoints/ upon completion (or periodically during training).
+1. Evaluation:
+   - Ensure you have a valid checkpoint saved, for example ```checkpoint14.pth.```
+   - Run:
+    ```bash
+    python evaluate.py
+    ```
+   - The script will load the checkpoint, instantiate the ```TetrisAI```, and then interactively show how the AI plays Tetris. You can step through the game move by move in the console.

__pycache__/intelligence.cpython-312.pyc ADDED Viewed

Binary file (5.86 kB). View file

__pycache__/model.cpython-312.pyc ADDED Viewed

Binary file (5.86 kB). View file

__pycache__/representation.cpython-312.pyc ADDED Viewed

Binary file (654 Bytes). View file

__pycache__/tetris.cpython-312.pyc ADDED Viewed

Binary file (4.96 kB). View file

__pycache__/tools.cpython-312.pyc ADDED Viewed

Binary file (541 Bytes). View file

checkpoints/checkpoint0.pth ADDED Viewed

Binary file (99 kB). View file

checkpoints/checkpoint1.pth ADDED Viewed

Binary file (99 kB). View file

checkpoints/checkpoint10.pth ADDED Viewed