Run GPT-OSS model locally with Docker!

Tech Stach Used
GPT-OSS Chatbot with Docker
Run OpenAIβs open-sourced GPT-OSS models (117B / 21B) locally using Docker inside your own codebase. This project avoids third-party GUIs like Open WebUI or LM Studio to help you learn how to use GPT-based models directly in your applications.
Everything is containerized with Docker for a clean, reproducible setup. This is a fun side project designed to help others explore running powerful language models locally.
β¨ Contributions are welcome!
Feel free to fork the repo and submit a pull request if you'd like to collaborate or enhance the project further.
Project Repository: GitHub Repo
Features
- Simple Chat Interface: Clean frontend to interact with the chatbot.
- Powered by GPT-OSS: Uses OpenAI's open GPT model (117B or 21B).
- Dockerized: Fully containerized using Docker Compose.
- FastAPI Backend: Handles the API and logic.
- Ollama Integration: Use Ollama to serve GPT-OSS models locally.
π GPT-OSS Resources
OpenAI released the GPT-OSS models under Apache 2.0. Here's the learning path:
- Intro to GPT-OSS: https://openai.com/index/introducing-gpt-oss
- Model Card & Specs: https://openai.com/index/gpt-oss-model-card/
- Dev Overview: https://cookbook.openai.com/topic/gpt-oss
- vLLM Setup Guide: https://cookbook.openai.com/articles/gpt-oss/run-vllm
- Harmony Format (I/O schema): https://github.com/openai/harmony
- PyTorch Reference Code: https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation
- Community Site: https://gpt-oss.com/
- Ollama: https://ollama.com/library/gpt-oss
- HuggingFace: https://huggingface.co/openai/gpt-oss-20b
Technical Details & Versions
- Language Model: GPT-OSS (117B or 21B)
- Python: 3.11-slim
- Backend: FastAPI + Uvicorn
- Frontend Web Server:
nginx:alpine
- Serving:
ollama/ollama
image - HTTP Client: HTTPX
- Containerization: Docker & Docker Compose
Directory Structure
gpt-oss-chatbot-dockerized/
βββ backend/
β βββ main.py
β βββ requirements.txt
βββ frontend/
β βββ index.html
βββ docker-compose.yml
βββ Dockerfile
Prerequisites
- Docker
- Docker Compose (usually included with Docker Desktop)
How to Run with Docker
Clone the repository
git clone https://github.com/Imran-ml/gpt-oss-app-open-source cd gpt-oss-app-open-source
Navigate to the project directory: Make sure you are in the
gpt-oss-app-open-source
.Build and run the application using Docker Compose:
- Build the backend Docker image based on
Dockerfile
. - Pull the
ollama/ollama
image and thenginx:alpine
image. - Start all defined services (Ollama, backend, frontend).
- The
ollama
service is configured to automatically pull the gpt-oss model upon starting. This might take some time during the first run as the model is downloaded. From docker compsoe, in this line "sh -c "ollama serve & sleep 5 && ollama pull gpt-oss && tail -f /dev/null"" you can replace gpt-oss with gpt-oss:120b if you want 120 B model!
docker-compose up --build
You will see logs from all the containers in your terminal.
- Build the backend Docker image based on
Access the Chatbot: Once the services are up and running:
- Open your web browser and go to:
http://localhost:8080
to interact with the chatbot. - The backend API is accessible at
http://localhost:8000
. - The Ollama API is at
http://localhost:11434
.
- Open your web browser and go to:
Usage
- Open
http://localhost:8080
in your browser. - The chat interface should load.
- Type your message in the input field and press Enter or click the send button to chat with the gpt-oss model.
About Author
Name: Muhammad Imran Zaman
Company: DOCUFY GmbH
Role: Lead Machine Learning Engineer
Professional Links: - HuggingFace: Profile - Kaggle: Profile - LinkedIn: Profile - Google Scholar: Profile - Medium: Profile
- Project Repository: GitHub Repo