Spaces:

soutrik
/

gradio_demo_CatDogClassifier

Runtime error

App Files Files Community

Soutrik commited on Nov 8, 2024

Commit

0ca9ca4

1 Parent(s): 9e67266

added: poetry dvc-s3 env

Browse files

Files changed (6) hide show

.dvc/config +7 -0
.gitignore +1 -0
basic_setup.md +382 -0
poetry.lock +0 -0
pyproject.toml +43 -11
todo.md +11 -0

.dvc/config CHANGED Viewed

	@@ -0,0 +1,7 @@

+[core]
+    autostage = true
+    remote = aws_remote
+['remote "myremote"']
+    url = /tmp/dvcstore
+['remote "aws_remote"']
+    url = s3://deep-bucket-s3

.gitignore CHANGED Viewed

@@ -19,3 +19,4 @@ src/.vscode-test/
 app/core/__pycache__/
 src/__pycache__/test_infra.cpython-310.pyc
 app/core/__pycache__/config.cpython-310.pyc

 app/core/__pycache__/
 src/__pycache__/test_infra.cpython-310.pyc
 app/core/__pycache__/config.cpython-310.pyc
+data/

basic_setup.md ADDED Viewed

	@@ -0,0 +1,382 @@

+## __POETRY SETUP__
+```bash
+# Install poetry
+conda create -n poetry_env python=3.10 -y
+conda activate poetry_env
+pip install poetry
+poetry env info
+poetry new pytorch_project
+cd pytorch_project/
+# fill up the pyproject.toml file without pytorch and torchvision
+poetry install
+# Add dependencies to the project for pytorch and torchvision
+poetry source add --priority explicit pytorch_cpu https://download.pytorch.org/whl/cpu
+poetry add --source pytorch_cpu torch torchvision
+poetry lock
+poetry show
+# Add dependencies to the project
+poetry add matplotlib
+poetry add hydra-core
+poetry add omegaconf
+poetry add hydra_colorlog
+poetry add --dev black #
+poetry lock
+poetry show
+Type	Purpose	Installation Command
+  Normal Dependency	Required for the app to run in production.	poetry add <package>
+  Development Dependency	Needed only during development (e.g., testing, linting).	poetry add --dev <package>
+# Add dependencies to the project with specific version
+poetry add <package_name>@<version>
+```
+## __MULTISTAGEDOCKER SETUP__
+#### Step-by-Step Guide to Creating Dockerfile and docker-compose.yml for a New Code Repo
+If you're new to the project and need to set up Docker and Docker Compose to run the training and inference steps, follow these steps.
+---
+### 1. Setting Up the Dockerfile
+A Dockerfile is a set of instructions that Docker uses to create an image. In this case, we'll use a __multi-stage build__ to make the final image lightweight while managing dependencies with `Poetry`.
+#### Step-by-Step Process for Creating the Dockerfile
+1. __Choose a Base Image__:
+   - We need to choose a Python image that matches the project's required version (e.g., Python 3.10.14).
+   - Use the lightweight __`slim`__ version to minimize image size.
+   ```Dockerfile
+   FROM python:3.10.14-slim as builder
+   ```
+2. __Install Dependencies in the Build Stage__:
+   - We'll use __Poetry__ for dependency management. Install it using `pip`.
+   - Next, copy the `pyproject.toml` and `poetry.lock` files to the `/app` directory to install dependencies.
+   ```Dockerfile
+   RUN pip3 install poetry==1.7.1
+   WORKDIR /app
+   COPY pytorch_project/pyproject.toml pytorch_project/poetry.lock /app/
+   ```
+3. __Configure Poetry__:
+   - Configure Poetry to install the dependencies in a virtual environment inside the project directory (not globally). This keeps everything contained and avoids conflicts with the system environment.
+   ```Dockerfile
+   ENV POETRY_NO_INTERACTION=1 \
+       POETRY_VIRTUALENVS_IN_PROJECT=1 \
+       POETRY_VIRTUALENVS_CREATE=true \
+       POETRY_CACHE_DIR=/tmp/poetry_cache
+   ```
+4. __Install Dependencies__:
+   - Use `poetry install --no-root` to install only the dependencies and not the package itself. This is because you typically don't need to install the actual project code at this stage.
+   ```Dockerfile
+   RUN --mount=type=cache,target=/tmp/poetry_cache poetry install --only main --no-root
+   ```
+5. __Build the Runtime Stage__:
+   - Now, set up the final runtime image. This stage will only include the required application code and the virtual environment created in the first stage.
+   - The final image will use the same Python base image but remain small by avoiding the re-installation of dependencies.
+   ```Dockerfile
+   FROM python:3.10.14-slim as runner
+   WORKDIR /app
+   COPY src /app/src
+   COPY --from=builder /app/.venv /app/.venv
+   ```
+6. __Set Up the Path to Use the Virtual Environment__:
+   - Update the `PATH` environment variable to use the Python binaries from the virtual environment.
+   ```Dockerfile
+   ENV PATH="/app/.venv/bin:$PATH"
+   ```
+7. __Set a Default Command__:
+   - Finally, set the command that will be executed by default when the container is run. You can change or override this later in the Docker Compose file.
+   ```Dockerfile
+   CMD ["python", "-m", "src.train"]
+   ```
+### Final Dockerfile
+```Dockerfile
+# Stage 1: Build environment with Poetry and dependencies
+FROM python:3.10.14-slim as builder
+RUN pip3 install poetry==1.7.1
+WORKDIR /app
+COPY pytorch_project/pyproject.toml pytorch_project/poetry.lock /app/
+ENV POETRY_NO_INTERACTION=1 \
+    POETRY_VIRTUALENVS_IN_PROJECT=1 \
+    POETRY_VIRTUALENVS_CREATE=true \
+    POETRY_CACHE_DIR=/tmp/poetry_cache
+RUN --mount=type=cache,target=/tmp/poetry_cache poetry install --only main --no-root
+# Stage 2: Runtime environment
+FROM python:3.10.14-slim as runner
+WORKDIR /app
+COPY src /app/src
+COPY --from=builder /app/.venv /app/.venv
+ENV PATH="/app/.venv/bin:$PATH"
+CMD ["python", "-m", "src.train"]
+```
+---
+### 2. Setting Up the docker-compose.yml File
+The `docker-compose.yml` file is used to define and run multiple Docker containers as services. In this case, we need two services: one for __training__ and one for __inference__.
+### Step-by-Step Process for Creating docker-compose.yml
+1. __Define the Version__:
+   - Docker Compose uses a versioning system. Use version `3.8`, which is widely supported and offers features such as networking and volume support.
+   ```yaml
+   version: '3.8'
+   ```
+2. __Set Up the `train` Service__:
+   - The `train` service is responsible for running the training script. It builds the Docker image, runs the training command, and uses volumes to store the data, checkpoints, and artifacts.
+   ```yaml
+   services:
+     train:
+       build:
+         context: .
+       command: python -m src.train
+       volumes:
+         - data:/app/data
+         - checkpoints:/app/checkpoints
+         - artifacts:/app/artifacts
+       shm_size: '2g'  # Increase shared memory to prevent DataLoader issues
+       networks:
+         - default
+       env_file:
+         - .env  # Load environment variables
+   ```
+3. __Set Up the `inference` Service__:
+   - The `inference` service runs after the training has completed. It waits for a file (e.g., `train_done.flag`) to be created by the training process and then runs the inference script.
+   ```yaml
+     inference:
+       build:
+         context: .
+       command: /bin/bash -c "while [ ! -f /app/checkpoints/train_done.flag ]; do sleep 10; done; python -m src.infer"
+       volumes:
+         - checkpoints:/app/checkpoints
+         - artifacts:/app/artifacts
+       shm_size: '2g'
+       networks:
+         - default
+       depends_on:
+         - train
+       env_file:
+         - .env
+   ```
+4. __Define Shared Volumes__:
+   - Volumes allow services to share data. Here, we define three shared volumes:
+     - `data`: Stores the input data.
+     - `checkpoints`: Stores the model checkpoints and the flag indicating training is complete.
+     - `artifacts`: Stores the final model outputs or artifacts.
+   ```yaml
+   volumes:
+     data:
+     checkpoints:
+     artifacts:
+   ```
+5. __Set Up Networking__:
+   - Use the default network to allow the services to communicate.
+   ```yaml
+   networks:
+     default:
+   ```
+### Final docker-compose.yml
+```yaml
+version: '3.8'
+services:
+  train:
+    build:
+      context: .
+    command: python -m src.train
+    volumes:
+      - data:/app/data
+      - checkpoints:/app/checkpoints
+      - artifacts:/app/artifacts
+    shm_size: '2g'
+    networks:
+      - default
+    env_file:
+      - .env
+  inference:
+    build:
+      context: .
+    command: /bin/bash -c "while [ ! -f /app/checkpoints/train_done.flag ]; do sleep 10; done; python -m src.infer"
+    volumes:
+      - checkpoints:/app/checkpoints
+      - artifacts:/app/artifacts
+    shm_size: '2g'
+    networks:
+      - default
+    depends_on:
+      - train
+    env_file:
+      - .env
+volumes:
+  data:
+  checkpoints:
+  artifacts:
+networks:
+  default:
+```
+---
+### Summary
+1. __Dockerfile__:
+   - A multi-stage Dockerfile is used to create a lightweight image where the dependencies are installed with Poetry and the application code is run using a virtual environment.
+   - It ensures that all dependencies are isolated in a virtual environment, and the final container only includes what is necessary for the runtime.
+2. __docker-compose.yml__:
+   - The `docker-compose.yml` file defines two services:
+     - __train__: Runs the training script and stores checkpoints.
+     - __inference__: Waits for the training to finish and runs inference based on the saved model.
+   - Shared volumes ensure that the services can access data, checkpoints, and artifacts.
+   - `shm_size` is increased to prevent issues with DataLoader in PyTorch when using multiple workers.
+This setup allows for easy management of multiple services using Docker Compose, ensuring reproducibility and simplicity.
+## __References__
+- <https://stackoverflow.com/questions/53835198/integrating-python-poetry-with-docker>
+- <https://github.com/fralik/poetry-with-private-repos/blob/master/Dockerfile>
+- <https://medium.com/@albertazzir/blazing-fast-python-docker-builds-with-poetry-a78a66f5aed0>
+- <https://www.martinrichards.me/post/python_poetry_docker/>
+- <https://gist.github.com/soof-golan/6ebb97a792ccd87816c0bda1e6e8b8c2>
+8. ## __DVC SETUP__
+First, install dvc using the following command
+```bash
+dvc init
+dvc version
+dvc init -f
+dvc config core.autostage true
+dvc add data
+dvc remote add -d myremote /tmp/dvcstore
+dvc push
+```
+Add some more file in the data directory and run the following commands
+```bash
+dvc add data
+dvc push
+dvc pull
+```
+Next go back to 1 commit and run the following command
+```bash
+git checkout HEAD~1
+dvc checkout
+# you will get one file less
+```
+Next go back to the latest commit and run the following command
+```bash
+git checkout -
+dvc checkout
+dv pull
+dvc commit
+```
+Next run the following command to add google drive as a remote
+```bash
+dvc remote add --default gdrive gdrive://1w2e3r4t5y6u7i8o9p0
+dvc remote modify gdrive gdrive_acknowledge_abuse true
+dvc remote modify gdrive gdrive_client_id <>
+dvc remote modify gdrive gdrive_client_secret <>
+# does not work when used from VM and port forwarding to local machine
+```
+Next run the following command to add azure-blob as a remote
+```bash
+dvc remote remove azblob
+dvc remote add --default azblob azure://mycontainer/myfolder
+dvc remote modify --local azblob connection_string "<>"
+dvc remote modify azblob  allow_anonymous_login true
+dvc push -r azblob
+# this works when used and requires no explicit login
+```
+Next we will add S3 as a remote
+```bash
+```
+9. ## __HYDRA SETUP__
+```bash
+# Install hydra
+pip install hydra-core hydra_colorlog omegaconf
+# Fillup the configs folder with the files as per the project
+# Run the following command to run the hydra experiment
+# for train
+python -m src.hydra_test experiment=catdog_experiment ++task_name=train ++train=True ++test=False
+# for eval
+python -m src.hydra_test experiment=catdog_experiment ++task_name=eval ++train=False ++test=True
+# for both
+python -m src.hydra_test experiment=catdog_experiment task_name=train train=True test=True # + means adding new key value pair to the existing config and ++ means overriding the existing key value pair
+```
+10. ## __LOCAL SETUP__
+```bash
+ python -m src.train experiment=catdog_experiment ++task_name=train ++train=True ++test=False
+ python -m src.train experiment=catdog_experiment ++task_name=eval ++train=False ++test=True
+ python -m src.infer experiment=catdog_experiment
+```
+11. ## _DVC_PIPELINE_SETUP_
+```bash
+dvc repro
+```
+12. ## _DVC Experiments_
+  - To run the dvc experiments keep different experiment_<>.yaml files in the configs folder under experiment folder
+  - Make sure to override the default values in the experiment_<>.yaml file for each parameter that you want to change
+13. ## _HYDRA Experiments_
+  - make sure to declare te config file in yaml format in the configs folder hparam
+  - have hparam null in train and eval config file
+  - run the following command to run the hydra experiment
+  ```bash
+   python -m src.train --multirun experiment=catdog_experiment_convnext ++task_name=train ++train=True ++test=False hparam=catdog_classifier_covnext
+   python -m src.create_artifacts
+  ```

poetry.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml CHANGED Viewed

@@ -1,32 +1,60 @@
 [tool.poetry]
-name = "fastapi-aws-template"
 version = "0.1.0"
-description = "Basic template testing pytorch fastapi application on aws infra using GHA"
-authors = ["Soutrik Chowdhury"]
 license = "Apache-2.0"
 readme = "README.md"
 [tool.poetry.dependencies]
 python = "3.10.15"
 fastapi = "^0.115.4"
-loguru = "^0.7.2"
 fastapi-utils = "^0.7.0"
 httpx = "^0.27.2"
 typing-inspect = "^0.9.0"
 requests = "^2.32.3"
-pydantic = "^2.9.2"
-pydantic-settings = "^2.6.1"
-uvicorn = "^0.32.0"
 fastapi-restful = {extras = ["all"], version = "^0.6.0"}
 aioredis = "^2.0.1"
-tenacity = "^9.0.0"
 psycopg2-binary = "^2.9.10"
 asyncpg = "^0.30.0"
 confluent-kafka = "^2.6.0"
 aiokafka = "^0.12.0"
 azure-servicebus = "^7.12.3"
 aiohttp = "^3.10.10"
-gunicorn = "^23.0.0"
 aiofiles = "^24.1.0"
 aiologger = "^0.7.0"
 pyyaml = "^6.0.2"
@@ -36,17 +64,21 @@ alembic = "^1.13.3"
 fastapi-limiter = "^0.1.6"
 redis = "5.0.8"
 redisearch = "2.0.0"
-pandas = "^2.2.3"
 python-multipart = "^0.0.17"
 python-dotenv = "^1.0.1"
 celery = "^5.4.0"
 fastapi-cache2 = "^0.2.2"
 aiocache = "^0.12.3"
 [tool.poetry.dev-dependencies]
-pytest = "^7.2.0"
 pytest-asyncio = "^0.20.3"
 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"

 [tool.poetry]
+name = "pytorch_fastapi_project"
 version = "0.1.0"
+description = "Consolidated PyTorch and FastAPI project for AWS deployment and GHA testing"
+authors = ["soutrik71 <[email protected]>"]
 license = "Apache-2.0"
 readme = "README.md"
 [tool.poetry.dependencies]
 python = "3.10.15"
+black = "24.8.0"
+coverage = ">=7.6.1"
+hydra-colorlog = "1.2.0"
+hydra-core = "1.3.2"
+lightning = {version = "2.4.0", extras = ["extra"]}
+loguru = "0.7.2"
+pytest = "^8.3.3"
+rich = "13.8.1"
+rootutils = "1.0.7"
+tensorboard = "2.17.1"
+timm = "1.0.9"
+pandas = "^2.2.3"
+numpy = "^1.26.0"
+ruff = "^0.1.0"
+torch = {version = "^2.4.1+cpu", source = "pytorch_cpu"}
+torchvision = {version = "^0.19.1+cpu", source = "pytorch_cpu"}
+seaborn = "^0.13.2"
+pydantic = "^2.9.2"
+kaggle = "^1.6.17"
+pytest-cov = "^5.0.0"
+pytest-mock = "^3.14.0"
+flake8 = "^7.1.1"
+dvc-gdrive = "^3.0.1"
+dvc-azure = "^3.1.0"
+transformers = "^4.45.2"
 fastapi = "^0.115.4"
+pydantic-settings = "^2.6.1"
+uvicorn = "^0.32.0"
+tenacity = "^9.0.0"
+gunicorn = "^23.0.0"
+aim = "^3.25.0"
+mlflow = "^2.17.1"
+hydra-optuna-sweeper = "^1.2.0"
+dvc = "^3.56.0"
+platformdirs = "3.10"
 fastapi-utils = "^0.7.0"
 httpx = "^0.27.2"
 typing-inspect = "^0.9.0"
 requests = "^2.32.3"
 fastapi-restful = {extras = ["all"], version = "^0.6.0"}
 aioredis = "^2.0.1"
 psycopg2-binary = "^2.9.10"
 asyncpg = "^0.30.0"
 confluent-kafka = "^2.6.0"
 aiokafka = "^0.12.0"
 azure-servicebus = "^7.12.3"
 aiohttp = "^3.10.10"
 aiofiles = "^24.1.0"
 aiologger = "^0.7.0"
 pyyaml = "^6.0.2"
 fastapi-limiter = "^0.1.6"
 redis = "5.0.8"
 redisearch = "2.0.0"
 python-multipart = "^0.0.17"
 python-dotenv = "^1.0.1"
 celery = "^5.4.0"
 fastapi-cache2 = "^0.2.2"
 aiocache = "^0.12.3"
+dvc-s3 = "^3.2.0"
 [tool.poetry.dev-dependencies]
 pytest-asyncio = "^0.20.3"
+[[tool.poetry.source]]
+name = "pytorch_cpu"
+url = "https://download.pytorch.org/whl/cpu"
+priority = "explicit"
 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"

todo.md ADDED Viewed

	@@ -0,0 +1,11 @@

+**__Pytorch Lightning Classifier with Hydra DVC and Linting and Pytest Deployed on AWS EC2__**:
+- Data loading script with pytorch lightning datamodule
+- Pytorch Lightning Classifier
+- Hydra configuration for datamodule , trainer and callbacks
+- DVC for versioning data and model using s3 bucket
+- Linting with flake8 and black
+- Pytest for testing
+- Hyperparameter optimization with optuna executed using base package
+- Dockerized application and tested via docker-compose
+- Deployed on AWS EC2 instance using github actions
+- Github actions for CI/CD and docker image push to elastic container registry (ECR)