Soutrik commited on
Commit
0ca9ca4
·
1 Parent(s): 9e67266

added: poetry dvc-s3 env

Browse files
Files changed (6) hide show
  1. .dvc/config +7 -0
  2. .gitignore +1 -0
  3. basic_setup.md +382 -0
  4. poetry.lock +0 -0
  5. pyproject.toml +43 -11
  6. todo.md +11 -0
.dvc/config CHANGED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ [core]
2
+ autostage = true
3
+ remote = aws_remote
4
+ ['remote "myremote"']
5
+ url = /tmp/dvcstore
6
+ ['remote "aws_remote"']
7
+ url = s3://deep-bucket-s3
.gitignore CHANGED
@@ -19,3 +19,4 @@ src/.vscode-test/
19
  app/core/__pycache__/
20
  src/__pycache__/test_infra.cpython-310.pyc
21
  app/core/__pycache__/config.cpython-310.pyc
 
 
19
  app/core/__pycache__/
20
  src/__pycache__/test_infra.cpython-310.pyc
21
  app/core/__pycache__/config.cpython-310.pyc
22
+ data/
basic_setup.md ADDED
@@ -0,0 +1,382 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## __POETRY SETUP__
2
+
3
+ ```bash
4
+ # Install poetry
5
+ conda create -n poetry_env python=3.10 -y
6
+ conda activate poetry_env
7
+ pip install poetry
8
+ poetry env info
9
+ poetry new pytorch_project
10
+ cd pytorch_project/
11
+ # fill up the pyproject.toml file without pytorch and torchvision
12
+ poetry install
13
+
14
+ # Add dependencies to the project for pytorch and torchvision
15
+ poetry source add --priority explicit pytorch_cpu https://download.pytorch.org/whl/cpu
16
+ poetry add --source pytorch_cpu torch torchvision
17
+ poetry lock
18
+ poetry show
19
+
20
+ # Add dependencies to the project
21
+ poetry add matplotlib
22
+ poetry add hydra-core
23
+ poetry add omegaconf
24
+ poetry add hydra_colorlog
25
+ poetry add --dev black #
26
+ poetry lock
27
+ poetry show
28
+
29
+ Type Purpose Installation Command
30
+ Normal Dependency Required for the app to run in production. poetry add <package>
31
+ Development Dependency Needed only during development (e.g., testing, linting). poetry add --dev <package>
32
+ # Add dependencies to the project with specific version
33
+ poetry add <package_name>@<version>
34
+ ```
35
+
36
+ ## __MULTISTAGEDOCKER SETUP__
37
+
38
+ #### Step-by-Step Guide to Creating Dockerfile and docker-compose.yml for a New Code Repo
39
+
40
+ If you're new to the project and need to set up Docker and Docker Compose to run the training and inference steps, follow these steps.
41
+
42
+ ---
43
+
44
+ ### 1. Setting Up the Dockerfile
45
+
46
+ A Dockerfile is a set of instructions that Docker uses to create an image. In this case, we'll use a __multi-stage build__ to make the final image lightweight while managing dependencies with `Poetry`.
47
+
48
+ #### Step-by-Step Process for Creating the Dockerfile
49
+
50
+ 1. __Choose a Base Image__:
51
+ - We need to choose a Python image that matches the project's required version (e.g., Python 3.10.14).
52
+ - Use the lightweight __`slim`__ version to minimize image size.
53
+
54
+ ```Dockerfile
55
+ FROM python:3.10.14-slim as builder
56
+ ```
57
+
58
+ 2. __Install Dependencies in the Build Stage__:
59
+ - We'll use __Poetry__ for dependency management. Install it using `pip`.
60
+ - Next, copy the `pyproject.toml` and `poetry.lock` files to the `/app` directory to install dependencies.
61
+
62
+ ```Dockerfile
63
+ RUN pip3 install poetry==1.7.1
64
+ WORKDIR /app
65
+ COPY pytorch_project/pyproject.toml pytorch_project/poetry.lock /app/
66
+ ```
67
+
68
+ 3. __Configure Poetry__:
69
+ - Configure Poetry to install the dependencies in a virtual environment inside the project directory (not globally). This keeps everything contained and avoids conflicts with the system environment.
70
+
71
+ ```Dockerfile
72
+ ENV POETRY_NO_INTERACTION=1 \
73
+ POETRY_VIRTUALENVS_IN_PROJECT=1 \
74
+ POETRY_VIRTUALENVS_CREATE=true \
75
+ POETRY_CACHE_DIR=/tmp/poetry_cache
76
+ ```
77
+
78
+ 4. __Install Dependencies__:
79
+ - Use `poetry install --no-root` to install only the dependencies and not the package itself. This is because you typically don't need to install the actual project code at this stage.
80
+
81
+ ```Dockerfile
82
+ RUN --mount=type=cache,target=/tmp/poetry_cache poetry install --only main --no-root
83
+ ```
84
+
85
+ 5. __Build the Runtime Stage__:
86
+ - Now, set up the final runtime image. This stage will only include the required application code and the virtual environment created in the first stage.
87
+ - The final image will use the same Python base image but remain small by avoiding the re-installation of dependencies.
88
+
89
+ ```Dockerfile
90
+ FROM python:3.10.14-slim as runner
91
+ WORKDIR /app
92
+ COPY src /app/src
93
+ COPY --from=builder /app/.venv /app/.venv
94
+ ```
95
+
96
+ 6. __Set Up the Path to Use the Virtual Environment__:
97
+ - Update the `PATH` environment variable to use the Python binaries from the virtual environment.
98
+
99
+ ```Dockerfile
100
+ ENV PATH="/app/.venv/bin:$PATH"
101
+ ```
102
+
103
+ 7. __Set a Default Command__:
104
+ - Finally, set the command that will be executed by default when the container is run. You can change or override this later in the Docker Compose file.
105
+
106
+ ```Dockerfile
107
+ CMD ["python", "-m", "src.train"]
108
+ ```
109
+
110
+ ### Final Dockerfile
111
+
112
+ ```Dockerfile
113
+ # Stage 1: Build environment with Poetry and dependencies
114
+ FROM python:3.10.14-slim as builder
115
+ RUN pip3 install poetry==1.7.1
116
+ WORKDIR /app
117
+ COPY pytorch_project/pyproject.toml pytorch_project/poetry.lock /app/
118
+ ENV POETRY_NO_INTERACTION=1 \
119
+ POETRY_VIRTUALENVS_IN_PROJECT=1 \
120
+ POETRY_VIRTUALENVS_CREATE=true \
121
+ POETRY_CACHE_DIR=/tmp/poetry_cache
122
+ RUN --mount=type=cache,target=/tmp/poetry_cache poetry install --only main --no-root
123
+
124
+ # Stage 2: Runtime environment
125
+ FROM python:3.10.14-slim as runner
126
+ WORKDIR /app
127
+ COPY src /app/src
128
+ COPY --from=builder /app/.venv /app/.venv
129
+ ENV PATH="/app/.venv/bin:$PATH"
130
+ CMD ["python", "-m", "src.train"]
131
+ ```
132
+
133
+ ---
134
+
135
+ ### 2. Setting Up the docker-compose.yml File
136
+
137
+ The `docker-compose.yml` file is used to define and run multiple Docker containers as services. In this case, we need two services: one for __training__ and one for __inference__.
138
+
139
+ ### Step-by-Step Process for Creating docker-compose.yml
140
+
141
+ 1. __Define the Version__:
142
+ - Docker Compose uses a versioning system. Use version `3.8`, which is widely supported and offers features such as networking and volume support.
143
+
144
+ ```yaml
145
+ version: '3.8'
146
+ ```
147
+
148
+ 2. __Set Up the `train` Service__:
149
+ - The `train` service is responsible for running the training script. It builds the Docker image, runs the training command, and uses volumes to store the data, checkpoints, and artifacts.
150
+
151
+ ```yaml
152
+ services:
153
+ train:
154
+ build:
155
+ context: .
156
+ command: python -m src.train
157
+ volumes:
158
+ - data:/app/data
159
+ - checkpoints:/app/checkpoints
160
+ - artifacts:/app/artifacts
161
+ shm_size: '2g' # Increase shared memory to prevent DataLoader issues
162
+ networks:
163
+ - default
164
+ env_file:
165
+ - .env # Load environment variables
166
+ ```
167
+
168
+ 3. __Set Up the `inference` Service__:
169
+ - The `inference` service runs after the training has completed. It waits for a file (e.g., `train_done.flag`) to be created by the training process and then runs the inference script.
170
+
171
+ ```yaml
172
+ inference:
173
+ build:
174
+ context: .
175
+ command: /bin/bash -c "while [ ! -f /app/checkpoints/train_done.flag ]; do sleep 10; done; python -m src.infer"
176
+ volumes:
177
+ - checkpoints:/app/checkpoints
178
+ - artifacts:/app/artifacts
179
+ shm_size: '2g'
180
+ networks:
181
+ - default
182
+ depends_on:
183
+ - train
184
+ env_file:
185
+ - .env
186
+ ```
187
+
188
+ 4. __Define Shared Volumes__:
189
+ - Volumes allow services to share data. Here, we define three shared volumes:
190
+ - `data`: Stores the input data.
191
+ - `checkpoints`: Stores the model checkpoints and the flag indicating training is complete.
192
+ - `artifacts`: Stores the final model outputs or artifacts.
193
+
194
+ ```yaml
195
+ volumes:
196
+ data:
197
+ checkpoints:
198
+ artifacts:
199
+ ```
200
+
201
+ 5. __Set Up Networking__:
202
+ - Use the default network to allow the services to communicate.
203
+
204
+ ```yaml
205
+ networks:
206
+ default:
207
+ ```
208
+
209
+ ### Final docker-compose.yml
210
+
211
+ ```yaml
212
+ version: '3.8'
213
+
214
+ services:
215
+ train:
216
+ build:
217
+ context: .
218
+ command: python -m src.train
219
+ volumes:
220
+ - data:/app/data
221
+ - checkpoints:/app/checkpoints
222
+ - artifacts:/app/artifacts
223
+ shm_size: '2g'
224
+ networks:
225
+ - default
226
+ env_file:
227
+ - .env
228
+
229
+ inference:
230
+ build:
231
+ context: .
232
+ command: /bin/bash -c "while [ ! -f /app/checkpoints/train_done.flag ]; do sleep 10; done; python -m src.infer"
233
+ volumes:
234
+ - checkpoints:/app/checkpoints
235
+ - artifacts:/app/artifacts
236
+ shm_size: '2g'
237
+ networks:
238
+ - default
239
+ depends_on:
240
+ - train
241
+ env_file:
242
+ - .env
243
+
244
+ volumes:
245
+ data:
246
+ checkpoints:
247
+ artifacts:
248
+
249
+ networks:
250
+ default:
251
+ ```
252
+
253
+ ---
254
+
255
+ ### Summary
256
+
257
+ 1. __Dockerfile__:
258
+ - A multi-stage Dockerfile is used to create a lightweight image where the dependencies are installed with Poetry and the application code is run using a virtual environment.
259
+ - It ensures that all dependencies are isolated in a virtual environment, and the final container only includes what is necessary for the runtime.
260
+
261
+ 2. __docker-compose.yml__:
262
+ - The `docker-compose.yml` file defines two services:
263
+ - __train__: Runs the training script and stores checkpoints.
264
+ - __inference__: Waits for the training to finish and runs inference based on the saved model.
265
+ - Shared volumes ensure that the services can access data, checkpoints, and artifacts.
266
+ - `shm_size` is increased to prevent issues with DataLoader in PyTorch when using multiple workers.
267
+
268
+ This setup allows for easy management of multiple services using Docker Compose, ensuring reproducibility and simplicity.
269
+
270
+ ## __References__
271
+
272
+ - <https://stackoverflow.com/questions/53835198/integrating-python-poetry-with-docker>
273
+ - <https://github.com/fralik/poetry-with-private-repos/blob/master/Dockerfile>
274
+ - <https://medium.com/@albertazzir/blazing-fast-python-docker-builds-with-poetry-a78a66f5aed0>
275
+ - <https://www.martinrichards.me/post/python_poetry_docker/>
276
+ - <https://gist.github.com/soof-golan/6ebb97a792ccd87816c0bda1e6e8b8c2>
277
+
278
+ 8. ## __DVC SETUP__
279
+
280
+ First, install dvc using the following command
281
+
282
+ ```bash
283
+ dvc init
284
+ dvc version
285
+ dvc init -f
286
+ dvc config core.autostage true
287
+ dvc add data
288
+ dvc remote add -d myremote /tmp/dvcstore
289
+ dvc push
290
+ ```
291
+
292
+ Add some more file in the data directory and run the following commands
293
+
294
+ ```bash
295
+ dvc add data
296
+ dvc push
297
+ dvc pull
298
+ ```
299
+
300
+ Next go back to 1 commit and run the following command
301
+
302
+ ```bash
303
+ git checkout HEAD~1
304
+ dvc checkout
305
+ # you will get one file less
306
+ ```
307
+
308
+ Next go back to the latest commit and run the following command
309
+
310
+ ```bash
311
+ git checkout -
312
+ dvc checkout
313
+ dv pull
314
+ dvc commit
315
+ ```
316
+
317
+ Next run the following command to add google drive as a remote
318
+
319
+ ```bash
320
+ dvc remote add --default gdrive gdrive://1w2e3r4t5y6u7i8o9p0
321
+ dvc remote modify gdrive gdrive_acknowledge_abuse true
322
+ dvc remote modify gdrive gdrive_client_id <>
323
+ dvc remote modify gdrive gdrive_client_secret <>
324
+ # does not work when used from VM and port forwarding to local machine
325
+ ```
326
+
327
+ Next run the following command to add azure-blob as a remote
328
+
329
+ ```bash
330
+ dvc remote remove azblob
331
+ dvc remote add --default azblob azure://mycontainer/myfolder
332
+ dvc remote modify --local azblob connection_string "<>"
333
+ dvc remote modify azblob allow_anonymous_login true
334
+ dvc push -r azblob
335
+ # this works when used and requires no explicit login
336
+ ```
337
+
338
+ Next we will add S3 as a remote
339
+
340
+ ```bash
341
+ ```
342
+
343
+ 9. ## __HYDRA SETUP__
344
+
345
+ ```bash
346
+ # Install hydra
347
+ pip install hydra-core hydra_colorlog omegaconf
348
+ # Fillup the configs folder with the files as per the project
349
+ # Run the following command to run the hydra experiment
350
+ # for train
351
+ python -m src.hydra_test experiment=catdog_experiment ++task_name=train ++train=True ++test=False
352
+ # for eval
353
+ python -m src.hydra_test experiment=catdog_experiment ++task_name=eval ++train=False ++test=True
354
+ # for both
355
+ python -m src.hydra_test experiment=catdog_experiment task_name=train train=True test=True # + means adding new key value pair to the existing config and ++ means overriding the existing key value pair
356
+ ```
357
+
358
+ 10. ## __LOCAL SETUP__
359
+
360
+ ```bash
361
+ python -m src.train experiment=catdog_experiment ++task_name=train ++train=True ++test=False
362
+ python -m src.train experiment=catdog_experiment ++task_name=eval ++train=False ++test=True
363
+ python -m src.infer experiment=catdog_experiment
364
+ ```
365
+
366
+ 11. ## _DVC_PIPELINE_SETUP_
367
+
368
+ ```bash
369
+ dvc repro
370
+ ```
371
+ 12. ## _DVC Experiments_
372
+ - To run the dvc experiments keep different experiment_<>.yaml files in the configs folder under experiment folder
373
+ - Make sure to override the default values in the experiment_<>.yaml file for each parameter that you want to change
374
+
375
+ 13. ## _HYDRA Experiments_
376
+ - make sure to declare te config file in yaml format in the configs folder hparam
377
+ - have hparam null in train and eval config file
378
+ - run the following command to run the hydra experiment
379
+ ```bash
380
+ python -m src.train --multirun experiment=catdog_experiment_convnext ++task_name=train ++train=True ++test=False hparam=catdog_classifier_covnext
381
+ python -m src.create_artifacts
382
+ ```
poetry.lock CHANGED
The diff for this file is too large to render. See raw diff
 
pyproject.toml CHANGED
@@ -1,32 +1,60 @@
1
  [tool.poetry]
2
- name = "fastapi-aws-template"
3
  version = "0.1.0"
4
- description = "Basic template testing pytorch fastapi application on aws infra using GHA"
5
- authors = ["Soutrik Chowdhury"]
6
  license = "Apache-2.0"
7
  readme = "README.md"
8
 
9
  [tool.poetry.dependencies]
10
  python = "3.10.15"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  fastapi = "^0.115.4"
12
- loguru = "^0.7.2"
 
 
 
 
 
 
 
 
13
  fastapi-utils = "^0.7.0"
14
  httpx = "^0.27.2"
15
  typing-inspect = "^0.9.0"
16
  requests = "^2.32.3"
17
- pydantic = "^2.9.2"
18
- pydantic-settings = "^2.6.1"
19
- uvicorn = "^0.32.0"
20
  fastapi-restful = {extras = ["all"], version = "^0.6.0"}
21
  aioredis = "^2.0.1"
22
- tenacity = "^9.0.0"
23
  psycopg2-binary = "^2.9.10"
24
  asyncpg = "^0.30.0"
25
  confluent-kafka = "^2.6.0"
26
  aiokafka = "^0.12.0"
27
  azure-servicebus = "^7.12.3"
28
  aiohttp = "^3.10.10"
29
- gunicorn = "^23.0.0"
30
  aiofiles = "^24.1.0"
31
  aiologger = "^0.7.0"
32
  pyyaml = "^6.0.2"
@@ -36,17 +64,21 @@ alembic = "^1.13.3"
36
  fastapi-limiter = "^0.1.6"
37
  redis = "5.0.8"
38
  redisearch = "2.0.0"
39
- pandas = "^2.2.3"
40
  python-multipart = "^0.0.17"
41
  python-dotenv = "^1.0.1"
42
  celery = "^5.4.0"
43
  fastapi-cache2 = "^0.2.2"
44
  aiocache = "^0.12.3"
 
45
 
46
  [tool.poetry.dev-dependencies]
47
- pytest = "^7.2.0"
48
  pytest-asyncio = "^0.20.3"
49
 
 
 
 
 
 
50
  [build-system]
51
  requires = ["poetry-core"]
52
  build-backend = "poetry.core.masonry.api"
 
1
  [tool.poetry]
2
+ name = "pytorch_fastapi_project"
3
  version = "0.1.0"
4
+ description = "Consolidated PyTorch and FastAPI project for AWS deployment and GHA testing"
5
+ authors = ["soutrik71 <[email protected]>"]
6
  license = "Apache-2.0"
7
  readme = "README.md"
8
 
9
  [tool.poetry.dependencies]
10
  python = "3.10.15"
11
+ black = "24.8.0"
12
+ coverage = ">=7.6.1"
13
+ hydra-colorlog = "1.2.0"
14
+ hydra-core = "1.3.2"
15
+ lightning = {version = "2.4.0", extras = ["extra"]}
16
+ loguru = "0.7.2"
17
+ pytest = "^8.3.3"
18
+ rich = "13.8.1"
19
+ rootutils = "1.0.7"
20
+ tensorboard = "2.17.1"
21
+ timm = "1.0.9"
22
+ pandas = "^2.2.3"
23
+ numpy = "^1.26.0"
24
+ ruff = "^0.1.0"
25
+ torch = {version = "^2.4.1+cpu", source = "pytorch_cpu"}
26
+ torchvision = {version = "^0.19.1+cpu", source = "pytorch_cpu"}
27
+ seaborn = "^0.13.2"
28
+ pydantic = "^2.9.2"
29
+ kaggle = "^1.6.17"
30
+ pytest-cov = "^5.0.0"
31
+ pytest-mock = "^3.14.0"
32
+ flake8 = "^7.1.1"
33
+ dvc-gdrive = "^3.0.1"
34
+ dvc-azure = "^3.1.0"
35
+ transformers = "^4.45.2"
36
  fastapi = "^0.115.4"
37
+ pydantic-settings = "^2.6.1"
38
+ uvicorn = "^0.32.0"
39
+ tenacity = "^9.0.0"
40
+ gunicorn = "^23.0.0"
41
+ aim = "^3.25.0"
42
+ mlflow = "^2.17.1"
43
+ hydra-optuna-sweeper = "^1.2.0"
44
+ dvc = "^3.56.0"
45
+ platformdirs = "3.10"
46
  fastapi-utils = "^0.7.0"
47
  httpx = "^0.27.2"
48
  typing-inspect = "^0.9.0"
49
  requests = "^2.32.3"
 
 
 
50
  fastapi-restful = {extras = ["all"], version = "^0.6.0"}
51
  aioredis = "^2.0.1"
 
52
  psycopg2-binary = "^2.9.10"
53
  asyncpg = "^0.30.0"
54
  confluent-kafka = "^2.6.0"
55
  aiokafka = "^0.12.0"
56
  azure-servicebus = "^7.12.3"
57
  aiohttp = "^3.10.10"
 
58
  aiofiles = "^24.1.0"
59
  aiologger = "^0.7.0"
60
  pyyaml = "^6.0.2"
 
64
  fastapi-limiter = "^0.1.6"
65
  redis = "5.0.8"
66
  redisearch = "2.0.0"
 
67
  python-multipart = "^0.0.17"
68
  python-dotenv = "^1.0.1"
69
  celery = "^5.4.0"
70
  fastapi-cache2 = "^0.2.2"
71
  aiocache = "^0.12.3"
72
+ dvc-s3 = "^3.2.0"
73
 
74
  [tool.poetry.dev-dependencies]
 
75
  pytest-asyncio = "^0.20.3"
76
 
77
+ [[tool.poetry.source]]
78
+ name = "pytorch_cpu"
79
+ url = "https://download.pytorch.org/whl/cpu"
80
+ priority = "explicit"
81
+
82
  [build-system]
83
  requires = ["poetry-core"]
84
  build-backend = "poetry.core.masonry.api"
todo.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **__Pytorch Lightning Classifier with Hydra DVC and Linting and Pytest Deployed on AWS EC2__**:
2
+ - Data loading script with pytorch lightning datamodule
3
+ - Pytorch Lightning Classifier
4
+ - Hydra configuration for datamodule , trainer and callbacks
5
+ - DVC for versioning data and model using s3 bucket
6
+ - Linting with flake8 and black
7
+ - Pytest for testing
8
+ - Hyperparameter optimization with optuna executed using base package
9
+ - Dockerized application and tested via docker-compose
10
+ - Deployed on AWS EC2 instance using github actions
11
+ - Github actions for CI/CD and docker image push to elastic container registry (ECR)